Realtime Distributed Computing At Scale: Storm And Streamparse

YouTube

Description

Realtime distributed computing is tough, especially at scale: managing a large data pipeline is tough, and it’s even tougher to keep latency low and availability high when processing tens of thousands of items per second. Many people turn in despair to Java or Scala when it comes time to scale up, but we can do it in Python: Apache Storm is a distributed realtime computation system that can let you scale up - and no need to reach for a new language!

This talk will walk the audience through the basics of Apache Storm and how it’s an elegant, useful solution to realtime distributed computing, as well as how streamparse can let you write your storm components in Python by writing some code and a basic storm topology in Python. We’ll also look at how Parsely uses Storm in production to handle billions of realtime events a month. If we have time, we’ll go a bit into how Storm has several advantages over other common Python computing data streaming solutions, like Spark’s microbatching.

Goals:

At the end of the talk, ideally you should be able to understand:

What Apache Storm is, how it works generally, and what scenarios it’s useful for
How streamparse can be used to write your Storm topologies
How Storm + streamparse is used in an actual high-availability, low-latency production environment

PyVideo

Realtime Distributed Computing At Scale: Storm And Streamparse

Description

Details