My journey into joining billions of rows in seconds with ScyllaDB

YouTube
MP4

Description

A talk on a technological iteration journey: replacing MongoDB + Hive by ScyllaDB in production to meet the requirements of business critical work loads.

I will share my recent experience in migrating our most intensive and JOIN hungry production work load from MongoDB + Hive to ScyllaDB.

This work and iteration allowed us to JOIN billions of rows in seconds while drastically reducing operation and development complexity by using one database (ScyllaDB) instead of two (MongoDB + Hive).

ScyllaDB is a C++ drop-in replacement of Cassandra that proved that its design was up to the challenge by squeezing every bit of performance from hardware. We will cover the approach and key aspects of this NoSQL database.

I will finally present the results of the benchmarks between Dask and Spark and highlight their differences and what we learned along the way.

Draft of the agenda

Business context and work load details
Problems and limitations in handling this work load using MongoDB + Hive
How we conducted a thorough evaluation of ScyllaDB to replace MongoDB + Hive
How we ended up challenging Spark with Dask
Lessons learned and production feedback

PyVideo

My journey into joining billions of rows in seconds with ScyllaDB

Description

Details