Contribute Media
A thank you to everyone who makes this possible: Read More

My journey into joining billions of rows in seconds with ScyllaDB


A talk on a technological iteration journey: replacing MongoDB + Hive by ScyllaDB in production to meet the requirements of business critical work loads.

I will share my recent experience in migrating our most intensive and JOIN hungry production work load from MongoDB + Hive to ScyllaDB.

This work and iteration allowed us to JOIN billions of rows in seconds while drastically reducing operation and development complexity by using one database (ScyllaDB) instead of two (MongoDB + Hive).

ScyllaDB is a C++ drop-in replacement of Cassandra that proved that its design was up to the challenge by squeezing every bit of performance from hardware. We will cover the approach and key aspects of this NoSQL database.

I will finally present the results of the benchmarks between Dask and Spark and highlight their differences and what we learned along the way.

Draft of the agenda

  • Business context and work load details
  • Problems and limitations in handling this work load using MongoDB + Hive
  • How we conducted a thorough evaluation of ScyllaDB to replace MongoDB + Hive
  • How we ended up challenging Spark with Dask
  • Lessons learned and production feedback


Improve this page