Contribute Media
A thank you to everyone who makes this possible: Read More

Improving Pandas and PySpark performance and interoperability with Apache Arrow

By Li Jin

Description

Apache Spark has become a popular and successful way for Python programming to parallelize and scale up data processing. However, it's not well integrated with popular Python tools such as Pandas, and often result in poor performance when using Pandas with PySpark. In this talk, we will demonstrate how we improve PySpark performance with Apache Arrow.

Details

Improve this page