Description
Apache Spark has become a popular and successful way for Python programming to parallelize and scale up data processing. However, it's not well integrated with popular Python tools such as Pandas, and often result in poor performance when using Pandas with PySpark. In this talk, we will demonstrate how we improve PySpark performance with Apache Arrow.