Description
There is a lot more to parallel programming in Python than multiprocessing.Pool().map.
In this talk I will share some hard-learned knowledge gained in several years of parallel programming.
Covered topics will include performance, ways to measure the performance, memory occupation, data transfer and ways to reduce the data transfer, how to debug parallel programs and useful libraries.
I will give some practical examples, both in enterprise programming (importing CSV files in a database) and in scientific programming (numerical simulations). The initial part of the talk will be pedagogical, advocating the convenience of parallel programming in the small (i.e. in single machine environment); the second part will be more advanced and will touch a few things to know when writing parallel programs for medium-sized clusters.
I will also briefly discuss the compatibility layer that we have developed at GEM to be independent from the underlying parallelization technology (multiprocessing, concurrent.futures, celery, ipyparallel, grid engine...).