Description
We all start our programming with single process in mind. But parallelization from scratch is a real headache. Things get worse when it comes to writing a spaghetti code in both short time and high performance, which is often the case doing data analysis.
In fact, cases like computation under different conditions, programs can be easily parallelized (distributed) with a few modifications using existed library like multiprocessing, IPython Parallel, and Celery.
This talk makes you not afraid of parallel or distributed computing and provides ways for different level of parallelization from single machine, cluster, to cluster with task queue. I will only talk about the basic scenario, which should be easier for newbies to try on and understand how powerful these tools can be.
A machine learning example will be given in the end to compare performance and possible issues with different implementations.
About the speaker
呆呆電雞生,喜歡寫 R / Python,喜歡統計與生物資訊。目前為 Taiwan R Users Group 工作人員及 Taipei.py 常客。
Bioinfo / Stat / R / Python, master student of NTU BEBI. Co-organizer of Taiwan R Users Group and freq attendee of Taipei.py.