Description
I'm going to talk about the 4 main levels of parallelism in modern Computing:
- multiple (virtual) machines
- multiple processes
- multiple threads
- multiple green threads, aka asyncio
Why you might use each of them, how to go about doing so with python and some of the pitfalls you might fall into along the way.
To do so, I'll give short examples in code of achieving each level:
- leveraging multiple hosts using RQ, and also the possibility of RPC
with HTTP
- multiprocessing and threading using their respective modules from
the python standard library
- asyncio demonstrated with AIOHTTP
That sounds great, but there are "gotchas" you should know about before you get started, for example:
- multiple machines can actually be multiple virtual machines on the
same host
- effectively communicating between processes is hard, how can we go
about making it easier?
- the limitations of threading and the GIL
- run_in_executor - do we ever really need to use multiprocessing or
threading directly again
- use of asyncio when dealing with both networking between hosts and
between processes - you end up using two different kinds of
concurrency at the same time. That can be confusing, but also awesome.
I'll finish of by showcasing a library I built, arq which is a job queueing and RPC library for python which uses asyncio and Redis.