Description
This talk introduces a new implementation of NumPy arrays that provides support for outofcore data analysis without changing code, without breaking APIs and without losing the performance advantage provided by FORTRAN libraries or justintime compilers. wendelincore acts transparently as distributed shared virtual memory manager for binary data handled by python interpreters deployed on a cluster. Thanks to wendelin core, each python interpreter can access elementary ndarray structures of virtually 2 exabytes in a single memory block, whatever the amount of RAM available on each node. With wendelincore, a cluster of inexpensive PCs can thus act as a teramory server at much lower cost. A cluster of tera-memory servers can act as an examemory machine.
In addition to bringing true BIg Data support to NumPy libraries, wendelincore also provides native persistency of ndarrays thanks to its integration with NEO database. NEO together with wendelincore can shard and store ndarrays on a redundant array of inexpensive computers and provide native support for python exception handling, thus enforcing a rollback of any change made to data in case of bug or error during a calculation. The talk will focus on the technical aspects of wendelin-core. It will explain the technical approach that has been used for the first implementation: what has been achieved, what is still weak, what can be improved. It will explain how to hook wendelin-core to a persistency layer. It will then present the technical roadmap and suggest how to integrate wendelincore to other persistency layers or to other data structures.