Contribute Media
A thank you to everyone who makes this possible: Read More

Out-of-core NumPy arrays without changing your code with wendelin-core

Description

This talk introduces a new implementation of NumPy arrays that provides support for out­of­core data analysis without changing code, without breaking APIs and without losing the performance advantage provided by FORTRAN libraries or just­in­time compilers. wendelin­core acts transparently as distributed shared virtual memory manager for binary data handled by python interpreters deployed on a cluster. Thanks to wendelin­ core, each python interpreter can access elementary ndarray structures of virtually 2 exabytes in a single memory block, whatever the amount of RAM available on each node. With wendelin­core, a cluster of inexpensive PCs can thus act as a teramory server at much lower cost. A cluster of tera­-memory servers can act as an examemory machine.

In addition to bringing true BIg Data support to NumPy libraries, wendelin­core also provides native persistency of ndarrays thanks to its integration with NEO database. NEO together with wendelin­core can shard and store ndarrays on a redundant array of inexpensive computers and provide native support for python exception handling, thus enforcing a rollback of any change made to data in case of bug or error during a calculation.

The talk will focus on the technical aspects of wendelin­-core. It will explain the technical approach that has been used for the first implementation: what has been achieved, what is still weak, what can be improved. It will explain how to hook wendelin­-core to a persistency layer. It will then present the technical roadmap and suggest how to integrate wendelin­core to other persistency layers or to other data structures.

Details

Improve this page