From Documentation
Jump to: navigation, search

When programming in Python, some common libraries, such as Numpy, Pandas, Scikit-Learn, etc. usually work well if the dataset fits into the existing RAM on a single machine. However, when dealing with large datasets, it could be a challenge to work around the memory constraints. This is where Dask can help. Dask is a Python task-graph based framework to parallelize operations. Dask provides Python APIs or libraries that can handle large datasets on a single multi-core machine or on a cluster. This webinar provides an introduction to Dask.