Netflix’s data-science team has open-sourced its Metaflow Python library, a key part of the ‘human-centered’ machine-learning infrastructure it uses for building and deploying data-science workflows.
The video-streaming giant uses machine learning across all aspects of its business, from screenplay analysis, to optimizing production schedules, predicting churn, pricing, translation, and optimizing its giant content distribution network.
According to Netflix software engineers, Metaflow was built to help boost the productivity of its data scientists who like to express business logic through Python code but don’t want to spend too much time thinking about engineering issues, such as object hierarchies, packaging issues, or dealing with obscure APIs unrelated to their work.
The idea behind Metaflow was to give Netflix data scientists the ability to see early on whether a prototyped model would fail in production, allowing them to fix whatever the issue was and ideally speed up deployment times. Netflix in February revealed that Metaflow had helped reduce median deployment times from four months to just seven days.
Netflix offers this nutshell description of its Python library on the new metaflow.org website: “Metaflow helps you design your workflow, run it at scale, and deploy it to production. It versions and tracks all your experiments and data automatically. It allows you to inspect results easily in notebooks.”
It can also be used with popular Python data-science libraries, including PyTorch, Tensorflow, and SciKit Learn.
Netflix, as is well known, is one of the largest users of Amazon Web Services (AWS), so it’s not surprising that Metaflow integrates with numerous AWS services, including the ability to snapshot all code and data in Amazon S3, which Netflix uses as its ‘data lake’. This ability should help users quickly scale up models using AWS’s storage, compute, and machine-learning services.
The ability to snapshot code in S3 is what enables Metaflow’s automated versioning and experiment tracking so developers can safely inspect and restore Metaflow execution.
Metaflow is also bundled with a “high-performance S3 client, which can load data up to 10Gbps”.
The client allows any organization’s data scientists to achieve what Netflix data scientists have done for the past few years. Netflix revealed in April that it used Metaflow to “push the limits of Python”, enabling it to use “parallelized and optimized Python code to fetch data at 10Gbps, handle hundreds of millions of data points in memory, and orchestrate computation over tens of thousands of CPU cores”.
“This client has been massively popular among our users, who can now load data into their workflows an order of magnitude faster than before, enabling faster iteration cycles,” Netflix software engineers said today.
Metaflow also integrates with Batch, the AWS container-based compute platform.
Netflix argues that Metaflow on AWS allows developers to get the speed of developing on a laptop, with the deeper compute resources available in the cloud.
“Metaflow makes it easy to move back and forth between the local and remote modes of execution” by not necessitating changes to code or libraries for each state, which in turn should make troubleshooting easier.