Diving Deeper into Machine Learning with Snorkel AI

Adaptive systems with the ability to learn underpin nearly every core application that touches modern business today. Machine learning (ML) models are more than the brains behind purchase recommendations from Amazon or why our Twitter and Facebook feeds feel so addictive. These models run power plants, communications infrastructure, banks, warehouses, and military systems.

My early impression of the sheer power of learning systems was formed as a young computer science graduate, a military technology officer, and an early member of the MIT startup Kiva Systems, automating million-square-foot warehouses with fleets of robotics and software. At Kiva, our customers were blown away by the experience of having their $10m capital investment unexpectedly become more and more efficient and productive every single day as it adapted to specific human input and materials moving through the warehouse.

When I joined GV in Silicon Valley after Kiva’s acquisition by Amazon, I jumped at the chance to spend time on Stanford’s campus as I had done at MIT and Harvard while recruiting technical talent for our Boston-area startup. I was lucky to cross paths with one of the most impressive technical practitioners in GV’s network: Chris Ré, MacArthur Genius award winner and Associate Professor of computer science at Stanford. Chris, too, saw a future powered by learning systems from his work with Stanford’s AI lab, and we loved to riff together off of his big ideas about the future of computing and its impact on our world.

Snorkel’s vision is to make it dead simple for even non-developers to build and deploy ML applications in a fraction of the time. It’s an exciting prospect with significant implications for how we can dive even more deeply into machine learning for decades to come.

Chris is the kind of human who generates an electric field of energy around him as he thinks aloud. And he has an uncanny ability to attract individuals and teams who aspire to make a dent in the computing universe. Over the course of our work together on two other startups (Lattice.io and SambaNova), Chris’s wit was second only to his eye for talent. In 2017, Chris told me, “You have to meet my collaborator, Alex Ratner. He’s the full package in computer science: a brilliant mind and an empathetic clear communicator, equal parts driven and kind-hearted.” My curiosity was piqued.

Chris had teamed up with Alex and a group of technologists and academics in Stanford’s AI Lab to create Snorkel, initially an open source project designed to automate the task of data labelling. Their big idea was to shift the huge cost of data labelling done by armies of contract workers tagging videos, voice, posts, and text to developers who would write code called “labelling functions”. Over time their work has evolved into a full-fledged platform built for making machine learning processes even faster. Along with co-founders Paroma Varma, Braden Hancock, and Henry Ehrenberg, the team has proven to be learning machines themselves, constantly iterating on their initial approach into a broader, more applicable ML platform for businesses. Today, four years in the making, Snorkel is emerging from stealth with $15 million, including seed and Series A investments from GV along with Greylock and IQT.

The company focuses on a problem that data scientists and developers know well: machine learning models driving nearly every learning system today are dependent on the highly manual creation of training data. Snorkel AI builds upon the team’s early open source success in data labelling, which has already been adopted by dozens of Fortune 500 companies including Google, Apple, and Intel. As companies both deploy machine learning models at a rapid pace and iterate on existing models with new ideas, they need training data that isn’t just created by humans tagging data once, but is created by software, is auditable, and able to dynamically change over time. Snorkel enables just that.

The irony of machine learning is how dependent the world was, before Snorkel, on humans to hand-label data sets that could then be used to teach computers to build models. The machine was still “learning”— but with an extremely costly lesson plan drafted by humans. Companies were paying armies of human contractors to generate training data by hand-labelling data sets— images in photos, actions in videos, words in audio, texts in documents— that all needed to be hand labeled before a ML system could work. The flow then was Gather Data → Clean Data → Pay Human Contractors to Observe and Tag Data → Train a Machine Learning Model. Snorkel’s approach automated the whole process by empowering software developers to create labelling functions. The new flow is Gather data into Snorkel → use Snorkel to implement labelling functions → train an ML model. This approach eliminates human error, enables auditing of the labelling, and allows for mid-cycle adaptations and retraining along the way.

While it’s still early days, Snorkel’s vision is to make it dead simple for even non-developers to build and deploy ML applications in a fraction of the time. It’s an exciting prospect with significant implications for how we can dive even more deeply into machine learning for decades to come.

Please join me in congratulating the team on today’s launch!