Adam Ghobarah is an expert in applying data and analysis to improve products and transform large-scale operations. As an engineering partner at Google Ventures, he guides portfolio companies on data analysis, and identifies disruptive startups in the big data and analysis space.
1. “Big Data” is one of those buzzwords that means different things to different people. How do you define the term?
“Big Data” has become a convenient catch-all term. It can be seen as three primary layers:
- Data storage and access.
- Data management and quality, which is really about making the data usable.
- Big data analytics and applications designed to get value out of data, such as insights and predictions.
Currently, the first layer is well-populated (including Hadoop providers and new database technologies), and has broad demand from customers. However, as we move up to the second and third layers, the coverage is thinner. Large business users are beginning to ask, “How do I get value from these volumes of data that I store?”
For example, a recent survey found that U.S. government agencies store significantly larger volumes of data than ever before. However, 40% of IT managers surveyed reported that their agencies do not analyze the data they collect, and almost two-thirds do not use their data to make strategic decisions.
We also see a strong need for scalable analysis tools and for industry-focused solutions. Recorded Future, ClearStory Data, Climate Corporation, DNAnexus, and others in the portfolio are great examples of companies leveraging Big Data to solve substantial challenges in government, retail, agriculture, and personalized medicine.
2. What is the next major market opportunity for companies in the Big Data space?
Scalable analysis tools and Big Data applications are obvious growth areas. Another area is perhaps less obvious: data quality and integration.
Practitioners know that it takes up to 80% of their project time to organize and clean data. It’s an open secret that Big Data is dirty. Inaccurate or missing data leads to inaccurate insights, and quantity does not make up for quality. Skilled analysts and data scientists spend time to carefully understand patterns of missing data, outliers, and measurement error.
Historically, small data sets were manually generated, curated, and tightly controlled by a rigid schema and a fearsome database administrator. Now, massive volumes of sensor/server-generated streaming data come in many formats and are subject to many breakdowns. Furthermore, solving the quality challenge is critical for reliable integration of data.
There’s an enormous opportunity for companies that can figure out how to detect and resolve data problems. The hard part is figuring out how to improve data quality on large scale, while not being ham-handed in throwing out rare and interesting events.
3. How do you work with Google Ventures portfolio companies?
I do hands-on work with our portfolio companies every day; it’s the most gratifying part of my job.
The work we do really depends on the needs of each startup. I’m often on-site with a portfolio company, helping them implement advanced machine-learning algorithms and finding new ways to draw the most value from their data.
One portfolio company asked me to help them build more nuanced models to increase the accuracy of their predictions. Another one needed advice on choosing a new data architecture and analysis tools. A life-sciences firm in the portfolio needed access to computing resources, so we worked with Google to give them 1 million core hours at no charge.
I also help them evaluate candidates for specialized data scientist positions, and I conduct workshops on Big Data and analysis as part of Startup Lab.