Data quality has been shaping up as a salient and increasingly critical part of the world of data science: enterprises are sitting on growing troves of information, but it’s only useful if we can trust it to be accurate and usable. To that end, Validio, a startup building tools to improve and ensure data quality — specifically with tools that let users clean up data both stored in data warehouses and elsewhere, as well as in real-time — is announcing a seed round to mark its emergence from stealth. The Stockholm-based company has raised $15 million, funding that it plans to use for business and product development, R&D and to hire more talent.
Lakestar — the London-based VC that made early investments in companies like Facebook and Airbnb but has largely focused on backing promising-looking startups out of Europe (it also backed Skype, Spotify, Revolut and many others) — led this round, with J12 and several high profile individuals also participating.
(The list includes footballer (soccer player) Zlatan Ibrahimović, Snowflake’s CMO Denise Persson, MongoDB’s co-founder Kevin Ryan, Neo4j co-founder Emil Eifrem, DeepMind’s head of product Mehdi Ghissassi and Kim Fai Kok & Dara Gill of angel collective Framtid.)
As with a lot of enterprise startups in stealth these days, Validio has been using the time since being founded in 2019 to work quietly on its product while also signing up customers for live deployments. Its clients range across the usual suspects in the big data game — those in marketing and commerce, security companies, and business intelligence. Validio doesn’t disclose a lot of names but notes a few: Budbee and Babyshop in the e-commerce space; e-scooter company Voi; and electricity startup Tibber.
The challenge that Validio has identified an is addressing is one that CEO and co-founder Patrik Liu Tran said he encountered early on in his working life. A math and computer wiz, he graduated aged 16 from school and also accelerated his time at university, going to work in 2014/2015 while still a teenager consulting companies on AI projects. It was still a nascent endeavor in most places (frankly, it still is), and one of the big issues, apart from having few in the field prepared to go into companies to work on their problems, was the lack of integrity and quality in the data that they were trying to use in their machine learning models, he said.
“At every company that I was advising, the thing that caught my attention was the lack of trust in data, so much that people did very little with it, and there were no tools really to help with that,” he said in an interview. He added that the first efforts in identifying the issue and trying to deal with it (such as the Great Expectations open source project, created by the people who are behind Superconductive), were promising but do not focus on real-time information as much as data in warehouses.
“But machine learning resides in streams, not the warehouse,” he said.
Beyond that, they are generally too reliant on rules that engineers and data scientists need to set and regularly monitor and tweak.
Validio’s approach is to create not exactly low code tools. “We’re building for data engineers. It’s very technical,” Tran said, slightly surprised with my question about that. “But we are focusing on a smooth user experience.”
That includes using machine learning and statistical analysis to “teach” a users’ system to find and respond more quickly to the data coming through the pipeline; sets of rules that are created automatically for an engineer to use or to complement with customized rules; automated thresholds and auto-resolution capabilities, and more.
“We want to make it as seamless as possible for data engineers to do their work,” he added.
The company doesn’t have a larger set of rules that it applies across the platform, but has built it to be tailored to individual organizations.
“‘Data quality’ is hard to define. What is good for one company might be bad for another,” Tran said. “Data is never perfect and companies also need to start to accept that.” But the list of its investors (including some of those attached to strategic names) is a sign that others may well be singing the same tune with that kind of thinking, and how Validio specifically is building to address that: tools to improve data quality, but built for the real world.
There are a few other companies that have identified the market for data quality and are building to address that — including Great Expectations creator Superconductive, which raised $40 million earlier this year; along with heavyweights like Microsoft, SAS, and Talend — but for now Validio’s approach is one that seems to be striking the right chord, enough to expand bets in what is still a young space.
“As data teams are increasingly shifting their focus toward data quality, we believe that Validio is uniquely positioned to become the next big global software player from Europe,” noted Stephen Nundy, Lakestar partner, in a statement. “Validio has built its platform with a unique architecture, enabling the management of data quality in data warehouses, lakes and streams both on the actual data and metadata in real-time. We look forward to supporting the stellar Validio team in their journey building a global data infrastructure leader.”