Our investment in Gladia: Redefining Audio Intelligence for Enterprises

Joshua Olusanya

October 15, 2024

Illuminate Financial is pleased to participate in Gladia’s oversubscribed $16 million Series A round led by XAnge, with participation from, XTX Ventures, Athletico Ventures, Gaingels, Mana Ventures, Motier Ventures, Roosh Ventures, and Soma Capital. Founded in 2022, Gladia has now raised a total of $20.3 million, with earlier seed investments headed by New Wave, Sequoia Capital (as part of the First Sequoia Arc program), Cocoa, and GFC.

Audio data is an emerging frontier, and as enterprises increasingly rely on voice interactions, harnessing that data for actionable intelligence has become imperative. Historically, the infrastructure available to efficiently extract intelligence from audio data has been insufficient, and the problem has only been exacerbated with increasing audio-data volumes.

Frustrated by existing audio speech recognition solutions that failed to reliably handle multilingual and accented speech, Jean-Louis Quéguiner and Jonathan Soto set out to create a solution that could address these limitations at scale.

Headquartered in Paris, Gladia aims to be an end-to-end platform that can transcribe, process, analyse, and understand complex audio data in real time — a feat that incumbent automatic speech recognition (ASR) engines, also referred to as speech-to-text (STT), have failed to do.

The Problem with Incumbent Automatic Speech Recognition Engines

Automatic speech recognition solutions from big tech (Amazon, Google, and Microsoft), as well as independent vendors, have seen widespread adoption due to their reliability in non-specific speech-related tasks. Examples include transcribing meeting notes or customer calls, enabling voice commands for virtual assistants, or automating captions for videos. These use-cases typically require broad language support but not domain-specific nuance.

However, their limitations become evident in more specialized settings.

For instance, an FX trader discussing complex investment strategies might use terms like “straddle” or “put options”, which generic ASR engines may misconstrue with common language constructs. These systems lack the context to understand industry-specific terminology, resulting in lower transcription accuracy and reducing the overall value of the insights derived from the data.

Moreover, given that these solutions are built for general-use, they usually require organisations to dedicate significant engineering resources to customise them to their needs, driving up costs without fully solving the problem.

OpenAI’s Whisper: A Step Up from Incumbents, but Still Falls Short

OpenAI’s Whisper is an open-source ASR engine that has garnered significant attention since its release in 2022. It represents a significant improvement over incumbent ASR systems by offering broader language support (50+) and a distinct ability to handle background noise.

However, like the incumbents, its effectiveness declines in specialized scenarios, where domain-specific terminology and contextual nuances are critical.

Frequent users report that Whisper is prone to hallucinations — where the model generates inaccurate or misleading transcriptions in complex, niche scenarios. For instance, feedback has shown that the engine struggles to account for the different ways that people speak, often overlooking factors like accent, dialect and even speech impediments. Factoring in these distinctions is critical, as they can vary drastically depending on the use-case, and the speaker.

For instance, due to the nature of their work, traders are known for their rapid speech pattern, and heavy-use of acronyms like BPS (basis point), NAV (net asset value) and CDO (collateralised debt obligation). Whereas on the other hand, medical professionals may speak with a slower speech pattern to avoid overwhelming patients with scientific vocabulary when discussing health issues and prescriptions.

Furthermore, Whisper is not optimised for real-time use; it exclusively operates in a batch-processing manner i.e. audio data must pre-recorded, transcribed and then analysed.

While Whisper’s extensive feature set is powerful, many of its out-of-the-box capabilities are unnecessary for enterprise use, leading to higher computing costs. Customizing Whisper for specific verticals also demands significant engineering resources, further diminishing its practicality for businesses that require streamlined, cost-effective audio intelligence solutions.

Enter Gladia: Enterprise-Grade Automatic Speech Recognition and Audio Intelligence Infrastructure

Gladia is building an end-to-end platform that will redefine how businesses interact with audio data. By optimising Whisper’s base model and overhauling its source code to service real-time usage, Gladia addresses the key shortcomings of incumbent systems.

The platform enhances transcription accuracy by integrating proprietary models designed to understand industry-specific vocabulary and conversational context, making it more effective in specialized scenarios.

Gladia’s real-time transcription engine supports over 100 languages, with industry-leading latency under 300 milliseconds, enabling businesses to access accurate insights almost instantly. Additionally, Gladia extracts key insights from conversations — such as sentiment analysis and key takeaways — in real time. This makes it invaluable for high-stakes scenarios like compliance monitoring on trading floors or customer interaction analysis for SaaS platforms.

For regulated industries where data privacy is mandatory, Gladia’s on-premise deployment ensures full control over data, meeting strict compliance and governance requirements. For organisations seeking flexibility, the API/cloud options seamlessly integrate into existing tech stacks, ensuring both scalability and security.

Gladia's real-time audio intelligence API launches

Since launching last summer, Gladia has gained significant traction, with over 70,000 active users worldwide in work-streams including compliance, sales enablement, and customer service.

A Visionary Team

Gladia is led by Jean-Louis Quéguiner and Jonathan Soto, a dynamic duo at the intersection of artificial intelligence, cloud computing, and enterprise software. Jean-Louis, Co-founder & CEO, brings a wealth of experience in AI and cloud computing, with a background that spans multiple high-growth tech ventures including Europe’s largest cloud software vendor, OVHCloud. At OVHCloud, Jean-Louis led their AI software practise as the Group VP in charge of AI & Quantum Computing.

Jonathan, Co-founder & CTO, has over fifteen years of experience in building technical products and teams from the ground-up. Since graduating from MIT, Jonathan has held CTO/VP Eng roles at several vc-backed software companies, including the likes of SnappCar, ActiveViam, and SigFox, the latter of which has raised over $300m in VC-funding before exiting in 2022.

Together, they have positioned Gladia at the forefront of a competitive audio intelligence market, creating a platform that has hit the ground running.

Jonathan Soto (left) and Jean-Louis Queguiner (right)

We firmly believe that Jean-Louis, Jonathan, and their talented organisation are the ideal team to transform audio intelligence for enterprises. Their unique mixture of expertise, combined with a clear vision for solving complex enterprise challenges, sets them apart from other teams. At Illuminate, we are proud and excited to partner with Gladia as they scale their platform globally and empower businesses to extract actionable insights from audio data.

If you would like to connect with the Gladia team, or if you are building a next-generation enterprise technology solution and believe we could be of any help, feel free to reach out!

INVESTMENTS