Embedding AI-Ready Observability in the Lakehouse: Lessons from Bill and ZoomInfo

As AI adoption accelerates, data teams face a growing mandate: ensure trust in every byte powering models, workflows, and decisions. In this panel recap from CDOIQ 2025, ZoomInfo and Bill share how they embedded real-time data observability into their lakehouse architectures—shifting from reactive monitoring to proactive, scalable systems that support AI-ready pipelines. From organizational ownership to open formats like Iceberg, this piece unpacks practical lessons on building data trust at scale.

Anoop Gopalam

July 31, 2025

As enterprises modernize toward AI-first architectures, trustworthy data pipelines have become a foundational requirement. At enterprise scale, the sheer velocity, variety, and complexity of evolving data ecosystems make it essential not just to deliver clean data, but to embed data observability deeply within lakehouse architectures. Without it, even the most sophisticated analytics or AI initiatives risk breaking under the weight of unreliable inputs.

At this year’s CDOIQ Symposium, Hasmik Sarkezians, SVP of Data Engineering at ZoomInfo, and Aindra Misra, Director of Product at Bill, joined Mona Rakibe, CEO of Telmai, for a candid panel discussion. Together, they shared hard-won insights on what it takes to operationalize real-time, proactive data observability in modern lakehouse environments—and why traditional, reactive approaches no longer meet the needs of today’s AI-driven enterprise.

Why Observability Can’t Be an Afterthought

Both Bill and ZoomInfo operate in high-velocity, high-stakes data environments. Bill powers mission-critical financial workflows for over 500,000 small businesses and 9,000+ accounting firms, with products spanning AP, AR, and spend management. ZoomInfo manages a complex pipeline of over 450 million contacts and 250 million companies, delivering enriched, AI-powered go-to-market intelligence to thousands of customers in real time.

In both cases, small data errors often snowball into systemic risks. For instance, at ZoomInfo, A misclassified company description that is used to infer industry, headcount, or revenue, if left unchecked, can ripple through downstream processes and undermine the accuracy of critical data products if left unchecked.As Hasmik Sarkezians, SVP of Data Engineering at ZoomInfo, put it:

A minor data issue can become a massive customer-facing problem if it slips through the cracks. Catching it at the source is 10x cheaper and 100x less painful.

Catching issues at the root, she emphasized, is far less costly than retroactively fixing the consequences after they’ve been exposed to customers.

Moving from Monitoring to Intelligent Action: Making Observability Actionable

Observability is often synonymous with an after-the-fact reporting function. But both Bill and ZoomInfo have pushed well beyond that model toward embedded, actionable observability that actively shapes how data flows through their systems.

At ZoomInfo, this shift has been architectural. Rather than automatically pushing updates from the source of truth to their customer-facing search platform, the data team now holds that data until it passes a battery of automated quality checks powered by Telmai. If anomalies are detected, a failure alert is sent via Slack, and the data is held back from publication until the issue is resolved.

“We prevent the bad data from being exposed to the customer,” explained Hasmik Sarkezians, SVP of Data Engineering at ZoomInfo, “we catch that before it’s even published.” Updated records now undergo anomaly detection and policy checks via DAGs, and only data that passes validation is published. If an issue is found, a failure alert is pushed into Slack, and the data is held for manual review or correction.

In one instance, a faulty proxy once caused a data source to generate null revenue values for a large portion of companies. “We already caught multiple issues,” said Hasmik, referencing one such case involving SEC data, “proxy had an issue [that] generated null values, and we didn’t consume it because we had this alert in place.”

The pipeline, equipped with Telmai rules and micro-batch DAGs, caught the anomaly before it could propagate to customers.

Meanwhile, at Bill, the platform team faced a familiar challenge: lean data engineering resources spread thin managing Great Expectations and ad hoc rule logic. With a growing number of internal and external data consumers—including AI agents, forecasting engines, and fraud models—the cost of manual triage became unsustainable.

At Bill, the driver was slightly different. Their lean engineering team had previously relied on open-source frameworks like Great Expectations, but the overhead of managing rule-based tests across dynamic datasets was increasingly unsustainable.

Our hope with Telmai is that we’ll improve operational efficiency for our teams… and scale data quality to analytics users as well, not just engineering. – Aindra Misra, Director of Product at Bill

By introducing anomaly detection, no-code interfaces, and out-of-the-box integrations, Bill aims to empower not just data engineers but business analysts to assess trustworthiness—without relying on custom rules or engineering intervention.

For both companies, this marks a step toward making observability not just visible, but actionable—and enabling faster, safer data product delivery as a result.

The Role of Open Architectures

Both Bill and ZoomInfo emphasized the centrality of open architectures, anchoring their platforms on Apache Iceberg to support scalable, AI-ready analytics across heterogeneous, rapidly evolving ecosystems.

ZoomInfo, in particular, has leaned into architectural openness to simplify access across its vast and distributed data estate that includes cloud platforms and legacy systems. “We’ve been at GCP, we have presence in AWS. We have data all over,” said Hasmik Sarkezians. To unify this complexity, ZoomInfo adopted Starburst on top of Iceberg. “It kind of democratized how we access the data and made our integration much easier.”

Bill echoed a similar philosophy. “For us, open architecture is a combination of three different components,” explained Aindra Misra. “The first one is… open data format. Second is industry standard protocols. And the third… is modular integration.” He highlighted Bill’s use of Iceberg and adherence to standardized protocols for syncing with external accounting systems—ensuring flexibility both within their stack and across third-party integrations.

This architectural philosophy carries important implications for observability. Rather than relying on closed systems or platform-specific solutions, both teams prioritized composability—selecting tools that integrate natively into their pipelines, query layers, and governance stacks. As Mona pointed out, interoperability was “literally table stakes” in ZoomInfo’s evaluation process: “Would we integrate with their today’s data architecture, future’s data architecture, past data systems?”

Observability, in these environments, must adapt—not disrupt. That means understanding Iceberg metadata natively, connecting easily to orchestration frameworks, and enabling cross-system validation without manual validation. In short, open data architectures demand open observability systems—ones built to meet organizations where their data lives.

This design philosophy lets teams keep pace with changing business and technical needs. In Hasmik’s words: “…for me it’s just democratization of… the quality process, the data itself, the data governance, all of that has to come together to tell a cohesive story.”

By rooting their approaches in open, flexible architectures, both companies have positioned themselves to scale trust and agility—making meaningful, system-wide observability possible as they pursue ever more advanced data and AI outcomes.

Organizational Lessons: Who Owns Data Quality?

Despite making significant technical strides, both panelists acknowledged that data quality ownership and building a culture around it remain a persistent challenge.

ZoomInfo tackled this by forming a dedicated Data Reliability Engineering (DRE) team was initially created to manage observability infrastructure and to onboard new data sets. However, as Hasmik Sarkezians explained, this model soon ran up against bottlenecks and scalability concerns:

“Currently, we have a very small team. We created a team around [Telmai], which is called the DRE, the data reliability engineers… It’s a semi-automatic way of onboarding new datasets… but it’s not really automatic and it’s not really easy to get the direct cause, so there’s a lot of efforts being done to automate all of that process.”

Recognizing these limitations, ZoomInfo is actively working to decentralize data quality responsibilities. The vision is to empower product and domain teams—not only centralized data reliability engineers—to set their own Telmai policies, receive alerts directly, and react quickly via Slack integrations or future natural language interfaces:

“For me, I think we need to make sure that the owner, the data set owner, can set up the Telmai alerts, would be reactive to those alerts, and will take action.”

At Bill, Aindra Misra described a similar challenge. Leaning too heavily on a small, expert engineering team created not just operational drag, but also strained handoffs and trust with analytics and business teams: “With the lean team… things get escalated and the overall trust between the handshake between internal teams like the platform engineering and analytics team—that trust loses.”

Their north star is to build an ecosystem where business analysts, Ops, GTM teams, and other data consumers have the direct context to check, understand, and act on data quality issues—without always waiting for engineering intervention.

In both organizations, it’s clear that tools alone aren’t enough. Ownership must be embedded into culture, process, and structure—with clearly defined SLAs, better cross-team handoffs, and systems that empower the people closest to the data to take accountability for its quality.

Toward AI-Ready Data Products

Both organizations are also preparing for a shift from analytics-driven to autonomous systems.

At Bill, internal applications are increasingly powered by insights and forecasts that must be accurate, explainable, and timely. Use cases like spend policy enforcement, invoice financing, and fraud detection rely on real-time decisions driven by data flowing through modern platforms like Iceberg. As Aindra Misra noted, delivering trust in this context is critical: “Trust is our mission—whether it’s external customers or internal teams, data SLAs need to be predictable and transparent.”

ZoomInfo, meanwhile, is layering AI copilots and signal-driven workflows on top of an extensive enrichment pipeline. As Hasmik Sarkezians explained earlier, a single issue in a base data set can cascade through derived fields—corrupting entity resolution, contact mapping, and ultimately customer-facing outputs.

In both environments, the stakes are rising. Poor data quality no longer just breaks dashboards—it can undermine automation, introduce risk, and erode customer trust. As Aindra put it:

Once the data goes into an AI… if the output of that AI application is not what you expect it to be, it’s very hard to trace it back to the exact data issue at the source… unless you observed it before it broke something.”

That’s why both organizations see observability not as a reporting tool, but as a foundational enabler of AI—instrumenting every stage of the pipeline to catch issues before they scale into system-wide consequences.

Final Thoughts: Trust Is Your Data Moat

As AI models and agentic workflows become commoditized, the true differentiator isn’t your algorithm. It’s the reliability of the proprietary data you feed into it.

For both Bill and ZoomInfo, embedding observability wasn’t just about operational hygiene. It was a strategic move to scale trust, protect business outcomes, and prepare their architectures for the demands of autonomous systems.

Here are a few key points and takeaways from this panel discussion –

Start Early. Shift Left: Observability works best when embedded at the data ingestion and pipeline layer, not added post-facto once problems reach dashboards or AI models.
Automate the Feedback Loop: Use tools that not only detect issues but can orchestrate action—blocking bad data, triggering alerts, and assigning ownership.
Democratize, Don’t Centralize: Give business and analytics teams accessible controls and visibility into data health, instead of relying solely on specialized teams.
Build for Change: Choose data observability platforms that support open standards, multi-cloud, and mixed data ecosystems—future-proofing your investments

Want to learn how Telmai can accelerate your AI initiatives with reliable and trusted data? Click here to connect with our team for a personalized demo.

Want to stay ahead on best practices and product insights? Click here to subscribe to our newsletter for expert guidance on building reliable, AI-ready data pipelines.

On this page

See what’s possible with Telmai

Request a demo to see the full power of Telmai’s data observability tool for yourself.

Book a demo Contact Us

Articles

See all articles

Embedding AI-Ready Observability in the Lakehouse: Lessons from Bill and ZoomInfo

Why Observability Can’t Be an Afterthought

Moving from Monitoring to Intelligent Action: Making Observability Actionable

The Role of Open Architectures

Organizational Lessons: Who Owns Data Quality?

Toward AI-Ready Data Products

Final Thoughts: Trust Is Your Data Moat

See what’s possible with Telmai

Articles

Bringing Continuous Data Trust to Microsoft OneLake with Telmai

Architecting Data Trust for Agentic AI with Open Lakehouses

Telmai Brings Autonomous-Ready Data Observability for the Agentic AI Era

Telmai + Atlan unify trust and context to scale autonomous enterprise AI systems