AI-ready data Archives

Architecting Data Trust for Agentic AI with Open Lakehouses

Posted on November 4, 2025November 4, 2025 by Anoop Gopalam

As Agentic AI adoption accelerates, the industry conversation is shifting from “Can we build AI?” to “Can we trust it?”

In the latest episode of our Data Quality podcast series, Telmai’s CEO and co-founder, Mona Rakibe, joined Ravit Jain, Alex Merced, and Scott Haines to explore how open lakehouse architectures are becoming foundational for sustainable Agentic AI infrastructures. They also highlighted critical considerations for architecting agentic infrastructures where autonomous systems and workflows don’t just analyze data, but act on it in a deterministic and trusted manner.

“Agentic-Ready” Starts With Reliable Contextual Data

“Agent-Ready” data isn’t something entirely new, it’s simply well-prepared data elevated for a new level of responsibility. Yet as AI systems evolve from analytical to autonomous, the stakes rise dramatically.

“At the end of the day, all AI is doing is understanding your data just faster,” said Alex Merced, Head of Developer Relations at Dremio. “It’ll get to the right answer faster or the wrong one faster, depending on your data quality. That means everything we’ve always cared about, accuracy, cleanliness, and semantic definitions, now matters a lot more.”

Mona Rakibe expanded on this idea, “The biggest mental shift is understanding that for agents to be truly autonomous, the data powering them must be reliable and enriched with context. It’s no longer enough to have dashboards; the data pipeline itself needs to be self-healing and self-validating.”

This shift is fundamental, as agents interact dynamically with data across diverse domains, demanding real-time observability and proactive governance. Scott Haines echoed this sentiment, noting how teams are “giving up control” to automated systems, which makes the need for guardrails and testable context even more urgent. “You have to ensure your workflows behave predictably, that’s what makes the difference between a trusted agentic ecosystem and one that’s just automated chaos,” he said.

Building “AI-ready” or “agentic AI-ready” data isn’t just about accuracy, it’s about real-time reliability and contextual awareness. Systems must now deliver machine-consumable metadata, continuous validation, and semantic consistency at the speed of automation. In this world, data trust isn’t an afterthought, it needs to be baked into your data infrastructure.

Open Lakehouses: The Foundation for Trusted and Autonomous AI

If the first step toward Agentic AI is reliable contextual data, the next is an open architecture that makes that trust accessible across every system, use case, and engine. As Mona Rakibe put it, today’s enterprises are moving toward a model where “everything lives in one lake,” but the engines and intelligence around that data are increasingly decoupled and composable.

“Historically, we used to ETL data, model it, and make it fit for purpose for every single use case,” she explained. “Now, what I’m seeing is a shift toward standardized open formats — Iceberg, Delta, Hudi — where data can be dumped into a lake and processed as-is. The models themselves are smart enough to handle JSON or XML, so the transformation moves closer to the use case. That’s where zero-ETL and zero-copy architectures are becoming real.”

Alex Merced expanded on this, noting that the goal isn’t to eliminate data movement entirely, but to reduce redundancy and preserve context throughout the pipeline. “Even if you standardize on Iceberg or Delta,” he said, “you still need to move data between systems. But the transformations themselves can become more logical or virtual. That’s why semantic layers are getting so much attention. Instead of physically transforming data, you engineer meaning over it.”

This architectural evolution, from closed, pipeline-heavy data systems to open, semantically aware ecosystems, enables AI agents to operate with both context and consistency. It ensures that the rules, lineage, and quality signals that define trusted data travel seamlessly, regardless of where or how the data is consumed.

Scott Haines, who leads developer relations at Buf, added another dimension to this openness: data contracts embedded directly into schemas.“At Buf, we created something called Proto Validate,” he explained. “It lets you embed those data contracts right inside your schema definitions, basically adding guardrails at the edge, before data even enters the lake.”This “edge validation” approach ensures that governance begins at ingestion, not after a failure has already propagated downstream. It’s a natural complement to the metadata-driven validation Mona advocates for and the semantic standardization Alex envisions.

Open lakehouses aren’t just about flexibility or performance, they’re about trust at scale. By embracing open formats, shared semantics, and embedded contracts, data teams can finally align what humans understand and what AI agents act upon. In the era of Agentic AI, interoperability becomes the new reliability, and the open lakehouse serves as the foundation for both.

Metadata, MCP, and the Headless Future of Data Quality

If open lakehouses provide the foundation for trust, metadata is what animates that trust in real time. As AI agents begin to interact directly with enterprise data, the challenge isn’t just validating accuracy it’s making validation context available instantly, wherever and whenever agents need it.

Mona Rakibe explained this shift clearly: “Unstructured data has now become a first-class citizen. The moment a PDF or a JSON file enters a pipeline, its validation becomes critical,” she said. “That validation context must also be accessible through an MCP — the Model Context Protocol — because when agents query data, they need to know which records can be trusted and which should be excluded.”

In her view, MCP represents the next evolution of interoperability, serving as a universal protocol that enables AI systems to access not only data but also its context, quality, and provenance in a standardized manner. “It’s almost like REST for AI,” Mona noted. “Everything now has to be accessible to the agent in a standardized format. It’s no longer optional.”

This real-time exposure of metadata marks the beginning of what she called the “headless data quality era.” In this model, validation isn’t something performed within a UI or a tool; it becomes an invisible, autonomous service that continuously surfaces reliability signals across every workflow, both human and machine.

“Data quality needs to become a headless application,” Mona said. “We need to get the context out as soon as data lands, make it part of the MCP so agents can operate on it. That’s the only way to make autonomous systems truly reliable.”

Alex Merced agreed, adding that headlessness isn’t limited to data quality, it’s transforming the entire data stack. “We’re walking into a world where application building is less about designing a user interface and more about building functionality,” he said. “MCP enables that. It decouples how we interact with systems from how those systems actually run.”

Scott Haines tied this back to governance and predictability, reminding that decentralization doesn’t mean disorder. NLP and automation enable teams to manage distributed quality responsibilities without compromising coherence, but metadata must remain the unifying thread. “In a world where agents can run checks and feed that context back into workflows, governance becomes a living process,” he observed — one that’s both autonomous and explainable.

Together, these ideas signal a dramatic transformation: metadata is becoming the interface between humans, machines, and trust.
Headless data quality, powered by MCP, ensures that every system, from an LLM querying customer data to an autonomous workflow reconciling transactions, has access to the same trusted, contextual truth.

From Reactive Pipelines to Autonomous Systems

For years, data quality was defined by dashboards, alerts, and manual intervention systems that reacted after something went wrong. However, in an era where AI agents process data in real-time, reactive monitoring can no longer keep pace. Enterprises now need self-governing, self-healing data ecosystems that detect, diagnose, and resolve anomalies autonomously.

Mona Rakibe described this as the natural endpoint of the shift Telmai itself has been preparing for.“We’re moving toward a world where data quality becomes autonomous,” she said. “Nobody loves doing DQ work, and that’s exactly what makes it the perfect candidate for automation. The system should be able to detect drifts, understand patterns, and correct itself without waiting for human approval.”

That autonomy, however, doesn’t mean a loss of control,it means redefining control. The goal isn’t to replace human oversight, but to embed intelligence within the data fabric, making trust continuous and invisible. In this future, metadata, lineage, and validation signals form a living feedback loop that constantly informs both human decisions and AI reasoning.

Alex Merced explained how this evolution changes the way teams build and interact with systems.“We’re walking into a world where application building is less about designing a user interface and more about building functionality,” he said. “With MCP and headless validation, the system itself becomes the interface. The agent can query, interpret, and act — and the data quality layer ensures it does so responsibly.”

The move toward autonomous trust systems also brings a cultural transformation. Teams must start designing for proactive reliability, not just reactive response. Instead of tracking SLAs and resolution times, success will be measured by how seamlessly systems prevent incidents altogether, by design, not by repair.

In the context of Agentic AI, this evolution isn’t optional; it’s existential. As workflows become more distributed and decisions become more automated, the only sustainable model is one where data quality operates as an autonomous, intelligent service that detects context, adapts to drift, and reinforces trust without human bottlenecks.

Ultimately, this is where observability meets agency. The systems that once monitored data will now reason about it, closing the loop between awareness and action, and transforming trust from a static KPI into a continuously orchestrated state.

Democratizing Data Quality in the Agentic Era

As architectures evolve and data quality becomes autonomous, one challenge persists: who owns trust? For years, organizations have swung between centralized governance and decentralized accountability, both with their trade-offs. Centralization brought standardization but lacked context; decentralization gave teams control but often led to inconsistency.

Mona Rakibe captured this tension perfectly.“Initially, data quality was centralized, which nobody liked — the people who owned it didn’t have the context. So we tried to decentralize. But then the teams who had the context struggled with SQL or with using yet another tool. Data quality wasn’t part of their KPIs; they just wanted to build and ship products.”

Her point underscores a reality that many enterprises face: data quality can’t thrive as an isolated function. It must live within the workflows where data is produced and consumed. And with the rise of natural language interfaces and AI-powered validation, this is finally becoming possible. “Decentralization is possible today because of NLP,” Mona added. “We can make quality checks accessible through simple prompts, allowing agents and business users to participate without deep technical knowledge.”

Scott Haines described this as “governance that lives where work happens.” Instead of forcing teams to adopt new tools, observability and validation flow directly into existing workflows — Git commits, notebooks, orchestration platforms, or even chat-based agents. The result is ambient governance: always present, rarely intrusive, and fully traceable.

The broader implication is that trust itself becomes decentralized, but discipline stays centralized.Central teams define the rules, while distributed teams enforce and improve them through automated systems. NLP and agentic validation create a collaborative loop in which humans guide intent, and systems ensure consistency.

Closing Thoughts: Trust Is Your Data Moat

In the rush to operationalize Agentic AI, it’s easy to focus on compute power, model accuracy, or prompt engineering. But as every enterprise soon discovers, the true differentiator isn’t intelligence, it’s integrity.

Agentic AI doesn’t just consume data; it inherits its flaws. In an autonomous ecosystem, even a minor inconsistency can amplify into a systemic failure. That’s why building trust as infrastructure is no longer optional. From contextual validation to open lakehouses, from metadata-rich MCP layers to headless observability, every architectural decision now shapes how confidently your systems can operate independently.

As enterprises design for autonomy, this trust fabric becomes their most enduring moat. It’s what enables AI agents to reason responsibly, what gives teams confidence in automation, and what allows innovation to scale without fear of fragility.In the end, trust isn’t a checkpoint in the pipeline, it’s the currency of intelligent systems. The organizations that invest in reliable, contextual, and explainable data today will be the ones defining how AI behaves tomorrow.

Want to learn how Telmai can accelerate your AI initiatives with reliable and trusted data? Click here to connect with our team for a personalized demo.

Want to stay ahead on best practices and product insights? Click here to subscribe to our newsletter for expert guidance on building reliable, AI-ready data pipelines.

Telmai Brings Autonomous-Ready Data Observability for the Agentic AI Era

Posted on October 27, 2025October 27, 2025 by Anoop Gopalam

Introducing Telmai’s Data Reliability Agents

Telmai, the AI-powered data observability platform, today announced its Agentic offerings to make enterprise data truly Autonomous-Ready. These new capabilities ensure agentic AI workflows can communicate, decide, and execute actions on real-time trusted data with minimal human oversight.

Agentic AI significantly changes the requirements for how organizations manage their data and thus their data quality (DQ). Because Agentic AI requires low-latency and real-time access to validated data, it’s imperative that data quality happens right at the source, not downstream, where most companies focus their DQ efforts today.

But validation alone isn’t enough. AI agents also need to understand whether data is truly fit for purpose in the context of their actions. This involves delivering contextual information about data health as metadata into catalogs and semantic layers that AI agents can access.

Only when trust and context are combined can AI agents operate responsibly and enterprises deploy them with real confidence.

Telmai has the unique ability to continuously validate, monitor, and enrich data with quality signals at the lake and can push that data quality metadata for consumption by agents. This creates the trusted foundation that autonomous AI products need to operate reliably and at scale.

With Telmai’s latest product launch, AI agents can continuously access reliable data and the critical data quality context needed to automate downstream workflows.

Real-Time, Continuous, Agentic AI-Ready Data

Telmai’s Data Reliability Agents ensures continuous validation, context, and governance across open lakehouses

At the core of this update is the introduction of Telmai’s MCP-compliant server, which enables LLM-powered agents like Claude, Bedrock, or Vertex to query Telmai directly. Telmai continuously validates data, whether structured, semi-structured, or unstructured. Additionally, it generates comprehensive data quality metadata alongside the validated data, providing essential context on data health to ensure the data is reliable and AI-ready. Through the MCP layer, AI agents can access and retrieve validated data and metadata into their agentic workflows, eliminating the need for third-party transformations or complex workarounds.

“In the era of model commoditization, true competitive advantage will emerge from trustworthy, dynamic, and contextually aware data,” said Sanjeev Mohan, industry analyst and principal at SanjMo. “Telmai’s latest release is a big step in this process. It offers continuous validation and contextual metadata that enable AI agents to act responsibly, while reducing the operational debt that has long hindered enterprise adoption.”

Natural Language AI Assistants & Decentralized Data Trust

Building on this foundation, Telmai is introducing a suite of AI assistants called Data Reliability Agents accessible through natural language interfaces, enabling both technical and non-technical users to interact directly with the platform. This decentralization means that ownership of data reliability no longer sits solely with engineering, accelerating time to value by making platform management and critical data quality insights accessible and actionable to all relevant stakeholders.

Autonomous Detection and Remediation

Telmai’s Data Reliability Agents enable autonomous detection and resolution of data anomalies. These intelligent agents continuously monitor data pipelines for irregularities and provide clear, plain-language explanations of root causes. Identifying and resolving complex data quality issues that once required deep technical expertise are now easily understood and addressed by both technical and business teams. Beyond detection, the Data Reliability Agents provide actionable recommendations and assist in generating data quality rules tailored to newly identified anomalies.

Furthermore, these Data Reliability Agents augment existing automated workflows, such as ticket creation and alert triggers, to help data teams proactively adapt and drive continuous improvement in their data quality processes.

This comprehensive approach closes the loop from detection through triage and remediation, ensuring that data being fed into the downstream processes is not only trustworthy but consistently ready for autonomous consumption and decision-making.

“As AI agents take the reins of decision-making, we believe autonomy should never come at the cost of reliability,” said Mona Rakibe, Co-founder & CEO of Telmai. “With these updates, Telmai is laying the groundwork for true intelligent automation and allowing enterprise data teams to shift their focus to driving measurable business value via Agentic AI.”

For more information or to learn more about Telmai’s Data Reliability Agents, request early access today.

Want to stay ahead on best practices and product insights? Click here to subscribe to our newsletter for expert guidance on building reliable, AI-ready data pipelines.

Embedding AI-Ready Observability in the Lakehouse: Lessons from Bill and ZoomInfo

Posted on July 31, 2025July 31, 2025 by Anoop Gopalam

As enterprises modernize toward AI-first architectures, trustworthy data pipelines have become a foundational requirement. At enterprise scale, the sheer velocity, variety, and complexity of evolving data ecosystems make it essential not just to deliver clean data, but to embed data observability deeply within lakehouse architectures. Without it, even the most sophisticated analytics or AI initiatives risk breaking under the weight of unreliable inputs.

At this year’s CDOIQ Symposium, Hasmik Sarkezians, SVP of Data Engineering at ZoomInfo, and Aindra Misra, Director of Product at Bill, joined Mona Rakibe, CEO of Telmai, for a candid panel discussion. Together, they shared hard-won insights on what it takes to operationalize real-time, proactive data observability in modern lakehouse environments—and why traditional, reactive approaches no longer meet the needs of today’s AI-driven enterprise.

Why Observability Can’t Be an Afterthought

Both Bill and ZoomInfo operate in high-velocity, high-stakes data environments. Bill powers mission-critical financial workflows for over 500,000 small businesses and 9,000+ accounting firms, with products spanning AP, AR, and spend management. ZoomInfo manages a complex pipeline of over 450 million contacts and 250 million companies, delivering enriched, AI-powered go-to-market intelligence to thousands of customers in real time.

In both cases, small data errors often snowball into systemic risks. For instance, at ZoomInfo, A misclassified company description that is used to infer industry, headcount, or revenue, if left unchecked, can ripple through downstream processes and undermine the accuracy of critical data products if left unchecked.As Hasmik Sarkezians, SVP of Data Engineering at ZoomInfo, put it:

A minor data issue can become a massive customer-facing problem if it slips through the cracks. Catching it at the source is 10x cheaper and 100x less painful.

Catching issues at the root, she emphasized, is far less costly than retroactively fixing the consequences after they’ve been exposed to customers.

Moving from Monitoring to Intelligent Action: Making Observability Actionable

Observability is often synonymous with an after-the-fact reporting function. But both Bill and ZoomInfo have pushed well beyond that model toward embedded, actionable observability that actively shapes how data flows through their systems.

At ZoomInfo, this shift has been architectural. Rather than automatically pushing updates from the source of truth to their customer-facing search platform, the data team now holds that data until it passes a battery of automated quality checks powered by Telmai. If anomalies are detected, a failure alert is sent via Slack, and the data is held back from publication until the issue is resolved.

“We prevent the bad data from being exposed to the customer,” explained Hasmik Sarkezians, SVP of Data Engineering at ZoomInfo, “we catch that before it’s even published.” Updated records now undergo anomaly detection and policy checks via DAGs, and only data that passes validation is published. If an issue is found, a failure alert is pushed into Slack, and the data is held for manual review or correction.

In one instance, a faulty proxy once caused a data source to generate null revenue values for a large portion of companies. “We already caught multiple issues,” said Hasmik, referencing one such case involving SEC data, “proxy had an issue [that] generated null values, and we didn’t consume it because we had this alert in place.”

The pipeline, equipped with Telmai rules and micro-batch DAGs, caught the anomaly before it could propagate to customers.

Meanwhile, at Bill, the platform team faced a familiar challenge: lean data engineering resources spread thin managing Great Expectations and ad hoc rule logic. With a growing number of internal and external data consumers—including AI agents, forecasting engines, and fraud models—the cost of manual triage became unsustainable.

At Bill, the driver was slightly different. Their lean engineering team had previously relied on open-source frameworks like Great Expectations, but the overhead of managing rule-based tests across dynamic datasets was increasingly unsustainable.

Our hope with Telmai is that we’ll improve operational efficiency for our teams… and scale data quality to analytics users as well, not just engineering. – Aindra Misra, Director of Product at Bill

By introducing anomaly detection, no-code interfaces, and out-of-the-box integrations, Bill aims to empower not just data engineers but business analysts to assess trustworthiness—without relying on custom rules or engineering intervention.

For both companies, this marks a step toward making observability not just visible, but actionable—and enabling faster, safer data product delivery as a result.

The Role of Open Architectures

Both Bill and ZoomInfo emphasized the centrality of open architectures, anchoring their platforms on Apache Iceberg to support scalable, AI-ready analytics across heterogeneous, rapidly evolving ecosystems.

ZoomInfo, in particular, has leaned into architectural openness to simplify access across its vast and distributed data estate that includes cloud platforms and legacy systems. “We’ve been at GCP, we have presence in AWS. We have data all over,” said Hasmik Sarkezians. To unify this complexity, ZoomInfo adopted Starburst on top of Iceberg. “It kind of democratized how we access the data and made our integration much easier.”

Bill echoed a similar philosophy. “For us, open architecture is a combination of three different components,” explained Aindra Misra. “The first one is… open data format. Second is industry standard protocols. And the third… is modular integration.” He highlighted Bill’s use of Iceberg and adherence to standardized protocols for syncing with external accounting systems—ensuring flexibility both within their stack and across third-party integrations.

This architectural philosophy carries important implications for observability. Rather than relying on closed systems or platform-specific solutions, both teams prioritized composability—selecting tools that integrate natively into their pipelines, query layers, and governance stacks. As Mona pointed out, interoperability was “literally table stakes” in ZoomInfo’s evaluation process: “Would we integrate with their today’s data architecture, future’s data architecture, past data systems?”

Observability, in these environments, must adapt—not disrupt. That means understanding Iceberg metadata natively, connecting easily to orchestration frameworks, and enabling cross-system validation without manual validation. In short, open data architectures demand open observability systems—ones built to meet organizations where their data lives.

This design philosophy lets teams keep pace with changing business and technical needs. In Hasmik’s words: “…for me it’s just democratization of… the quality process, the data itself, the data governance, all of that has to come together to tell a cohesive story.”

By rooting their approaches in open, flexible architectures, both companies have positioned themselves to scale trust and agility—making meaningful, system-wide observability possible as they pursue ever more advanced data and AI outcomes.

Organizational Lessons: Who Owns Data Quality?

Despite making significant technical strides, both panelists acknowledged that data quality ownership and building a culture around it remain a persistent challenge.

ZoomInfo tackled this by forming a dedicated Data Reliability Engineering (DRE) team was initially created to manage observability infrastructure and to onboard new data sets. However, as Hasmik Sarkezians explained, this model soon ran up against bottlenecks and scalability concerns:

“Currently, we have a very small team. We created a team around [Telmai], which is called the DRE, the data reliability engineers… It’s a semi-automatic way of onboarding new datasets… but it’s not really automatic and it’s not really easy to get the direct cause, so there’s a lot of efforts being done to automate all of that process.”

Recognizing these limitations, ZoomInfo is actively working to decentralize data quality responsibilities. The vision is to empower product and domain teams—not only centralized data reliability engineers—to set their own Telmai policies, receive alerts directly, and react quickly via Slack integrations or future natural language interfaces:

“For me, I think we need to make sure that the owner, the data set owner, can set up the Telmai alerts, would be reactive to those alerts, and will take action.”

At Bill, Aindra Misra described a similar challenge. Leaning too heavily on a small, expert engineering team created not just operational drag, but also strained handoffs and trust with analytics and business teams: “With the lean team… things get escalated and the overall trust between the handshake between internal teams like the platform engineering and analytics team—that trust loses.”

Their north star is to build an ecosystem where business analysts, Ops, GTM teams, and other data consumers have the direct context to check, understand, and act on data quality issues—without always waiting for engineering intervention.

In both organizations, it’s clear that tools alone aren’t enough. Ownership must be embedded into culture, process, and structure—with clearly defined SLAs, better cross-team handoffs, and systems that empower the people closest to the data to take accountability for its quality.

Toward AI-Ready Data Products

Both organizations are also preparing for a shift from analytics-driven to autonomous systems.

At Bill, internal applications are increasingly powered by insights and forecasts that must be accurate, explainable, and timely. Use cases like spend policy enforcement, invoice financing, and fraud detection rely on real-time decisions driven by data flowing through modern platforms like Iceberg. As Aindra Misra noted, delivering trust in this context is critical: “Trust is our mission—whether it’s external customers or internal teams, data SLAs need to be predictable and transparent.”

ZoomInfo, meanwhile, is layering AI copilots and signal-driven workflows on top of an extensive enrichment pipeline. As Hasmik Sarkezians explained earlier, a single issue in a base data set can cascade through derived fields—corrupting entity resolution, contact mapping, and ultimately customer-facing outputs.

In both environments, the stakes are rising. Poor data quality no longer just breaks dashboards—it can undermine automation, introduce risk, and erode customer trust. As Aindra put it:

Once the data goes into an AI… if the output of that AI application is not what you expect it to be, it’s very hard to trace it back to the exact data issue at the source… unless you observed it before it broke something.”

That’s why both organizations see observability not as a reporting tool, but as a foundational enabler of AI—instrumenting every stage of the pipeline to catch issues before they scale into system-wide consequences.

Final Thoughts: Trust Is Your Data Moat

As AI models and agentic workflows become commoditized, the true differentiator isn’t your algorithm. It’s the reliability of the proprietary data you feed into it.

For both Bill and ZoomInfo, embedding observability wasn’t just about operational hygiene. It was a strategic move to scale trust, protect business outcomes, and prepare their architectures for the demands of autonomous systems.

Here are a few key points and takeaways from this panel discussion –

Start Early. Shift Left: Observability works best when embedded at the data ingestion and pipeline layer, not added post-facto once problems reach dashboards or AI models.
Automate the Feedback Loop: Use tools that not only detect issues but can orchestrate action—blocking bad data, triggering alerts, and assigning ownership.
Democratize, Don’t Centralize: Give business and analytics teams accessible controls and visibility into data health, instead of relying solely on specialized teams.
Build for Change: Choose data observability platforms that support open standards, multi-cloud, and mixed data ecosystems—future-proofing your investments

Want to learn how Telmai can accelerate your AI initiatives with reliable and trusted data? Click here to connect with our team for a personalized demo.

Want to stay ahead on best practices and product insights? Click here to subscribe to our newsletter for expert guidance on building reliable, AI-ready data pipelines.