Introducing Telmai Data Observability Platform

Telmai can monitor data anywhere, not only after it lands in a data warehouse. And it is not limited to batch only, but also works for streaming data. So you can get alerted right away. Telmai’s engine is designed to support massive parallelism in data processing and won’t unnecessary load your critical infrastructure (e.g. data warehouse or operational database) while analyzing hundreds of sources and with thousands of attributes.
Telmai doesn’t require any prior setup, or configuration. Connect to your source and start monitoring right away. With Telmai you can connect to files on Cloud storage, Data warehouse, operational databases or even message bus to observe streams of data.

For example, let's connect to a BigQuery View. All it needs is the dataset name, table/view name and optional Id Attribute, if you have one, so Telmai can track the number of duplicate records:

Once connected, Telmai starts observing the data. It learns historic trends on the data and starts making predictions when an unexpected drift is detected.

In this instance there is a particularly bad batch, happened at 9:38AM on Jan 13. There are 18 alerts total across all attributes due to various drifts, including significant dip in volume of the data observed.
When this happens Telmai proactively notifies users via slack or email.

Telmai automatically tracks drifts in volume, completeness, uniqueness of values as well as shape of the data, e.g. patterns, distributions, lengths, special characters etc.

For each of such policies users can select a list of attributes to be monitored, whether or not to receive a notification and a channel for the notification, ex. email.
Users can see all information about alerts and violations in each of the attributes and drill down to see exactly what went wrong and get more details for their root cause analysis.

We can notice that one of the alerts was for a drift in Record Count, so it would be helpful to understand what was the historic trend and prediction based on that trend and what was the observed value:

By clicking on that alert we can now see that based on the historical data system expected to see significantly more records than it received and hence generated the alert.
Similarly with other types of drifts, Telmai can register a wide variety of anomalies and present them to users. Below we see that in the Agency attribute there were 4 different observations of drifts detected: unexpected drop in ratios of alphabetic values, jump in numeric values, as well as increase in mean number of tokens and special characters. All of these are signs of bad data influx.

And now we can understand why the system generated an alert by comparing predictions based on historical trend with the observations.

To verify that, Telmai also allows users to drill down and see the exact values which were detected as anomalous, saving users a lot of time and effort by not having to scramble through multiple systems and writing dozens of queries to figure it out. With a click of a button we can easily see a significant influx of bad data.

Telmai Investigator offers a variety of interactive tools for users for finding blind spots in the data and automatically suggests anomalies. These tools include pattern analyzer, anomaly scores and distribution analyzer.

In this picture we see each value get's scored on number of aspects. The lower the score (highlighted in red) the more anomalous the value is. Sorting data by various anomality aspects greatly helps in identifying blind spots in the data.

When users explore their data via Telmai’s automated tools, they gain insights and might find it useful to further refine the system by adding relevant expectations manually. For example, a user may add an expectation for the attribute Agency’s values to be of a 2-letter pattern and have a frequency count of at least 1000 as shown below:

Once such an expectation is set, users can specify an alert policy based on correctness level tolerance. For example, the user may want to be alerted whenever the correctness of such Agency’s expectation drops below 100% (see the illustration below). In this case, the user has zero tolerance for policy violation. Had the user chosen 95% instead, the violation tolerance is 5% and, thus, an alert will be issued only when the expectation is violated by at least 5% of the records.

Next time data is processed, Telmai will apply newly defined correctness policy and alert users when number of correct values is less than was expected:

As a summary, Telmai provides a powerful AI engine for automated detection of anomalies in the data without requiring any prior configuration or setup. It also powers the users with the ability to refine, and further tune the system, without any coding requirements. This allows for unlocking this conveniently useful platform for broader and more diverse data teams.
You can sign up for a demo and see it for yourself.
Data profiling helps organizations understand their data, identify issues and discrepancies, and improve data quality. It is an essential part of any data-related project and without it data quality could impact critical business decisions, customer trust, sales and financial opportunities.
To get started, there are four main steps in building a complete and ongoing data profiling process:
We'll explore each of these steps in detail and discuss how they contribute to the overall goal of ensuring accurate and reliable data. Before we get started, let's remind ourself of what is data profiling.
1. Data Collection
Start with data collection. Gather data from various sources and extract it into a single location for analysis. If you have multiple sources, choose a centralized data profiling tool (see our recommendation in the conclusion) that can easily connect and analyze all your data without having you do any prep work.
2. Discovery & Analysis
Now that you have collected your data for analysis, it's time to investigate it. Depending on your use case, you may need structure discovery, content discovery, relationship discovery, or all three. If data content or structure discovery is important for your use case, make sure that you collect and profile your data in its entirety and do not use samples as it will skew your results.
Use visualizations to make your discovery and analysis more understandable. It is much easier to see outliers and anomalies in your data using graphs than in a table format.
3. Documenting the Findings
Create a report or documentation outlining the results of the data profiling process, including any issues or discrepancies found.
Use this step to establish data quality rules that you may not have been aware of. For example, a United States ZIP code of 94061 could have accidentally been typed in as 94 061 with a space in the middle. Documenting this issue could help you establish new rules for the next time you profile the data.
4. Data Quality Monitoring
Now that you know what you have, the next step is to make sure you correct these issues. This may be something that you can correct or something that you need to flag for upstream data owners to fix.
After your data profiling is done and the system goes live, your data quality assurance work is not done – in fact, it's just getting started.
Data constantly changes. If unchecked, data quality defects will continue to occur, both as a result of system and user behavior changes.
Build a platform that can measure and monitor data quality on an ongoing basis.
Take Advantage of Data Observability Tools
Automated tools can help you save time and resources and ensure accuracy in the process.
Unfortunately, traditional data profiling tools offered by legacy ETL and database vendors are complex and require data engineering and technical skills. They also only handle data that is structured and ready for analysis. Semi-structured data sets, nested data formats, blob storage types, or streaming data do not have a place in those solutions.
Today organizations that deal with complex data types or large amounts of data are looking for a newer, more scalable solution.
That’s where a data observability tool like Telmai comes in. Telmai is built to handle the complexity that data profiling projects are faced with today. Some advantages include centralized profiling for all data types, a low-code no-code interface, ML insights, easy integration, and scale and performance.
Data Observability
Data Quality
Leverages ML and statistical analysis to learn from the data and identify potential issues, and can also validate data against predefined rules
Uses predefined metrics from a known set of policies to understand the health of the data
Detects, investigates the root cause of issues, and helps remediate
Detects and helps remediate.
Examples: continuous monitoring, alerting on anomalies or drifts, and operationalizing the findings into data flows
Examples: data validation, data cleansing, data standardization
Low-code / no-code to accelerate time to value and lower cost
Ongoing maintenance, tweaking, and testing data quality rules adds to its costs
Enables both business and technical teams to participate in data quality and monitoring initiatives
Designed mainly for technical teams who can implement ETL workflows or open source data validation software
Start your data observibility today
Connect your data and start generating a baseline in less than 10 minutes.
No sales call needed
Start your data observability today
Connect your data and start generating a baseline in less than 10 minutes.