What's new in Telmai

What's new in Telmai
Max Lukichev

I’m excited to share some of the latest features we have been working on at Telmai  

We have added some fascinating functionality to our product this release like,

  • Automatic ML-based thresholds
  • Support of Semi-Structured Data (JSON)
  • New Integrations: Snowflake, Firebolt
  • Change Data Control for SQL sources and Cloud data storage
  • Data metric segmentation
  • New Data Metrics: Table level metrics + distribution drifts

Support of Semi-Structured Data

Most modern cloud data warehouses and data lakes now support semi-structured schemas. Data architects are leveraging this structure to design the most efficient data model for storage and querying. Providing quality metrics and KPIs that are aware of such systems is crucial for establishing accurate observability outcomes.
Hence, now Telmai can monitor not only flat data but also files and Data Warehouse tables with semi-structured schema (i.e., nested and multi-valued attributes.). Telmai is designed to support complete analysis on complex data with thousands of attributes without any impact on performance. 

New Integrations: Snowflake, Firebolt

We added support for Azure Blob, Snowflake, and Firebolt in addition to BigQuery, CloudStorage, S3 and local files.

All SQL sources now support both flat and semi-structured schemas and can be configured for Change Data Capture(CDC).

Stay tuned for a separate blog on these integartions.

Change Data Capture(CDC) support

Telmai provides a way to schedule runs with specific periodicity, i.e., hourly, daily, or weekly for any source. Additionally, you can configure Telmai to process and monitor only the portion of the data (delta) which changed between the runs. For Cloud storage, the delta is determined via file metadata. For SQL sources, users can specify an attribute holding records creation/update timestamps, which is then used to read freshly changed records.

CDC support is additional to the full database and table analysis i.e we can provide metrics and alerts on total data as well changed data.

This powerful functionality will enable users to review trends and drifts holistically for the entire data set and only for changed data.

Data Metric and threshold segmentation

Data in the same table often needs to be analyzed and monitored separately as the trends may vary based on specific dimensions, like different customers or geographic regions. 

From this release, Telmai allows users to specify this dimension in the data source, enabling both holistic metric and segmented analysis.

Automatic ML-based Thresholds

ML driven automatic thresholds

With our low-code no-code approach, Telmai calculates thresholds for each data metric on your dataset. These ML-based thresholds are now enhanced to evolve with your data without any configurations.

Telmai will automatically establish and predict trends over key metrics, like % of non-null/empty values, number of records, % of unique values and many more. When an observed value of a metric is outside of prediction boundaries Telmai will issue an alert and send a notification to subscribers.

New Metrics: Distribution Drifts

In addition to various data metric drifts like record count, completeness etc., Telmai can automatically detect unexpected changes in distributions of categorical data or changes in the distributions of value patterns:

Often categorical data is used to understand the segmentation of business, and sudden drift in such distributions could have a direct potential impact. With this automatic alert, data teams can be proactively aware of such drifts to investigate before any business impact.

All of the above and much more functionality has been added to our product. If you would like to learn more about these features and how they apply in your usecase, feel free to schedule a demo using this link

Data profiling helps organizations understand their data, identify issues and discrepancies, and improve data quality. It is an essential part of any data-related project and without it data quality could impact critical business decisions, customer trust, sales and financial opportunities. 

To get started, there are four main steps in building a complete and ongoing data profiling process:

  1. Data Collection
  2. Discovery & Analysis
  3. Documenting the Findings
  4. Data Quality Monitoring

We'll explore each of these steps in detail and discuss how they contribute to the overall goal of ensuring accurate and reliable data. Before we get started, let's remind ourself of what is data profiling.

What are the different kinds of data profiling?

Data profiling falls into three major categories: structure discovery, content discovery, and relationship discovery. While they all help in gaining more understanding of the data, the type of insights they provide are different:

 

Structure discovery analyzes that data is consistent, formatted correctly, and well structured. For example, if you have a ‘Date’ field, structure discovery helps you see the various patterns of dates (e.g., YYYY-MM-DD or YYYY/DD/MM) so you can standardize your data into one format.

 

Structure discovery also examines simple and basic statistics in the data, for example, minimum and maximum values, means, medians, and standard deviations.

 

Content discovery looks more closely into the individual attributes and data values to check for data quality issues. This can help you find null values, empty fields, duplicates, incomplete values, outliers, and anomalies.

 

For example, if you are profiling address information, content discovery helps you see whether your ‘State’ field contains the two-letter abbreviation or the fully spelled out city names, both, or potentially some typos.

 

Content discovery can also be a way to validate databases with predefined rules. This process helps find ways to improve data quality by identifying instances where the data does not conform to predefined rules. For example, a transaction amount should never be less than $0.

 

Relationship discovery discovers how different datasets are related to each other. For example, key relationships between database tables, or lookup cells in a spreadsheet. Understanding relationships is most critical in designing a new database schema, a data warehouse, or an ETL flow that requires joining tables and data sets based on those key relationships.

Stay in touch

Stay updated with our progress. Sign up now

By subscribing you agree to with our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Stay in touch

Stay updated with our progress. Sign up now

By subscribing you agree to with our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data Observability
Data Quality

Leverages ML and statistical analysis to learn from the data and identify potential issues, and can also validate data against predefined rules

Uses predefined metrics from a known set of policies to understand the health of the data

Detects, investigates the root cause of issues, and helps remediate

Detects and helps remediate.

Examples: continuous monitoring, alerting on anomalies or drifts, and operationalizing the findings into data flows

Examples: data validation, data cleansing, data standardization

Low-code / no-code to accelerate time to value and lower cost

Ongoing maintenance, tweaking, and testing data quality rules adds to its costs

Enables both business and technical teams to participate in data quality and monitoring initiatives

Designed mainly for technical teams who can implement ETL workflows or open source data validation software

Stay in touch

Stay updated with our progress. Sign up now

By subscribing you agree to with our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Start your data observibility today

Connect your data and start generating a baseline in less than 10 minutes. 

No sales call needed

Stay in touch

Stay updated with our progress. Sign up now

By subscribing you agree to with our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Start your data observability today

Connect your data and start generating a baseline in less than 10 minutes. 

Telmai is a platform for the Data Teams to proactively detect and investigate anomalies in real-time.
© 2023 Telm.ai All right reserved.