Technology and Architectural pillars of Telmai

Both Max and I have been very clear that we wanted to build Telmai for the DataOps teams, specifically data engineers. Our objective is to empower these highly skilled data engineers with the right set of tools for data observability.
Once we were clear on the user persona and the problem we wanted to solve, our next set of decisions were around the technology foundation for Telmai. We went through a series of discussions around topics like open source, open core, the SaaS model, and the criticality of seamless integrations into data pipelines.
Giving you all a peek into some of our key decisions.

Software as a Service(SaaS)
Typically monitoring software is very resource-intensive, especially when ingestion rates reach millions of data points or records per second and experience significant unpredictable spikes in the volume. Handling enterprise-grade security, fast auto-scaling, throttling, retries are additional overhead for data engineers who are already dealing with highly complex data systems. Our experience in designing such highly secured systems in an efficient manner can eliminate this overhead from the data engineering team.
The SaaS model also gives us an opportunity for continuous improvement of our AI models.
Low configuration
To understand if you have problems with your data, you need superior monitoring to detect outliers. This is traditionally addressed in data quality systems using rules, however, rules are fragile, hard to develop and they can only discover what you already know. We want to tell you what you don't know and should know.
Advanced ML models significantly reduce the time and effort to get value and also adapt to constantly evolving data. At the same time, you might have validation logic that relies on business rules. Such rules are typically well understood and robust. In such cases augmenting your rules with ML makes our system even more powerful.
Simplicity of integration
Last but not the least, whether your pipeline reads files from GCS or S3, or a data warehouse like BigQuery, Redshift or Snowflake, or even process records with Spark or Dataflow - we want your integration with Telmai to be as seamless as possible and we will provide both client libraries and REST APIs to satisfy any type of integration. All without adding to the latencies of your pipeline.
You will notice that the primary design principle for Telmai’s architecture and technology is to provide the best developer experience possible.
To summarize we will have :
- Secured cloud platform to reduce infrastructure planning and maintenance overhead
- Advanced ML models for robust and scalable anomaly detection
- The simplicity of integration to reduce time to value
Now how will we do all of this? That is the magic of “Telmai”.
#dataobservability #dataquality #dataops
Data profiling helps organizations understand their data, identify issues and discrepancies, and improve data quality. It is an essential part of any data-related project and without it data quality could impact critical business decisions, customer trust, sales and financial opportunities.
To get started, there are four main steps in building a complete and ongoing data profiling process:
We'll explore each of these steps in detail and discuss how they contribute to the overall goal of ensuring accurate and reliable data. Before we get started, let's remind ourself of what is data profiling.
1. Data Collection
Start with data collection. Gather data from various sources and extract it into a single location for analysis. If you have multiple sources, choose a centralized data profiling tool (see our recommendation in the conclusion) that can easily connect and analyze all your data without having you do any prep work.
2. Discovery & Analysis
Now that you have collected your data for analysis, it's time to investigate it. Depending on your use case, you may need structure discovery, content discovery, relationship discovery, or all three. If data content or structure discovery is important for your use case, make sure that you collect and profile your data in its entirety and do not use samples as it will skew your results.
Use visualizations to make your discovery and analysis more understandable. It is much easier to see outliers and anomalies in your data using graphs than in a table format.
3. Documenting the Findings
Create a report or documentation outlining the results of the data profiling process, including any issues or discrepancies found.
Use this step to establish data quality rules that you may not have been aware of. For example, a United States ZIP code of 94061 could have accidentally been typed in as 94 061 with a space in the middle. Documenting this issue could help you establish new rules for the next time you profile the data.
4. Data Quality Monitoring
Now that you know what you have, the next step is to make sure you correct these issues. This may be something that you can correct or something that you need to flag for upstream data owners to fix.
After your data profiling is done and the system goes live, your data quality assurance work is not done – in fact, it's just getting started.
Data constantly changes. If unchecked, data quality defects will continue to occur, both as a result of system and user behavior changes.
Build a platform that can measure and monitor data quality on an ongoing basis.
Take Advantage of Data Observability Tools
Automated tools can help you save time and resources and ensure accuracy in the process.
Unfortunately, traditional data profiling tools offered by legacy ETL and database vendors are complex and require data engineering and technical skills. They also only handle data that is structured and ready for analysis. Semi-structured data sets, nested data formats, blob storage types, or streaming data do not have a place in those solutions.
Today organizations that deal with complex data types or large amounts of data are looking for a newer, more scalable solution.
That’s where a data observability tool like Telmai comes in. Telmai is built to handle the complexity that data profiling projects are faced with today. Some advantages include centralized profiling for all data types, a low-code no-code interface, ML insights, easy integration, and scale and performance.
Start your data observibility today
Connect your data and start generating a baseline in less than 10 minutes.
No sales call needed
Start your data observability today
Connect your data and start generating a baseline in less than 10 minutes.