Data Wiki | Data Management


01 | What is a Data warehouse?


A data warehouse is a central repository for all data that is collected in an organization's business systems. Data can be extracted, transformed, and loaded (ETL) or extracted and loaded into a warehouse which then supports reporting, analytics and mining on this extracted and curated data.

02 | What is a Data Lake?

A data lake is a storage repository that holds large amounts of raw data in native format until it is needed. Data lakes address the shortcomings of data warehouses in two ways. First, the data can be stored in structured, semi-structured, or unstructured format. Second, the data schema is decided upon reading, rather than loading or writing data and it can be changed for more agility.


03 | What is a Data Lake House?

A data lake house is a data solution concept that combines the best elements of the data warehouse with those of the data lake. Data lake-houses implement data warehouses’ data structures and management features of data lakes, which are typically more cost-effective for data storage. A Data lakehouse has proven to be quite useful to data scientists as they enable machine learning and business intelligence.


04 | What is a data catalog?

Gartner definition: A data catalog maintains an inventory of data assets through the discovery, description, and organization of datasets. The catalog provides context to enable data analysts, data scientists, data stewards, and other data consumers to find and understand a relevant dataset for the purpose of extracting business value.


05 | What is a Data dictionary?

A Data Dictionary is a collection of names, definitions, and attributes about data elements that are used or captured in a database, information system, or part of a research project. It describes the meanings and purposes of data elements within the context of a project and provides guidance on interpretation, accepted meanings, and representation. A Data Dictionary also provides metadata about data elements.


06 | What is metadata management?

Metadata Management is an organization-wide agreement on how to describe information assets. 

With a radical shift in the amounts of data metadata management is critical to help derive the right data as quickly as possible, to increase the ROI on data.


07 | What is data lineage?

Data lineage traces the transformations that a dataset has gone through since the time of origination.  It describes a certain dataset’s origin, movement, characteristics, and quality.

Data lineage gives visibility while greatly simplifying the ability to trace errors back to the root cause in a data analytics process.


08 | What is ETL?

ETL is a 3 step process of extracting data from one or more sources, transforming it into a structure that is geared for its business use, and loaded or stored into a data storage system like a data warehouse, for further use.


09 | What is ELT?

ELT is a modern alternative to ETL for massive amounts of data, where data is extracted from multiple sources, loaded once into a central data repository like a data lake, and transformed as needed by BI tools, allowing for timely access, scalability and flexibility.


10 | What is data governance?

Data governance is a set of principles and practices that ensure high quality through the complete lifecycle of your data. According to the Data Governance Institute (DGI), it is a practical and actionable framework to help a variety of data stakeholders across any organization identify and meet their information needs.