The reason is that in the past, companies deployed applications on-premise and, more importantly, the applications often used the same physical database server. This means that both applications and data were fully controlled and directly accessible by the corporate IT department.
As a result, the company’s analysts were able to work with the database directly, which made the acquisition of analytical data straightforward and easy.
Today’s world of data acquisition is not so simple. All apps are cloud based, and SaaS is the dominant type of app. Data has been moved to walled gardens and SaaS companies don’t provide direct access to DB for a variety of reasons, from security to competition.
In the past, analysts worked with one program with one SQL dialect with a server operating trouble-free in the local network. Now they have to work with dozens of completely different APIs, with incompatible data formats and different semantics.
To cope with a zoo of external data sources, companies started to employ teams of data engineers. Data engineers develop scripts which download and transform data. Some of the tasks performed by data engineers were productized as ETL tools SaaS.
Data Pipeline in Modern Companies
Data Analysts and Data Scientists are consumers of the data process. Data Analysts prepare metrics for people to make decisions; Data Scientists prepare figures for algorithms to make decisions.
ETL tools help with the first two steps of the data process.
What ETL Tools Do
ETL is an acronym that stands for Extract, Transform, Load.
Extract: The ETL service extracts data from SaaS apps and uploads it to cloud- based warehouses. The service monitors any changes in API so the client doesn’t have to take care of it.
Transform: The service automatically initiates SQL scripts which are able to supplement raw data.
Load: The data is transformed into a single schema. This allows analysts to do their job instead of identifying different field names and data formats. After that, the data is uploaded to the data warehouse.
Examples of how ETL tools operate
A marketing manager works with two ad networks. The manager wants to generate a single report with the figures for both ad networks.
The data is locked in the ad network API. The ETL tool connects to the API (Extract) and downloads data to data warehouse tables or spreadsheets (Load).
Now, the marketing manager can use an SQL query or formulas in the spreadsheet to generate a report for two data sources at once.
At this stage, simple ETL tools give up.
But with many data sources, a new problem comes to the forefront: incompatible data schemas. SaaS products name fields in different ways, e.g. for Spend and Cost. It is difficult to consider these aspects in SQL. This is why advanced ETL tools can save data according to a single schema (Transform).
Now data similar in nature is in the same table. This makes working with SQL expressions so much easier.
Defining the ETL Landscape
There are a variety of ETL tools available on the market. Here are some of them: