Missed features in Existing ETL Tools

Re-shaping of data


The existing tools download data from a variety of sources, transform it into a single schema, and upload it to the data warehouse.

Let’s have a look at a use case. An organization is conducting an ad campaign on Facebook to bring visitors to its e-commerce store. A user makes a purchase in the store and the store accepts the user’s credit card via Stripe. So, advertising cost data is in Facebook and revenue data is in Stripe.

The Marketing Director wants to calculate ROI for the ad campaign.

Simply adding rows from different systems in one table won’t help. In this case, there should be a row for each visitor, with two columns: advertising costs and revenue.

The rows in the original tables should be converted into columns. Moreover, the rows have to be matched with each other.


This conversion is called data reshaping.

ETL services can’t do such conversions. They delegate this task to users.

To perform the task, the user has to buy a Business Intelligence app. Examples of these types of apps are Looker or Mode. The user is expected to use SQL and do the conversions using SQL.

However, not all users want to work with SQL. Those who know how to work with SQL will face new problems.

For two rows to be linked properly, the rows have to have the so-called key, that is, a field with the same value. It can be a user ID or the name of an ad campaign. An SQL query finds the rows with the same key and aggregates them.

That’s the theory. In reality, platform developers (Google, Facebook, Stripe) don’t care about a uniform key. Add the human factor, and the result is data running into double percentage figures that can’t be linked with each other. No one trusts this incomplete data.

Mismatched Data and Measurement Granularities


Mismatched granularities are a massive problem.

A platform is represented by data with different levels of detail. E.g. there is a hierarchy with several levels, such as campaign, adset, and ad, for Facebook ads. Previously, the hierarchy for Google ads has just one level, campaign.

The ad network provides a report on advertising costs broken down by countries and campaigns; CRM, by countries only. But clients want to see a report with costs broken down by both campaigns and countries.

The advertising manager wants to distribute revenues in proportion to ad impressions made.


But ETL and BI tools don’t offer a solution to this problem. Clients have to employ data engineers to create a code to approximate one report out of two different reports.

Cohorts


Most SaaS apps are optimized for time-delayed events, e.g. for a 3 Days of Active Use event.

Therefore 99% of marketing experts use cohort analysis to assess ad campaign performance.

A cohort is all visitors brought by a specific ad campaign on a specific day. For each such cohort, we build metrics, e.g., Revenue Day 1, Revenue Day 7 or Revenue Month 6.

When the advertising manager compares two ad campaigns, they look at a cohort metric, not at the total revenue, which varies depending on the campaign’s life cycle.

It is very inconvenient to calculate cohort metrics using SQL. In fact, it is practically impossible.