Advertising and marketing Analytics on the Databricks Lakehouse utilizing Fivetran and dbt



Advertising and marketing groups use many alternative platforms to drive advertising and marketing and gross sales campaigns, which might generate a major quantity of helpful however disconnected information. Bringing all of this information collectively may help to drive a big return on funding, as proven by Publicis Groupe who have been in a position to enhance marketing campaign income by as a lot as 50%.

The Databricks Lakehouse, which unifies information warehousing and AI use circumstances on a single platform, is the best place to construct a advertising and marketing analytics resolution: we keep a single supply of fact, and unlock AI/ML use circumstances. We additionally leverage two Databricks associate options, Fivetran and dbt, to unlock a variety of selling analytics use circumstances together with churn and lifelong worth evaluation, buyer segmentation, and advert effectiveness.

Fivetran and dbt can learn and write to Delta Lake utilizing a Databricks cluster or Databricks SQL warehouse

Fivetran means that you can simply ingest information from 50+ advertising and marketing platforms into Delta Lake with out the necessity for constructing and sustaining complicated pipelines. If any of the advertising and marketing platforms’ APIs change or break, Fivetran will handle updating and fixing the integrations so your advertising and marketing information retains flowing in.

dbt is a well-liked open supply framework that lets lakehouse customers construct information pipelines utilizing easy SQL. Every part is organized inside directories, as plain textual content, making model management, deployment, and testability easy. As soon as the info is ingested into Delta Lake, we use dbt to rework, take a look at and doc the info. The remodeled advertising and marketing analytics information mart constructed on prime of the ingested information is then prepared for use to assist drive new advertising and marketing campaigns and initiatives.

Each Fivetran and dbt are part of Databricks Accomplice Join, a one-stop portal to find and securely join information, analytics and AI instruments instantly inside the Databricks platform. In just some clicks you possibly can configure and join these instruments (and plenty of extra) instantly from inside your Databricks workspace.

The right way to construct a advertising and marketing analytics resolution

On this hands-on demo, we’ll present the right way to ingest Marketo and Salesforce information into Databricks utilizing Fivetran after which use dbt to rework, take a look at, and doc your advertising and marketing analytics information mannequin.

All of the code for the demo is offered on Github within the workflows-examples repository.

dbt lineage graph showing data sources and models
dbt lineage graph displaying information sources and fashions

The ultimate dbt mannequin lineage graph will appear like this. The Fivetran supply tables are in inexperienced on the left and the ultimate advertising and marketing analytics fashions are on the best. By deciding on a mannequin, you possibly can see the corresponding dependencies with the completely different fashions highlighted in purple.

Information ingestion utilizing Fivetran

Fivetran has many marketing analytics data source connectors
Fivetran has many advertising and marketing analytics information supply connectors

Create new Salesforce and Marketo connections in Fivetran to begin ingesting the advertising and marketing information into Delta Lake. When creating the connections Fivetran may even robotically create and handle a schema for every information supply in Delta Lake. We’ll later use dbt to rework, clear and combination this information.

Define a destination schema in Delta Lake for the Salesforce data source
Outline a vacation spot schema in Delta Lake for the Salesforce information supply

For the demo identify the schemas that will likely be created in Delta Lake marketing_salesforce and marketing_marketo. If the schemas don’t exist Fivetran will create them as a part of the preliminary ingestion load.

Select which data source objects to synchronize as Delta Lake tables
Choose which information supply objects to synchronize as Delta Lake tables

You’ll be able to then select which objects to sync to Delta Lake, the place every object will likely be saved as particular person tables. Fivetran additionally makes it easy to handle and examine what columns are being synchronized for every desk:

Fivetran monitoring dashboard to monitor monthly active rows synchronized
Fivetran monitoring dashboard to watch month-to-month energetic rows synchronized

Moreover, Fivetran supplies a monitoring dashboard to investigate what number of month-to-month energetic rows of knowledge are synchronized each day and month-to-month for every desk, amongst different helpful statistics and logs.

Information modeling utilizing dbt

Now that every one the advertising and marketing information is in Delta Lake, you should use dbt to create your information mannequin by following these steps

Setup dbt mission regionally and hook up with Databricks SQL

Arrange your native dbt improvement atmosphere in your chosen IDE by following the set-up directions for dbt Core and dbt-databricks.

Scaffold a brand new dbt mission and hook up with a Databricks SQL Warehouse utilizing dbt init, which can ask for following info.

$ dbt init
Enter a reputation on your mission (letters, digits, underscore): 
Which database would you want to make use of?
[1] databricks
[2] spark

Enter a quantity: 1
host ( 
http_path (HTTP Path): 
schema (default schema that dbt will construct objects in): 
threads (1 or extra) [1]: 

After getting configured the profile you possibly can take a look at the connection utilizing:

$ dbt debug

Set up Fivetran dbt mannequin packages for staging

Step one in utilizing the Marketo and Salesforce information is to create the tables as sources for our mannequin. Fortunately, Fivetran has made this straightforward to stand up and working with their pre-built Fivetran dbt mannequin packages. For this demo, let’s make use of the marketo_source and salesforce_source packages.

To put in the packages simply add a packages.yml file to the foundation of your dbt mission and add the marketo-source, salesforce-source and the fivetran-utils packages:

  - bundle: dbt-labs/spark_utils
    model: 0.3.0
  - bundle: fivetran/marketo_source
    model: [">=0.7.0", "=0.4.0", "

To download and use the packages run

 $ dbt deps

You should now see the Fivetran packages installed in the packages folder.

Update dbt_project.yml for Fivetran dbt models

There are a few configs in the dbt_project.yml file that you need to modify to make sure the Fivetran packages work correctly for Databricks.

The dbt_project.yml file can be found in the root folder of your dbt project.

spark_utils overriding dbt_utils macros

The Fivetran dbt models make use of macros in the dbt_utils package but some of these macros need to be modified to work with Databricks which is easily done using the spark_utils package.

It works by providing shims for certain dbt_utils macros which you can set using the dispatch config in the dbt_project.yml file and with this dbt will first search for macros in the spark_utils package when resolving macros from the dbt_utils namespace.

 - macro_namespace: dbt_utils
   search_order: ['spark_utils', 'dbt_utils']

Variables for the marketo_source and salesforce_source schemas

The Fivetran packages require you to outline the catalog (known as database in dbt) and schema of the place the info lands when being ingested by Fivetran.

Add these variables to the dbt_project.yml file with the right catalog and schema names. The default catalog is hive_metastore which will likely be used if _database is left clean. The schema names will likely be what you outlined when creating the connections in Fivetran.

   marketo_database: # depart clean to make use of the default hive_metastore catalog
   marketo_schema: marketing_marketo
   salesforce_database: # depart clean to make use of the default hive_metastore catalog
   salesforce_schema: marketing_salesforce

Goal schema for Fivetran staging fashions

To keep away from all of the staging tables which might be created by the Fivetran supply fashions being created within the default goal schema it may be helpful to outline a separate staging schema.

Within the dbt_project.yml file add the staging schema identify and this may then be suffixed to the default schema identify.

   +schema: your_staging_name # depart clean to make use of the default target_schema
   +schema: your_staging_name # depart clean to make use of the default target_schema

Primarily based on the above, in case your goal schema outlined in profiles.yml is mkt_analytics, the schema used for marketo_source and salesforce_source tables will likely be mkt_analytics_your_staging_name.

Disable lacking tables

At this stage you possibly can run the Fivetran mannequin packages to check that they work accurately.

dbt run –choose marketo_source
dbt run –choose salesforce_source

If any of the fashions fail as a result of lacking tables, since you selected to not sync these tables in Fivetran, then in your supply schema you possibly can disable these fashions by updating the dbt_project.yml file.

For instance if the e-mail bounced and e mail template tables are lacking from the Marketo supply schema you possibly can disable the fashions for these tables by including the next underneath the fashions config:

   +schema: your_staging_name 
       +enabled: false
       +enabled: false
     +enabled: false
     +enabled: false

Creating the advertising and marketing analytics fashions

dbt lineage graph showing the star schema and aggregate tables data model
dbt lineage graph displaying the star schema and combination tables information mannequin

Now that the Fivetran packages have taken care of making and testing the staging fashions you possibly can start to develop the info fashions on your advertising and marketing analytics use circumstances which will likely be a star schema information mannequin together with materialized combination tables.

For instance, for the primary advertising and marketing analytics dashboard, it's possible you'll need to see how engaged sure firms and gross sales areas are by the variety of e mail campaigns they've opened and clicked.

To take action, you possibly can be a part of Salesforce and Marketo tables utilizing the Salesforce person e mail, Salesforce account_id and Marketo lead_id.

The fashions will likely be structured underneath the mart folder within the following means.

|-- dbt_project.yml
|-- packages.yml
|-- fashions
      |-- mart
             |-- core
             |-- intermediate
             |-- marketing_analytics

You'll be able to view the code for all of the fashions on Github within the /fashions/mart listing and beneath describes what's in every folder together with an instance.

Core fashions

The core fashions are the details and dimensions tables that will likely be utilized by all downstream fashions to construct upon.

The dbt SQL code for the dim_user mannequin

with salesforce_users as (
       e mail
   from {{ ref('stg_salesforce__user') }}
   the place e mail will not be null and account_id will not be null
marketo_users as (
       e mail
   from {{ ref('stg_marketo__lead') }}
joined as (
   from salesforce_users
     left be a part of marketo_users
     on salesforce_users.e mail = marketo_users.e mail

choose * from joined

You can too add documentation and exams for the fashions utilizing a yaml file within the folder.

There are 2 easy exams within the core.yml file which have been added

model: 2

 - identify: dim_account
   description: "The Account Dimension Desk"
     - identify: account_id
       description: "Major key"
         - not_null
 - identify: dim_user
   description: "The Consumer Dimension Desk"
     - identify: lead_id
       description: "Major key"
         - not_null

Intermediate fashions

A few of the ultimate downstream fashions might depend on the identical calculated metrics and so to keep away from repeating SQL you possibly can create intermediate fashions that may be reused.

The dbt SQL code for int_email_open_clicks_joined mannequin:

with opens as (
	choose * 
	from {{ ref('fct_email_opens') }} 
), clicks as (
	choose * 
	from {{ ref('fct_email_clicks') }} 
), opens_clicks_joined as (

      o.lead_id as lead_id,
      o.campaign_id as campaign_id,
      o.email_send_id as email_send_id,
      o.activity_timestamp as open_ts,
      c.activity_timestamp as click_ts
    from opens as o 
      left be a part of clicks as c 
      on o.email_send_id = c.email_send_id
      and o.lead_id = c.lead_id


choose * from opens_clicks_joined

Advertising and marketing Analytics fashions

These are the ultimate advertising and marketing analytics fashions that will likely be used to energy the dashboards and experiences utilized by advertising and marketing and gross sales groups.

The dbt SQL code for country_email_engagement mannequin:

with accounts as (
	from {{ ref('dim_account') }}
), customers as (
	from {{ ref('dim_user') }} 
), opens_clicks_joined as (

    choose * from {{ ref('int_email_open_clicks_joined') }} 

), joined as (

	choose * 
	from customers as u
	left be a part of accounts as a
	on u.account_id = a.account_id
	left be a part of opens_clicks_joined as oc
	on u.lead_id = oc.lead_id


	billing_country as nation,
	rely(open_ts) as opens,
	rely(click_ts) as clicks,
	rely(click_ts) / rely(open_ts) as click_ratio
from joined
group by nation

Run and take a look at dbt fashions

Now that your mannequin is prepared you possibly can run all of the fashions utilizing

$ dbt run

After which run the exams utilizing

$ dbt take a look at

View the dbt docs and lineage graph

dbt lineage graph for the marketing analytics model
dbt lineage graph for the advertising and marketing analytics mannequin

As soon as your fashions have run efficiently you possibly can generate the docs and lineage graph utilizing

$ dbt docs generate

To then view them regionally run

$ dbt docs serve

Deploying dbt fashions to manufacturing

After getting developed and examined your dbt mannequin regionally you might have a number of choices for deploying into manufacturing one among which is the brand new dbt job kind in Databricks Workflows (non-public preview).

Your dbt mission ought to be managed and model managed in a Git repository. You'll be able to create a dbt job in your Databricks Workflows job pointing to the Git repository.

Using a dbt task type in Databricks Workflows to orchestrate dbt
Utilizing a dbt job kind in Databricks Workflows to orchestrate dbt

As you're utilizing packages in your dbt mission the primary command ought to be dbt deps adopted by dbt run for the primary job after which dbt take a look at for the following job.

Databricks Workflows job with two dependant dbt tasks
Databricks Workflows job with two dependant dbt duties

You'll be able to then run the workflow instantly utilizing run now and in addition arrange a schedule for the dbt mission to run on a specified schedule.

Viewing the dbt logs for each dbt run
Viewing the dbt logs for every dbt run

For every run you possibly can see the logs for every dbt command serving to you debug and repair any points.

Powering your advertising and marketing analytics with Fivetran and dbt

As proven right here utilizing Fivetran and dbt alongside the Databricks Lakehouse Platform means that you can simply construct a strong advertising and marketing analytics resolution that's straightforward to set-up, handle and versatile sufficient to swimsuit any of your information modeling necessities.

To get began with constructing your individual resolution go to the documentation for integrating Fivetran and dbt with Databricks and re-use the marketing_analytics_demo mission instance to rapidly get began.

The dbt job kind in Databricks Workflows is in non-public preview. To strive the dbt job kind, please attain out to your Databricks account govt.



Please enter your comment!
Please enter your name here