Hubspot Lead Analytics in SQL, Beyond CRM Reporting

This blog post is the first of a two-part series. The first part will focus on Hubspot lead analytics and will equip you with the tools you need to go beyond standard Customer Relationship Management (CRM) reporting. Part 2 will go over DATEDIFF and how to create time elapsed histograms, using the dataset from Part 1.

To guide us through the process of gaining familiarity with Hubspot lead analytics, I’ll focus on one example question: How much time does it take for a contact to convert to a “marketing qualified lead” (MQL)?

As with anything we might want to measure, we’ll first need to define what is meant by “contact”, “MQL” and also decide what unit of time we will use. We’ll consider a customer a “contact” when their contact information is added to Hubspot; we’ll capture this with ‘hubspot_contacts_id’. This is a unique value associated with each contact from the Hubspot API.

When you collect your Hubspot data into Panoply, the ‘‘hubspot_contacts_id’ value is found on every table associated with Hubspot contact data. Each business will have their own criteria for when and how a potential customer becomes a “marketing qualified lead”; and that conversion date is stored in the ‘hs_lifecyclestage_marketingqualifiedlead_date’ field from the Hubspot API (more on this in the next section). Finally, we will look at the length of time in ‘days’ using the DATEDIFF function.

Getting familiar with Hubspot Lead Data

To answer this question, we will be using Hubspot data in Panoply via the Hubspot API. It sounds hard, but it’s actually not too bad. My background is not in computer science. Or analytics. It’s in social work! With some curiosity, determination and the Hubspot - Panoply docs as a guide, I assure you, you can figure it out too!

One thing to note about Hubspot data in Panoply is that many of the table names have a “-” in them. To avoid getting errors, you have to put the “table-name” in double quotes.

Hubspot API

The Hubspot API provides data about leads as a JSON formatted list of objects with a “value” field. As a result, Panoply transforms the data into a (very) long table of key/value pairs.

DATEDIFF() using Hubspot data

 

You can use the following query to find the unique keys that you are looking for. Of course, you can sub out “hubspot_contacts_properties” for any table of interest.

SELECT distinct key 
FROM hubspot_contacts_properties;

 

I knew I wanted to find out when contacts became “contacts” (when the contacts were created) and something about “marketing qualified leads” so I ran this query to explore the available keys.

SELECT DISTINCT key 
FROM hubspot_contacts_properties 
WHERE key ILIKE '%date%' OR key ILIKE '%market%';

WHERE key ILIKE means the results will only include observations in which the key contains the specified string (case-insensitive). ‘%’ are wildcards, meaning the string can contain any character or set of characters in place of the %. WHERE lower(key) LIKE '%date%' OR lower(key) LIKE '%market%' would be another way of getting the same results.

The resulting table (of keys) contains 2 keys that will help us answer the question of how long it takes for a contact to become a MQL: createdate and hs_lifecyclestage_marketingqualifiedlead_date. Perfect. We can use ‘createdate’ for the date a contact became a “contact”, and ‘hs_lifecyclestage_marketingqualifiedlead_date’ as the date a “contact” became a “MQL”.

For clarification about any of the contact property keys, check out the Hubspot Contacts API docs. That describes the “Contacts API”, but you can click on whichever API meets your needs (located on the left rail). For what it’s worth, I found the documents helpful and not particularly jargon heavy.

To answer the question — How much time does it take for a contact to become a “marketing qualified lead” — we will do 3 things:
  1. Create a table that contains hubspot_contacts_id, createdate and MQL date

*This is where we will end this post, but stay tuned for steps 2 and 3 in a future post.

      2. Write a DATEDIFF statement to calculate the difference between createdate and MQL date

      3. Group our results by the number of days (date_diff) column

Create a table that contains contact_id, createdate and MQL date

This query combines nested data (key/value pair tables) into a single table. It also shows how to change Unix timestamps into Dates. I borrowed this straight from the Hubspot - Panoply docs and made a couple of tweaks to make it applicable to my needs.

Query (See explanation below)

SELECT 
  p.hubspot_contacts_id AS contact_id, 
  MIN(CASE WHEN "key" = 'createdate' THEN timestamp 'epoch' + p.value / 1000 * 
interval '1 second' ELSE NULL END) createdate, MIN(CASE WHEN "key" = 'hs_lifecyclestage_marketingqualifiedlead_date'
THEN timestamp 'epoch' + p.value / 1000 * interval '1 second' ELSE NULL END)
hs_lifecyclestage_marketingqualifiedlead_date FROM hubspot_contacts_properties p GROUP BY 1 ORDER BY 3;

Explanation

-- the "p" is an alias that refers to hubspot_contacts_properties table
-- the first column will contain hubspot_contacts_id from table p
SELECT 
  p.hubspot_contacts_id AS contact_id, 
  
-- the second column (createdate) will contain the minimum value for all rows
in table p where key = 'createdate' and those values will be converted from a
Unix timestamp to a Date data type
MIN(CASE WHEN "key" = 'createdate' THEN timestamp 'epoch' + p.value / 1000 *
interval '1 second' ELSE NULL END) createdate, -- the third column (hs_lifecyclestage_marketingqualifiedlead_date) will contain
the minimum value for all rows in table p where key =
'hs_lifecyclestage_marketingqualifiedlead_date' and those values will be converted
from a Unix timestamp to a Date data type
MIN(CASE WHEN "key" = 'hs_lifecyclestage_marketingqualifiedlead_date'
THEN timestamp 'epoch' + p.value / 1000 * interval '1 second' ELSE NULL END)
hs_lifecyclestage_marketingqualifiedlead_date FROM hubspot_contacts_properties p -- grouped by column 1 (hubspot_contacts_id) GROUP BY 1 --ordered by column 3 (hs_lifecyclestage_marketingqualifiedlead_date)
in ascending order
ORDER BY 3;
id-createdate-mqlGreat! We have a table that contains all the information we need to find out how much time elapsed between ‘createdate’ and ‘hs_lifecyclestage_marketingqualifiedlead_date’. To answer our question fully, we will use DATEDIFF to calculate the difference between the two dates (in days) and group our results to see the distribution. Is there a certain window of time in which most contacts become MQLs or is there an even distribution over time? Join us in Part 2 to find out and to learn more about the DATEDIFF function.
Get a free consultation with a data architect to see how to build a data warehouse in minutes.
Request Demo
Read more in:
Share this post:

Work smarter, better, and faster with monthly tips and how-tos.