Panoply Blog: Data Management, Warehousing & Data Analysis

HubSpot & SQL: A New Way to Understand Your Revenue

Written by Yulia Lukashina | Jun 22, 2021 12:13:00 PM

If you use HubSpot, you know it captures a ton of data about your customers, marketing campaigns, and so much more. And while native HubSpot reporting is a great way to get started, you’re likely to find yourself thirsty for more detail if that’s all you use.

But there’s good news: by ingesting HubSpot data into external storage and running SQL queries, you can pull much, much more out of your data. We’ll show you how.

    Table of contents

Why is SQL so great?

We know, our love for SQL is a little intense. But that’s for good reason. 

It’s (fairly) easy to learn

SQL is simple and straightforward. Most commands have intuitive names and once you understand the structure of SQL queries, you can read and write them without hesitation. 

To become fluent in SQL, you won’t need to attend a three-week course. Most people learn it ad hoc by starting with simple queries, like the ones that we prepared for you in this article. 

Customization is a snap

SQL became so popular because of its flexibility. You can create custom queries that transform your data on the fly in any possible way. You can aggregate, filter, calculate derivatives, and combine data from more than one source or table. 

It’s compatible with (pretty much) any BI tool

You can use SQL with most BI tools, such as Tableau and Power BI. Both have a workbench where you can enter your query. Pre-processing the data before it's loaded into your BI tool can save you a few seconds of wait time. But the greatest advantage here is that you can re-use the same query in just about any SQL-compatible tool. 

Plus, using SQL with a BI tool means that you can have all of your reports in one place. You don’t need to open your HubSpot account in one tab, bring up another tool in a second tab, and then either try to stare at multiple tables at the same time or export them all, combine, and pivot (yuck). With SQL, you put all the data into one table in a view that speaks for itself. And just as important, you can share that data with your stakeholders without changing anything. 

Views: A balance of granularity & performance

HubSpot data—and any data in general—is collected at different levels of granularity. For instance, one table contains general data about your contacts, such as business address, industry, etc. Another table contains data about all deals for all your contacts. Therefore, the second table is focused on deals and their KPIs. Inside it, the contact (the customer) is considered an attribute of a deal. That means you'll need to join two tables to find detailed information about customers and their deals.

Because HubSpot data is spread across multiple tables, having a sense of how they all fit together is helpful.
Click here for a downloadable PDF.

 

The problem is that when you combine data from multiple tables or run sophisticated SQL queries, the results may take some time to load. A few seconds may not not seem like a big deal, but if you use your BI tool for showing your data during important meetings—or just have no chill—this can be crucial. 

Why create a view?

A view saves the results of your query in a table. Thus, the query engine does not have to perform the transformations or calculations every time you need them. 

Sounds like just another table in the database? It’s not. A “normal” table has all historical data at the highest granularity. You rarely need it in this form, either for yourself or your stakeholders. In contrast, a view usually takes a more narrow focus by aggregating and/or transforming your latest data so it’s easier and faster to use. 

Views: Best practices

Since views are often used to capture the most recent trends in the data, you should make sure that they are regularly refreshed. Depending on your database or cloud storage, you can set up those refreshes by changing a setting in the user interface, by performing a supplementary query, or via a command in the main query. 

When designing views, the following tips are helpful for boosting performance to only milliseconds:

  • Apply a filter to limit the data to the most relevant records
  • Perform aggregations before performing a join
  • Create multiple views for the same database or data lake table if you need different data for different purposes

A practical example

We will build a few simple queries together in the next part but before we do that, let's walk through those best practices using a particular query. We've added numbered annotations in gray so you can follow along below:

---(1) 
create view hubspot_owners_consolidated AS
SELECT 
    ownerid AS owner_id,
    type,
    firstname AS first_name,
    lastname AS last_name,
    email,
---(2) 
    TIMESTAMP 'epoch' + createdat / 1000 * interval '1 second' AS created_date,
    TIMESTAMP 'epoch' + updatedat / 1000 * interval '1 second' AS updated_date
FROM hubspot_owners
---(3) 
WHERE case 
    when isactive = 1 then TRUE 
    when isactive = 0 then FALSE 
    else isactive::bigint::boolean end IS TRUE ;

 

This query states that (1) we are creating a view and not a normal database table. The query converts (2) two timestamp values in two timestamp columns into a more usual date format making them more readable. Finally, it applies a (3) custom filter that filters out inactive users and transforms data in a particular column to make it more user-friendly and select rows based on a value.  

Not too complicated, right? The end result is that this query delivers a list of active HubSpot owners with their basic data and shows when they were created and when they were last updated. 

Step-by-step HubSpot revenue queries

With that view in place, there are fewer joins to manage, so it's a whole lot easier to create SQL queries to examine your sales pipeline from different angles.  We'll walk you through a couple more sample queries that identify strong and weak points in your sales funnel.

Expected revenue by closed month

-- Expected Revenue by Close Month
SELECT date_part(YEAR, close_date) AS "Close Year",
       date_part(MONTH, close_date) AS "Close Month",
       sum(forecast_amount) AS "Pipeline - Forecasted Amount"
FROM hubspot_deals_consolidated
WHERE deal_stage NOT IN ('closedwon', 'closedlost')
GROUP BY 1, 2
ORDER BY 1, 2;

 

Each SQL query is wrapped into a pair of commands: SELECT and FROM. The FROM command tells the query engine where to get the data. Inside the SELECT statement, you list all the columns you want to return and perform minor transformations—in this case, changing the columns headers to more user-friendly names—on the data.

In our case, we take the close_date column from the database table and use it to create two new columns by extracting the year and the month with the date_part() function and renaming the columns accordingly. 

Then we take data from the forecast_amount column, aggregate it using the sum() function, and rename the new column as “Pipeline - Forecasted Amount.”

Renaming columns is often necessary—or at least desirable—because in the database table the columns have names with underscores and without capitalization. Doing some light transformation makes those columns easier to understand, especially for less data-savvy stakeholders.

After the SELECT statement, we attach a filter using a WHERE clause. We want only deals that are still open, so we exclude those that are marked as “closedwon” or “closedlost” in the deal_stage column. 

Finally, we specify how we want to group the aggregation with the GROUP BY clause: 1 and 2 stand for the first and the second columns inside the SELECT statement (so, "Close Year" and "Close Month," respectively). GROUP BY is pretty similar to pivoting in Excel—you're just rolling up data into a superset (in this case, month and year). We also sort the data by the same columns using the ORDER BY clause.

As a result, we get the data below, which shows that in November 2020 the forecasted revenue was the highest and in July 2020 it was the lowest:

Close Year

Close Month

Pipeline - Forecasted Amount

2020

1

30,028,000

2020

2

50,750,000

2020

3

76,400,000

2020

4

23,019,000

2020

5

58,790,000

2020

6

34,000,000

2020

7

10,000,000

2020

8

71,000,000

2020

9

85,000,000

2020

10

65,000,000

2020

11

120,000,000

2020

12

11,002,000

2021

1

23,000,000

2021

2

44,300,000

2021

3

50,080,000

 

Pipeline by HubSpot owner

-- Pipeline by HubSpot Owner
SELECT o.owner_id,
       o.first_name,
       o.last_name,
       sum(d.forecast_amount) AS "Pipeline - Forecasted Amount"
FROM hubspot_deals_consolidated d
INNER JOIN hubspot_owners_consolidated o ON d.owner_id = o.owner_id
WHERE deal_stage NOT IN ('closedwon', 'closedlost')
GROUP BY 1, 2, 3
ORDER BY 4 DESC;

 

In this query, we want to create a report that shows which HubSpot owner generated the most forecasted revenue. To do that, we have to combine—aka, join—the data from two tables. 

The first table, hubspot_deals_consolidated, contains detailed information about your deals and the IDs of their respective owners. The second table, hubspot_owners_consolidated, contains detailed information about the owners, including their names. 

We start by using a SELECT clause to get the data from the final table. That’s a bit of a mind bender, but you should imagine that you already joined them to write this part of the query. We select owners’ IDs, their first and last names from the second table, and calculate their individual revenues based on the data from the first table.

We add the first table alias—simply a d—that we need later in the JOIN command. Then, we use an INNER JOIN statement to tell the query engine that we want all the matching rows from the second table, which is aliased as o.

With ON, we specify which values should be used to perform the match. Similar to the previous query, we use three clauses: WHERE to filter, GROUP BY to group, and ORDER BY to order. 

Once we run the query, we get this list of owners sorted by their predicted revenues:

o.owner_id

o.first_name

o.last_name

Pipeline - Forecasted Amount

456789

John

Reed

59,983,075

123456

Maya

Reeves

40,320,682

789123

Jack

Thunders

23,382,795

In this example, you can see what happens when you don't rename the columns in the query! It's no big deal for users that are comfortable working with data directly from databases, but can be off-putting for users that are more accustomed to the tidy column headers found in most BI tools.

 

The resulting table tells us that John Reed is crushing it, bringing in more than 2x more pipeline than his colleague Jack. Not too shabby!

Conclusion

Are you getting excited about SQL queries and digging into the HubSpot data? We hope yes! 

Wishing you could run these queries on your own HubSpot data? No problem! All you have to do is give Panoply a try. It’s completely free for 14 days and setup is super simple. Just create an account, connect your HubSpot data, and run the SQL in the built-in workbench once your data’s loaded. You’ll be finding new insights in no time...and you’ll finally get why we just can’t help but squeal over a great SQL query.

Want to dive even deeper into HubSpot data? Check out our follow-up post on analyzing your email campaigns!