Google BigQuery is a serverless data warehouse solution that aims to be cost-effective and highly scalable, allowing businesses to analyze large amounts of data with speed and efficiency.
It also comes with some very useful SQL functions that make handling common business cases much more painless.
In this post, I'll walk you through a non-exhaustive list of nifty and helpful BigQuery SQL functions you should try.
I'll divide the functions into categories and provide the BigQuery syntax for each one, along with usage examples.
Let's dig in!
Without further ado, let's walk you through our list of functions.
To make things easier on you, I've categorized the functions into groups, such as "string" and "date and time."
Keep in mind that I won't cover very basic functions, e.g., those related to table creation. Instead, I'll try to focus on those that apply to common business problems you might have.
For instance, finding out the date of the last day of a month—or another time—is a fairly common need in business.
Another common need is dealing with large datasets that might contain invalid data—for instance, columns that aren't supposed to be 0 but are. In such cases, you'd probably want your operation to run without aborting, returning a "placeholder" value that indicates the areas in which there were issues.
String manipulation—including searching and replacing—is also a given when dealing with large data sets; that includes regex operations.
Also, keep in mind that what we're showing here is just the tip of the iceberg. BigQuery offers many more functions than the ones covered here, and you'll learn a lot by exploring them further.
With that out of the way, let's get started.
Handling time is an essential task for most business. So, let's start with some useful BigQuery functions related to time.
It's common that you'll need to find the date of the last day of a given period.
This is certainly more common when it relates to months; after all, "regular" months can have 30 or 31 days, and February has 28, gaining an extra day on leap years.
However, you may also need to get the last date of a different period, such as a week or a quarter.
BigQuery SQL has the LAST_DAY
function for that.
Its syntax is as follows:
LAST_DAY(date_expression[, date_part])
The date_part
parameter is optional. If you don't pass it, the default is MONTH
.
Here are some examples of the function in use; for each one, I'll show the function call, followed by its result.
select last_day('2021-07-24', MONTH) 2021-07-31
select last_day('2021-07-24') 2021-07-31
select last_day('2021-07-20', WEEK) 2021-07-24
select LAST_DAY('2021-07-22', WEEK(MONDAY)) 2021-07-25
It's easy to perform date and time arithmetic using the following functions:
Action | Function | Syntax |
Add dates | DATE_ADD |
DATE_ADD(date_expression, INTERVAL int64_expression date_part) |
Subtract dates | DATE_SUB |
DATE_SUB(date_expression, INTERVAL int64_expression date_part) |
Add a time | TIME_ADD |
TIME_ADD(time_expression, INTERVAL int64_expression part) |
Subtract a time | TIME_SUB |
TIME_SUB(time_expression, INTERVAL int64_expression part) |
Let's see the above functions in action using some examples.
select DATE_ADD(CURRENT_DATE(), INTERVAL 5 DAY) 2021-07-29
SELECT DATE_SUB("2015-10-21", INTERVAL 3 DAY) 2015-10-18
--WARNING! ERRORS ENCOUNTERED DURING SQL PARSING!
SELECT TIME_SUB("08:30:00", INTERVAL 10 MINUTE)
08:20:00
You can simply use the +
or -
operator and specify a number of days.
For example:
SELECT "2021-07-23" + 5 2021-07-28
select "2021-07-24" - 8 2021-07-16
Text manipulation is another crucial task businesses often require, so let's cover some cool BigQuery functions related to string and text.
You can easily concatenate strings in BigQuery SQL using the CONCAT
function:
SELECT CONCAT("SCHULTS", ", ", "Carlos"); SCHULTS, Carlos
However, using the CONCAT
function for many values can become cumbersome. Luckily, you can use the concatenation operator (||)
instead of using the function, like so:
SELECT "SCHULTS" || ", " || "Carlos" SCHULTS, Carlos
You can easily obtain the ASCII value of the first character of a word using the ASCII
function:
SELECT ASCII('ABCDE') 65
Note: An empty string results in 0
.
You often need to find the index of a value inside a string. The function INSTR
comes in handy for that. Its syntax is as follows:
--WARNING! ERRORS ENCOUNTERED DURING SQL PARSING!
INSTR(source_value, search_value[, position[, occurrence]])
])
It returns the index—based on 1—of the first occurrence of search_value
inside of source_value
.
The position parameter is optional; if specified, the function starts the search by that position.
Occurrence is also optional; if specified, the function searches for the nth
occurrence of the search value.
Let's look at some examples of it in action:
SELECT INSTR("Stay", "St") 1
SELECT INSTR("StackOverflow start Steel", "St", 1, 2) 21
Note: Bear in mind that this function is case sensitive.
There's an enhanced version of the previous function called REGEXP_INSTR
. It allows you to specify a regex (regular expression) as the search value:
select REGEXP_INSTR("test 123test", "[0-9]+[a-zA-Z]+") 6
BigQuery SQL offers many functions that support regex.
I'll cover another one: REGEX_REPLACE
. Here's its syntax:
REGEXP_REPLACE(value, regexp, replacement)
As its name and syntax suggest, you use it to replace a value that matches to the specified regex.
Let's look at an example:
select REGEXP_REPLACE("test 123test", "[0-9]+[a-zA-Z]+", "it works!") test it works!
Finally, let's look at some useful BigQuery math functions.
Sometimes, you want to perform a division, but you know or suspect your data set contains values that will result in errors.
For instance, you might end up attempting divisions by zero.
In such a case, it'd be helpful to carry on with the division, returning a special value—such as NULL
—instead of having an error that halts the whole execution.
That's what the SAFE_DIV
function is for:
select SAFE_DIVIDE(10, 0) null
Similar to the safe division operation, you might need to perform a safe multiplication.
For instance, your data set might contain very large numbers that would result in an overflow when multiplying.
By using the SAFE_MULTIPLY
function, you ensure you get NULL
instead of an error, allowing your operation to continue without aborting.
Google BigQuery is certainly an interesting solution for companies that need a multi-cloud data warehouse solution.
Besides its many valuable features, such as high availability, real-time analytics, and serverless architecture, Google BigQuery offers several functions that make your life easier, allowing you to handle common business problems in a less cumbersome way.
If you're interested, your organization can start using BigQuery right away, in the free tier.
And when you need a cloud data platform to connect all of your data and deliver it seamlessly to everyone inside your business, Panoply's got you covered.
As of June 2021, Panoply works on Google BigQuery, offering great features and support at an affordable price. You can even try Panoply free for 14 days and get a personalized demo for your needs.
Thanks for reading, until next time!