When I started my SQL journey not too long ago, I diligently completed interactive tutorials and to be honest, thought I had a grasp on this thing. Fast forward a couple of months, I tried to write my own queries on “real” data and… BAM! A brick wall. I spent a significant amount of time banging my head on it until finally the whole “formatting dates” thing started to click. I realized that once you can format dates properly, you can sort your data in all kinds of interesting ways.
My intention for this blog post is to create a guide that is easy to understand (even for a beginner) and to include queries you can copy and paste into your SQL editor right now. Ultimately, I hope to help you “see the light” with minimal brain damage.
As you may know, there are many different “flavors” of SQL. For Example: Microsoft SQL Server, MySQL (open source) and PostgreSQL (open source). The information in this blog post will be specific to PostgreSQL, but many of the basic commands are the same in all varieties. If you’re not sure which version of SQL you are using, try running: SELECT version()
in your SQL editor.
Let’s Get Started
If you’re feeling ready to dive in and want to run the example queries on your own, here is one way to get started:
- Sign up for a free account on Mode.com
- Sign up for a Panoply.io free trial
- Connect your data to Panoply (in minutes)
- Login to Mode > Click the circle with your username in it > ‘Connect a Database’ > Amazon Redshift > Enter the credentials from Panoply “Connect” screen > Start writing your queries in the Mode editor!
“Big ideas” about Dates
There are many ways to format dates.
For example, today is October 7, 2019. It is also 10-7-2019, 10-07-2019, 10/7/2019, 10/07/2019, 10/7/19, 2019-10-07, 07-10-2019 and Oct-7-2019. You get it. Tomato, to-mah-to. However, when writing queries in SQL, all dates are not created equal. Therefore, you may have to manipulate the format of your dates in order to use them in your queries and/or to take advantage of date functionality. It may seem overwhelming at first, but truly, it’s all a matter of syntax (rules that define the correct sequence of symbols).
Year-First
Although you “can” format dates in a variety of ways, you should format year-first in SQL (yyyy-mm-dd). Why? So that when you use the ORDER BY
function, they will be chronological.
Timestamps/Time zones
In addition to storing date information, timestamps also include time data. “Timestamp” does not include time zone data, but “timestamptz” does. Timestamptz converts and stores time values in UTC by default.
What is UTC? Glad you asked. UTC stands for Coordinated Universal Time. Yes, the acronym is jumbled. UTC isn’t exactly a time zone, but it is a standard (has an offset of 00:00 and no concept of Daylight Saving Time) and as such, it makes sense to store your time data this way in a database. To learn more about UTC/time, check out Zac Holman’s post about it. Note: It may blow your mind. If you need or want to convert time zones, PostgreSQL Tutorials can help you learn how to do that.
Bottom line, take some time to think about what you really want to do with your data. If you don’t actually need to know the time, stick with date. Time can get messy.
Dates can be represented in different data types
- Date - date data only (no time data)
- Integer- time can be represented as the total number of seconds from the “Unix Epoch” on January 1, 1970 (UTC). This is known as a “unix timestamp.”
- Interval - represents a period of time (ex. ‘1 week’)
- String - dates can be represented as a string for presentation
- Timestamp - contains date and time data, no time zone data
- Timestamptz - contains date and time data, with time zone data (UTC by default)
Helpful tips:
- If you’re performing arithmetic, the data types generally need to be the same or at least compatible (ex. you can subtract a date from another date, but you’d need to cast a string as a timestamp in order to subtract it from another timestamp).
- The data type of your results will depend on how you are using the dates/data types (ex. date + interval = date).
SELECT current_date
returns a date value (date data type) for the current date.SELECT NOW()
returns a timestamp with time zone, equivalent to current_timestamp (depends on the database server’s time zone settings).SELECT NOW()
at time zone 'utc' returns a timestamp using the standard UTC.
Converting a Date to a String
There are several ways to convert a date to a string, but we’ll stick with TO_CHAR
for now. You may want to do this for presentation. If you want to use date functions, you can turn a properly formatted string into a date.
Note: The data used in the following examples is from Panoply’s Facebook account, but you can use whatever data and metrics make the most sense for you.
TO_CHAR
The TO_CHAR()
function converts a value to a string (might be a date, timestamp, interval or other data type). You’ll always start with a SELECT
statement. Next, you’ll write your function, in this case, TO_CHAR
. Inside parenthesis, you’ll input your “expression” (which is your date and/or time value) followed by a comma and then, inside single quotes, the format that you want the output to be in (will be a string data type).
Syntax: TO_CHAR(expression, format)
Example: Use TO_CHAR()
to calculate the number of page impressions by month. This query takes a column with date values, formats the month as a string value in a column called “month”, sums the total number of page impressions by month and orders them by month.
SELECT TO_CHAR(date, 'mm') AS month, SUM(page_impressions) FROM facebook_pages GROUP BY month ORDER BY month
Output:
Here is another example. Let’s pretend I want my output to show the year that it is right now, and I want it to be a string datatype. I could use the following:
SELECT TO_CHAR(now() :: timestamptz, 'yyyy')
Output:
Resource: Check out AWS Redshift’s full documentation to see a list of options for how you can format your date (ex. YYYY, MM, DAY, Day).
EXTRACT
The EXTRACT()
function retrieves a field such as year, month, day, etc., from a date or time value. Why is this useful? Well, your data might contain a column with a date for each individual page impression. However, you really only care about which day of the week has the most page impressions. It would take a while to calculate that using the dates as they are. Luckily, you can extract the day of the week (‘dow’) from the date and then look for the day of the week that has the most page impressions (sum). I realize that page impressions are not necessarily the most meaningful metric, but I’m using it just for the sake of an example.
Syntax: EXTRACT(field FROM source)
Field Value |
TIMESTAMP |
Interval |
CENTURY |
The century |
The number of centuries |
DAY |
The day of the month (1-31) |
The number of days |
DECADE |
The decade that is the year divided by 10 |
Sames as TIMESTAMP |
DOW |
The day of week Sunday (0) to Saturday (6) |
N/A |
DOY |
The day of year that ranges from 1 to 366 |
N/A |
EPOCH |
The number of seconds since 1970-01-01 00:00:00 UTC |
The total number of seconds in the interval |
HOUR |
The hour (0-23) |
The number of hours |
ISODOW |
Day of week based on ISO 8601 Monday (1) to Saturday (7) |
N/A |
ISOYEAR |
ISO 8601 week number of year |
N/A |
MICROSECONDS |
The seconds field, including fractional parts, multiplied by 1000000 |
Sames as TIMESTAMP |
MILLENNIUM |
The millennium |
The number of millennium |
MILLISECONDS |
The seconds field, including fractional parts, multiplied by 1000 |
Sames as TIMESTAMP |
MINUTE |
The minute (0-59) |
The number of minutes |
MONTH |
Month, 1-12 |
The number of months, modulo (0-11) |
QUARTER |
Quarter of the year |
The number of quarters |
SECOND |
The second |
The number of seconds |
TIMEZONE |
The timezone offset from UTC, measured in seconds |
N/A |
TIMEZONE_HOUR |
The hour component of the time zone offset |
N/A |
TIMEZONE_MINUTE |
The minute component of the time zone offset |
N/A |
WEEK |
The number of the ISO 8601 week-numbering week of the year |
N/A |
YEAR |
The year |
Sames as TIMESTAMP |
Source for table: http://www.postgresqltutorial.com/postgresql-extract/
Example: Use EXTRACT()
to calculate the average number of page impressions by day of the week. This query calculates the average number of page impressions by day of week (Sunday=0) and then lists them in descending order (the day of week with the most page impressions, on average, is at the top). Pretty neat!
SELECT EXTRACT('dow' FROM date) AS day_of_week, AVG(page_impressions) FROM facebook_pages GROUP BY day_of_week ORDER BY AVG(page_impressions) DESC
Output:
DATE_TRUNC
The DATE_TRUNC()
function truncates (shortens) a timestamp or an interval value based on a specified date part (hour, week, month, etc.). Everything to the “right” of the datepart you selected will be “blank” or go back to the beginning (in other words, if you truncate your query at year, then the month, day and time will “reset” to 01-01 00:00).
Syntax: DATE_TRUNC('datepart', field)
Datepart can include |
Field can include |
millennium |
Timestamp(tz) or interval |
century |
Timestamp(tz) or interval |
year |
Timestamp(tz) or interval |
quarter |
Timestamp(tz) or interval |
month |
Timestamp(tz) or interval |
week |
Timestamp(tz) or interval |
day |
Timestamp(tz) or interval |
hour |
Timestamp(tz) or interval |
minute |
Timestamp(tz) or interval |
second |
Timestamp(tz) or interval |
milliseconds |
Timestamp(tz) or interval |
microseconds |
Timestamp(tz) or interval |
Note: You can use a date for the field value; it will be cast to timestamp with time 00:00.
Example: Use DATE_TRUNC()
to calculate how many days you are into this quarter. This query subtracts the beginning of this “quarter” from the current date.
SELECT CURRENT_DATE - date_trunc('quarter', CURRENT_DATE) AS days_into_this_quarter
Output:
Converting a String to a Date
CAST date or timestamp
“Cast” means to convert a value from one data type to another. You may use this to convert a string to a date.
Syntax: CAST(expression AS target_type)
SELECT CAST('10-07-2019' AS date)
Output: 2019-10-07 00:00:00
SELECT CAST ('10/07/2019' AS date)
Output: 2019-10-07 00:00:00
SELECT '2019-10-11' :: date
Output: 2019-10-11 00:00:00
TO_DATE
TO_DATE converts a string to a date value. Format here means the input format. You need to convert a string to a date if you want to use any of the date functions or perform arithmetic with another date value or interval.
Syntax: TO_DATE(text, format)
SELECT TO_DATE('10-07-2019', 'mm-dd-yyyy')
Output: 2019-10-07 00:00:00
PostgreSQL Date Functions - “Lightning Round” examples
Function |
Example |
Output |
---|---|---|
CURRENT_DATE |
SELECT CURRENT_DATE |
date 2019-10-11 00:00:00 |
CURRENT_TIME |
SELECT CURRENT_TIME |
timetz 15:40:58 |
CURRENT_TIMESTAMP |
SELECT CURRENT_TIMESTAMP |
timestamptz 2019-10-11 15:41:43 |
EXTRACT |
SELECT EXTRACT('EPOCH' from '2019-10-11'::date) |
date_part 1570752000 |
LOCALTIME |
SELECT LOCALTIME |
time 15:49:34 |
LOCALTIMESTAMP |
SELECT LOCALTIMESTAMP |
timestamp 2019-10-11 15:50:21 |
DATE_PART |
SELECT DATE_PART('year', date '2019-10-11') |
Pgdate_part 2019 |
DATE_TRUNC |
SELECT DATE_TRUNC('year', date '2019-10-11') |
date_trunc 2019-01-01 00:00:00 |
NOW |
SELECT NOW() |
now 2019-10-11 15:56:08 |
NOW (COORDINATED UNIVERSAL TIME) |
SELECT NOW() at time zone 'utc' |
timezone 2019-10-11 15:59:15 |
TO_DATE |
SELECT TO_DATE('2019/10/11', 'yyyy-mm-dd') |
to_date 2019-10-11 00:00:00 |
TO_TIMESTAMP |
SELECT TO_TIMESTAMP('2019/10/11 11:09:03', 'yyyy-mm-dd hh:mi:ss') |
to_timestamp 2019-10-11 11:09:03 |
Thank you for taking the time to read through this guide. My intention is to equip you with the tools you need to format dates in SQL so you can start analyzing your data! There are some really great, interactive resources for getting started with SQL (such as Mode Analytics SQL tutorials). If you’re like me, the best way to learn is by practicing with your own data.
At Panoply, we love to learn and grow. If you have questions or have suggestions for how to improve this blog post, we want to hear from you! Contact us here.