This article will cover how to work with time series/datetime data in Redshift.
Time series data holds the date and/or time information and is different from string or numeric data since it has components like day, month, year, hours, etc., that we might want to access/compare/modify.
For example, we might want sales in the month of May or to find the difference in two dates as the number of months.
Redshift is a database provided as a part of AWS (Amazon Web Services), based on the Postgres database, but with some differences. Like most other relational databases such as MySQL, SQL Server, Oracle, DB2, etc., Redshift can handle and manipulate time series data.
Let's see how.
Firstly, let's look at which data types for time series are supported by Redshift and what kind of data they can hold.
- DATE: Dates without the time part
- TIME: Only the time part
- TIMEZ: Time with time zone info (since time in New York is behind that in Tokyo, for instance, time zones reflect this difference)
- TIMESTAMP: Includes both the date and time
- TIMESTAMPZ: Timestamp along with the time zone info
For more information, check out the AWS documentation.
Fetching datetime results
Now we know the different data types you can get for time series. But how do we get the results of those types of queries? Does it always have to come from data stored in columns?
No, we can get datetime results in two ways:
- Selecting a datetime column in the query
- Using a function to create a datetime from some other input
Let's see an example query.
Current datetime example
select current_date today, current_timestamp now, current_time time_now, to_date( '28-10-2010', 'DD-MM-YYYY') today_from_str, current_date + 1 tomo
We've used Redshift's built-in functions to get today's date and time in the example above.
TO_DATE to create a date from a string, and we used date arithmetic to get tomorrow's date from today. Note how the "now" column has both the date and time.
GETDATE can also be used for the current timestamp, with the caveat that
SYSDATE returns the timestamp for the current transaction.
Working with time zones
We saw that the type
TIMESTAMPZ above that can hold time zone info, but what is that, and why is it needed?
Because of the earth's rotation, X a.m./p.m. of any given day can be different in different locations around the world; we have time zones to capture this info.
For example, when it's sunrise in Japan, England is still in the dark, waiting for the earth's rotation to bring it to face the sun. Thus Japan and England are in different time zones.
TIMESTAMPZ data type captures the time zone along with the timestamp. If the time zone is not specified in a
TIMESTAMPZ type, it's assumed to be UTC.
So first, how do we get a
Either we populate a
TIMESTAMPZ column with a string literal containing the time zone (e.g., '2021-10-31 13:00:00 IST'), or we use the
TIMEZONE function to create a
TIMESTAMPZ value from a timestamp. In the below example, we have specified EST as the time zone.
select TIMEZONE( 'EST', TIMESTAMP '2021-10-31 10:00:00')
The result is being displayed as UTC. UTC is ahead of EST by five hours, so the UTC time is 10+5=15.
Let's say my boss in the US wants me to attend an online meeting at 7 p.m. on Friday, October 31, 2021. What time will it be in India?
CONVERT_TIMEZONE function converts from one time zone to another, and here we want to convert from source EST to target IST. Unfortunately, it seems that Redshift is confusing IST with Israel time, so I'll also have to use the Asia string.
select CONVERT_TIMEZONE( 'EST', 'Asia/Kolkata', TIMESTAMP '2021-10-31 07:00:00 PM')
Oh no! I have to get up at 5:30 a.m. on a Saturday? :(
Creating sample data and examples
Next, let's now create some data for running more queries.
In this scenario, we'll say we have a business, and we're looking at how many of our products get sold within certain time frames.
Let's assume for the examples below that we have a table "sales," which has columns titled as follows: product_name, sale_date(timestamp), and amount(decimal). Then, we can create the sample data as shown below:
create table sales (id integer, product_name varchar(100), purchase_date timestamp, sale_date timestamp, amount integer); insert into sales values (1, 'Kellogs', null, to_timestamp('2021-10-29 07:00:00', 'YYYY-MM-DD HH:MI:SS') - interval '12 weeks', 20); insert into sales values (2, 'Kellogs', null, to_timestamp('2021-10-29 08:00:00', 'YYYY-MM-DD HH:MI:SS') + interval '2 hours', 10); insert into sales values (3, 'Kellogs', null, to_timestamp('2021-10-29 08:00:00', 'YYYY-MM-DD HH:MI:SS'), 10); insert into sales values (4, 'Milk', null, to_timestamp('2021-10-29 08:00:00', 'YYYY-MM-DD HH:MI:SS'), 5); update sales set purchase_date = dateadd( months, -id, sale_date);
TO_TIMESTAMP to parse the date from the string. HH24 should be used for hours when using the 24-hour time format.
Note the use of the
INTERVAL keyword to perform date arithmetic using specific time intervals.
Redshift may not support month, year in intervals in some contexts. In addition, we've populated the purchase_date column by subtracting the ID number of months from the sale_date, using the
It's also possible to generate such dates by adding the result of the
ROW_NUMBER function to some initial date.
Sales in the morning example
Using the sample data we created above, let's learn some more information about the products.
For example, to find out which products are sold in the morning hours between 8 a.m. and 10 a.m., we could write the following:
select product_name, sum(amount) from sales where date_part( 'hour', sale_date) between 8 and 10 group by product_name order by 2 desc;
Here we're using the
DATE_PART function to extract the hour part. Note that using a function on an indexed column means the index cannot be used. Instead, a separate function index will have to be built on the expression.
So, in the above example, even if the
sale_date column has an index on it, it won't be used, as we are not using
Sales in the current quarter example
To query for sales in the current quarter, run the following:
select * from sales where date_part( 'quarter', sale_date) = date_part( 'quarter', current_date) and date_part( 'year', sale_date) = date_part( 'year', current_date)
Here, we compare both the quarter and the year of the sale date with those of the current date.
First day of the month example
To get sales from the first day of the month, we can use the
It will truncate the date/time parts that are less significant than the specified time part. Here, years are the most significant, and microseconds are the least significant.
In the below case, if we truncate till months, then the days, hours, and seconds parts that are less significant than the month will all be cleared/truncated. Similarly, we can get the first day of the year or the first day of the week.
There is a separate
LAST_DAY function that returns the last day of the month for a given date.
select date_trunc('month', sale_date ) first_day, last_day(sale_date) from sales;
We've seen some ways of working with dates. For example, adding two dates usually doesn't make sense (e.g., add 01-Jan-19 to 02-Feb-20). So usually, we add a date and a time interval (e.g., add a day or week or month or year to a given date).
We've already seen the use of the
INTERVAL keyword above to add/subtract time intervals to/from dates. You can also use the
DATEADD function for this purpose. The advantage it has over the
INTERVAL keyword is that the value of the interval to be added can be a dynamic expression involving columns, etc.
DATEADD can be used for subtraction also, by using negative time intervals.
Generating all days of the month example
The below query uses
ROW_NUMBER to generate all dates of a given month (in this case, November).
For this purpose, we need a table that is sure to have at least 31 rows for the max of 31 days a month can have. So we use a system table
PG_ATTRIBUTE_INFO, which contains info about all columns and has many rows for system table attributes.
Here we have used
EXTRACT as an alternative to
DATE_PART. We have used common table expressions to be able to reuse the queries.
DATE_TRUNC is used to find the first day of the month.
with mdays as ( select top 31 row_number() over () rnum from pg_attribute_info), ldt as (select last_day( to_date('01-11-2021','DD-MM-YYYY')) lday ) select rnum, dateadd(day, rnum -1, date_trunc('month', lday)) from ldt, mdays where rnum <= extract( days from lday) order by rnum;
An alternative approach using the
GENERATE_SERIES function from Postgres is given below.
Though it works on Redshift, it's not officially supported per the documentation.
with ldt as (select last_day( to_date('01-11-2021','DD-MM-YYYY')) lday ) select dateadd(day, generate_series-1, date_trunc('month', lday) ) from generate_series(1,(select convert( int,date_part(day,lday)) from ldt),1), ldt;
After the addition of dates, how about subtraction? If we subtract one date from another, the result is a time interval.
While it's possible to simply subtract date1 from date2, we usually want to know the difference between dates in a particular time interval (e.g., days, months, or years). You can use the
DATEDIFF function for this purpose.
On the other hand, if we subtract an interval from a date, the result is a date. For this purpose, we can use the
Sales for the last 3 months example
select sum(amount) from sales where sale_date > current_date - interval '3 month'
Note that we could also use the
DATEADD or the
ADD_MONTHS functions instead of
Sale is 2 months later than the purchase example
select * from sales where datediff( month, purchase_date, sale_date ) > 2
select * from sales where sale_date > dateadd( month, 2, purchase_date )
We have explored various ways of working with datetime data in Redshift.
- Date arithmetic
And suppose you're looking for ways to manage this data in Redshift. In that case, the Panoply can load data from Amazon Redshift, amongst numerous other options to integrate with data sources and BI tools. So you can bring together and analyze data from various sources.
It can also perform transformations on that data with its low-code ETL tool. As they put it, "Everyone wins when data flows freely."
You can even request a free demo!