In this blog post, I will introduce you to the basics of using CASE statements/expressions. This post will also cover comparison operators (greater than, less than, equal to) and BETWEEN. My goal is to present the material in a way that is accessible to everyone, even a newbie. CASE statements are a pretty simple way to up your SQL game and are useful for returning conditional values.CASE statements
I’m actually really excited to write about CASE statements.
If you’ve ever worked in a school setting, have a child or a pet,
then CASE statements will be your jam.
Are you familiar with first/then, if/then or when/then? Great, you already get CASE statements. CASE statements are essentially the “case when” certain conditions are met.
Here’s a non-code example. My daughter wants to paint, but there are puzzle pieces all over the floor. I might say to her, “when you’ve put away the puzzle pieces, then you can paint.” Once the first condition is met, the second condition can occur. This CASE statement would go something like:
SELECT CASE WHEN puzzle_pieces = 'put away' THEN 'let’s paint!'
ELSE 'no painting'
END AS paint_status
Image by congerdesign from Pixabay
Your conditions might be straightforward, like in the example above. They could also contain comparison operators or time intervals (we’ll get to that in a later post). The ELSE statement is optional. You’ll always finish your CASE statement with END.
CASE statement structure (in plain English):
SELECT (always begin with a SELECT statement)
CASE WHEN (what are the conditions?)
THEN (when those conditions are met (or not), what will happen?)
(ELSE) (if you don’t include ELSE, the results outside of the conditions you specified will return null)
END (this ends your case statement)
AS (this is the name of the column you are adding to the table)
FROM (the name of the table you are pulling data from)
(GROUP BY) (optional)
(ORDER BY) (optional)
Now that you know the basic structure, let’s move on to Comparison Operators.
Comparison operators will return a boolean type; this is a fancy way to say TRUE or FALSE.
Here is a table of comparison operators and their descriptions.
less than or equal to
greater than or equal to
<> or !=
Table source: https://www.postgresql.org/docs/current/functions-comparison.html
Note: You can SELECT arithmetic computations without referencing a specific table. In this situation, it is helpful for me to think of SELECT as “print” in other coding languages.
SELECT 7 < 2 AS output
7 is not less than 2, so the boolean value is false.
SELECT 7 > 2 AS output
7 is greater than 2, so the boolean value is true.
Here is an example of a CASE statement (with a Comparison Operator) using data from my Twitter account:
SELECT retweet_count, CASE WHEN retweet_count > 0 THEN 'yes' ELSE 'no' END AS retweets FROM public.twitter_tweets
“Retweet_count” is a field (column) already populated by Twitter, found in the “twitter_tweets” table. The CASE statement in the example states that whenever a row (instance) was retweeted (the retweet_count was greater than 0), “yes” should be printed under the new column called “retweets”. If retweet_count is 0, then “no” should be printed.
To check your work, you could also SELECT “retweeted” for reference. “Retweeted” is a field that demonstrates whether or not a post was retweeted (same as our CASE statement) with a boolean data type. Selecting this column would show us that our CASE statement is doing what we want it to, because yes = true and no = false for each instance.
Want to try this with your own data?
Here are step-by-step instructions for how to integrate your Twitter data into Panoply in a matter of minutes. This is a fun way to learn/practice.
Similar to a comparison operator, BETWEEN is actually considered a comparison predicate. You can use BETWEEN to return a result that falls between specified values.
Let’s pretend I have a table called “employee_info”, and I want to return a list of my “millennial” employees and their birth dates. I might run the following query:
SELECT employee_name, birth_year, birth_date FROM employee_info WHERE birth_year BETWEEN 1981 AND 1996
I could also use:
SELECT employee_name, birth_year, birth_date FROM employee_info WHERE birth_year >= 1981 AND birth_year <= 1996
Both queries will return the same result because they are essentially saying the same thing in a different way. It’s like saying “what are all of the integers where x is between 5 and 10?” The answer would be: 5, 6, 7, 8, 9, 10. Another way to ask the same question would be, “what are all of the integers where x >= 5 AND x<=10?” and the answer would be... the same!
Ok, let’s put it all together in one final example:
SELECT retweet_count, CASE WHEN retweet_count BETWEEN 1 AND 10 THEN 'boring' WHEN retweet_count BETWEEN 11 AND 20 THEN 'interesting' WHEN retweet_count > 20 THEN 'on fire!' ELSE 'no retweets' END AS retweets FROM public.twitter_tweets ORDER BY retweet_count
Notice that there are no commas after each WHEN statement, but there is a comma after “retweet_count” (separating columns).
A little about ORDER BY (bonus section)
ORDER BY retweet_count and ORDER BY 1 (1st column) would have returned the same result. The default for ORDER BY is ascending order. If you wanted your results in descending order, you could write “ORDER BY retweet_count DESC”. You could also ORDER BY “retweets” or “2”, but your results would be in alphabetical order since the values are a String data type.
Until next time...
Thank you for joining me on this journey to learning SQL. Hopefully this post helped you gain a deeper understanding of CASE statements as well as the practical tools to put them to use! Boolean expressions and comparison operators will continue to prove their utility beyond their functions within CASE statements. Until next time… Happy query-ing!
At Panoply, we love to learn and grow. If you have questions or have suggestions for how to improve this blog post, we want to hear from you! Contact us here.