Panoply Blog: Data Management, Warehousing & Data Analysis

Instagram User Analytics: Getting Started with SQL Queries

Written by Peter Weinberg | Sep 23, 2019 5:52:00 PM

This post has been updated to reflect changes to Instagram's API (Facebook's Graph API). You will now need to switch to an Instagram Business account to perform this analysis in Panoply.

Even though Instagram isn’t one of the original social media platforms (shouts out Friendster and Myspace), it has become one of the most dominant social media companies out there, especially among younger people who aren’t interested in hearing about how windmills cause cancer from their Grandma on Facebook. To put it in more concrete terms, 500 million people use the app daily, including 64% of 18-29 year olds, so it shouldn’t be a surprise that an effective Instagram strategy has become a cornerstone of contemporary marketing efforts. It’s not just about buying ads on Instagram, though. 80% of Instagram accounts follow a business on Instagram, which means that, as a company, you can get a lot of exposure simply by setting up an account and building up a base of followers. 

 

Of course, if you’re here reading about Instagram SQL queries at Panoply, you’re probably a data-driven or data-curious company, so you’ll be wondering how to use your Instagram data to hone your strategy for the best possible outcomes. Today we’re going to walk through some common questions you might want to ask using your Instagram data and how to access the relevant data using Panoply. 

 

Instagram data on Panoply

First off, let’s talk about where your Instagram data ends up when you use Panoply’s native connector to collect it. If you do the standard Instagram data collection using Panoply, you’ll get a set of tables that looks like this: 

 

  • instagram_graph_user 
  • instagram_graph_comments
  • instagram_graph_media
  • instagram_graph_media_caption
  • instagram_graph_media_images (plus some very similar tables containing various resolution levels) 
  • instagram_graph_media_user

 

We won’t cover all of these tables today, but if you want to know more, you can always check out our documentation, which goes through the elements you’ll find in each table. For the purposes of what we’re talking about today, though, we’re going to focus on the instagram_media, instagram_comments, and instagram_user tables. Briefly, these tables contain these distinct types of data: 

 

instagram_media: contains the like counts, comment counts, type of media, and other data about individual posts

 

instagram_comments: contains the text of the comments left on posts, the user who left the comment, and the time the comment was created 

 

instagram_user: contains follower counts and other summary information about the user’s account 

 

Instagram metrics and SQL queries

So now that you have some basic familiarity with the data you’ll find after using Panoply to collect your Instagram data, let’s look at a few metrics you might want to dig into, and how to generate them using SQL. We selected some of the most insightful queries from submissions to our Instagram contest back in April, and we're going to go over them in depth. We’re going to cover each query individually over the next couple days, so make sure to keep an eye on this series. 

 

Best-performing type of Instagram media 

Although it started as a photo-sharing service, Instagram now allows users to upload multiple types of visual media: images, multi-picture carousels, videos and stories. Each of these types of post can be engaged with in its own way, but some types of post may perform better with your audience than others. On average, videos have a 38% higher engagement rate than other post types, but your followers might be different! Luckily, you now have the ability to see what’s doing the best in your feed. 

 

Try this SQL query from Ka's submission to our Instagram challenge. It's for analyzing the performance of different types of Instagram media: 

SELECT media_type, 
  SUM(like_count) as total_likes, 
  AVG(like_count) as avg_likes, 
  SUM(comments_count) as total_comments, 
  AVG(comments_count) as avg_comments, 
  COUNT(distinct id) as nums_of_post 
FROM public.instagram_graph_media 
  /* Make sure to enter the exact name of the instagram table as it appears 
in your Panoply instance */
GROUP BY media_type

This will return a set of results with the average and total likes counts, comments counts and number of posts organized by media type, and you can plug these results into your favorite data visualization tool to generate a chart for the visual learners on your team. 

 

Before you run off to paste this query into your SQL workbench, though, let’s take a minute to look a little bit more closely at the code. What we’re doing is basically creating a new table that summarizes the much larger instagram_media table by creating 6 columns: 

 

  • media type (in the code as type)
  • total likes (SUM(likes_count))
  • average likes (AVG(likes_count))
  • total comments (SUM(comments_count))
  • average comments (AVG(comments_count)
  • total number of posts by type (COUNT(distinct id))

 

But the aggregation functions (SUM(), AVG() and COUNT()) won’t work without the GROUP BY statement at the end of the query, and because we’re looking to get information about the three different media types, we’ll use type to aggregate (GROUP BY type).  

 

Because we piled so many different summary columns into this new table, you’ll be able to make a number of different charts using the results of this query: 

 

  • Total counts of each post type 
  • Average engagement for each post type (likes and/or comments) 
  • Total lifetime like/comment counts for each post type

 

If you liked this mini-guide, check out the rest of the series!