Businesses need to deliver insights fast if not immediately. If this were not true, we would not be writing this post, you wouldn’t be reading it and over all, big data would be less of a diamond in the rough. That said, fast is a relative term and one that is not necessarily equivalent to efficiency.
Often, when an analyst is asked a question the first instinct is to rush to find the answer. When asked how many users opened your application in the past month, the answer is quite easy, all you need to do is count the unique users that opened the app in the past month. Three minutes tops and you have your answer. So far so good, but what will happen if the question is much more difficult to answer? The time that you will need to invest in it will definitely be more than a few minutes. Often, you will save your script and if you’re lucky, next time you’re asked the same or similar question you will only need to change a few parameters in order to answer the question. However (come on, you knew that was coming), if these types of questions are asked often it turns into a tedious task and the reality is that the questions will never be exactly the same so more and more often changing the parameters simply won’t cut it.
This methodology, or to be more exact lack of methodology, will result in you building many one-time reports where each script answers a specific question or a set of questions. To break this spiral and empower data consumers to ask and retrieve data on demand you need to think ahead and create relevant KPIs in advance. Why create a report/script that shows the MAU (monthly active users) of Nov. 15′ when you can create an aggregated table (as granularly as possible) that will answer this question and many more. Just to set a few things straight here’s an example:
Q: MAU Nov. 15’?
Q: MAU Oct. 15’?
Two scripts written so far. Only now you are asked the same questions per platform:
Q: MAU Oct – Nov 15’ by platform?
Now we know MAU per platform. Did you notice that the sum of both platforms by month is higher than the total MAU we’ve received in the previous table; Apparently some of your users are both iOS and Android users. This issue will reoccur the granular the data becomes; the deeper down we will dig the more often the cases where users can be in two different rows at the same time. We can solve this by saving several different levels of aggregation (in the same table/query or a few). The result table should look like this:
This way all the user needs to do is ask what was the MAU with the relevant parameters he needs. As long as we keep the KPI table granular your users will be able to get answers for any question they have so long as sufficient data is available. Obviously you can add to the KPI table more dimensions like: country, gender, city, etc.:
|Month||Platform||Country City||Gender||MAU Installs||Revenue|
All the data consumer needs to do now is query this KPI table and ask for the relevant combinations only. Note, that now your data consumer (CMO/ CEO/ product manager and so on) has the answers for the “simple” questions at the edge of their fingertips. This will free up a huge amount of time allowing you to focus on deep analysis, data mining and so forth.
So to conclude, with a single query we can now answer many questions and not only one. The MAU KPI created enables us to ask as many varied questions regarding this set of KPI (MAU of Nov. 15’, Average MAU on Android, male MAU in Oct. 15’ and many more). All we need to do is query this KPI table/ view and we will have all or most of the answers we are looking for. Obviously, this KPI table/ query/ view will require maintenance if a new dimension or metric is added to the data but on a day to day basis there is little to no work needed on this query. The end users will be able to use this query/ table/ view in any data visualization tools the company has.