This is the third and final part in the 101 series covering big data concepts, terminology and technology
When I joined Panoply.io I was clueless as to what data driven meant, much less big data and the technologies that drive it. I'd spent the last five years at companies that were more about go, go, go then why, when, where. That cluelessness was one of the driving factors to join this team. While I trust my gut I wanted to learn how to objectively quantify and argue for those instincts. The distance between my aspirations and reality was acute. My coworkers can attest to this as I'm sure that on more than one occasion they've suppressed face palms.
From my point of view not only were data infrastructure and visualization not mutually exclusive they were the same thing. I simply didn't know how to look beyond the cosmetics and see the engine underneath.
One should keep in mind that while analytics and data visualization are natural progressions of data infrastructure neither requires the other to exist. Therefore a decision in favor of one does not require the other.
We are frequently asked how data warehouses are different from analytical tools such as Google analytics and mixpanel. The short answer to this is that they are two entirely different things but let's explain this with an example. You're looking for a way to get around, you live in the city but don't need to cover any great distances. So what car should you buy? You don't buy a car, you buy a bike. A bike is more than enough for your needs and you'll be broke within a month for the parking fees alone. The same is true for niche analytical tools. If all you care about is your website traffic than Google analytics is more than enough but if you want to analyze that data alongside in-store sales data then you need something more sophisticated than an out-of-the-box plug-and-play system.
These niche analytical tools are powerful because they focus on exactly one vertical and therefore can assume commonality logic in between their customers. For example all mixpanel customers will have mobile users which can be segmented into new and returning users. Every single event monitored is within this environment therefore the commonality logic holds true. The second you step foot outside of this environment you'll need to rework the data.
This additional step is exactly why data warehouses exist they are here to bridge the gap the minute you want to ask a question that is outside the limited environment of your plug-and-play analytical tool. For panoply.io tools like a Google analytics and mixpanel are data sources replacing them is not our goal enriching them is.
Let's say you love the color red. You see a red car. So you decide to buy it.
This statement is absurd. No one in their right mind goes out and buys a car because it's red. I would expect that it's pretty clear where I'm going with this… But let's go there anyway. When you look at a dashboard you are looking at someone else's data. No one is promising you that when your data is connected into the system you will see that dashboard exactly in the same way. But don't take my word for it, here is a resource with multiple dashboards available for download to Google analytics. Go ahead and connect them to your Google analytics account. Once you've done that take a look at the data and see if it makes sense. Now don't get me wrong these dashboards are great in fact I use more than one of them. But none of them work out of the box. For example if you look at the blog content you'll see that you'll need to filter out all of the pages that are not relevant for this dashboard. So even in the scope of a niche analytical tool environment there is no getting around customization.
So next time to get all googly eyed about a dashboard keep in mind that unless your business KPI's and metrics correlate exactly to the metrics in the dashboard you're looking at customization work to get off the ground.
Big data is about 3 V's variety, velocity and volume. Data visualization is not a big data problem but rather a human problem. We are simply not capable of instinctively understanding data sets of the variety, velocity and volume that today's businesses are generating. Understanding the where, when and why of data has nothing to do with blending, speed and capacity. On the contrary, blending, speed and capacity have everything to do with how you will receive your wear clothes when and why but not the other way around.
There are amazing analysis and data visualization tools out there today and they are answering a real need for businesses not only to understand their data to communicate those findings to the organization. That said, a business's decision to adopt data visualization and analysis technology has absolutely nothing to do with its data infrastructure. Each decision should be disparate from the other in order to ensure that each maximizes its potential.