Recently, Panoply held a webinar with Nathan Patrick Taylor, the CIO of the Symphony Post Acute Network. The focus of the webinar was on what Nathan calls the scalability paradox—the dilemma of focusing too early or too much on scalability, so much so that you derail a product's growth and it never reaches the size where it needs to scale.
Nathan offered a ton of wisdom for others trying to build a data stack and for analysts trying to build a career in data, so watching the full webinar is definitely recommended. But if you’re looking for the highlights, look no further:
- Defining the scalability paradox
- The road to building a data stack
- Doing predictive analytics with a team of two
- Advice for building a career in data
Defining the scalability paradox
Early in his career, when Nathan was working as an app developer, his team reached a point where it became clear that their code wasn't going to scale. But Nathan also realized that focusing prematurely on scalability could undermine the need for scalability—they wouldn’t need scalability if their product "never sees the light of day or if people are never going to use it."
That’s a common problem for startups, as they often find themselves trying to balance the time it takes to write scalable code and the need to build quickly and get the product in users’ hands.
Nathan said the same dilemma is present in the work that analysts do. At the beginning, scalability is far less important than getting something out there and getting feedback because there's no way to know in advance which analyses and reports are going to gain enough traction to need scalability. For example,
[I'll have] somebody email me, “When are you going to build this report, when is it going to be done?” And then I publish it and no one uses it. It just sits there, collecting digital dust. Other times, I build something on a whim because I have my feelers out.
For example, we did this geospatial analysis of some of our patients. No one asked for it, we just thought it was a cool thing to try. People loved it, and it ended up turning into a full-blown project.
The long, winding road to building a data stack
When someone gives a webinar on how they came up with the first-rate solution they're using today, the story often goes like this: First they built a solution that had serious problems, then they found an amazing tool or framework that overcame the problems, and so everyone lived happily ever after.
But the road to data success is often far more difficult than the cleaned up version we're given. In contrast, Nathan gave us the unvarnished story—and in the process reinforced the challenges of the scalability paradox.
Symphony started out by trying to build a traditional monolithic Microsoft data warehouse that captured all the complex data they had. After two attempts, they realized that building one data warehouse created a solution that might scale but was way too complex. So they switched to just building little data marts that were far easier to develop and "that ended up working out really well."
As the use of their data marts increased, eventually they ran into scalability issues. So, they switched to Snowflake. And as they did, they ran into a problem that Nathan thinks is one of the hidden issues of scalability for data teams: "if we build it and they come, is my estimate on how much data I'm going to need to move correct?" In their case, their estimate was way off. Instead of using 80-90 GB/month, they only ended up using 10 GB/month, which meant they were paying a lot for capacity they didn't need.
Eventually they realized that with Snowflake, "we were trying to kill a housefly with a nuclear bomb instead of a fly swatter," so they ended up going back to Microsoft's SQL Server. And then they found Panoply, which perfectly fit their need for speed of deployment and on-demand scalability.
With data infrastructure that fit their current needs, they rebuilt their data solution, using a strategy of "pick those winners first, those high-value targets that everyone's going to see." For Symphony, that meant starting with the one metric that mattered (OMTM): census, aka "heads in the beds." Next they tackled accounts receivable, invoicing, and aging, then worked their way down the list until they hit data that users need just once a month.
The value of this approach really paid off when the pandemic hit. For the first few months of the pandemic, their electronic medical records (EMR) vendor didn't have a way to store infection control information. Symphony couldn't wait that long, so they had users store the data in Excel spreadsheets. That would've been a huge pain to manage in a traditional data warehouse; with Panoply, they could easily import the data and analyze the results in Power BI.
Symphony's "agile" data stack
Ultimately, Symphony ended up with what Nathan calls an "agile" data stack. By agile, he means a data stack that allows his team to quickly build prototypes and modify production systems. Their data stack included:
- Alteryx. They are big Alteryx fans because it allows you to move and transform data using a drag-and-drop interface. According to Nathan, you could just as easily use Talend, Paxata, or RapidMiner—any tool that allows someone who isn't a data scientist to quickly get up to speed in handling complex transformations.
- Panoply. Panoply allowed them to quickly build out a solution, easily scale up when needed, and reduced the need for a lot of data work they'd been doing in Alteryx.
- Power BI and Tableau. For most of their staff, Symphony used the cloud-based version of Power BI for their analytics and reports because it’s super easy to use and the licensing costs were already covered by the use of other Microsoft cloud products. But for staff who were comfortable slicing and dicing data, Tableau was a better tool for their in-depth work.
"Now we can do the cool stuff": Doing predictive analytics with a data team of two
Now that all the basics are working well, Symphony is beginning to explore predictive analytics using platforms like DataRobot, an automated machine learning platform. They pull their data from Panoply, then use Alteryx to move the predictions from DataRobot to their Power BI service.
So far, they've been able to build models for readmissions, falls and pressure ulcers, and operations-related issues such as turnover in a remarkably short amount of time. In less than two weeks, they built a model and a means for outputting the results, a Power BI report that lets users know whether a patient is likely to end up becoming a readmission.
They are also beginning to gear up for forecasting, which "is its own beast." Here's an example of the kind of issue they're addressing:
[We] want to know if this person's likely to have a fall in the next couple of days, and why is that? Is it because they have the wrong assistive device? Is it because they're on a medication that makes them dizzy? Are they too weak? Maybe they're trying to do too much before they've actually regained strength.
Nathan stressed that it’s important to realize that their goal isn't perfect accuracy, it's to get a better forecast of what's reasonably likely to occur and to enable frontline staff to act.
Advice for a building career in data
Nathan argued that fear is one of the biggest obstacles to building a successful career:
One of the things that I recognized as I started to move from position to position was to drop your fear of saying the wrong thing, drop your fear of putting yourself out there.
In particular, he suggested:
- Really listen in conversations with your colleagues. Nathan says it lets you "find opportunities in a nice and nonaggressive way" to become an asset in those conversations and to provide analyses that staff otherwise might not realize they need.
- Putting yourself out there can open up amazing opportunities. Nathan started out as a developer, but when he realized that what most grabbed him was data work, he navigated his way into working on data full time. Then one day, his manager said to him, “I need you to build a data warehouse.” Nathan dove in, coming up with an estimate of the hefty amount of technical resources and staff needed to pull it off, then got the go-ahead. Then he scrambled to get up to speed—"[I] got a ton of books and did a ton of reading"—and hired the team to make it happen.
- Avoid overconfidence. Part of the way you overcome your fear is to "readily admit where you are weak." He says it’s ok not to have domain expertise about everything, but making clear where you’re coming from as an analyst and then working with stakeholders to figure out what they need is key.
- Actively learn from your mistakes. You have to be readily willing to admit when you've made mistakes and learn from them. Nathan said it was tough to go back to his team (and his boss) and say that switching to Snowflake was a mistake, but it was the right thing to do.
It’s rare for a webinar to provide an unvarnished view of reality, but that’s exactly what you’ll get with The Scalability Paradox. We’ve hit the highlights here, but there are plenty more nuggets of wisdom for you to find, whether it’s about when to use incremental loading vs. kill-and-fill or how to make a posture of humility work for you in the workplace.
You can check out the full webinar here, but we also recommend following Nathan on Twitter and subscribing to his YouTube channel so you don’t miss his takes on tooling, career tips, or tricks for taking your Power BI dashboards to the next level.