Learning SQL and a Smattering of Python
By Scott Burns
I want to let you in on a little secret: our product, strategic resources group (SRG), and data services team members are the real healthcare experts here at Stratasan.
Our team of developers are pros at what they do: running data warehouses and computing infrastructure that our applications use to store and analyze complex healthcare datasets. But we rely heavily on the expertise of our team members to prioritize what needs to be developed within our applications.
Based on their direction, we learn what our customers are asking for and then determine what new features and products to build. Working hand in hand, our teams create useful tools that can derive insightful meaning from healthcare data.
Recognizing the need to close the gap between our development know-how and their healthcare insight, we’ve taken steps to improve the flow of communication between our teams and proactively facilitate interdepartmental collaboration. Our goal is to decrease the time it takes to get a product to market so we can more efficiently address the needs and goals of our partners.
In this post, we’ll discuss the steps we’ve taken to tighten our processes and provide a platform where the best thinking from every team can be fully utilized. We hope that after reading this, you’ll pick up a few ideas on how to improve your internal workflows and increase cross-department teamwork.
Our product suite is aimed at varying distances from the raw underlying data. Our Blackbird tool is very close, allowing users to define specific extracts of the data, while requiring know-how about what columns exist in the data and how to make it meaningful. Canvas is slightly farther away in that it builds reports which may be the output of multiple queries. Launch and other Pathways applications are a step even further away from the raw data but decidedly add a lot of visualization capabilities on top.
Historically, our SRG team members have built new products and deliverables by querying for raw data through Blackbird, loading the generated CSV into Excel and building analyses from there. Excel does a lot of things correctly, but this process has a few pitfalls. First, unless you’re deeply in tune with the data, it can take a few different Blackbird queries to pull correctly-shaped data. Second, there is no version-control in Excel so it can be difficult to tweak or recreate a particular analysis.
To overcome these issues, we’ve provided these teams with tools that have direct access to our underlying data warehouses including direct SQL access and Jupyter Notebooks. If you’re unaware of Jupyter Notebooks, it is a web-based tool for building documents that contain live code, equations, visualizations, and narrative text. By pulling and analyzing data from within a Jupyter Notebook, our analysts can build more reproducible and easy-to-follow analyses, making it easier to understand what we deliver to a client. Some of our newer teammates bring years of SQL experience so providing them read-only access to a copy of production data also lets them answer questions more quickly.
Moving forward, we expect that each new product handed to a developer should have an accompanying notebook that defines the really valuable analyses so that our developers don’t have to figure out how to implement this ourselves and can, therefore, spend more time on improving that analysis, making it performant and building an elegant user interface around it. Most importantly, by allowing them closer and deeper access to our data warehouses, we hope to decrease the time it takes to get a product to market.
Our thinking is, let experts be experts and focus on what they’re good at. By teaching our analysts SQL and a smattering of python, they are empowered to deliver more informed product direction and our developers can more efficiently generate new deliverables.
Maintaining Security, Increasing Efficiency
Heads up, this next section is going to get pretty technical, but stay with me!
Having deployed a Jupyter Notebook server, we’ve provided our analysts a full computing environment with Python and R interpreters available. We’ve pre-installed many of the very capable libraries for data analysis including pandas and sqlalchemy. By deploying this within our Amazon Web Services infrastructure, we can maintain secure access to our data warehouses while simultaneously removing the burden of requiring each internal user to install and run a Jupyter Notebook server themselves. We have also configured Jupyter such that we can share our Notebooks and make copies of those written by other teammates, making it easy to collaborate and extend others work.
Every day, we build a version of our main application that’s pointed against nightly backups of our production databases. By granting our internal users access to these databases, we ensure errant queries will not affect users in our production application while still providing very fresh data.
Empowering Analysts to Learn
Developers have been writing pure SQL and prototyping code in Jupyter Notebooks for many years now. Making this automation perform correctly and reliably is the nature of our work, but that doesn’t mean automation isn’t useful for non-developers. This isn’t meant to be a religious war of Excel vs. Python. You should always choose the best tool for the job.
We do believe that Python is a great choice for analysts hoping to add a programming language to their toolbelt and SQL will remain the lingua franca for accessing tabular datasets for many years to come. We don’t expect our analysts to write production-quality code, but there is definitely a valuable sweet spot to be found using code to automate some of the more mundane tasks around data exploration and analysis.
We feel all of this work is an investment in our product pipeline and people.
We hope that by providing some of the tools, we’re lowering the barrier to entry for our analysts to get a deeper understanding of our datasets and push it in ways that will help our clients deliver better healthcare to their markets.
More important, however, is that by building these analyses within Python, our analysts are producing the initial proof-of-concept for potential new features or product within our main application. This strategy confirms that we build products that have value. We also ensure that we’re iteratively developing our products and delivering exactly what our users are looking for.
For more information on how to leverage Stratasans products and services to accelerate innovation, schedule a discovery call with one of our experts today.
Article by Scott Burns, VP of Engineering for Stratasan