EC2 is not enough: Donor Bureau's need for a data science platform
Donor Bureau (DB) uses large data sets and machine learning algorithms to help clients maximize return on investment for their fundraising mailings.
DB found that it was spending too much valuable analyst time maintaining its EC2-based compute infrastructure (e.g., maintaining configurations).
DB “outsourced” these needs to Domino; its analysts now use Domino as a platform to conduct their analytical modeling for clients instead of using EC2 directly.
On net, DB estimates that Domino has saved its analysts a full day each week — time that was previously spent maintaining infrastructure can now go to billable client work.
Donor Bureau (DB) helps hundreds of non-profits and political campaigns optimize their direct mail campaigns, maximizing return for each letter sent. DB's data scientists employ a variety of machine learning techniques to develop and train targeting models against a data warehouse of over a billion transactions and tens of millions of donors.
To meet their heavy demand for computational resources, DB used Amazon's EC2 infrastructure — but managing EC2 machines for a production data science workflow wasn't simple or easy. DB had to grapple with EC2 setup and configuration and software installation; and then develop their own tools and techniques for transferring data to their machines, transferring results back, spinning machines up and down, maintaining configurations, and keeping snapshots of their work. In other words, they had to build a data science workflow on top of EC2.
The original infrastructure setup took a month, and in recent months DB found that its analysts were spending more and more time maintaining and improving this infrastructure. This work included dealing with configuration issues on EC2, adjusting for changes that Amazon introduced to their infrastructure, figuring out better workflows around data, and maintaining integration into a version control system.
It was a time-consuming headache, so DB's Head of Technology, Brian Johnson, went in search of a better solution.
“We're no longer maintaining all the infrastructure required for our people to use EC2 — we've outsourced that to Domino. And in total cost of ownership terms we're much better off.”Brian Johnson, Head of Technology
In May, DB moved to Domino as its data science platform, rather than continuing to use EC2 plus its custom infrastructure. Data scientists continue to use their favorite IDEs for Python and R. But now they use Domino to kick-off model runs and store and share results. And Domino handles all the behind the scenes issues with EC2 — from spinning up and down machines to billing.
Johnson describes the transition as “painless.”
Now DB is free from maintaining its infrastructure — Domino handles all the plumbing, so DB can focus on its analysis.
Domino increases DB's productivity in two ways. First, it lets DB run its models on arbitrary cloud hardware with one click, completely abstracting them from the details of managing cloud machines. And second, Domino automatically keeps a revisioned history of DB's work, including code, data, and results, so its analysts can always reproduce past work, and never lose results.
Most importantly, Johnson estimates that he's gotten back about a day a week of analyst time. In other words, they are 20% more productive now that they are free of the burden of infrastructure maintenance.
In total cost of ownership terms, Johnson considers this a no-brainer. His overall bill for infrastructure is about the same, but his team is much more productive.
And DB's analysts get a user-friendly, ever improving data science platform for their use as well.