Problem Description:

Made by the original developers of MySQL, MariaDB has become one of the most popular database services on offer today, committed to staying open-source. They wanted to benchmark their column oriented data warehouses with the Greenplum Massively Parallel PostgreSQL (MPP) database, an open-source platform for analytics, machine learning, and artificial intelligence.


Solutions Highlights:

Since data warehouses are designed with scale and volume in mind, the benchmark had to be completed with a large scale database on a cluster setup. AWS was the best fit for this use case, spinning up r4.6xlarge EC2 instances and provisioning large EBS reserved IOPS storage volumes. Within these instances, MariaDB ColumnStore 1.6 and Greenplum 5.11 clusters consisted of four nodes each were configured.

AVM carried out an extensive study of both data warehouse engines. This allowed us to fine tune the appropriate DB parameters for a fair comparison between these engines. We decided to use Star Schema Benchmark tool (TPC-H) defined by the TPC organisation for this project. Benchmarking scripts and datasets for the two platforms were designed. Usage and performance metrics were captured and monitored using AWS CloudWatch. Final TCP-H compliant reports were delivered to the client. As it was large scale compute load, during the whole project we made sure cost is closely watched so client can get the best ROI.