Oracle on AWS
PriceSpider provides a commerce platform for brands to drive conversion, track performance, monitor retailers, monitor digital commerce, ensure guideline compliance, and take action on all these inputs unifying brand marketing and performance marketing. Their clients, including many of the world’s top brands, rely on them for reliable, timely data.
PriceSpider processes tens of millions of business events per day. Their Engineering and Business Ops teams make use of Kibana based dashboards to ensure these events are flowing correctly and customer changes are visible quickly.
In order to onboard new customers and fulfill customer feedback critical events must appear in these dashboards as soon as possible. They were seeing cases where some (but not all) events were severely delayed impacting multiple customers each day. They found nothing in common about these events or any root cause.
AVM worked with PriceSpider’s Platform Engineering team on understanding the event pipeline that included AWS Lambda, Amazon Kinesis Data Firehose, and Amazon ElasticSearch Service and its Kibana as well as establishing the KPIs important to all stakeholders. Based on that, AMV Streaming experts then performed an end to end analysis of the component configurations and historic performance using CloudWatch to pinpoint the problem areas and inefficiencies.
Within weeks AVM distilled many separate issues in the pipeline into new resiliency strategies in Firehose and ElasticSearch that exceed the performance SLAs 99.99% of the time, improved observability and alerting of the complex pipeline, and proposed developing additional Lambda based tools to evolve the pipeline to a self-healing system. In addition, AVM found cost savings of over 50% with a more resilient architecture.
Organized, sophisticated and persistent cyber-threat-actors pose a significant challenge to large, high-value organizations. They are capable of disrupting and destroying cyber infrastructures, denying organizations access to IT services, and stealing sensitive information including intellectual property, trade secrets and customer data. SMB organizations are often challenged by incident response management, in part because incident response procedures may not be established.
Therefore it’s critical for an organization to identify and respond to security incidents and events in a timely manner. Whether a breach is small or large, organizations need to have an incident response policy in place to mitigate the risks of being a victim of the latest cyber-attack.
Edmodo is an educational technology company offering a communication, collaboration, and coaching platform to K-12 schools and teachers. The Edmodo network enables teachers to share content, distribute quizzes, assignments, and manage communication with students, colleagues, and parents. Responding to any security incident is a critical element for Edmodo’s businesses and data security compliance requirement. It’s also essential for Edmodo to identify and respond to security incidents and events in a timely manner. Whether a breach is small or large, Edmodo wanted to have an incident response plan in place to manage the lifecycle (preperation, detection & analysis, containment, eradication & recovery, and post incident activity) of all security Incidents. The faster they detect and respond to security incidents, the less likely it will have a significant impact on their data, customer trust, reputation, and a potential loss in revenue.
We undertook a comprehensive analysis of their existing policies in place, their current team structure, and security incidents happened in the past and their preparedness to handle any future security incidents. We evaluated NIST SP 800-61 & ISO/IEC 27035 standards and based on their existing org structure and specific need, we decided to go ahead with creating security incident response policy based on NIST SP 800-61 standards.
Performing incident response effectively is a complex undertaking, establishing a successful incident response capability requires substantial planning and resources. Continually monitoring for attacks is essential and establishing clear procedures for prioritizing the handling of incidents is critical, as is implementing effective methods of collecting, analysing, and reporting data.
We created a set of practices, processes, and solutions that enabled Edmodo’s Security Incident Response Team (SIRT) in rapidly detecting incidents, minimizing loss and destruction, mitigating the weaknesses that were exploited, and restoring IT services in the shortest possible time.
With incident response policy in place Edmodo’s SIRT team is now able to quickly detect, investigate, address vulnerabilities and issues, and respond to all IT security incidents in an efficient and timely manner. Faster responses helped them reduce the overall impact of incidents, mitigate damages, and ensure that systems and services continue to operate as planned.
Without incident management, an organization may lose valuable data, experience reduced productivity and revenues due to downtime, or be held liable for breach of service level agreements (SLAs). Even when incidents are minor with no lasting harm, IT teams must devote valuable time to investigating and correcting issues.
Artificial intelligence has become a must have strength for almost all organizations to be sustainable in the business. Some enterprises have already embraced it and others are planning to invest and included it in their roadmaps. However still the success rates of these investments are at a low level. This is mainly because the information infrastructure of the organizations is not ready for artificial intelligence projects.
Incite Logix has a well experienced team who can help organizations to be successful in artificial intelligence related implementations. They have gathered intellectual knowledge in this problem domain after successfully completing several projects at different organizations. However it’s a lot of technical and non-technical jargon which is hard for someone to read and understand.
The AVM team joined with the Incite Logix team to find a solution to bridge the gap between the people and the intel that the company had collected. Our primary goal was to present this intel to the users in an understandable manner and self measure their information infrastructure readiness to embrace AI projects.
As a result the two teams came up with the idea to implement a web and a mobile application where the users can register themselves and measure their information infrastructure readiness through a question and answer model which provides a categorized and aggregated score with an action plan to work on.
We at AVM always work with industry leading cutting edge technologies and decided to design this solution using serverless technologies offered by Amazon Web Services. Following the client-server architecture pattern we developed the frontend as a single page application using ReactJS and deployed in AWS using the CloudFront and S3 buckets. This approach gave us the opportunity to serve the application worldwide with reduced latency. Backend application has been built on top of serverless framework using NodeJS, and has been deployed on AWS using API Gateway and Lambda functions and DynamoDB as the persistent store. We used Amazon Cognito for identity management and it simplified most of implementation efforts and gave a solid layer of security. Next the mobile applications were developed using the Flutter framework and that gave us the opportunity to implement for both IOS and Android platforms parallelly reducing a lot of development efforts. Finally, no solution can be successful without proper monitoring and telemetry. We used Amazon CloudWatch logging with alarms configured where necessary to keep the team informed of any failures. And the application usage monitoring was achieved through Amazon Pinpoint and Google Analytics.
In this way, the complete solution has been developed and deployed and a 100% serverless application with very small running cost footprint.
In early 2017, Match.com had become the largest online dating platform reporting over 35 Million users, with the only competitor eHarmony far from catching up with only 17.5 Million. The advent of this new romantic age that leveraged online technologies in the quest for love, brought with it a whole new category of challenges for the platform operators. The number of requests to their servers were no longer in the thousands but in the Trillions. Yet these new types of challenges facing Match, were perfectly suited to be addressed by leveraging the scale and performance benefits of cloud solutions and integrating these with traditional day-to-day IT operations.
One of the first challenges faced by the company was to modernize any remaining monolithic architecture for increased performance and agility. Previously within their software system, functionally distinguishable aspects of their applications, such as data I/O, processing, error handling and user interfaces, were interwoven rather than being isolated into separate architectural components. Other bottlenecks and issues included the elastic demand capabilities of their web servers, and the high capital expenses of provisioning new resources for the on premise data centers.
In order to facilitate performance improvements and greater agility we conceptualized and implemented a full service end-to-end cloud migration and adoption strategy based around the cloud services offered by Google (GCP) and Oracle (OCI). First, we helped them re-architect their existing infrastructure and applications into a suite of independently deployable, modular microservices. As such each application runs a unique process and communicates through a well-defined, lightweight mechanism. With the help of Docker Containers we helped them migrate these from their on premise locations to the Google Cloud Platform (GCP). Initially, our team used the ExtraHop platform for a continuous auto-discovery of application dependencies and to identify and map architectural requirements necessary to run these applications on GCP. This allowed us to configure and provision Match’s new cloud-based VM environment in a way that would optimally serve the needs of their applications.
Furthermore, we used HashiCorp’s cloud configuration and orchestration tool Terraform to spin up a highly elastic farm of Apache Web Servers in the Google Cloud, to meet the unpredictable and volatile number of requests coming from the online dating platform. This enabled Match to scale flexibly to meet demands and provided significant cost-savings by scaling down when demands were low and stable. Finally, after this initial cloud solution, Match.com commissioned us to help them migrate their database as well. Subsequently we migrated their Oracle DB from on premise to the Oracle Cloud AZ in Phoenix. This is done with the aim of maintaining and improving performance further through the utilization of Oracle’s Baremetal infrastructure. Simultaneously, we are facilitating significant Oracle licensing cost savings through the provision of dynamically scalable instances (elastic CPU scalability) and automation.
When H&R Block came to us they were facing various challenges. Whilst already largely virtualized, their infrastructure and IT systems contained applications with many legacy features and the performance of many applications was suboptimal. Especially one of the consumer group services tended to underperform and required elastic scalability to service fluctuating numbers of consumers. Finally, as a financial services company, comprehensive and complete data security throughout their cloud solution was of critical importance and one of the main priorities for H&R Block.
To improve performance of the internal API Gateway and consumer group service, we migrated it to AWS using Terraform for the infrastructure as a code. However, the migration required a lot of planning and analysis, as there were complex multi-dependencies that had to be discovered and mapped out, and many legacy features that needed to be removed.
Furthermore, as this consumer group service dealt with financial data and customers’ private information, overall data security within the cloud solution was of paramount importance. Therefore, it was necessary to ensure our solution design guaranteed in transit and at rest data encryption of the highest standards.
We tackled this challenge by establishing permissions which followed the AWS Security principle of least-privilege. This allowed us to minimize the blast radius and drive the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) to down under an hour.
As a platform, Tickets.com must handle a large number of requests and manage many varied integration services from multiple third party vendors who are selling tickets through Tickets.com. This multi-party integration of different APIs posed significant problems for setting up and managing the platform’s backend.
Furthermore, as the event industry is subject to a highly volatile and seasonal demand fluctuations, the platform needed to be able to rapidly and inexpensively scale up or down capacities depending on demand.
In response to these requirements, we helped Tickets.com architect an AWS based cloud solution that contained a simplified third party integration service. Additionally we designed and provisioned elastic RDS services in AWS to help them manage their volatile loads.
These solutions effectively helped Tickets.com gain the scalability and elastic capabilities they needed for future growth of the platform, and helped the company minimize costs, risk, and down time by gaining stability for their services because the predictable performance of the AWS cloud was leveraged.
Made by the original developers of MySQL, MariaDB has become one of the most popular database services on offer today, committed to staying open-source. They wanted to benchmark their column oriented data warehouses with the Greenplum Massively Parallel PostgreSQL (MPP) database, an open-source platform for analytics, machine learning, and artificial intelligence.
Since data warehouses are designed with scale and volume in mind, the benchmark had to be completed with a large scale database on a cluster setup. AWS was the best fit for this use case, spinning up r4.6xlarge EC2 instances and provisioning large EBS reserved IOPS storage volumes. Within these instances, MariaDB ColumnStore 1.6 and Greenplum 5.11 clusters consisted of four nodes each were configured.
AVM carried out an extensive study of both data warehouse engines. This allowed us to fine tune the appropriate DB parameters for a fair comparison between these engines. We decided to use Star Schema Benchmark tool (TPC-H) defined by the TPC organisation for this project. Benchmarking scripts and datasets for the two platforms were designed. Usage and performance metrics were captured and monitored using AWS CloudWatch. Final TCP-H compliant reports were delivered to the client. As it was large scale compute load, during the whole project we made sure cost is closely watched so client can get the best ROI.
eHarmony is one of the most advanced development firms we have seen: they have followed an agile methodology for years, being one of our first customers to run CI/CD pipelines. Each development team runs their development in parallel development environments in AWS, which they call runaways. They later merge all work from these runways to trunk and deploy to production.
The critical showstopper to making this process fully automated was their Oracle database, a key component of the architecture. The DB is an exceptionally powerful RDBMS, which did not have much tooling around release automation.
eHarmony required a solution which can clone the Oracle DB, mask its data, then apply or rollback an incremental set of release-specific changes in a repeatable, automated way. This would allow them to roll the DB forward and back in a finely grained fashion.
AVM adopted AWS Elastic Block Storage (EBS) snapshots to clone the Oracle DB data before spinning up any new runways. We applied a set of custom scripts to mask and map the DB data. This is a fairly standard industry solution, which became more interesting later when we started to address incremental changes.
Previously scripts would be applied to the DB manually: no history on these scripts was managed, so it was not possible to reproduce previous activity. We redesigned the DB release process, creating a framework to apply changes in a structured way. This framework consisted of Golang Core and Bash scripts.
Within the framework, DB changes are first pulled from GitHub and analysed against a set of pre-checks. If these checks passed, the framework applied its scripts to the DB and logged any activity in the GitHub repository, tracking the status of execution. A rollback script was also created as part of the process.
With a repository and an audit trail of all DB changes at our disposal, we were able to deliver two new features: the first is being able to roll the DB forward and back, and the second is to roll the DB in a selective fashion. In other words, we could choose which runways to apply which branch scripts to. This allowed us to introduce a revolutionary merging technique to the DB, which worked alongside git branch merges. We designed a wrapper around it, using Ruby on Rails following eHarmony’s request, and plugged it into the CI/CD pipeline as a Jenkins and CodePipeline job.
This is one of the case studies we are most proud of. Have you found yourself in a situation when you have a standard ask, which you would think others have already faced thousands of times before? Ever found yourself in the position where you do an online search, and you see everybody is asking the same question, but there is no answer? In this engagement with eHarmony, we addressed exactly this situation, creating a product that the world was looking for and which can be reused to help many, many more customers going forward.
Felix delivers intelligent local advertising, connecting businesses with the right customers. They wanted to migrate their on-premises Percona MySQL database to AWS, taking this opportunity to upgrade their DB engine from MySQL 5.0 to 5.7 along the way.
AVM explored a range of options for the migration to AWS – including Aurora, EC2, and RDS. Data Migration Service (DMS) was also considered for this project. We compared an abundance of options for Felix, ranging from Percona MySQL to community editions of MySQL. Felix benefited hugely from Percora’s threading functionality. Since RDS uses the community flavour, we decided to rule this service out.
The biggest challenge in this project presented itself in the upgrade from MySQL 5.0 to 5.7. This had to be a multi-step migration, since the database was 3TB in size and a downtime of at most one hour was permissible. We did an extensive study for the upgrade, establishing how to speed this up and provide minimal downtime.
We settled for Percona MySQL on EC2, tweaking the instance type and IOPS for optimal performance. To accelerate the DB engine upgrade, we adopted MySQL’s mysqldump functionality with custom parallel execution, choosing this over native data migration tools because it could better handle ref constraints and triggers.
For over a decade, Linked2pay split their infrastructure between two physical datacenters. Operating within the payments industry, they have to be PCI compliant, which meant these datacenters required substantial maintenance in order to meet PCI-DSS standards. As a result, operational costs were an ongoing concern for the company.
AVM undertook a comprehensive analysis of Linked2pay’s operations: studying their business priorities, costs, production loads, resource utilization, and Service Level Agreements. After considering both local and cloud based solutions, we concluded that a migration to the cloud would be hugely beneficial for Linked2pay.
This migration presented a new opportunity for the business to rebuild their system from the ground up: reducing costs, eliminating the need to maintain underutilized hardware, and easing the patching process. We decided Oracle cloud was the most economically and technically viable product for them. Working both on-site and remotely, we conducted a series of workshops to prepare for the migration, documenting everything along the way.
Using GitLab for source control, we moved Linked2Pay’s IBM legacy stacks for all environments – including development, staging, and production. To ease the process, and bring agility to the team we implemented a blue / green deployment model using Ansible and Terraform, whilst also switching their Cisco ASA firewall to FortiGate Next-Generation.
The IT culture change within Linked2pay formed a critical part of the migration. Moving away from physical infrastructure to the cloud was warmly welcomed from development and operations all the way up to C level. Linked2pay saw their costs cut in half, with system availability rising to 99.999%. Seeing such a positive impact – particularly on their IT budget – was a hugely rewarding experience!
Edlio hosted their core Tomcat-based application in Amazon Web Services. The application suffered from connection leak and couldn’t scale to manage variable traffic loads. They had to regularly restart the application to manage these connections and over-provision the instances to handle peaks in traffic.
Working both on-site and remotely, AVM delivered a complete solution to the table which addressed both problems. Using Ansible and AWS CloudFormation, AVM implemented autoscaling into the application, allowing it to grow and shrink in-line with incoming traffic.
Legacy load balancing components were replaced with AWS Elastic Load Balancers – the native equivalent which integrated with ease into the autoscaling architecture. This dramatically increased application stability, reducing costs along the way.
To address the connection leaks, AVM profiled application using AppDynamics – providing real time insight to detect anomalies and identify where they arise. In doing this, AVM were able to trace legacy Java code deficiencies and increase stability even further.
eLocal is a fast growing advertising firm helping consumers to connect with businesses in their local community. Specialising in home services, legal, and finance, they were using two groups of AWS EC2 instances with Oracle Enterprise installed to distribute uneven SQL traffic loads. SharePlex third-party replication technology synchronised the data between these instances.
Data often fell out of sync, and there was no single source of truth to re-sync the data. Also since each instance was independent, the SQL plans for each varied, in turn presenting a huge stability challenge.
AVM analysed eLocal’s entire stack from top to bottom. We concluded that the best solution for eLocal would be to switch from SharePlex to Oracle Data Guard, guaranteeing high availability, data protection, and disaster recovery for enterprise data.
Since most traffic was distinguishable between read and read-write pattern between these two set of instances, performance could be load balanced using a custom Oracle connection pool setup with Data Guard on standby. This configuration allowed for a full separation of concern between the databases and the application: in the event of a DB failure, traffic seamlessly re-routed to the other instance without intervention from the application.
This setup was advantageous on two fronts: SLA improved with fail-over fully automated, whilst costs fell with a smaller head count maintaining the infrastructure. Given there were no additional costs from either AWS or Oracle, and SharePlex was removed entirely from the stack, we made huge savings in licensing costs, much to eLocal’s appreciation!
CityGrid enables local businesses to thrive: promoting their business and managing their presence across an abundance of review sites and search engines. They were running their Oracle Enterprise infrastructure across two physical datacentres, hosting production on the West Coast and a disaster recovery center on the East Coast. With Puppet automation already embedded into their processes, CityGrid’s main priority was to reduce costs.
Since their parent company, IAC, was moving to the cloud, a migration to AWS was a sensible economical move for CityGrid. AVM took responsibility for the migration of their core Oracle infrastructure, which stored essential content, GEO, and financial data.
We undertook a comprehensive analysis of the infrastructure, considering both native AWS solutions and third party options. We settled on EC2 instances with Oracle Enterprise installed, since AWS Relational Database Service (RDS) couldn’t support the Oracle Enterprise features.
Between AWS costs, Oracle licensing, and database performance, we wanted to strike a perfect balance in our solution. The Oracle installation within EC2 was fully automated, including most features of RDS, whilst maintaining sufficient licensing freedom. We adopted Oracle Data Guard for the migration, considering it the most viable, resilient, and cost-effective product for CityGrid’s stack.
For CityGrid it was one of the smoothest migrations the company ever performed. All environments – from development, QA, UAT, all the way to production – were migrated to the cloud, leading to the desired effect of reducing costs on the infrastructure.
CloudKnox provides a single platform to manage identify privileges across multiple cloud environments, allowing customers to protect themselves against malicious activity and compromised credentials. Their build and deployment processes entailed manual script execution. As the business grew, these processes increased in complexity to handle multiple cloud platforms.
Without automation, these processes consumed a lot of time, leaving VMs without the latest security patches vulnerable to attacks. Furthermore, CloudKnox’s MongoDB database was a bottleneck on services, impacting performance. Both costs and security were top priorities.
Working alongside CloudKnox’s Internal Operations team, AVM automated the build and deployment process using Terraform. We implemented user-based SSH access to the VMs, controlled via code. Furthermore, we enabled end-to-end encryption in-transit using SSL. In doing this, we removed the risks associated with manual deployment, reduced costs, and increased instance security.
To reduce bills further, we migrated CloudKnox’s services from on-demand to reserved and spot instances. Custom AMIs were defined along with an automated patching process using Packer. This allowed CloudKnox to update instance AMIs with the latest software patches on a regular basis, addressing vulnerabilities and providing better protection against malicious attackers.
Tools like AWS Inspector were adopted to automatically scan these instances and generate vulnerability reports. Using these reports, CloudKnox were able to benchmark their services against CIS standards, quickly identifying any deviations and remedying them to meet CIS compliance.
Logging was enabled using a combination of Loggly, Graphana, and CloudWatch for greater visibility on instance activity. CloudKnox were then able to monitor the logs for any malicious activity. Metrics and alerts were configured off the back of these logs to optimise threat detection, quickly notifying teams of any unusual behaviour and automatically triggering functions to address potential threats.