Meteorcomm is the rail industry’s trusted choice for wireless communications technology to reliably transport mission-critical information. To enable the safe and efficient operation of the world’s trains, Meteorcomm helps railroad service organizations embed kill switches and other technology into their railcars to perform functions like halting the car in case of an emergency. In addition to ensuring passenger and operator safety, these solutions enable Positive Train Control (PTC) compliance and improve operational efficiencies.
The hardware and software employed in Meteorcomm’s messaging and systems management solutions is complex and highly integrated. To help them deliver solutions as quickly as customers need them, they are committed to agile software development principles and a continuous integration/continuous delivery (CI/CD) approach. Metorcomm’s solutions require each railcar have custom hardware and dedicated software onboard.
Meteorcomm made significant investments in test automation, running over 3,300 complex integration tests against nightly builds of their Interoperable Train Control Messaging (ITCM) application. Historically, to support this ITCM application, they’ve relied on a large bank of massive density blade servers running in two private data centers managed by an internal IT team.
In an attempt to enable their engineers to request capacity support for their computer needs on-demand, these servers have been virtualized into thousands of VMware virtual machines. Typically, this would result in roughly 6000 virtual machines running concurrently at peak, pushing utilization beyond a manageable threshold. Outside of these peak hours, the vast majority of their hardware was sitting idle most of the time.
Their development, test, and training teams didn’t want to have to rebuild every time they were needed. To ensure they could meet peak demand, their solution was to keep them running and perform cleanups after each use. This led to massive cost inefficiencies. They also lacked the governance and cost management capabilities they required, further highlighting the need for a better solution.
The AWS Solution
Meteorcomm determined that Amazon Web Services would enable them to better meet dynamic demands by provisioning infrastructure on-demand, paying only for the resources they use. From there, Meteorcomm engaged AWS Premier Consulting Partner, Cascadeo to design and implement a new infrastructure solution on AWS to support their regression tests.
The new solution (Figure 1) creates a green field test environment where each test run deploys a new Amazon VPC that houses all of the Amazon EC2 instances and other infrastructure needed to run that test, as defined by an AWS CloudFormation Template.
All of the AMIs that make up each set of instances required for a test are configurable at any time. Meteorcomm has also automated the process of scaling each test set and the infrastructure that supports it up and down as needed and, like the AMIs, this template can be re-configured at any time to support their changing needs.
A user interface allows for simple re-configuration of both at any time, and also defines the cost governance rules for the regression runs to ensure they are staying as cost-effective as possible.
Figure 1: Meteorcomm’s Regression Test Infrastructure
One of the key components of Meteorcomm’s architecture is the inclusion of a Radio Network Simulation (RNS) server, which is hosted on an instance of Microsoft Windows Server running services on specific ports. When an individual service starts up, the RNS server allocates which ports the messages need to be relayed to by leveraging a configuration file that maps this out.
The individual RNS servers are created using a CloudFormation template that provisions a base Windows Server AMI. It also contains the user data needed to pull necessary files from Amazon S3 and install the RNS.exe file. A Windows PowerShell script is then run, which starts all of the services that correspond to the starting ports and configuration files that were in the package from S3. Finally, a DSC script is run to verify the system is correctly configured.
In addition to Amazon EC2, Amazon S3, Amazon VPC, and AWS CloudFormation, the following services are used:
- Jenkins extensible automation server with Jenkins Pipeline, a suite of plugins to support continuous integration and continuous delivery pipelines
- AWS CodePipeline
- Amazon CloudWatch, AWS CloudTrail, and Zenoss to support operational requirements
- AWS Lambda
The Success Story
By leveraging this automated process, Meteorcomm can spin up the infrastructure they need when they need it, eliminating the cost inefficiencies of provisioning for peak demand 24/7. They can start a regression test almost immediately, scale up to thousands of instances–each leveraging the exact combination of interrelated services to their specific configuration–and quickly shut the instances off once tests are completed. Because provisioning and deployment is also automated, every test runs in an identical environment: Tests do not fail due to different environmental variables in manually built instances. As a result of their migration to AWS, Meteorcomm has reduced time to completion for each test by 75% and estimates a 50% year-over-year cost savings. The configurable automation allows Meteorcomm to add other product test groups to the solution with minimum effort, making the solution extensible.
Based on the success of their AWS-based regression test solution for their ITCM solution, Meteorcomm is already looking at other ways it can leverage AWS. Most notably, they are looking to expand the configurable automations they’re currently using to enable other development groups to move their regression tests out of their on-premise data centers and into AWS. Meteorcomm is also looking for ways to leverage Amazon EC2 Spot Instances to further drive down costs for nightly regression runs where time is not as critical. Another initiative they are pursuing is how to decrease time to completion for each regression run by optimizing groups of tests across test instances.