The procured equipment will serve four high-level experimental purposes:
1. Infrastructure for Experimentation with Software Deployment Paradigms: This component will contain machines that will be reconfigured often to mimic popular software deployment schemes (e.g., blue-green deployment, canary releases, chaos engineering) in a controlled experimental setting. The machines will run popular configuration-as-code tools, such as Ansible, Puppet, TerraForm, and Kubernetes. The machines will also use virtualization and containerization technologies, such as VMware, and Docker.
2. Infrastructure for Performance Assessment: This component will be used to collect computational performance measurements (e.g., CPU utilization, memory consumption, I/O operations) during the execution of software build jobs. These machines will be configured with software build tools (e.g., make, bazel, buck, gradle, maven), and language toolchains (e.g., GCC, Clang, JDK). These machines may use containerization tools, but virtualization will not be used to ensure that overhead on measurements will be carefully controlled.
3. Infrastructure for Large-Scale Compute: This component features machines configured for CPU-intensive workloads and GPU-intensive workloads.
4. Infrastructure for Data Storage: This component features machines that will host databases where experimental results will be collected and retained (e.g., MongoDB, PostgreSQL, mySQL). In addition, a collection of commodity servers will be used for long-running data collection scripts that are not compute intensive (e.g., crawling data from large public archives while observing API rate limits). Popular scripting language toolchains are often used for these tasks (e.g., Python, Ruby, Node.js).