We’ve heard about data lakes and their importance, but we rarely talk about a testing results data lake and how it can help us. Companies have always struggled with not having a single place to hold all testing results and related data. All data related to testing are mostly scattered, hard to merge and move around, and have proven to be one of the biggest barriers for successful testing.
We’ve seen in companies where, for an instance, the automation testing team has its way of storing, managing, and maintaining their results data. Similarly, performance testing, security testing, and other types of testing have their own ways. We see this trend across teams, companies, and their respective industries.
As an example, on the performance testing and engineering, the load testing tools would have their own way of storing data – profiling and analysis tools would have their own way of storing data. Because of this, it’s highly challenging to port and move around the data from one store to another. There isn’t any simpler way to connect these different data stores to provide flexibility and power to the testing team, a community across the organizations.
Data is the new oil today, and the benefits equally apply to the testing domain as well.
A data lake is a centralized repository that is created and used to store all the structured and unstructured data. A testing data lake can be formed by pushing all the various test results and related data into a centralized repository that can store the data – for example, AWS S3, which is an object storage service.
Creating a Testing Data Lake
Consider a company that has a testing team named “TCOE”. “TCOE” has been conducting automation testing, performance testing, exploratory testing, security testing, and accessibility testing across their products.
“TCOE,” after they execute performance test runs, would push all test results and data into a data lake they have created. They can store all the data as a parquet file or XML file or JSON file or any other format before it gets pushed to a data lake. The files could store response times, hits per second, errors, CPU, memory utilization as XML/Parquet files, and push it to a data lake.
Automation testing results like automation script pass/fail, validations, screenshots, errors, and other details can be pushed to the data lake as well.
Similarly, other types of testing related data can be pushed to the data lake.
One can choose the technology and service on how to store, construct and push to data lake based on their requirements. Similarly, we can choose technology, service and cloud/on-premises for building the data lake based on specific requirements.
There is a plethora of options to choose from and build the entire ecosystem.