What Are Flaky Tests and How You Can Avoid Them

Learn more about flaky tests, common issues, and how to avoid flakiness in application testing for more secure and accurate results.

What Are Flaky Tests and How You Can Avoid Them

When testing in an automated fashion, we need to ensure that our tests are both fit for purpose and have confidence that the results we receive in return have ensured we have tested all workflows correctly. When tests start to give us inconsistent results without any reason, we start to return to a manual testing workflow to troubleshoot which can slow down the entire testing process. SBOX helps to tackle test flakiness in a number of different areas which we will cover within this post along with some general best practices.

What Are Flaky Tests?

At the most basic level, a flaky test is one which does not produce consistent results over time. We have all seen flaky tests before but perhaps were unsure of the terminology. For example, if myself and one of my colleagues run the exact same test against the same environment and expect the same outcome but receive different results, we can deem this test flaky.

Flaky tests can be caused by a whole host of different factors which can be seen below, we will touch on each point below before diving deeper into the test environment/infrastructure.

  • Poorly written code
  • Misunderstanding of expected outcome/workflow
  • Test data issues
  • Test environment issues

Poorly Written Code

The list and examples that can be given here are endless. As the team at Element 34 have had the opportunity to work closely with so many different teams throughout their careers, we will give two examples which tend to go hand in hand.

The first is in relation to how we are identifying and interacting with elements within the website/mobile app we are testing. Choosing the correct locator is crucial when it comes to reducing flakiness within your tests. Some locators may be easier to use but may result in poorer performance and others can be inconsistent across different browsers etc. which will result in tests failing when executed against a particular browser.

In some cases, when the above is visible the user will try to combat these issues by adding waits to ensure the page has loaded prior to executing the next command. Waits can be beneficial and sometimes required but adding waits for a specific time (Implicit Wait) when not necessary will increase test execution time. The correct understanding of waits and sleeps is important when it comes to writing good test scripts.

Misunderstanding of Expected Outcome/Workflow

A simple yet nonetheless important item to raise, know what needs to be automated and understand the user journey.

Document what the user needs to do in the correct order to successfully navigate through the user journey and ensure this is identical to what is seen in the test script.

If we are marking our tests as passed/failed based off of messaging we see at the end of the user journey, ensure that message does appear in all cases or we will get tests that run as expected but are marked as failed.

Test Data Issues

Depending on where you are executing your tests, you may only be allowed to use synthetic data from a security concern standpoint. As SBOX is installed within your corporate firewall, this will not be a concern for Element 34 customers but for others this can be an issue.

For example, if I am testing on devices/browsers that sit outside my corporate firewall, are we allowed to use real data? If we are not, we need to ensure the test data we use still allows us to test all workflows we need to test for. We also need to ensure it behaves just as production/live data would behave. Understanding how much synthetic data we need to create and the cost behind this is also something we should be aware of.

For parallel execution, we have seen where User 123 cannot be logged into more than one device at any given time or the test script has been written in a way that can only run sequentially. Parallel execution will allow us to reduce build time, release quicker etc so this is something we should consider as we build out test scripts.

Test Environment Issues

Whether you are testing locally, using an internal grid, making use of SBOX or a SAAS provider, the environment you are running your tests on needs to be high performing, scalable, secure and be available whenever needed.

In relation to performance, when we are frequently executing a large number of tests we need to ensure performance is not an issue or else we may start to see timeouts, failures, large differences between build time on each execution etc.

Some offerings will perform better than others, it is important to be aware of this as you may follow all best practices and the flakiness of your tests not be through any fault of yours or the teams.

For in house solutions and SBOX, as all of the testing is done within your corporate firewall, latency is not a concern as the data moves within the network rather than over the internet etc.

In relation to scalability, building out your own internal grid can be an enjoyable experience but when asked to scale as you move along your automation journey, this can be quite difficult. Also as fragmentation continues to grow in terms of browsers and devices, it may become more and more difficult to allow teams to test on what is important to them and their customers.

Like many offerings available, SBOX is built for scale and grows with your needs.

In relation to security, all testing should be completed in a secure environment and when writing our tests we should always remain security conscious. With SBOX running inside your corporate firewall, no data is leaving your network and no external access is required.

Finally, we need to ensure that the environment we are running our tests on is available whenever needed. As we look to automate as many user journeys as possible, these tests can be executed from CI/CD pipelines on a schedule so our environment should be available 24/7. We also need to be aware of any usage limitations and who else is testing on our platform. If all teams within the organization are testing on quite a small grid, are we constantly having to wait for this to become available. The same logic will apply here if we decide to use a SAAS provider, can they support the needs of our organization along with all their other customers.


In summary, it is crucial for us to have confidence that the tests we have created are fit for purpose. The above is just some examples of what can cause flakiness within our test scripts. Flakiness can also be caused by external factors that need to be considered both for where you and your team are currently but also later on in your automation journey. 

When it comes to automated testing solutions, SBOX is the safer, more secure automated testing solution in the market. That's why top global financial institutions, automotive manufacturers, and technology companies choose SBOX over traditional SaaS solutions.

Feel free to connect with us to learn more about our SBOX offering!


1. How can teams effectively monitor for flaky tests within their continuous integration pipeline?

Implementing robust logging and reporting mechanisms can help teams identify flakiness patterns, such as tests that fail intermittently without code changes. Tools that track test stability over time can be particularly useful.

2. What role do test retries play in managing flaky tests, and how can they be implemented without masking underlying issues?

Test retries can temporarily mitigate the impact of flaky tests on the development process by reducing the chance that a flaky failure blocks a build. However, it's important to use retries judiciously and investigate the root causes of flakiness to ensure they're not simply covering up deeper issues.

3. Can flaky tests be completely eliminated, or should teams focus on minimizing their impact?

While it's challenging to eliminate flaky tests entirely, especially in complex, dynamic testing environments, teams should focus on reducing their occurrence through better test design, reliable test environments, and addressing known flakiness causes.

4. How do different testing frameworks and tools help in identifying and reducing flakiness in tests?

Many modern testing frameworks offer features designed to identify flaky tests, such as annotations to mark tests as flaky, mechanisms to rerun failed tests automatically, and detailed logging to help debug failures. Selecting tools that provide robust support for dealing with test flakiness can make managing and reducing flaky tests more manageable. Last, but not least, selecting a test solution with low latency in test runs is also critical to minimize flakyness caused by the nature of the test infrastructure