Getting code coverage data for each request coming to a python web server

Getting code coverage data for each request coming to a python web server

In this blog, we will demonstrate how to get the coverage data for each incoming request on a python web server built using any web framework.

What is Code Coverage ?

Code coverage is a metric used in software testing to measure the extent to which the source code of a program has been executed during testing. It indicates the percentage of code that has been covered by the test cases. Code coverage analysis helps developers understand how thoroughly their tests exercise the codebase.

Code coverage tools are used to collect data on code execution during testing and generate reports showing the coverage metrics. These reports help developers identify areas of the code that are not adequately covered by tests, allowing them to write additional tests to improve coverage and increase confidence in the software's correctness and reliability.

What does it mean to get coverage data for each request ?

Obtaining coverage data for each request coming to a web server offers several benefits:

  1. Granular Insights: By capturing coverage data for each request, developers gain detailed insights into which parts of the codebase are executed in response to different types of requests. This level of granularity allows for a deeper understanding of the application's behavior under various conditions.

  2. Identifying Untested Code Paths: Coverage data helps identify areas of the code that are not adequately covered by tests. By analyzing coverage reports, developers can pinpoint specific code paths that need additional testing, ensuring comprehensive test coverage across the entire codebase.

  3. Building deduplication feature: coverage data for each e2e testcase can be analyzed to identify duplicate tests and remove them.

Obtaining coverage data

To obtain the coverage data, we would be using the coverage.py library. coverage.py is mostly used through CLI. But it provides API to use it programmatically.

We will define a middleware through which every incoming request would pass. In our "coverage" middleware before passing control to other parts of our application, we will call start function from coverage library. Coverage measurement is only collected in functions called after start() function is invoked, so if this middleware is scheduled to run first then coverage of other middleware would also be captured along with main application code.

Once the application returns then we would stop collecting coverage data. We can then fetch the data and further process it.

The below is a code snippet for the coverage middleware which can be used in servers built using flask web framework:

import coverage

class CoverageMiddleware:
    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        cov = coverage.Coverage(cover_pylib=False)
        cov.start()
        response = self.app(environ, start_response)
        cov.stop()
        result = cov.get_data()
        Write(result)
        return response

here,Writefunction writes the coverage data to a file, say dedupData.yaml, which can then be used to identify duplicate testcases in e2e scenario.

Here is the modified sample python application with middleware and writing logic in place : https://github.com/AkashKumar7902/samples-python/tree/v1.0.0/flask-mongo

The repository includes test cases generated by Keploy, which can be replayed using the command keploy test -c "pythonapp.py". Upon successful execution of this command, a dedupdata.yaml file will be generated. This file will contain details of the executed files, including the lines covered, for each test case.

Here is a sample dedupdata.yaml:

- id: test-1
  executedLinesByFile:
    /home/akash/Desktop/githubrepo/samples-python/flask-mongo/app.py:
    - 33
    - 34
    - 35
- id: test-2
  executedLinesByFile:
    /home/akash/Desktop/githubrepo/samples-python/flask-mongo/app.py:
    - 24
    - 23

Using Coverage data to identify duplicate tests

Earlier written dedupData.yaml which contains coverage data for each testcases, can be used to identify and flag duplicate testcases by analyzing similar codepaths.

There are also multiple deduplication features for test cases based on coverage data for Keploy Cloud.

Conclusion

This is how with very little code change you can collect coverage data for each incoming requests and prioritise increasing coverage for most frequent requests and also it can be used to build a deduplication feature.

Reference:https://coverage.readthedocs.io/en/7.4.1/api.html

FAQ's

How can coverage data for each request benefit developers?

Obtaining coverage data for each request provides granular insights into the codebase's execution under various conditions, helps identify untested code paths, and can be used to build deduplication features for test cases.

How can coverage data be obtained in a Python web server?

Coverage data can be obtained using the coverage.py library, which provides both CLI and API for collecting coverage metrics programmatically. In a web server, coverage data can be captured using a middleware that wraps around the application logic and collects coverage information for each incoming request.

What are some deduplication features for test cases based on coverage data?

Deduplication features for test cases based on coverage data may include identifying and flagging duplicate tests by analyzing similar code paths, removing redundant tests, and optimizing test suites for better coverage and efficiency.