We offer you the translation of the post “Unit Testing is Overrated” by Alex Golub to discuss on the subject of unit tests. Are they really overestimated, according to the author, or are they a great help in the work? Poll - at the end of the post

ITKarma picture

Unit Test Results: Despair, Torment, Anger

The importance of testing in modern software development is difficult to overestimate. To create a successful product, it is not enough to release it and immediately forget, it is a long iterative process. After changing each line of code, the program should retain its functionality, which implies the need for rigorous testing.

In the process of developing the software development industry, testing methods have also improved. They gradually shifted towards automation and influenced the structure of the software itself, generating such “mantras” as “test-driven development”, emphasizing such patterns as dependency inversion, and popularizing built on their basis is high-level architecture.

Today, automated testing is so deeply connected in our minds with software development that one is hard to imagine without the other. And since it ultimately allows us to quickly create software without sacrificing its quality, it’s hard to argue about the usefulness of testing.

However, despite the existence of various approaches, modern "best practices" basically push developers to use specifically unit testing . Tests whose control area is located in the Mike Cohn pyramid above, or are written as part of a larger project ( often by completely different people), or completely ignored.

Part of the advantage of this approach is supported by the following argument: unit tests provide the greatest utility in the development process, because they are able to quickly track errors and help to use development patterns that simplify modularity. This idea has become so generally accepted that today the term “unit testing” merges to some extent with automated testing as a whole, because of which it loses some of its meaning and introduces into confusion.

When I was a less experienced developer, I strictly followed these “best practices”, believing that they could make my code better. I didn’t really like to write unit tests because of all the ceremonies associated with this with abstractions and the creation of stubs, but that was the recommended approach, and who am I to argue with him?

And only later, after experimenting and creating new projects, I began to realize that there are much better approaches to testing, and that in most cases emphasis on unit tests is a waste of time .

Aggressively promoted “best practices” often tend to create cargo cults around them that entice developers to apply development patterns or use certain approaches, without letting them think. In the context of automated testing, this situation arose with the unhealthy obsession with the unit testing industry.

In this article, I will share my observations about this test method and talk about why I consider it to be ineffective. I will also talk about the approaches I use to test my code, both in open-source projects and in everyday work.

Note: the code for the examples in this article is written in C #, but when explaining my position, the language itself is not (especially) important.

Note 2: I came to the conclusion that programming terminology does not convey its meaning at all, because everyone seems to understand it in their own way. In this article I will use the “standard” definitions: unit testing is aimed at checking the smallest individual parts of the code, end-to-end testing checks the most distant input points of the software, and integration testing is used for everything in between.

Note 3: if you don’t want to read the whole article, you can immediately jump to the conclusions at the end.

Misconceptions about unit testing

Unit tests, as their name implies, are associated with the concept of “unit”, which means a very small isolated part of the system. There is no formal definition of what a unit is and how small it should be, but it is most often accepted that it corresponds to an individual module function (or object method).

Usually, if the code is written without unit testing, testing some functions in complete isolation may not be possible because they may have external dependencies. To get around this problem, we can apply the principle of dependency inversion and replace concrete dependencies with abstractions. Then these abstractions can be replaced with real or fake implementations; it depends on whether the code is executed in the usual way, or as part of a test.

In addition, unit tests are expected to be clean. For example, if a function contains code that writes data to a file system, then this part must also be abstracted out; otherwise, a test that checks this behavior will be considered an integration test, because it also covers unit integration with the file system.

Given the above factors, we can conclude that unit tests are only useful for checking the clean business logic within a particular function . Their scope does not cover testing for side effects or other integrations, because this is already the scope of integrated testing.

To demonstrate how these nuances affect design, let's take an example of a simple system that we want to test. Imagine that we are working on an application that calculates the time of sunrise and sunset; it performs its task using the following two classes:

public class LocationProvider : IDisposable { private readonly HttpClient _httpClient=new HttpClient();//Gets location by query public async Task<Location> GetLocationAsync(string locationQuery) {/*... */}//Gets current location by IP public async Task<Location> GetLocationAsync() {/*... */} public void Dispose() => _httpClient.Dispose(); } public class SolarCalculator : IDiposable { private readonly LocationProvider _locationProvider=new LocationProvider();//Gets solar times for current location and specified date public async Task<SolarTimes> GetSolarTimesAsync(DateTimeOffset date) {/*... */} public void Dispose() => _locationProvider.Dispose(); } 

Although the structure presented above is completely correct from the point of view of OOP, it is impossible to conduct unit testing for any of these classes. Since CDMY0CDMY depends on its own instance of CDMY1CDMY, and CDMY2CDMY, in turn, depends on CDMY3CDMY, it is not possible to isolate business logic that may be contained within the methods of these classes.

Let's iterate the code and replace concrete implementations with abstractions:

public interface ILocationProvider { Task<Location> GetLocationAsync(string locationQuery); Task<Location> GetLocationAsync(); } public class LocationProvider : ILocationProvider { private readonly HttpClient _httpClient; public LocationProvider(HttpClient httpClient) => _httpClient=httpClient; public async Task<Location> GetLocationAsync(string locationQuery) {/*... */} public async Task<Location> GetLocationAsync() {/*... */} } public interface ISolarCalculator { Task<SolarTimes> GetSolarTimesAsync(DateTimeOffset date); } public class SolarCalculator : ISolarCalculator { private readonly ILocationProvider _locationProvider; public SolarCalculator(ILocationProvider locationProvider) => _locationProvider=locationProvider; public async Task<SolarTimes> GetSolarTimesAsync(DateTimeOffset date) {/*... */} } 

Thanks to this, we can separate CDMY4CDMY from CDMY5CDMY, but instead the code size has almost doubled. Please also note that we had to exclude CDMY6CDMY from both classes because they no longer own their dependencies and therefore are not responsible for their life cycle.

Although some of these changes may seem like improvements, it’s important to point out that the interfaces we defined do not have practical benefits, except for the possibility of unit testing . There is no need for polymorphism in our structure, that is, in our particular case, such abstractions are self-contained (that is, abstractions for the sake of abstractions).

Let's try to take advantage of the work done and write a unit test for CDMY7CDMY:

public class SolarCalculatorTests { [Fact] public async Task GetSolarTimesAsync_ForKyiv_ReturnsCorrectSolarTimes() {//Arrange var location=new Location(50.45, 30.52); var date=new DateTimeOffset(2019, 11, 04, 00, 00, 00, TimeSpan.FromHours(+2)); var expectedSolarTimes=new SolarTimes( new TimeSpan(06, 55, 00), new TimeSpan(16, 29, 00) ); var locationProvider=Mock.Of<ILocationProvider>(lp => lp.GetLocationAsync() == Task.FromResult(location) ); var solarCalculator=new SolarCalculator(locationProvider);//Act var solarTimes=await solarCalculator.GetSolarTimesAsync(date);//Assert solarTimes.Should().BeEquivalentTo(expectedSolarTimes); } } 

We got a simple test verifying that the CDMY8CDMY works correctly for the location we know. Since unit tests and their units are closely related, we use the recommended naming system, and the name of the test method corresponds to the CDMY9CDMY pattern ("Method_Condition_Result").

To simulate the necessary precondition at the Arrange stage, we need to implement the corresponding behavior in the CDMY10CDMY unit dependency. In this case, we do this by replacing the CDMY11CDMY return value with a location for which the correct sunrise and sunset times are known in advance.

Please note that although CDMY12CDMY discloses two different methods, from the point of view of the contract, we are unable to find out which one is being called . This means that when choosing to simulate one of these methods, we make the assumption about the internal implementation of the test method (which we intentionally hid in the previous code fragments).

Ultimately, the test correctly verifies that the business logic within CDMY13CDMY works as expected.However, let's list the observations made by us in the process.

1. Unit tests have limited applicability

It is important to understand that the task of any unit test is very simple: to verify business logic in an isolated scope. The applicability of unit testing depends on the interactions we need to test.

For example, is it logical to subject a unit test to a method that calculates sunrise and sunset times using a long and complex mathematical algorithm? Most likely yes .

Does it make sense to run a unit test of a method that sends a request to the REST API to get geographic coordinates? Most likely not .

If you consider unit testing as an end in itself, then you will soon find that despite a lot of efforts, most tests are unable to provide the level of confidence you need simply because they are not testing what is needed. In most cases, it’s much more profitable to test more extensive interactions using integrated testing than to focus specifically on unit tests.

It is curious that some developers in such situations ultimately write integral tests, but still call them unit tests. This is mainly caused by the confusion that surrounds this concept. Of course, it can be stated that the size of a unit can be arbitrarily selected and that it can cover several components, but because of this, the definition becomes very vague, and therefore the use of the term is completely useless.

2. Unit tests complicate the structure

One of the most popular arguments for unit testing is that it encourages you to design software in a very modular way. The argument is based on the assumption that it is easier to perceive code when it is broken down into many small components, rather than a small number of large ones.

However, this often leads to the opposite problem - functionality may be overly fragmented. Because of this, evaluating the code becomes much more difficult, because the developer has to look at several components of what should be a single related element.

In addition, the excessive use of abstractions necessary to ensure isolation of components creates many optional indirect interactions. Although this technique is incredibly powerful and useful in itself, abstractions inevitably increase cognitive complexity, making code perception even more difficult.

Due to such indirect interactions, we end up losing a certain degree of encapsulation that we could preserve. For example, the responsibility for managing the lifetime of individual dependencies shifts from the components that contain them to some other unrelated service (usually a dependency container).

Part of the infrastructure complexity can also be delegated to the dependency injection framework, which simplifies the configuration of dependencies, their management and activation. However, this reduces compactness, which in some cases, for example, when writing a library, is undesirable.

Ultimately, although it is obvious that unit testing affects software design, its usefulness is highly controversial.

3. Unit tests are costly

It is logical to assume that due to its small size and isolation, unit tests are very easy and fast to write. Unfortunately, this is another misconception; it seems to be quite popular, especially among the manual.

Although the modular architecture mentioned above makes us think that individual components can be considered separately from each other, in fact, unit tests do not benefit from this. In fact, the complexity of the unit test only grows in proportion to the amount of its external interactions; this is due to all the work that needs to be done to achieve isolation while maintaining the required behavior.

The example shown above is quite simple, however, in a real project, the Arrange stage can quite often stretch over many long lines, in which preconditions of one test are simply set. In some cases, simulated behavior can be so complex that it is almost impossible to unravel it in order to figure out what it was supposed to do.

In addition, unit tests, by their very nature, are very closely related to the code under test, that is, all the labor involved in making changes essentially doubles to make the test correspond to the updated code. The situation is aggravated by the fact that very few developers find this task fascinating, so they just dump it on less experienced team members.

4. Unit tests depend on implementation details

The sad consequence of unit testing based on mocks is that any test written using this technique necessarily takes into account the implementation. By simulating a specific dependency, the test begins to rely on how the code under test consumes this dependency, which is not regulated by the public interface.

This additional link often leads to unexpected problems in which changes that seemingly can't break anything start to fail when stubs become obsolete. This can be very annoying and ultimately discourages developers from refactoring the code, because it is never clear whether a test error occurred due to actual regression or because it depends on implementation details.

Unit testing of code with state storage may be even more complicated, because monitoring mutations through a public interface may not be possible. To work around this problem, you can usually introduce spies, that is, a kind of simulated behavior that registers a function call and helps to ensure that the unit uses its dependencies correctly.

Of course, when we depend not only on the call to a particular function, but also on the number of calls and arguments passed, the test becomes even more closely related to the implementation. Tests written in this way are only useful for internal specifics and it is usually even expected that they will not change (an extremely unreasonable expectation).

Too much dependence on implementation details also greatly complicates the tests themselves, given the amount of training needed to simulate a specific behavior; this is especially true when interactions are nontrivial or there are many dependencies. When tests become so complex that it’s difficult to understand their behavior, who will write the tests to test the tests?

5. Unit tests do not use user actions

Whatever software you develop, its task is to provide value to the end user. In fact, the main reason for writing automated tests is to ensure that there are no unintended defects that could reduce this value.

In most cases, the user works with the software through some high-level interface such as a UI, CLI or API. Although multiple layers of abstraction can be used in the code itself, only the level that it sees and interacts with is important for the user.

He doesn’t even care if in some part of the system the bug is several layers lower, if the user does not encounter it and does not harm the functionality. And vice versa: even if we have full coverage of all low-level parts, but if there is a flaw in the user interface, then this makes the system essentially useless.

Of course, if you want to guarantee the correct operation of an element, then you need to check it and see if it really works correctly. In our case, the best way to ensure confidence in the system is to simulate the interaction of a real user with a high-level interface and verify that he is working in accordance with expectations.

The problem with unit tests is that they are the exact opposite of this approach. Since we always deal with small isolated parts of the code that the user never interacts directly with, we never test the true behavior of the user.

Testing based on stubs puts the value of such tests into even greater doubt, because the parts of the system that would be used are replaced by simulations, further distancing the simulated environment from reality. It is not possible to provide confidence in the user experience while testing something unlike this one.

ITKarma picture

Unit testing is a great way to test stubs

Testing based on the pyramid

So why did we as the industry decide that unit testing should be the main way to test software, despite all its flaws? This is mainly due to the fact that testing at high levels has always been considered too difficult, slow and unreliable.

If we turn to the traditional testing pyramid, it suggests that the most significant part of testing should be performed at the unit level. The point is that since large tests are considered slower and more complex, to get an effective and supported test suite you need to focus on the bottom of the integration spectrum:

ITKarma picture

Above is through testing, in the center is integral testing, below is unit testing

The metaphorical model proposed by the pyramid should give us the idea that many different layers should be used for high-quality testing, because if you focus on extremes, this can lead to problems: the tests will be either too slow and clumsy, or useless and do not provide any confidence. Nevertheless, the emphasis is on the lower levels, because it is believed that there is the highest return on investment in test development.

Although high-level tests provide the most confidence, they often turn out to be slow, difficult to support, or too broad to be included in the usually fast development process. That is why, in most cases, such tests are supported by QA experts, because it is usually believed that they should not be written by developers.

Integral testing, which lies somewhere between unit testing and full end-to-end testing on the abstract part of the spectrum, is often completely ignored. It is unclear what specific level of integration we prefer, how to structure and organize such tests. In addition, there are fears that they will get out of control. Therefore, many developers refuse them in favor of a more clearly defined extreme, which is unit testing.

Because of these reasons, all testing during development usually stays at the very bottom of the pyramid. In fact, it has become so standard that development testing and unit testing today have become almost synonymous, leading to confusion exacerbated by conference reports, blog posts, books, and even some IDEs (according to JetBrains Rider, all tests are unit -tests).

According to most developers, the testing pyramid looks something like this:

ITKarma picture

Above is not my problem, below is unit testing

Although this pyramid has become a respectable attempt to turn software testing into a solved problem, there are obviously many problems in this model. In particular, the assumptions used in it are not true in all contexts, especially the assumption that highly integrated test suites are slow or difficult.

We, as humans, naturally tend to rely on the information transmitted to us by more experienced people, so we can use the knowledge of previous generations and apply the second system thinking to something more useful. This is an important evolutionary trait that has greatly enhanced our survival as a species.

However, when we extrapolate our experience into instructions, we usually perceive them as good in themselves, forgetting about the conditions inherently related to their relevance. In fact, these conditions are changing, and once completely logical conclusions (or best practices) may not be as well applicable.

If you look at the past, it is obvious that in the 2000s, high-level testing was difficult, it probably remained so even in 2009, but in the yard 2020 and we are already living in the future. Thanks to advances in technology and software design, these problems have become much less important than before.

Today, most modern frameworks provide some kind of separate API layer used for testing: in it you can run the application in a simulated environment inside the memory, which is very close to the real one. Virtualization tools like Docker have also enabled us to run tests that rely on real infrastructure dependencies while maintaining their determinism and speed.

We have solutions such as Mountebank , WireMock , GreenMail , Appium , Selenium , Cypress and an infinite number of others, they simplify various aspects of high-level testing that were once considered unattainable. If you are not developing desktop applications for Windows and are not forced to use the UIAutomation framework , then you most likely have many choices.

In one of my previous projects, we had a web service that was tested at the edge of the system using almost a hundred behavioral tests, which took less than 10 seconds to execute in parallel. Of course, when using unit tests you can achieve much faster execution, but given the confidence provided, we did not even consider such a possibility.

However, the misconception about the slowness of tests is not the only erroneous assumption on which the pyramid is based. The principle of applying most tests at the unit level only works when these tests really provide value, which, of course, depends on how much business logic is in the code being tested.

In some applications, there can be a lot of business logic (for example, in payroll systems), in some it is almost absent (for example, in CRUD applications), and most software is somewhere in between. Most of the projects that I personally worked on did not contain such a volume that there was a need for extensive coverage with unit tests; on the other hand, they had a lot of infrastructural complexity for which integrated testing would be useful.

Of course, in an ideal world, a developer could evaluate the context of a project and create a testing method that is most suitable for solving pressing problems. However, in reality, most developers do not even think about it, blindly turning up mountains of unit tests in accordance with the best practices recommendations.

Finally, in my opinion, it would be fair to say that the model created by the testing pyramid is too simple in general. The vertical axis represents the testing spectrum as a linear scale in which any increase in confidence is offset by the equivalent amount of loss of support and speed. This may be true if you are comparing extreme cases, but not always true for points in between.

The pyramid also does not take into account the fact that isolation itself has a price; it does not arise for free simply due to the “avoidance” of external interactions. Given how much work it takes to write and maintain stubs, it is possible that a less isolated test can be cheaper and ultimately provide more confidence, albeit at a slightly lower execution speed.

If you take into account these aspects, it seems likely that the scale will be non-linear, and that the point of maximum return on investment is somewhere closer to the middle, and not to the level of units:

ITKarma picture

Ultimately, if you are trying to determine an effective test suite for your project, the test pyramid is not the best sample you can follow.It’s much more logical to focus on what is specific to your context rather than relying on “best practices.”

Reality Based Testing

At the most basic level, a test provides value if it guarantees us that the software works correctly. The more we are sure, the less we need to rely on ourselves in search of potential bugs and regressions when making changes to the code, because we commission these tests.

This trust, in turn, depends on the accuracy with which the test reproduces the user's actual behavior. A test scenario that runs on the boundary of a system without knowledge of its internal specificity should provide us with greater confidence (and therefore value) than a test that operates at a lower level.

In fact, the degree of confidence obtained from tests is the main metric by which their value should be measured. And the main goal is its maximum increase.

Of course, as we know, other factors are involved in the business: price, speed, the possibility of parallelization, and others, and all of them are also important. The test pyramid makes strong assumptions about how the scaling of these elements is interconnected, but these assumptions are not universal.

Moreover, these factors are secondary to the primary goal of achieving confidence. An expensive and long-running test that provides more confidence is infinitely more useful than an extremely fast and simple test that does nothing.

Therefore, I believe that it’s better to write tests with the highest possible degree of integration, while maintaining their reasonable speed and complexity .

Does this mean that every test we create must be cross-cutting? No, but we must strive to advance as far as possible in this direction, while ensuring an acceptable level of disadvantages.

Acceptability is subjective and contextual. Ultimately, the important thing is that these tests are written by developers and used in the development process, that is, they should not create a burden with support and provide the ability to run in local assemblies and on a configuration unit.

In this case, we will most likely get tests scattered across several levels of the integration scale with an apparent lack of a sense of structure. This problem does not arise during unit testing, because each test in it is associated with a specific method or function, so the structure usually mirrors the code itself.

Fortunately, this is not important, because organizing tests into separate classes or modules does not matter in itself; rather, it is a side effect of unit testing. Instead, tests should be separated by the true user functionality that they test.

Such tests are often called functional because they are based on software functionality requirements that describe its capabilities and how they work. Functional testing is not another layer of the pyramid, but a completely perpendicular concept to it.

Contrary to popular belief, you do not need to use Gherkin or the BDD framework to write functional tests, they can be implemented using the same tools that are used for unit testing. For example, let's think about how we can rewrite an example from the beginning of the article so that the tests are structured based on the supported behavior of users, not code units:

public class SolarTimesSpecs { [Fact] public async Task User_can_get_solar_times_automatically_for_their_location() {/*... */} [Fact] public async Task User_can_get_solar_times_during_periods_of_midnight_sun() {/*... */} [Fact] public async Task User_can_get_solar_times_if_their_location_cannot_be_resolved() {/*... */} } 

Please note that the test implementation itself is hidden because it is not related to the fact that they are functional. It is important here that the tests and their structure are determined by the requirements for the software, and their scale can theoretically vary from end-to-end testing and even to the level of units.

By naming tests in accordance with specifications, and not with classes, we get an additional advantage - we eliminate this optional connection. Now, if we decide to rename CDMY14CDMY to something else or move it to another directory, the test names will not need to be changed.

If you stick to this structure, then our test suite essentially takes the form of live documentation.For example, here’s how the test suite is organized in CliWrap ( xUnit replaced the underscores with spaces):

ITKarma picture

While a software element does something remotely useful, it always has functional requirements. They can be either formal (specification documents, user stories, etc.) or informal (verbally, admissible, JIRA tickets written on toilet paper, and etc.)

Converting informal specifications into functional tests can often be a complicated process, because it requires stepping back from the code and forcing yourself to look at the software from the user's point of view. In my open-source projects, compiling a readme file helps me, in which I list a list of usage examples and then encode them into tests.

To summarize: we can conclude that it’s better to separate tests by behavioral chains rather than by internal code structure .

If you combine both of the above approaches, then a thinking structure is formed that gives us the clear goal of writing tests, as well as an understanding of the organization; nor do we need to rely on any assumptions. We can use this structure to create a set of tests for a project focused on value, and then scale it in accordance with priorities and limitations that are important in the current context.

The principle is that instead of focusing on a specific area or set of areas, we create a set of tests based on user functionality, trying to cover this functionality as accurately as possible.

Functional testing for web services (using ASP.NET Core)

You probably don’t understand what functional testing consists of and how exactly it should look, especially if you haven’t done it before. Therefore, it would be wise to give a simple but complete example. To do this, we will turn our sunrise and sunset calculator into a web service and cover it with tests in accordance with the rules outlined in the previous part of the article. This application will be based on ASP.NET Core, the web framework I am most familiar with, but the same principle should apply to any other platform.

Our web service discloses its endpoints for calculating sunrise and sunset times based on the user's IP or a specified location. To make things a little more interesting, we’ll add a Redis caching layer that stores previous calculations to speed up the answers.

Tests will be performed by running the application in a simulated environment in which it can receive HTTP requests, process routing, perform validation and demonstrate behavior. almost identical to the application running in production. We also use Docker so that our tests use the same infrastructure dependencies as the real application.

To understand what we are dealing with, let's first look at the implementation of a web application. Please note that some parts of the code snippets are missing for the sake of brevity, and the full project can be viewed at GitHub .

First, we need to find a way to determine the user's location by IP, performed using the CDMY15CDMY class, which we saw in the previous examples. It is a simple wrapper around an external GeoIP search service called IP-API :

public class LocationProvider { private readonly HttpClient _httpClient; public LocationProvider(HttpClient httpClient) => _httpClient=httpClient; public async Task<Location> GetLocationAsync(IPAddress ip) {//If IP is local, just don't pass anything (useful when running on localhost) var ipFormatted=!ip.IsLocal() ? ip.MapToIPv4().ToString() : ""; var json=await _httpClient.GetJsonAsync($"http://ip-api.com/json/{ipFormatted}"); var latitude=json.GetProperty("lat").GetDouble(); var longitude=json.GetProperty("lon").GetDouble(); return new Location { Latitude=latitude, Longitude=longitude }; } } 

To transform your location during sunrise and sunset, we use the sunrise and sunset calculation algorithm published by the US Naval Observatory .The algorithm itself is too long, so we will not bring it here, and the rest of the implementation CDMY16CDMY is as follows:

public class SolarCalculator { private readonly LocationProvider _locationProvider; public SolarCalculator(LocationProvider locationProvider) => _locationProvider=locationProvider; private static TimeSpan CalculateSolarTimeOffset(Location location, DateTimeOffset instant, double zenith, bool isSunrise) {/*... *///Algorithm omitted for brevity/*... */} public async Task<SolarTimes> GetSolarTimesAsync(Location location, DateTimeOffset date) {/*... */} public async Task<SolarTimes> GetSolarTimesAsync(IPAddress ip, DateTimeOffset date) { var location=await _locationProvider.GetLocationAsync(ip); var sunriseOffset=CalculateSolarTimeOffset(location, date, 90.83, true); var sunsetOffset=CalculateSolarTimeOffset(location, date, 90.83, false); var sunrise=date.ResetTimeOfDay().Add(sunriseOffset); var sunset=date.ResetTimeOfDay().Add(sunsetOffset); return new SolarTimes { Sunrise=sunrise, Sunset=sunset }; } } 

Since this is a MVC web application, we also need a controller that provides the end point for the disclosure of the application functionality:

[ApiController] [Route("solartimes")] public class SolarTimeController : ControllerBase { private readonly SolarCalculator _solarCalculator; private readonly CachingLayer _cachingLayer; public SolarTimeController(SolarCalculator solarCalculator, CachingLayer cachingLayer) { _solarCalculator=solarCalculator; _cachingLayer=cachingLayer; } [HttpGet("by_ip")] public async Task<IActionResult> GetByIp(DateTimeOffset? date) { var ip=HttpContext.Connection.RemoteIpAddress; var cacheKey=$"{ip},{date}"; var cachedSolarTimes=await _cachingLayer.TryGetAsync<SolarTimes>(cacheKey); if (cachedSolarTimes != null) return Ok(cachedSolarTimes); var solarTimes=await _solarCalculator.GetSolarTimesAsync(ip, date ?? DateTimeOffset.Now); await _cachingLayer.SetAsync(cacheKey, solarTimes); return Ok(solarTimes); } [HttpGet("by_location")] public async Task<IActionResult> GetByLocation(double lat, double lon, DateTimeOffset? date) {/*... */} } 

As indicated above, the endpoint CDMY17CDMY basically just delegated execution CDMY18CDMY, and in addition, has a very simple cache logic for disposal of excess requests to the third-party service. Caching is done CDMY19CDMY class that encapsulates the client wide Redis, used to store and retrieve JSON-content:

public class CachingLayer { private readonly IConnectionMultiplexer _redis; public CachingLayer(IConnectionMultiplexer connectionMultiplexer) => _redis=connectionMultiplexer; public async Task<T> TryGetAsync<T>(string key) where T : class { var result=await _redis.GetDatabase().StringGetAsync(key); if (result.HasValue) return JsonSerializer.Deserialize<T>(result.ToString()); return null; } public async Task SetAsync<T>(string key, T obj) where T : class => await _redis.GetDatabase().StringSetAsync(key, JsonSerializer.Serialize(obj)); } 

All the above parts are connected together in CDMY20CDMY class configures the request pipeline and registering the required services:

public class Startup { private readonly IConfiguration _configuration; public Startup(IConfiguration configuration) => _configuration=configuration; private string GetRedisConnectionString() => _configuration.GetConnectionString("Redis"); public void ConfigureServices(IServiceCollection services) { services.AddMvc(o => o.EnableEndpointRouting=false); services.AddSingleton<IConnectionMultiplexer>( ConnectionMultiplexer.Connect(GetRedisConnectionString())); services.AddSingleton<CachingLayer>(); services.AddHttpClient<LocationProvider>(); services.AddTransient<SolarCalculator>(); } public void Configure(IApplicationBuilder app, IWebHostEnvironment env) { if (env.IsDevelopment()) app.UseDeveloperExceptionPage(); app.UseMvcWithDefaultRoute(); } } 

Please note that we do not have to implement in the classroom some interfaces an end in itself, simply because we do not plan to use a stub. We may need to replace one of the services in the tests, but so far it is not clear, so we can avoid unnecessary work (and damage the structure of the code), while precisely will not believe that it is necessary.

Though the project and is quite simple, this app contains a sufficient amount of infrastructure complexity: it relies on third-party web service (GeoIP ISP), as well as on the storage layer (Redis). This is a very standard scheme used in many real projects.

In the classical approach, centered on unit testing, we have targeted the layer services, and possibly to the application layer controller, and would write isolated tests to ensure the proper execution of every code path. Such an approach would be in some way useful, but it will never give us confidence that the true endpoints with all middleware and peripheral components are working as expected.

So instead we will write tests, aimed directly at the endpoints. For this we need to create a separate test project and add a few infrastructure components that support our tests. One of them - it CDMY21CDMY, which will be used to encapsulate the virtual application instance:

public class FakeApp : IDisposable { private readonly WebApplicationFactory<Startup> _appFactory; public HttpClient Client { get; } public FakeApp() { _appFactory=new WebApplicationFactory<Startup>(); Client=_appFactory.CreateClient(); } public void Dispose() { Client.Dispose(); _appFactory.Dispose(); } } 

Most of the work is already done CDMY22CDMY - provides a framework utility that allows us to load a program into memory for testing purposes. it also provides an API for us to override configuration, registration services and request pipeline processing.

We can use an instance of this object in the tests to run the application and send a request to the supplied CDMY23CDMY, and then verify whether the response to our expectations. This instance can be either common to several tests, or generated separately for each test.

Since we also use a wide Redis, we need a way to launch a new server, which will be used by the application. There are many ways to implement this, but for a simple example, I decided to use for this purpose API equipment (fixture) framework xUnit:

public class RedisFixture : IAsyncLifetime { private string _containerId; public async Task InitializeAsync() {//Simplified, but ideally should bind to a random port var result=await Cli.Wrap("docker") .WithArguments("run -d -p 6379:6379 redis") .ExecuteBufferedAsync(); _containerId=result.StandardOutput.Trim(); } public async Task ResetAsync() => await Cli.Wrap("docker") .WithArguments($"exec {_containerId} redis-cli FLUSHALL") .ExecuteAsync(); public async Task DisposeAsync() => await Cli.Wrap("docker") .WithArguments($"container kill {_containerId}") .ExecuteAsync(); } 

Shown above code implements CDMY24CDMY interface that allows us to determine the methods that will be executed before and after the test runs. We use these methods to start the Redis container Docker followed by its destruction after the completion of testing.

In addition, CDMY25CDMY class CDMY26CDMY also discloses a method that can be used to perform CDMY27CDMY command that removes all the keys from the database. We will call this method before each test to reset Redis to the pure state. Alternatively, we could simply restart the container, which takes more time, but more reliably.

Set up the infrastructure, you can move on to writing the first test:

public class SolarTimeSpecs : IClassFixture<RedisFixture>, IAsyncLifetime { private readonly RedisFixture _redisFixture; public SolarTimeSpecs(RedisFixture redisFixture) { _redisFixture=redisFixture; }//Reset Redis before each test public async Task InitializeAsync() => await _redisFixture.ResetAsync(); [Fact] public async Task User_can_get_solar_times_for_their_location_by_ip() {//Arrange using var app=new FakeApp();//Act var response=await app.Client.GetStringAsync("/solartimes/by_ip"); var solarTimes=JsonSerializer.Deserialize<SolarTimes>(response);//Assert solarTimes.Sunset.Should().BeWithin(TimeSpan.FromDays(1)).After(solarTimes.Sunrise); solarTimes.Sunrise.Should().BeCloseTo(DateTimeOffset.Now, TimeSpan.FromDays(1)); solarTimes.Sunset.Should().BeCloseTo(DateTimeOffset.Now, TimeSpan.FromDays(1)); } } 

As you can see, the circuit is very simple. We only need to instantiate and use the provided CDMY28CDMY CDMY29CDMY to send requests to one of the endpoints, as if it happened in real web application.

Specifically, this test asks CDMY30CDMY route, determine the time of sunrise and sunset for the current date based on the user IP.Since we rely on a true GeoIP provider and do not know what the result will be, we perform property-based statements to guarantee the validity of sunrise and sunset times.

Although these statements are capable of tracking many potential bugs, they do not give us complete confidence in the perfect result. However, there are a couple of ways we can improve the situation.

The obvious way is to replace the real GeoIP provider with a fake instance, always returning the same location, which will allow us to hard-code the expected sunrise and sunset times in the code. The disadvantage of this approach is that we essentially reduce the scale of integration, that is, we can’t make sure that the application communicates correctly with a third-party service.

As an alternative approach, we can replace the IP address that the test server receives from the client. Thanks to this, we will make the test more rigorous, while maintaining the same scale of integration.

To do this, we will need to create a launch filter that allows us to inject the selected IP address into the request context using middleware:

public class FakeIpStartupFilter : IStartupFilter { public IPAddress Ip { get; set; }=IPAddress.Parse("::1"); public Action<IApplicationBuilder> Configure(Action<IApplicationBuilder> nextFilter) { return app => { app.Use(async (ctx, next) => { ctx.Connection.RemoteIpAddress=Ip; await next(); }); nextFilter(app); }; } } 

Then we can connect it to CDMY31CDMY by registering it as a service:

public class FakeApp : IDisposable { private readonly WebApplicationFactory<Startup> _appFactory; private readonly FakeIpStartupFilter _fakeIpStartupFilter=new FakeIpStartupFilter(); public HttpClient Client { get; } public IPAddress ClientIp { get => _fakeIpStartupFilter.Ip; set => _fakeIpStartupFilter.Ip=value; } public FakeApp() { _appFactory=new WebApplicationFactory<Startup>().WithWebHostBuilder(o => { o.ConfigureServices(s => { s.AddSingleton<IStartupFilter>(_fakeIpStartupFilter); }); }); Client=_appFactory.CreateClient(); }/*... */} 

Now we can supplement the test to use specific data:

[Fact] public async Task User_can_get_solar_times_for_their_location_by_ip() {//Arrange using var app=new FakeApp { ClientIp=IPAddress.Parse("") }; var date=new DateTimeOffset(2020, 07, 03, 0, 0, 0, TimeSpan.FromHours(-5)); var expectedSunrise=new DateTimeOffset(2020, 07, 03, 05, 20, 37, TimeSpan.FromHours(-5)); var expectedSunset=new DateTimeOffset(2020, 07, 03, 20, 28, 54, TimeSpan.FromHours(-5));//Act var query=new QueryBuilder { {"date", date.ToString("O", CultureInfo.InvariantCulture)} }; var response=await app.Client.GetStringAsync($"/solartimes/by_ip{query}"); var solarTimes=JsonSerializer.Deserialize<SolarTimes>(response);//Assert solarTimes.Sunrise.Should().BeCloseTo(expectedSunrise, TimeSpan.FromSeconds(1)); solarTimes.Sunset.Should().BeCloseTo(expectedSunset, TimeSpan.FromSeconds(1)); } 

Some developers may still be worried about using a third-party web service in the tests, because this can lead to non-deterministic results. At the same time, it can be argued that we really need to embed this dependency in our tests, because we want to know whether it will break or change in an unexpected way, because it can lead to bugs in our own software.

Of course, we can’t always use real dependencies, for example, if a service has restrictions on use, it costs money, or just slow or unreliable. In such cases, we will have to replace it with a fake (preferably not a stub) implementation for use in tests. However, in our case, this is not so.

Similar to what we did with the first test, we can write a test covering the second endpoint. This test is simpler because all incoming parameters are passed directly as part of the URL request:

[Fact] public async Task User_can_get_solar_times_for_a_specific_location_and_date() {//Arrange using var app=new FakeApp(); var date=new DateTimeOffset(2020, 07, 03, 0, 0, 0, TimeSpan.FromHours(+3)); var expectedSunrise=new DateTimeOffset(2020, 07, 03, 04, 52, 23, TimeSpan.FromHours(+3)); var expectedSunset=new DateTimeOffset(2020, 07, 03, 21, 11, 45, TimeSpan.FromHours(+3));//Act var query=new QueryBuilder { {"lat", "50.45"}, {"lon", "30.52"}, {"date", date.ToString("O", CultureInfo.InvariantCulture)} }; var response=await app.Client.GetStringAsync($"/solartimes/by_location{query}"); var solarTimes=JsonSerializer.Deserialize<SolarTimes>(response);//Assert solarTimes.Sunrise.Should().BeCloseTo(expectedSunrise, TimeSpan.FromSeconds(1)); solarTimes.Sunset.Should().BeCloseTo(expectedSunset, TimeSpan.FromSeconds(1)); } 

We can continue to add similar tests to ensure that the application supports all possible locations and dates, and also handles potential borderline cases, such as the polar day . However, it is possible that this approach will not scale well, because we may not need to run the entire pipeline every time only to verify the correctness of the business logic that calculates the sunrise and sunset times.

It is also important to note that although we tried to avoid this as much as possible, we can still reduce the scale of integration, if there are real reasons for that. In this case, you can try to cover additional cases with unit tests.

Usually this would mean that we somehow need to isolate CDMY32CDMY from CDMY33CDMY, which in turn implies stubs. Fortunately, there is a tricky way to avoid this.

We can change the implementation of CDMY34CDMY by separating the clean and dirty parts of the code:

public class SolarCalculator { private static TimeSpan CalculateSolarTimeOffset(Location location, DateTimeOffset instant, double zenith, bool isSunrise) {/*... */} public SolarTimes GetSolarTimes(Location location, DateTimeOffset date) { var sunriseOffset=CalculateSolarTimeOffset(location, date, 90.83, true); var sunsetOffset=CalculateSolarTimeOffset(location, date, 90.83, false); var sunrise=date.ResetTimeOfDay().Add(sunriseOffset); var sunset=date.ResetTimeOfDay().Add(sunsetOffset); return new SolarTimes { Sunrise=sunrise, Sunset=sunset }; } } 

We changed the code so that instead of using CDMY35CDMY to get the location, the CDMY36CDMY method receives it as an explicit parameter. Because of this, we also no longer need dependency inversion, because there are no dependencies for inversion.

To reconnect everything together, we just need to change the controller:

[ApiController] [Route("solartimes")] public class SolarTimeController : ControllerBase { private readonly SolarCalculator _solarCalculator; private readonly LocationProvider _locationProvider; private readonly CachingLayer _cachingLayer; public SolarTimeController( SolarCalculator solarCalculator, LocationProvider locationProvider, CachingLayer cachingLayer) { _solarCalculator=solarCalculator; _locationProvider=locationProvider; _cachingLayer=cachingLayer; } [HttpGet("by_ip")] public async Task<IActionResult> GetByIp(DateTimeOffset? date) { var ip=HttpContext.Connection.RemoteIpAddress; var cacheKey=ip.ToString(); var cachedSolarTimes=await _cachingLayer.TryGetAsync<SolarTimes>(cacheKey); if (cachedSolarTimes != null) return Ok(cachedSolarTimes);//Composition instead of dependency injection var location=await _locationProvider.GetLocationAsync(ip); var solarTimes=_solarCalculator.GetSolarTimes(location, date ?? DateTimeOffset.Now); await _cachingLayer.SetAsync(cacheKey, solarTimes); return Ok(solarTimes); }/*... */} 

Since the existing tests are not aware of the implementation details, this simple refactoring will not break them. Having done this, we can write additional short tests for more detailed coverage of business logic without the need for stubs:

[Fact] public void User_can_get_solar_times_for_New_York_in_November() {//Arrange var location=new Location { Latitude=40.71, Longitude=-74.00 }; var date=new DateTimeOffset(2019, 11, 04, 00, 00, 00, TimeSpan.FromHours(-5)); var expectedSunrise=new DateTimeOffset(2019, 11, 04, 06, 29, 34, TimeSpan.FromHours(-5)); var expectedSunset=new DateTimeOffset(2019, 11, 04, 16, 49, 04, TimeSpan.FromHours(-5));//Act var solarTimes=new SolarCalculator().GetSolarTimes(location, date);//Assert solarTimes.Sunrise.Should().BeCloseTo(expectedSunrise, TimeSpan.FromSeconds(1)); solarTimes.Sunset.Should().BeCloseTo(expectedSunset, TimeSpan.FromSeconds(1)); } [Fact] public void User_can_get_solar_times_for_Tromso_in_January() {//Arrange var location=new Location { Latitude=69.65, Longitude=18.96 }; var date=new DateTimeOffset(2020, 01, 03, 00, 00, 00, TimeSpan.FromHours(+1)); var expectedSunrise=new DateTimeOffset(2020, 01, 03, 11, 48, 31, TimeSpan.FromHours(+1)); var expectedSunset=new DateTimeOffset(2020, 01, 03, 11, 48, 45, TimeSpan.FromHours(+1));//Act var solarTimes=new SolarCalculator().GetSolarTimes(location, date);//Assert solarTimes.Sunrise.Should().BeCloseTo(expectedSunrise, TimeSpan.FromSeconds(1)); solarTimes.Sunset.Should().BeCloseTo(expectedSunset, TimeSpan.FromSeconds(1)); } 

Although these tests no longer use the full integration area, they still come from the functional requirements of the application. Since we already have another high-level test covering the entire endpoint, we can make these tests narrower without sacrificing a general level of confidence.

Such a compromise is reasonable if we strive to increase the speed of execution, but I would recommend to stick to the highest level tests as much as possible, at least until it becomes a problem.

Finally, we should do something to ensure that the Redis caching layer works properly. Even though we use it in our tests, in fact it never returns a cached answer, because between the tests the database is reset to its original state.

The problem with testing aspects such as caching is that they cannot be defined by functional requirements. A user who does not know about the internal operation of the application cannot find out if the response is returned from the cache.

However, if our task is only to test the integration between the application and Redis, we do not need to write tests that are aware of the implementation features, and instead we can do something like this:

[Fact] public async Task User_can_get_solar_times_for_their_location_by_ip_multiple_times() {//Arrange using var app=new FakeApp();//Act var collectedSolarTimes=new List<SolarTimes>(); for (var i=0; i < 3; i++) { var response=await app.Client.GetStringAsync("/solartimes/by_ip"); var solarTimes=JsonSerializer.Deserialize<SolarTimes>(response); collectedSolarTimes.Add(solarTimes); }//Assert collectedSolarTimes.Select(t => t.Sunrise).Distinct().Should().ContainSingle(); collectedSolarTimes.Select(t => t.Sunset).Distinct().Should().ContainSingle(); } 

The test makes a request to one endpoint several times and claims that the result always remains the same. This is enough to verify that responses are cached correctly and return them in the same way as regular responses.

In the end, we got a simple test suite that looks like this:

ITKarma picture

Please note that the test execution speed is pretty good - the fastest integrated test completes in 55 ms, and the slowest in less than a second (due to a cold start). Given that these tests use the entire work cycle, including all dependencies and infrastructure, without using any stubs, I can say that this is more than acceptable.

If you want to experiment with the project yourself, you can find it on GitHub .

Disadvantages and limitations

Unfortunately, there is no silver bullet , and the solutions described in this article also suffer from potential flaws. In the interest of justice, it makes sense to mention them.

One of the most serious problems that I encountered while implementing high-level functional testing is finding a satisfactory balance between utility and applicability. Compared to approaches that are entirely focused on unit testing, more efforts are required to verify the sufficient determinism of such tests, their speed of execution, the ability to execute independently of each other, and generally their applicability in the development process.

The large scale of the tests also implies the need for a deeper understanding of the project dependencies and the technologies that are used in it. It is important to know how they are used, whether they are easy to contain, the availability of options and the necessary trade-offs.

In the context of integrated testing, the “testability” aspect is determined not by how well the code can be isolated, but by how real infrastructure is adapted to testing and makes it easier. This imposes certain requirements from the point of view of technical competence both on the responsible person and on the team as a whole.

In addition, it may take some time to prepare and configure the test environment, because it requires the creation of fixtures, connecting fake implementations, adding specialized initialization and cleaning behavior, and so on. All these aspects will have to be supported in the process of increasing the scale and complexity of the project.

Writing functional tests in itself also requires a little more planning, because now it is not a question of covering each method of each class, but of defining software requirements and turning them into code. Sometimes it is difficult to understand these requirements and which of them are functional, because for this you need to think as a user.

Another common problem is that high-level tests often suffer from a lack of locality.If the test fails due to unmet expectations or an unhandled exception, it is usually not clear what exactly caused the error.

Although there are always ways to partially eliminate this problem, it always becomes a compromise: isolated tests better indicate the cause of the error, and integrated tests better show its effect. Both species are equally useful, so it comes down to what you find more important.

In the end, I still think that functional testing is worth it, despite all its shortcomings, because, in my opinion, it leads to an increase in the convenience and quality of development. I haven’t done classical unit testing for a long time and am not going to return to it.


Unit testing is a popular approach to software testing, but mainly for erroneous reasons. Often it is imposed as an effective way for developers to test their code, encouraging the use of best design practices, but many find it difficult and superficial.

It is important to understand that testing during development is not equal to unit testing. The main goal is not to write the most isolated tests, but to achieve confidence that the code works in accordance with functional requirements. And there are better ways to do this.

Writing high-level tests guided by user behavior will provide you with a much greater return on investment in the long run; nor is it as complicated as it seems. Find the best approach for your project and stick to it.

Here are the main lessons:

  1. Think critically and question the best practices
  2. Don't rely on the test pyramid
  3. Separate tests by functionality, not by class, module or scope
  4. Strive for the highest possible level of integration while maintaining reasonable speed and costs
  5. Avoid sacrificing software structure for testability
  6. Use stubs only in extreme cases

There are other great articles about alternative testing approaches in modern software development. Here are those that seemed interesting to me personally: