Exploring How to Define a Testing Problem as the Center of a Well-Defined Software Testing Strategy

Photo by Alexander Hoggard on Unsplash
Often, it seems like engineers faced with solving a software testing problem begin defining testing strategy in terms of the tasks that should be undertaken in order to provide sufficient coverage of work product.
The strategies they produce tend to look like tactical plans that function primarily to answer questions like these:
- Which functional areas within- or code paths established by the solution should be exercised?
- Which approaches should be taken to exercise the solution? To which degrees, for example, should the approach engage with positive or negative testing?
- Which specific use cases, workflows, or areas of concern should be evaluated against? Should testing focus mainly on the happy path, or should it explore more edge cases? Where are the possible bugs or product issues that might be most valuable to discover if they exist? What are the variable inputs to user workflows that might be most valuable to consider here?
- Which failure- or boundary scenarios should testing target specifically?
At the same time, it seems like software development organizations frequently adopt a similar approach to defining testing strategy, even if the questions read differently:
- Which tasks can we assign to engineers and testers in order to maximize coverage while at the same time making the outcomes of testing efforts as consistent and predictable as possible?
- How can we tailor both task assignments and corresponding evaluations of performance to making testing efforts easy to trace, easy to report, and easy to manage?
- How can we define and manage practices and processes for the entire testing department (or across development teams department-wide) so that for every foreseeable exception or issue it's clear how to proceed in service of the above?
- If the work Software Engineers produce delivers value that is easy to measure the profitability of, and we accept that the work Quality Engineers produce is challenging to measure the profitability of, how can we organize assignments between the two groups in order to maintain separation along these lines administratively?
Approaches like these tend to work relatively well until an issue surfaces that results in operational churn, delay, or missed opportunities for coverage. Usually some sort of adjustment or rework needs to take place (following some sort of investigation) in order to bridge the gap between what's already been tested and what otherwise might need to be tested in order to satisfy a mental model or working list of concerns and workflows that should be covered. What's worse is when issues like these have a cascading effect on inputs and outputs related to accurate estimates of work and assessments of the current state of coverage. This can be true even if an organization uses metrics like quantified code- or workflow coverage.
And when issues like these occur, often we find that activities related to testing software are no longer either serviceable or efficient, let alone both.
If we accept that the main approach outlined here clearly produces a strategy for what to test and how to test it, then hopefully it should also be clear that what that leaves this approach prone to these sorts of issues is not lack of a strategy. Generally the problem tends to relate to committing to sets of practices understood to produce visibility (or at least resemble producing visibility) without articulating clearly either how testing presents a problem worth solving for the organization or how a particular strategy intends to solve that problem.
If an organization (or engineers) were to define a testing problem in terms also recognizable as an engineering problem (not just as a process- or task-management problem), it should generally be easier to relate that problem to organizational needs and motivations, which should help make it clear why the problem is valuable for the organization to solve. What's more, if the goal of a testing strategy is to define how to approach what to test and how, it should also be easier to relate any testing strategy to the understanding of that problem as a potential resolution.
By providing a clearer big-picture frame of reference for testing by way of defining a testing problem as an engineering problem, it should be possible to define strategy in terms of that problem in a way that minimizes risk for the sorts of issues listed above. In my experience as an engineer, I've used this more than once to do just this; in the conclusion I'll actually list a couple real-world examples where I have.
This post will explore how to define a testing problem that can be used, in turn, to define testing strategy.
Articulate the Need/ Stakes for Testing
Nearly any software that's developed with the aim of completing work is produced for strategic benefit. That is to say: we develop production software because we seek to meet a need for the functionality software provides. Maybe it helps consumers make tasks more efficient, it provides access to store, analyze, and recall data, it enforces policy, or it makes data easier to visualize or interact with. Perhaps it's something else.
When we talk about software as a solution, it's generally shorthand for this reasoning. A solution applies a particular technology to solving problems for consumers.
The ability of a solution to meet this need also presents strategic benefit: not only is the solution valuable, but the relationship between the producer of that solution and the consumer of that solution is also valuable; in this, the solution is likely valuable to each in different ways.
This means the solution is valuable to somebody for some reason. Or, possibly, some people for a number of reasons. And where it's functionality that delivers value, the ability of the solution to continue to deliver value (consistently, as expected by the producer) is also of value. Maybe it's valuable within a marketplace of similar competing solutions. Or maybe there are other reasons (and there other problems the solution is potentially valuable in solving).
One reason (but not the only one) that explains why we test software has to do with examining the relationship between how the producer intends for the solution to deliver value and how it actually delivers value.
But why? Experience tells me that this question is generally a good place to start defining a testing problem. If software needs to be tested, that need likely has something to do with the way a solution applies technology to deliver value, the ways in which that value is relevant to the needs of important stakeholders, and what the risk or consequences might be in case the solution fails to deliver value. Through testing, the organization seeks to develop visibility into the current functional state of work product, so that it can confirm that the solution delivers value, consistently and as expected by the producer.
A testing strategy will need to address these concerns somehow. The more directly (and comprehensively) a strategy can address these concerns, generally the greater return on the investment in testing.
To articulate the role why plays in understanding testing as a legitimate engineering problem, it can help to start with questions like these:
- What Is the Nature of the Solution and the Value It Delivers? If a solution delivers value by applying technology somehow, what is the specific application of technology that makes it possible? What is the nature of the value that particular application provides as a solution? What are the essential things that the solution is expected to do that is of value to consumers? If the functionality is a login screen, does it deliver value by restricting access to users with accepted credentials (possibly submitted through a Web form)? In another example, does the solution enforce access rights by way of a CRUD permission set?
- In Which Specific Ways Is the Value a Solution Delivers Relevant to the Needs of the Solution's Consumers, Producer, and Other Relevant Stakeholders? What are the specific ways that this value provides benefit (including strategic benefit) to any of the parties mentioned here? Why is that benefit important? If search functionality, for example, serves to delineate effectively between expected and unexpected matches included in search results, why is that separation valuable to consumers and producers of the search solution? How about a Web page that allows- and responds to specific types of interaction?
- What Is the Nature of Any Risk Presented by a Failure to Deliver Value? If a particular solution or feature set somehow failed behave as expected, what would the consequences of such a failure be? What would the technical consequences be? What would the consequences of failure to deliver value (and possibly also enjoy strategic benefit) be to the consumer, the development organization, and/ or potentially to other relevant stakeholders? If a failure could be expected potentially to have impact on external stakeholders, what might the scope and consequences of that impact be? If a login screen were to fail to restrict access, what would the consequences be? What about a Web UI that failed to facilitate the expected interaction?
Articulate the Availability and Significance of System Output
In order to deliver the value intended by its producer, a solution likely produces some sort of output. This output is important, because it's what the consumer also expects; as such it serves a role as a recognizable sign (or indication) that a solution has delivered the expected value. During runtime a solution (in Test Engineering we refer to this as a system under test) either produces the expected output or it does not.
This output is what we use as an indication that the system under test has behaved as intended by those who produced it. Hopefully that output can be used to answer questions about functionality that are deterministic enough that they can be answered with a simple yes
or no
, depending on the nature of output. And if the system under test fails to behave in a manner that conforms to expectations, output that has been selected well (as an indication of functional state) can also be useful as an indicator exactly how the system under test is not behaving as intended.
At scale, the data and feedback gathered through evaluation of output works like puzzle pieces to assemble a complete (as much as possible) picture of current functional state. And at the same time certain pieces of the puzzle might be useful to help fill in the picture, they might also be helpful to separate knowns from unknowns (which is an important idea in engineering, as well risk management and strategic decision-making). In this, the presence of one puzzle piece might also be useful to help define the shape of the puzzle piece expected to fit into in the adjoining spot.
In this, output serves a role within testing as a means to evaluate functional behaviors exhibited by of the system under test. If testing serves to develop visibility into functional work state, output (and the evaluation of output) helps to inform data and feedback that can be used to inform this visibility. If the questions provided in the last section defined the why for this problem, the questions in this section define in essence the what (or more specifically which output) helps define a testing problem as an input to software tesitng strategy.
A suitable testing strategy will need to deal in this output in order to be able to produce the information that meets the needs that inform why.
To articulate the role what (as a function of which output) plays in understanding testing as a legitimate engineering problem, it can help to start with questions like these:
-
Which Output from the System Under Test Is Available to Be Extracted/ Evaluated? When testing it's important to identify which output can be expected to serve as a clear answer to the yes-or-no question or line of inquiry a test is expected to execute in service of. Either a light turns on (when the switch is flipped to
ON
) or it does not. Either a value gets persisted to a database when expected or it does not. If these are examples of system outputs, which outputs are produced by the system under test that can serve as a clear indication of the feature set behaving as expected (or that it has not)? -
How Can Output Be Evaluated as a Means of Producing Feedback and/ or Data? Any output produced by the system under test delivers a certain amount of value as an indication that it has actually behaved as expected (or, conversely, that it has not). Which sorts of conclusions does that output serve as evidence in support of? How does an assertion related to this output (in essence, answering the yes-or-no question) fit into the puzzle? Do any conclusions that can be drawn from the evaluation of output also help to separate knowns from unknowns by serving as a piece of the puzzle?
Articulate the Openness of the System Under Test to Interaction
Within the set of all software, it seems like fewer new software solutions run completely in isolation. 40 years ago, for example, nearly every video game running in an arcade or on a home console ran in complete isolation on specialized hardware; today they run on nearly any type of hardware, within emulators, with internet connections and in some cases accessible through access services, billed monthly. Today the Internet (40 years ago more recognizable as ARPANET) is home to some the world's largest software service platforms, as well as additional software service platforms that run on those platforms. Over a few decades, there's a lot that time appears to have changed.
Software's openness to interaction presents two different levels of importance in testing. On one level, openness presents a set of opportunities to interact with the system under test in order to produce and extract output. On another level, openness to interaction presents opportunities for variation in how the system under test delivers functionality. The more a solution accepts configuration, for example, the more it is open to variations in runtime but also inputs to produce functional output. The more varied the runtime environments, operating systems, or hardware configurations a solution runs on, the more deftly a solution (or its architecture, or its implementation) will likely need to (somehow) navigate those variations in order to be able to produce the expected value consistently across all platforms.
When attempting to define a testing problem, awareness of both of these levels (as well as any role time might play) can be important in understanding the how of the testing problem. Within this, it also pays to be able to connect how to the why to the what (or which output).
To clarify a little at the same time, though: if the first set of questions seeks to establish why it is valuable to test, and answers to the second set of questions serve to inform an understanding of what (or, more correctly: which outputs) it would be worthwhile to evaluate to respond to the need/ stakes for testing, this set of questions does not necessarily seek to define how to go about producing and retrieving that output. That involves an approach to what to test and how, which is the job of a well-defined testing strategy.
Instead, a suitable testing strategy will need to incorporate both levels of how described here, in order to match which outputs to why at each level.
To articulate the role how the system makes itself (or is made) open to interaction (and how that might inform how a strategist might go about planning what to test and how) plays within a testing problem that is also recognizable as an engineering problem, it can help to start with questions like these:
- Which Role/s Does Runtime Context Play in Solution Operations? If the solution is expected to operate in a variety of environments, which role/s might those environments play in the ability of the solution to deliver the expected value? At the same time, which opportunities or constraints might exist that a variety of runtime environments provide to (or impose on) an attempt to test the solution? For example, if a solution written in Java or Python (or for Node?) needs to access different parts of the filesystem depending on whether it's running within Windows or a POSIX-based operating system, how might this affect how the solution functions in either context? Does it change the way testing might be able to gather output? Does it present additional concerns for the ability of the solution to deliver value?
- How Does the Solution Open/ Close Itself to Interaction During Test Runtime? How does the solution's architecture or implementation make it either open or closed to interact with and gather output from? Is it open only at the e2e level, or is there some lower level on the testing pyramid (like the integration level, between application layers, or between microservices) that might be useful to gather output from? If the solution makes use of a UI that operates in a runtime separate from the backend, is there an opportunity to test the UI separately from the API? Is there an opportunity to test request handling logic for the API subcutaneously, without needing to make network calls (even to localhost) in order to be able to interact with that logic?
- Which Tooling (or Other Means) Is Available to Interact with the Solution and Collect Output? How does the availability of (or functionality provided within, or architecture established by) existing testing tooling (or tooling we might be able to build ourselves) provide opportunities or constraints to be able to produce and gather output? If a specific Web browser is known for running JavaScript directly on the hardware and a specific Web UI automation tool does not support interaction with that browser (or maybe vice-versa; I can't remember anymore), which opportunities or constraints does that knowledge present that a corresponding testing strategy should likely account for?
- Which Role/s Could Future Developments Play in Any of the Above? If the one true constant in software engineering (as with most everything else in life) is the potential for change, that makes change a potential input to functionality over time. The more time passes, the greater the degree of likelihood that solutions, the technologies that support them, and (ultimately) the needs that give the solution (as an application of technology) relevance might change. Does this escalating degree of likelihood present opportunities for testing? How about constraints? Does our understanding of the testing problem now provide an opportunity to solve a problem today that also anticipates a potential problem tomorrow? For anybody interested in ROI, that might be worth examining.
Conclusion
One of the primary goals of software testing involves developing visibility (into the current functional state of work product) by evaluating output produced by and gathered from the system under test (that is: a software solution expected to deliver value). There is more than one way to test and more than one type of feedback (more on this in the footnote below), but where testing (of any type) seeks to interact with the system under test in order to evaluate it, having as clear an understanding as possible which opportunities and constraints exist for the means to produce and gather output, what the significance of any such output may be, and why coverage derived from that output (by way of the above) solves problems that are valuable to the business. This helps make a testing problem a recognizable as a legitimate engineering problem of value to solve for the organization.
A software testing strategy that works from a clear understanding of a testing problem still looks a lot like a plan of attack: it's still an approach to what to test and how. Now, though, centered on a clearly-defined testing problem, it should resemble a plan of attack with an idea where it means to go (also let's not forget why). It should have a clearly-articulated sense of the bigger picture and its relevance to it. The strategy should also have an accompanying structure that helps define it and gives it operational context at the same time it supports the suggestions it makes as to how to proceed, at the same time it helps systematize and clarify understandings of where it might be valuable to look.
Despite this, engineers and organizations conventionally approach testing strategy in terms of the tasks that need to be undertaken and managed in order to produce some amount of visibility. As noted in the introduction, though, approaching strategy from a task-first perspective leaves those involved in delivering on that strategy open to issues related to churn, missed opportunities for coverage, and delays.
What, in my experience, guards against opportunities for these sorts of issues is a clearer understandings of found through questions related why, what (or which output), and how, as outlined throughout this post.
To explore how this might work, consider the following as examples:
- If a major change is being made to the architecture of the display layer in an N-tier Web app, what is the risk that the change poses to delivery of value that is central to a solution's value statement, and what's a reasonable path of least resistance for the organization to gather sufficient data and feedback in response to that risk while at the same time limiting waste?
- If multi-statement search functionality presents quintillions of possible statement combinations to define saved searches, what is a path of least resistance to gather the most relevant and comprehensive data and feedback without turning testing into a case-by-case game of Whac-A-Mole? Which output would be valuable to gather as a means of informing visibility? How does the system under test leave itself open to producing and gathering data and feedback?
- If part of the system under test interacts with an external commercial API (hosted, for example, by a vendor), what data and feedback would be valuable to gather related to the functional state of that interaction? Why would they be valuable, and how would they serves as pieces of a puzzle? How can the organization gather as much of this information as possible (even automate testing) while at the same time avoiding unnecessary load on the external API? Does the commercial service by any chance bill for access to that API by request volume?
To explore the value defining a testing problem clearly might provide in developing a well-defined testing strategy for any of these examples, consider comparing two approaches:
- What would a strategy look like that used responses to the questions listed in the main sections of this post as a starting point to define strategy?
- What would a strategy look like that used responses only to the questions listed in the introduction as a starting point?
By defining a testing problem clearly and making the definition of that problem the center of a sort of structural- and tensile integrity (something like the hub of a wheel with spokes) for the strategy used to ultimately solve it, hopefully it should be more of a challenge to miss opportunities for coverage, and there should be fewer surprises that lead to churn, rework, or other delays. Because good testing strategy makes use of the context within which that strategy aims to deliver success, any questions that develop during test planning or execution eventually stop serving as an opportunity to regroup and start to serve instead as an opportunity to restate and retest assumptions and understandings that have already survived a similar type of scrutiny in the process of producing the definition of the problem that the strategy relies on.
This is mostly because good strategy of any type makes judicious use of a clear understanding of opportunities and constraints that define the problem that strategy is tasked with resolving. Software testing strategy just happens to be another type of strategy.
When it comes to defining a strategy to solve a software testing problem, maybe no need to reinvent the wheel.
At the same time, though: the organization, team, or individual will hopefully not encounter issues like these as often with a solid, well-thought-out testing strategy based on a clear understanding of the testing problem that strategy aims to solve, whichever tasks may ultimately be most suitable to get a solution over the line.
Quick Footnote: To be clear, this should hopefully not serve as an exhaustive list. One noteworthy example of another category of feedback that this post intentionally sidesteps because it doesn't fit into the same box as functional validation is the sort of subjective feedback produced through acceptance testing or unstructured exploratory testing. Another is performance data, which it seems organizations for some reason are asking behavioral test engineers to be prepared to make part of automated quality gates. Yet another involves a business dimension to defining testing problems (for example, which subset of users still use Internet Explorer 11?) related directly to the business ROI for a given set of inquiries.