Within the field of black box testing, Kaner & Bach (see our course notes, Bach 2003b and Kaner, 2002, posted at www.testingeducation.org, and see Kaner, Bach & Pettichord, 2002) have Described eleven dominant styles of black box testing:
- Function testing
- Domain testing
- Specification-based testing
- Risk-based testing
- Stress testing
- Regression testing
- User testing
- Scenario testing
- State-model based testing
- High volume automated testing
- Exploratory testing
Bach and I call these "paradigms" of testing because we have seen time and again that one or two of them dominate the thinking of a testing group or a talented tester. An analysis we find intriguing goes like this:
If I was a "scenario tester" (a person who defines testing primarily in terms of application of scenario tests), how would I actually test the program? What makes one scenario test better than another? Why types of problems would I tend to miss, what would be difficult for me to find or interpret, and what would be particularly easy? Here are thumbnail sketches of the styles, with some thoughts on how test cases are “good” within them.
Function Testing
Test each function / feature / variable in isolation.
Most test groups start with fairly simple function testing but then switch to a different style, often involving the interaction of several functions, once the program passes the mainstream function tests.
Within this approach, a good test focuses on a single function and tests it with middle-of-theroad
values. We don’t expect the program to fail a test like this, but it will if the algorithm is fundamentally wrong, the build is broken, or a change to some other part of the program has
fowled this code.
These tests are highly credible and easy to evaluate but not particularly powerful.
Some test groups spend most of their effort on function tests. For them, testing is complete when every item has been thoroughly tested on its own. In my experience, the tougher function tests look like domain tests and have their strengths.
Domain Testing
The essence of this type of testing is sampling. We reduce a massive set of possible tests to a small group by dividing (partitioning) the set into subsets (subdomains) and picking one or two
representatives from each subset.
In domain testing, we focus on variables, initially one variable at time. To test a given variable,
the set includes all the values (including invalid values) that you can imagine being assigned to the variable. Partition the set into subdomains and test at least one representative from each
subdomain. Typically, you test with a "best representative", that is, with a value that is at least as likely to expose an error as any other member of the class. If the variable can be mapped to the number line, the best representatives are typically boundary values.
Most discussions of domain testing are about input variables whose values can be mapped to the
number line. The best representatives of partitions in these cases are typically boundary cases.
A good set of domain tests for a numeric variable hits every boundary value, including the minimum, the maximum, a value barely below the minimum, and a value barely above the maximum.
The first time these tests are run, or after significant relevant changes, these tests carry a lot of
information value because boundary / extreme-value errors are common.
Bugs found with these tests are sometimes dismissed, especially when you test extreme values of several variables at the same time. (These tests are called corner cases.) They are not necessarily credible, they don’t necessarily represent what customers will do, and thus they are not necessarily very motivating to stakeholders.
Specification-Based Testing
Check the program against every claim made in a reference document, such as a design specification, a requirements list, a user interface description, a published model, or a user manual.
These tests are highly significant (motivating) in companies that take their specifications seriously. For example, if the specification is part of a contract, conformance to the spec is very
important. Similarly products must conform to their advertisements, and life-critical products
must conform to any safety-related specification.
Specification-driven tests are often weak, not particularly powerful representatives of the class of tests that could test a given specification item. Some groups that do specification-based testing focus narrowly on what is written in the document. To them, a good set of tests includes an unambiguous and relevant test for each claim made in the spec.
Other groups look further, for problems in the specification. They find that the most informative
tests in a well-specified product are often the ones that explore ambiguities in the spec or examine aspects of the product that were not well-specified.
Risk-Based Testing
Imagine a way the program could fail and then design one or more tests to check whether the
program will actually fail that in way.
A “complete” set of risk-based tests would be based on an exhaustive risk list, a list of every way
the program could fail.
A good risk-based test is a powerful representative of the class of tests that address a given risk.
To the extent that the tests tie back to significant failures in the field or well known failures in a
competitor’s product, a risk-based failure will be highly credible and highly motivating. However, many risk-based tests are dismissed as academic (unlikely to occur in real use). Being able to tie the “risk” (potential failure) you test for to a real failure in the field is very valuable, and makes tests more credible.
you have some reason to believe might actually exist in the product. We learn a lot whether the program passes the test or fails it.
Stress Testing
There are a few different definition of stress tests.
- Under one common definition, you hit the program with a peak burst of activity and see it
fail. - IEEE Standard 610.12-1990 defines it as "Testing conducted to evaluate a system or component at or beyond the limits of its specified requirements with the goal of causing the system to fail."
- A third approach involves driving the program to failure in order to watch how the program fails. For example, if the test involves excessive input, you don’t just test near the specified limits. You keep increasing the size or rate of input until either the program finally fails or you become convinced that further increases won’t yield a failure. The fact that the program eventually fails might not be particularly surprising or motivating. The interesting thinking happens when you see the failure and ask what vulnerabilities have been exposed and which of them might be triggered under less extreme circumstances.
Some people dismiss stress test results as not representative of customer use, and therefore not
credible and not motivating. Another problem with stress testing is that a failure may not be useful unless the test provides good troubleshooting information, or the lead tester is extremely
familiar with the application.
A good stress test pushes the limit you want to push, and includes enough diagnostic support to
make it reasonably easy for you to investigate a failure once you see it. Some testers, such as Alberto Savoia (2000), use stress-like tests to expose failures that are hard to see if the system is not running several tasks concurrently. These failures often show up well within the theoretical limits of the system and so they are more credible and more motivating. They are not necessarily easy to troubleshoot.
Regression Testing
Design, develop and save tests with the intent of regularly reusing them, Repeat the tests after
making changes to the program.
This is a good point (consideration of regression testing) to note that this is not an orthogonal list
of test types. You can put domain tests or specification-based tests or any other kinds of tests into your set of regression tests.
So what’s the difference between these and the others? I’ll answer this by example:
Suppose a tester creates a suite of domain tests and saves them for reuse. Is this domain testing
or regression testing?
- I think of it as primarily domain testing if the tester is primarily thinking about partitioning variables and finding good representatives when she creates the tests.
- I think of it as primarily regression testing if the tester is primarily thinking about
building a set of reusable tests.
Regression tests may have been powerful, credible, and so on, when they were first designed. However, after a test has been run and passed many times, it’s not likely that the program will fail it the next time, unless there have been major changes or changes in part of the code directly involved with this test. Thus, most of the time, regression tests carry little information value. A good regression test is designed for reuse. It is adequately documented and maintainable. (For suggestions that improve maintainability of GUI-level tests, see Graham & Fewster, 1999; Kaner, 1998; Pettichord, 2002, and the papers at www.pettichord.com in general).
A good regression test is designed to be likely to fail if changes induce errors in the function(s) or area(s) of the program addressed by the regression test.
User Testing
User testing is done by users. Not by testers pretending to be users. Not by secretaries or executives pretending to be testers pretending to be users. By users. People who will make use of the finished product. User tests might be designed by the users or by testers or by other people (sometimes even by lawyers, who included them as acceptance tests in a contract for custom software). The set of user tests might include boundary tests, stress tests, or any other type of test.
Some user tests are designed in such detail that the user merely executes them and reports
whether the program passed or failed them. This is a good way to design tests if your goal is to
provide a carefully scripted demonstration of the system, without much opportunity for wrong
things to show up as wrong.
If your goal is to discover what problems a user will encounter in real use of the system, your
task is much more difficult. Beta tests are often described as cheap, effective user tests but in
practice they can be quite expensive to administer and they may not yield much information. For some suggestions on beta tests, see Kaner, Falk & Nguyen (1993).
A good user test must allow enough room for cognitive activity by the user while providing
enough structure for the user to report the results effectively (in a way that helps readers
understand and troubleshoot the problem).
Failures found in user testing are typically credible and motivating. Few users run particularly
powerful tests. However, some users run complex scenarios that put the program through its
paces.
Scenario Testing
A scenario is a story that describes a hypothetical situation. In testing, you check how the
program copes with this hypothetical situation. The ideal scenario test is credible, motivating, easy to evaluate, and complex. In practice, many scenarios will be weak in at least one of these attributes, but people will still call them scenarios. The key message of this pattern is that you should keep these four attributes in mind when you design a scenario test and try hard to achieve them.
Groups vary in how often they run a given scenario test.
Some groups create a pool of scenario tests as regression tests.
Others (like me) run a scenario once or a small number of times and then design another scenario rather than sticking with the ones they’ve used before.
Testers often develop scenarios to develop insight into the product. This is especially true early
in testing and again late in testing (when the product has stabilized and the tester is trying to understand advanced uses of the product.)
State-Model-Based Testing
In state-model-based testing, you model the visible behavior of the program as a state machine
and drive the program through the state transitions, checking for conformance to predictions
from the model.
In general, comparisons of software behavior to the model are done using automated tests and so the failures that are found are found easily (easy to evaluate).
In general, state-model-based tests are credible, motivating and easy to troubleshoot. However,
state-based testing often involves simplifications, looking at transitions between operational
modes rather than states, because there are too many states (El-Far 1995).
High-Volume Automated Testing
High-volume automated testing involves massive numbers of tests, comparing the results against one or more partial oracles.
High-volume testing is a diverse grouping. The essence of it is that the structure of this type of
testing is designed by a person, but the individual test cases are developed, executed, and interpreted by the computer, which flags suspected failures for human review. The almost complete automation is what makes it possible to run so many tests.
Exploratory Testing
Exploratory testing is “any testing to the extent that the tester actively controls the design of the tests as those tests are performed and uses information gained while testing to design new and better tests”
An exploratory tester might use any type of test--domain, specification-based, stress, risk-based, any of them. The underlying issue is not what style of testing is best but what is most likely to reveal the information the tester is looking for at the moment.
Exploratory testing is not purely spontaneous. The tester might do extensive research, such as studying competitive products, failure histories of this and analogous products, interviewing programmers and users, reading specifications, and working with the product.