General | Statistics: Model | Stimulus | Arc | Reliabilities
The JUMBL's test analysis engine produces several analysis results; this page attempts to explain what those results signify, and how they may be used. Please note that which statistics are important and useful to you depends very much on what problem you are trying to solve. This page is just an overview, nothing more. For an example test analysis report, click here.
The JUMBL may have more than one analytical engine. Different analytical engines will produce different results. Some analytical engines may use approximations to obtain their results, and these will be slightly different from the ``analytical'' solutions of other engines. At the time of writing, the JUMBL contains only one analytical engine:
There are four kinds of statistics presented in the analysis report:
Alternately, you may think of the variance in this way. 95% of the time, the observed value will be within SQRT[Var(X) / 0.95] of the expectation. (SQRT[x] is the square root of x.)
Often the square root of the variance is reported; this is called the standard deviation. The Chebyshev Inequality does not assume any distribution for X. If the distribution of a random variable is known, one can often do much better than the Chebyshev Inequality. For example, for a normally-distributed random variable, approximately 68% of the time the observed value is within one standard deviation of the mean, and 95% of the time the observed value is within two standard deviations of the mean.
The Miller model is based on Bayesian statistics and allows exploitation of prior information about system reliability (in this case, ``prior'' refers to information, either conjectural, analytical, or empirical, about the reliability of the system). This prior information is given in the form of two priors: a prior number of failures and a prior number of successes. For the case of no prior information about software reliability, both these priors are set to one.
Letting the prior failures and successes be denoted a and b, respectively, and letting the observed number of successful and unsuccessful tests in the current experiment be denoted s and f, respectively, one may compute the expected reliability using the following equation:
E[R] = (s+b) / (f+s+a+b)Since this is an expectation, there is an associated variance. The variance may be computed as:
Var[R] = E[R] * (f+a) / (f+s+a+b) / (f+s+a+b+1)Using the variance, one may compute a confidence interval around the expectation. This may be done, for instance, with the Chebyshev Inequality.
Generation and execution counts may differ due to failures. For example, a failure may prevent running some portion of a test case. The remaining items in the test case were generated (and appear in generation counts) but not executed. Both counts are computed and presented in the report.
Certain results, such as reliabilities and the Kullback, can be computed in two ways: using the testing record of execution and failure counts, or using the generation counts.
Prior to running any tests, one may wish to compute these results using the generation counts only, to get a sense of the best one could do (the ``optimum'' values for reliability and Kullback). Once testing has been completed, one can then look at the testing record and compute the experiential values.
Both statistics are reported, with the optimum values clearly labelled. These represent what would be observed if no failure occurred in testing.
This collection of constraints may be viewed as probabilities on arcs, or as costs associated with arcs, etc. The special default distribution has no key. The following example shows two TML constraints; the first is part of the default distribution, while the second is part of the ``special'' distribution.
($ 1/2 $)The average, per-step uncertainty in bits. This can be interpreted as the minimum average number of yes / no questions which must be asked to determine which path is taken. For example, if there are eight choices then one can determine which choice is made by asking three yes / no questions (splitting the number of choices in half each time). For n equally likely choices, one must ask CEIL(log2n) questions to determine which choice is made. (CEIL(n) is the smallest integer greater than or equal to n.)
When choices are not equally likely, the uncertainty is less than for the equally likely case (since some choices must be more likely than others). The uncertainty of a distribution P = {p1,p2,p3,...,pn} is given by the following famous equation derived by C. Shannon: H{P} = -SUM(pi log2 pi)
(SUM(xi) is the sum of all the xi values; SUM(xi) = x1+x2+...+xn.)
There are many different probability distributions on a Markov chain and thus many entropies. One can compute the entropy for each node of the model (using the probability distribution on the outgoing arcs), then take the average of these values, weighted by the long run occupancy for the node. This weighted average is called the source entropy of the model, and it is useful as a measure of complexity of the model.
There are two choices for the outgoing arc probability distribution. Traditionally the next node probability distribution (sometimes called the transition probability distribution) has been used--this gives the node source entropy, or the average per-step uncertainty in choosing the next node. More recently, attention has been focused on the stimuli which label the outgoing arcs. The probability distribution on the outgoing stimuli can be used instead--this gives the arc source entropy, or the average per-step uncertainty in choosing the next node and stimulus pair.
The arc source entropy is bounded below by the node source entropy. The two are equal if and only if each stimulus labels an arc to a different node. Both entropies are reported in the model analysis report.
The KL number is nonnegative, and zero only when the two distributions coincide. Thus as the statistics of the generated sample of tests approach the statistics of all tests, the KL number approaches zero. The number may therefore be used as a stopping criteria; that is, a criterion to help one determine when testing is sufficient.
The KL is not a metric; it does not satisfy the triangle inequality and is not symmetric. However, it is a convex function of the ``true'' distribution.
The KL may also be thought of as the relative entropy of one distribution p with respect to another distribution q. One can think of the KL (using log base two) as the average number of bits that are wasted by encoding events from distribution p with a code based on the distribution q.
The Kullback-Leibler (KL) number is not defined if a model's arcs have not been covered. It is possible, however, to introduce small perturbations in the visitation counts to allow computation of a number similar to the KL. This ``permuted'' Kullback number is asymptotically close to the true KL when the latter is defined. One may think of this permuted Kullback as an approximation to the KL, though this is not strictly true, since the permuted Kullback is defined over a larger domain that the KL. For this reason, only the permuted KL is reported.The Kullback-Leibler (KL) number (see Kullback) is defined as the difference between an approximating entropy and the source entropy (see Arc Source Entropy), with the latter value always smaller. As testing experience approaches the distribution given by the usage model, the KL decreases toward zero.
If the source entropy is 1 bit, then a KL of 0.1 bit is small. If the source entropy is 0.1 bits, then a KL of 0.1 bit is large. Thus the KL by itself may be misleading unless it is combined with the source entropy. Thus the relative Kullback is computed and presented. This is the percentage difference between the testing experience entropy and the true source entropy, and is given by:
100(KL / H)
where H is the arc source entropy. Again, as testing experience approaches the distribution given by the usage model, this number approaches zero.