In the memory of a computer, all data is stored in the form of bits (or binary digits). In this section we will investigate the exact manner in which various forms of information are stored as bits.
These days, most computers organise their memory in bytes of 8 bits each, and words of 2 or 4 bytes each. To write out the value of a byte one could write out the bits, but it is more economical to use hexadecimal numbers.
In the hexadecimal number system we extend the usual decimal digits 0,...,9 with A,B,C,D,E,F for 10,...,15. Since four bits are needed for one hexadecimal digit, we can write byte values using 2 hexadecimal digits.
Occasionally the octal number system is used. In it, only the digits 0,...,7 are used, and they can be represented in three bits.
If there is a possibility of confusion between the number systems, the
base is appended as a subscript to the number. Eg, 2A16
= 4210 = 528
= 1010102.
The positive and negative integers ...,-2,-1,0,1,2,... are
easily represented in the binary number system. Like our everyday decimal
numbers, this is a positional system: the rightmost positions indicates
the ones, the position to the left of that the powers of 2, then the powers
of 4, 8, 16, et cetera. For example: 10011 stands for (from
right to left) 1+2+16=19. This way, in n bits
we can store the numbers zero through 2^n-1.
However, we also need negative numbers. We declare the leftmost bit to be the sign bit: if the sign bit is zero the number is postive, if it is one the number is negative. The easiest realisation of negative numbers would now be to let a number and its negative have the same bit pattern up to the sign bit. For three bit numbers this would give:
000 001 010 011 100 101 110 111 0 1 2 3 -0 -1 -2 -3
The big disadvantage of this scheme is first of all that there would be two different numbers zero. Also, note that adding one to the bit pattern of a negative number would actually subtract one from that number.
A more sensible solution would then take a number, and complement each bit to arrive at its negative. This is called one's complement, and again there is the problem of two zeros.
000 001 010 011 100 101 110 111 0 1 2 3 -3 -2 -1 -0
The so-called two's complement system takes the complement of every bit individually, then adds one to the result. Eg,
2 = 10 -> 01 -> 10 ==> -2 = 110 000 001 010 011 100 101 110 111 0 1 2 3 -4 -3 -2 -1
Commonly, integers are stored in 2 bytes, that is, 16 bits. The two's
complement scheme stores the negative numbers in the bit patterns for 2^15
through 2^16-1. As a result, we can represent the numbers -32768=-2^15
through 32767=2^15-1. (Integers can also be stored in 4 bytes;
then the range is between plus and minus 2.14 billion.)
As a consequence, if you add two numbers with a sum of 2^15
or more, you may get a result that is negative. Maybe the computer system
has an option of telling you that a integer overflow error occurred,
but having to test for this inevitable slows your program down. It is easier
for the program to watch for potential occurrences of this problem and prevent
them.
In computer languages, the term `real' does not really correspond to real numbers in mathematics. Reals are more like fractions, binary fractions at that. Historically there are two types of reals: fixed point and floating point numbers. Fixed point numbers are hardly used any more, so you will often encounter the term floating point number as a synonym for real number.
The representation of real numbers is based on so-called scientific notation:
a number is written as a mantissa, that is, a fraction
between zero and one and an exponent, indicating the powers of
10 or 2 with which to multiply the fractional part. Example: 12.37
= 0.123 * 10^2. We could also have written 12.37 = 0.01237
* 10^3, but by convention we use a normalisation: the first
digit after the decimal point is always nonzero.
Both the mantissa and the exponent incorporate a sign: the former to represent negative numbers, the latter to represent fractions less than one in absolute value.
Real numbers, like integers, are subject to overflow errors: there is a largest representable number, and a calculation with a result larger than that will give an undefined result. However, there are a few more types of numerical error, specific to floating point numbers.
First of all, there is representation or round-off error. Just as 1/3 is not expressable as a finite decimal fraction (1/3 = 0.33333... which is a repeating fraction), most fractions are not expressable as finite binary fractions. Truncating a repeating fraction to a finite mantissa length leads to representation error. If round-off is applied here, we also call this round-off error.
Floating point numbers have a converse to overflow where numbers are too big: underflow, where numbers are too close to zero.. The smallest positive number is attained if the exponent takes its most negative value, and the mantissa is the smallest value that satisfies the normalisation condition above. Any calculation with a result less in absolute value than this number is said to cause floating point underflow.
A quantity related to the representation of real numbers is the so-called machine constant (sometimes called machine epsilon). Its formal definition is: the smallest number that, added to one, gives a result greater than one.
To understand this, consider how numbers in floating point format are
added. Suppose we havea mantissa with 2 decimal places. To add 12
(= 0.12 * 10^2) and 3.5 (= 0.35 * 10^1) we normalise
both to the exponent of the larger number: 0.12 * 10^2 and
0.035 * 10^2. Adding these would give 0.155 * 10^2,
but our mantissa can only accomodate 2 places, so the second operand is
first truncated to 0.03 * 10^2. The result of our computation
is then 15, rather than the exact sum 15.5. Even
if we had had some extra space to prevent truncation of the second operand,
we still would have round-off error in representing the final result
as a normalised floating point number.
Now, if our second operand had been even smaller, say 0.35,
its representation to the exponent of the other number would have been zero!
Hence, adding it to the first operand would simply give the first operand.
Hence the definition of the machine epsilon.
As a final instance of numerical error we
will consider loss of precision. This happens when almost equal
numbers are subtracted. For instance, in two significant digits, subtracting
0.23 from 0.24 is 0.01, or after
normalisation, 0.10 * 10^-1 in only one significant digit.
To understand this, consider that the number 0.23 can, because
of round-off or representation error, stand for anything between 0.225
and 0.235. The error is in the third decimal. However, after
the subtraction, the number 0.10 * 10^-1 has an error in the
second decimal.
This example also illustrates that addition of real numbers in a computer
is not associative: in exact arithmetic .20+.035-.20-.03
= .005, but
(.20 + .35*10^-1) - (.20 + .30*10^-1) = .24 - .23 = .10*10^-1
with a relative error of fifty percent, and
(.20 - .20) + (.35*10^-1 - .30*10^-1) = 0 + .5*10^-2
which is exact.
In computing the roots of a second degree polynomial the matter of numerical errors comes up twice. First of all, the test whether the discriminant
disc = b**2-4*a*c
is zero should not be written as
IF (disc.eq.0.0) THEN
Even if the values of a,b,c are such that in exact arithmetic the discriminant
would be zero, this is not likely to be the case in computer arithmetic.
Instead, using the machine epsilon, one should
test whether b**2 and 4*a*c differ relatively
by less than a small number of times the machine constant:
IF (abs(disc).LT.10*eps*b**2) THEN
Next, if b and sqrt(disc) are approximately
equal in absolute value, computation of one of the roots will suffer from
severe loss of precision. The solution is to
compute the stable root, and to use the fact that the product of the roots
is c/a:
IF (b.LT.0.0) THEN
x1 = (-b+sqrt(disc))/(2*a)
ELSE
x1 = (-b-sqrt(disc))/(2*a)
END IF
x2 = c/(a*x1)
There is no obvious way to represent character data in bit patterns. There only solution is to draw up a list of interpretations of the bit patterns. By far the most common character table is called Ascii: American Standard Code for Information Interchange. Occasionally you may still come across Ebcdic: the Extended Binary Coded Decimal Information Code. This is an IBM invention, so IBM mainframes are just about the only place it is still used.
There are in fact two Ascii tables: the 7-bit and 8-bit tables. While the meaning of 7-bit Ascii is pretty much standardised, various manufacturers consider the meaning of the higher Ascii codes fair game for their own purposes. Much software, eg much Internet communication software, is based on 7-bit ascii; in order to send 8-bit data you have to translate it down to 7-bit data.
In fact, the 7-bit ascii codes are conceptually split in the printable and unprintable Ascii codes. The printable codes are the alphabet, the numbers, and various other characters; the unprintable codes can have various meanings. Sending them to your screen may cause your cursor to jump, or parts of the screen to start blinking.
There are only two logical values: true and false. Strictly speaking a single bit of memory would be enough to store that, but on most computers addressing a single bit is expensive, typically requiring extraction from a byte or a word. Therefore, logical data is typically allocated a byte or a word. Various conventions are possible, eg, all zeros representing false and all ones representing true. Alternative, any nonzero bit could imply a true value.