Searching
Common Searching Algorithms
- Linear Search
- Binary Search
- Hash Coding
- Lots more
- Searching requires a key field (e.g., name, ID, code) which
is related to the target item.
- When the key field of a target item is found, a pointer to
the target item is returned. The pointer may be an address, an
index into a vector or array, or some other indication of where
to find the target.
- If a matching key field isn't found, the user is informed.
- Important factor: Speed!
How long does it take to find the target?
What is the response time?
- Response time may depend on:
- size of the list (number of records and record size)
- data structure used (vector, linked list, binary tree, etc.)
- data organization (key ordered, random, etc.)
- search strategy (linear, binary, other)
- location of the list
- external (on a disk or other device)
- internal (in memory)
- and more
- Search time can be measured with big-O notation:
O(n)   linear time
O(log2n)   logarithmic time
O(1)   constant time
What does this mean? Assume you have n items in list.
linear time: It takes about n probes to find target.
Double the size of the list and you double the number of probes.
logarithmic time: It takes about log2(n) probes.
Double the list size and increase the number of probes by one.
Double it again and increase the probes by only one.
constant time: The search time is independent of the size of
the list. Double the list size and the number of probes
remains the same.
What if we cut list in half?
- Linear Search - Array based list
Scan the list until:
- last record is searched, or
- target record is found
Return:
- index of found target or other useful information (e.g., find a name and
return the phone number), or
- flag (e.g., -1) indicating target record not found
Example  
//----------------------------linear search-----------------------
//
// Search A[N] for the first instance of a target.
//
// Input: an initialized array of N integers
// Return value: array index of the first instance of target,
// or -1 if target is not in the array
//----------------------------------------------------------------
int lsearch(int A[], int target) {
int index = 0;
while(index < N && A[index] != target)
index++;
if(index < N) return index;
else return -1;
}
-OR-
int lsearch(int A[], int target) {
int index;
for(index = 0; index < N && A[index] != target; index++);
if(index < N) return index;
else return -1;
}
If A holds the following values, what would be returned if target is 0?
+---+---+---+---+---+---+---+---+---+
A: |13 | 4 | 6 | -8| 0 | 2 | 0 | 9 | 6 |
+---+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7 8
How would you change the code to return the numbered position instead of the array index? That is, return 5 instead of 4 in the example above.
Example  
Search for the last occurrence of target.
int lsearch3(int A[], int target) {
int index = N-1;
while(index >= 0 && A[index] != target)
index--;
if(index >= 0) return index;
else return -1;
}
-OR-
int lsearch4(int A[], int target) {
int index;
for(index = N-1; index >= 0 && A[index] != target; index--);
if(index >= 0) return index;
else return -1;
}
Note:
- We usually search for a key within a large record.
- Linear search is easy to implement, but is slow.
How many probes are needed on a randomly ordered list of size n?
worst case___________________
best case____________________
average case_________________
Binary Search     O(log2(n))
Binary search is fast, but the list must be ordered.
Algorithm
- Probe middle of list
- If target equals list[mid], FOUND.
- If target < list[mid], discard 1/2 of list between list[mid]
and list[last].
- If target > list[mid], discard 1/2 of list between list[first] and
list[mid].
- Continue searching the shortened list until either the target
is found, or there are no elements to probe.
Example
+---+---+---+---+---+---+---+---+---+---+---+---+
| A | B | C | D | E | F | G | H | I | J | K | L |
+---+---+---+---+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7 8 9 10 11
Search for F
low is 0, high is 11.
mid is (high + low) / 2 or (11 + 0) / 2 or 5.
list[5] is F. // FOUND!!
Search for C
mid is (high + low) / 2 or 5.
list[5] != C.
Since C < list[5], high is mid-1 or 4.
Now search list[0] ... list[4].
mid is (4 + 0) / 2 or 2.
list[2] is C. // FOUND!!
Since we discard 1/2 of the list with each probe, doubling the size
of the list adds only one probe to search time.
But...
Summary
- Searching Techniques
- Linear (sequential)
- easy to implement
- slow
- best time: one probe
- average time: n/2 probes
- worst time: n probes
- Binary
- more complicated
- list must be ordered
- fast
- best time: one probe
- average/worst time: log2n
- Hash Coding
- far more complicated
- works best when list is in random order
- very fast
- best time: O(1) or constant time
- average/worst time: O(1) or constant time
- Sorting Algorithms
Selection Sort     O(n^2)
Algorithm
- Find smallest item in list.
- Exchange it with "first" item in list.
- Repeat, beginning with the rest of the list.
original:
+---+---+---+---+---+---+---+---+
| P | M | G | D | B | E | R | K |
+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7
after first pass:
+---+---+---+---+---+---+---+---+
| B | M | G | D | P | E | R | K |
+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7
after second pass:
+---+---+---+---+---+---+---+---+
| B | D | G | M | P | E | R | K |
+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7
Quicksort     O(nlog2(n))
Quicksort was invented and named by C. A. R. Hoare and is one of the best
general-purpose sorting algorithms. It is built on the ideal of partitions,
and it uses divide-and-conquer strategy. The basic algorithm for
a one-dimensional array is as follows.
- Partition Step: Select an element to place in its final
position in the array. That is, all the elements to the left will be
less than selected element, and all the elements to the right will be
greater than the chosen element. We will select the first element
in the array and put it in its final place in the array.
Then we have one element in its proper location and two unsorted
subarrays.
- Recursive Step: Repeat the process on each unsorted subarray.
Each time the partition step is repeated, another element is placed in its
final position in the sorted array, and two additional subarrays are created.
When a subarray eventually contains only one element, that subarray is sorted
and the element is in its final location.
Let's consider the following array of integers.
37 2 6 4 89 8 10 12 68 45
- Start from the rightmost element in the array and compare each element
with the chosen element, 37 here. When an element less than 37 is
found (12), swap it with the 37. Now we have
12 2 6 4 89 8 10 37 68 45
- Start from the left of the array beginning with the element after
the 12, and compare each element
with 37 until an element greater than 37 is found (89). Then swap 37
with 89. Now we have
12 2 6 4 37 8 10 89 68 45
- Start from the right but begin with the element before 89. When you
find an element less than 37 (10), swap the two elements. Now we have
12 2 6 4 10 8 37 89 68 45
- Start from the left but begin with the element after 10. When you
find an element greater than 37, swap the two elements. Since there
are no elements greater than 37, we compare 37 with itself and know that
37 is in its final place in the array.
Now we have two unsorted subarrays where the left one is left than 37 and
the right one is greater than 37.
12 2 6 4 10 8 37 89 68 45
This is one pass.
The sort continues with both subarrays being partitioned in the same manner.
The recursive quicksort function is elegant and easy to write.
Here is one possibility.
quicksort(int array[], int left, int right)
{
int index;
if(right > left) {
index = partition(array, left, right);
quicksort(array, left, index-1);
quicksort(array, index+1, right);
}
}
|
The partition function is a bit more laborious, but you should be
able to write it as a homework lab assignment.
When the original array elements are in random order, quicksort works very
well. What happens when the original array elements are sorted? What is
the order of the algorithm?