Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Algorithms (2006)

.pdf
Скачиваний:
252
Добавлен:
17.08.2013
Размер:
9.67 Mб
Скачать

Chapter 11

}

}

public void testAddingTheSameValuesDoesntChangeTheSize() { assertEquals(TEST_SIZE, _hashtable.size());

for (int i = 0; i < TEST_SIZE; ++i) { _hashtable.add(String.valueOf(i)); assertEquals(TEST_SIZE, _hashtable.size());

}

}

}

How It Works

The AbstractHashtableTestCase class defines a single variable for holding the hash table instance currently under test, which is initialized in the setUp() method by calling the abstract method createTable. As you will see later, the createTable() method is implemented by subclasses to return an instance of a specific Hashtable implementation. Notice how the setUp() method adds data to the hash table. If you had used the integers directly as numbers (0, 1, 2, and so on), then each would likely hash to its own position in the underlying table, thereby possibly eliminating any chance of collisions occurring (which, while clearly the ideal, doesn’t really reflect reality). Instead, the numbers are converted to strings in order to exercise a more complex hash function — namely, the hashCode() method defined in the String class:

public abstract class AbstractHashtableTestCase extends TestCase { private static final int TEST_SIZE = 1000;

private Hashtable _hashtable;

protected abstract Hashtable createTable(int capacity);

protected void setUp() throws Exception { super.setUp();

_hashtable = createTable(TEST_SIZE);

for (int i = 0; i < TEST_SIZE; ++i) { _hashtable.add(String.valueOf(i));

}

}

...

}

Having added a number of strings to the hash table in setUp(), the first thing you did was check that contains could actually find them again. Values would be rather useless if all we could do is store them without the ability to find them again:

public void testContains() {

for (int i = 0; i < TEST_SIZE; ++i) { assertTrue(_hashtable.contains(String.valueOf(i)));

}

}

274

Hashing

The next test checked that values you know don’t exist aren’t found by mistake:

public void testDoesntContain() {

for (int i = 0; i < TEST_SIZE; ++i) { assertFalse(_hashtable.contains(String.valueOf(i + TEST_SIZE)));

}

}

Finally, you made sure that adding the same value more than once doesn’t result in the hash table growing in size:

public void testAddingTheSameValuesDoesntChangeTheSize() { assertEquals(TEST_SIZE, _hashtable.size());

for (int i = 0; i < TEST_SIZE; ++i) { _hashtable.add(String.valueOf(i)); assertEquals(TEST_SIZE, _hashtable.size());

}

}

Here the size is checked both before and after the addition of duplicate values to ensure that the size remains constant.

Linear Probing

In the next Try It Out, you create a hash table that uses linear probing. The nice thing about linear probing is that the implementation is very simple and therefore relatively easy to understand.

Try It Out

Testing and Implementing a Hash Table That Uses Linear Probing

Start by creating the test class:

package com.wrox.algorithms.hashing;

public class LinearProbingHashtableTest extends AbstractHashtableTestCase { protected Hashtable createTable(int capacity) {

return new LinearProbingHashtable(capacity);

}

}

Follow with the hash table implementation itself:

package com.wrox.algorithms.hashing;

public class LinearProbingHashtable implements Hashtable { private Object[] _values;

private int _size;

public LinearProbingHashtable(int initialCapacity) {

assert initialCapacity > 0 : “initialCapacity can’t be < 1”; _values = new Object[initialCapacity];

}

275

Chapter 11

public void add(Object value) { ensureCapacityForOneMore();

int index = indexFor(value);

if (_values[index] == null) { _values[index] = value; ++_size;

}

}

public boolean contains(Object value) { return indexOf(value) != -1;

}

public int size() { return _size;

}

private int indexFor(Object value) { int start = startingIndexFor(value);

int index = indexFor(value, start, _values.length); if (index == -1) {

index = indexFor(value, 0, start); assert index == -1 : “no free slots”;

}

return index;

}

private int indexFor(Object value, int start, int end) { assert value != null : “value can’t be null”;

for (int i = start; i < end; ++i) {

if (_values[i] == null || value.equals(_values[i])) { return i;

}

}

return -1;

}

private int indexOf(Object value) {

int start = startingIndexFor(value);

int index = indexOf(value, start, _values.length); if (index == -1) {

index = indexOf(value, 0, start);

}

return index;

}

276

Hashing

private int indexOf(Object value, int start, int end) { assert value != null : “value can’t be null”;

for (int i = start; i < end; ++i) { if (value.equals(_values[i])) {

return i;

}

}

return -1;

}

private int startingIndexFor(Object value) { assert value != null : “value can’t be null”;

return Math.abs(value.hashCode() % _values.length);

}

private void ensureCapacityForOneMore() { if (size() == _values.length) {

resize();

}

}

private void resize() { LinearProbingHashtable copy =

new LinearProbingHashtable(_values.length * 2);

for (int i = 0; i < _values.length; ++i) { if (_values[i] != null) {

copy.add(_values[i]);

}

}

_values = copy._values;

}

}

How It Works

All the actual test cases have been defined in AbstractHashtableTestCase, so all you needed to do was extend this and implement the createTable() method to return an instance of the yet to be defined LinearProbingHashtable. When this class is executed, all the test cases from the base class will be included and run against a hash table that uses linear probing:

package com.wrox.algorithms.hashing;

public class LinearProbingHashtableTest extends AbstractHashtableTestCase { protected Hashtable createTable(int capacity) {

return new LinearProbingHashtable(capacity);

277

Chapter 11

}

}

As for the implementation code, linear probing is very straightforward, which is reflected in the class definition.

The LinearProbingHashtable class has an array for holding values, and in the single constructor you can specify the maximum number of values that can initially be stored — the capacity:

package com.wrox.algorithms.hashing;

public class LinearProbingHashtable implements Hashtable { private Object[] _values;

public LinearProbingHashtable(int initialCapacity) {

assert initialCapacity > 0 : “initialCapacity can’t be < 1”; _values = new Object[initialCapacity];

}

...

}

Speaking of capacity, the hash table needs to resize at various times to accommodate more values. For this you have the method ensureCapacityForOneMore(), which, as you may well imagine, ensures that the hash table can hold at least one more value. If not, then a resize is required:

private void ensureCapacityForOneMore() { if (size() == _values.length) {

resize();

}

}

The resize() method uses a neat but effective technique for increasing the number of available slots: A temporary table is created with twice the capacity. Into this is added all the values, and the array from the new table is used to replace the existing array:

private void resize() { LinearProbingHashtable copy =

new LinearProbingHashtable(_values.length * 2);

for (int i = 0; i < _values.length; ++i) { if (_values[i] != null) {

copy.add(_values[i]);

}

}

_values = copy._values;

}

The startingIndexFor() method is central to the operation of the hash table. This method takes a value and returns the index into the array at which it would be stored. It uses the hash code as defined by the value itself — all objects in Java define a hashCode() method — and then takes the remainder

278

Hashing

after dividing by the capacity of the table. This ensures you end up with an index that falls within the bounds of the array of values:

private int startingIndexFor(Object value) { assert value != null : “value can’t be null”;

return Math.abs(value.hashCode() % _values.length);

}

The two indexFor() methods work together to find a slot into which a new value can be placed.

The first method searches from the “natural” starting point until the end of the array. If a slot can’t be found there, then a further search is made from the start of the array up to the initial starting point:

private int indexFor(Object value) { int start = startingIndexFor(value);

int index = indexFor(value, start, _values.length); if (index == -1) {

index = indexFor(value, 0, start); assert index == -1 : “no free slots”;

}

return index;

}

The second method performs the actual search within the bounds specified by the first method. Look closely at the actual check that is made. Notice that a slot is chosen not only if it is empty (_values[i]== null), but also if it already contains the value (value.equals(_values[i])). There is little point in allowing the same value to be stored twice, as the second occurrence will likely never be found:

private int indexFor(Object value, int start, int end) { assert value != null : “value can’t be null”;

for (int i = start; i < end; ++i) {

if (_values[i] == null || value.equals(_values[i])) { return i;

}

}

return -1;

}

Implementing the add() method is made very simple: It first ensures that there is room for another value before storing it at the appropriate position:

public void add(Object value) { ensureCapacityForOneMore(); _values[indexFor(value)] = value;

}

The two indexOf() methods work together with the two indexFor() methods to find a slot into which a new value can be placed.

279

Chapter 11

The first method coordinates the search, beginning with the position calculated by startIndexFor(), and, if necessary, another search is attempted in the lower portion of the array. If a matching value is found, then its position is returned; otherwise, a value of -1 is used to indicate that no such value exists:

private int indexOf(Object value) {

int start = startingIndexFor(value);

int index = indexOf(value, start, _values.length); if (index == -1) {

index = indexOf(value, 0, start);

}

return index;

}

The second method performs a brute-force search through the array — constrained by the specified start and end positions — in search of the value:

private int indexOf(Object value, int start, int end) { assert value != null : “value can’t be null”;

for (int i = start; i < end; ++i) { if (value.equals(_values[i])) {

return i;

}

}

return -1;

}

Once the index of a value can be found, implementing contains() is a one-liner:

public boolean contains(Object value) { return indexOf(value) != -1;

}

The only other method required by the Hashtable interface is size(), which simply iterates over the array, incrementing the size each time a value is found. (As an exercise, you could try tracking the size instead of calculating it.)

public int size() { int size = 0;

for (int i = 0; i < _values.length; ++i) { if (_values[i] != null) {

++size;

}

}

return size;

}

280

Hashing

Bucketing

In the next Try It Out section, you develop a hash table that uses buckets to store values. As always, you start with the tests before moving on to the implementation.

Try It Out

Testing and Implementing a Hash Table That Uses Bucketing

Start by creating a test class as follows:

package com.wrox.algorithms.hashing;

public class BucketingHashtableTest extends AbstractHashtableTestCase { protected Hashtable createTable(int capacity) {

return new BucketingHashtable(capacity, 0.75f);

}

}

Now add the implementation class:

package com.wrox.algorithms.hashing;

import com.wrox.algorithms.iteration.Iterator; import com.wrox.algorithms.lists.LinkedList; import com.wrox.algorithms.lists.List;

public class BucketingHashtable implements Hashtable { private final float _loadFactor;

private List[] _buckets; private int _size;

public BucketingHashtable(int initialCapacity, float loadFactor) { assert initialCapacity > 0 : “initialCapacity can’t be < 1”; assert loadFactor > 0 : “loadFactor can’t be <= 0”;

_loadFactor = loadFactor;

_buckets = new List[initialCapacity];

}

public void add(Object value) { List bucket = bucketFor(value);

if (!bucket.contains(value)) { bucket.add(value); ++_size;

maintainLoad();

}

}

public boolean contains(Object value) {

List bucket = _buckets[bucketIndexFor(value)]; return bucket != null && bucket.contains(value);

}

281

Chapter 11

public int size() { return _size;

}

private List bucketFor(Object value) {

int bucketIndex = bucketIndexFor(value);

List bucket = _buckets[bucketIndex]; if (bucket == null) {

bucket = new LinkedList(); _buckets[bucketIndex] = bucket;

}

return bucket;

}

private int bucketIndexFor(Object value) { assert value != null : “value can’t be null”;

return Math.abs(value.hashCode() % _buckets.length);

}

private void maintainLoad() { if (loadFactorExceeded()) {

resize();

}

}

private boolean loadFactorExceeded() {

return size() > _buckets.length * _loadFactor;

}

private void resize() { BucketingHashtable copy =

new BucketingHashtable(_buckets.length * 2, _loadFactor);

for (int i = 0; i < _buckets.length; ++i) { if (_buckets[i] != null) {

copy.addAll(_buckets[i].iterator());

}

}

_buckets = copy._buckets;

}

private void addAll(Iterator values) {

assert values != null : “values can’t be null”;

for (values.first(); !values.isDone(); values.next()) { add(values.current());

}

}

}

282

Hashing

How It Works

Once again, you re-used the tests defined in AbstractHashtableTestCase, this time implementing createTable() to return an instance of a BucketingHashtable. Notice the extra constructor parameter — 0.75f. This is the load factor, which in this case specifies that the hash table should increase in size anytime the number of values stored reaches 75% of the number of available buckets:

package com.wrox.algorithms.hashing;

public class BucketingHashtableTest extends AbstractHashtableTestCase { protected Hashtable createTable(int capacity) {

return new BucketingHashtable(capacity, 0.75f);

}

}

Bucketing is a little more complex than linear probing, so the implementation class requires a little more explanation.

The BucketingHashtable class records the load factor, for later use, and an array of buckets. You may have noticed in the “Understanding Hashing” section that the buckets looked a lot like linked lists, and that’s exactly what you’ve used for your buckets here. The number of buckets to use — initialCapacity — is specified at construction time along with the desired load factor:

package com.wrox.algorithms.hashing;

import com.wrox.algorithms.iteration.Iterator; import com.wrox.algorithms.lists.LinkedList; import com.wrox.algorithms.lists.List;

public class BucketingHashtable implements Hashtable { private final float _loadFactor;

private List[] _buckets;

public BucketingHashtable(int initialCapacity, float loadFactor) { assert initialCapacity > 0 : “initialCapacity can’t be < 1”; assert loadFactor > 0 : “loadFactor can’t be <= 0”;

_loadFactor = loadFactor;

_buckets = new List[initialCapacity];

}

...

}

The method maintainLoad() simply checks the current load. If the desired load has been exceeded, then a resize is necessary to spread the values over a larger number of buckets. The resize() method works in a similar way to the method of the same name in LinearProbingHashtable: A new hash table is created into which all the values are added, and then the new bucket array is used to replace the existing one. Each time a resize is performed, the capacity doubles. You could choose any value for this, but it always comes down to a trade-off between space and time. The smaller the increment, the more often a resize will be required; the larger the increment, the more wasted space.

283