More Hashing and Maps

More Hashing and Maps
> Click or hit Control-Enter to run the code above

Imagine I told you that there was a function with the following properties:

Determinism: it can convert an arbitrary amount of data into a single limited-size value. If we repeat the computation on the same data, we get the same value.
Uniformity: over many inputs, each output value is equally likely.
Efficiency: it is efficient to compute.

int[] numbers = new int[] { 5, 6, 7 };
System.out.println(numbers[0]);
numbers[1] = 8;

Arrays map an index (0, 1, 2, array.length to a value).
The value can be anything, but the indices had to be be integers.
No longer!

A Java Map allows us to use any object like an array index.

import java.util.Map;
import java.util.HashMap;

Map stringValues = new HashMap();
stringLengths.put("test", 5);
System.out.println(stringLengths.get("test")); // Prints 5
stringLengths.put("test", 7);
System.out.println(stringLengths.get("test")); // Prints 7

Maps are one of the two data structures you meet in heaven. (Along with lists.) Every language has them:

Python calls them dictionaries: dict["key"] = "value"
JavaScript calls them anonymous objects dict["key"] = "value"
C++ calls them maps: dict.insert(std::make_pair("key", "value"));
Go calls them maps: dict["key"] = "value"
Even Perl had them: $dict{'key'} = "value"!
Sometimes we call them key-value stores, since each key maps to a single value

> Click or hit Control-Enter to run the code above

Remember the functions defined by Object that are shared by every Java object?

toString: to display an object in human-readable format
equals: test whether an object equals another
hashCode: return a hash value for that object, which is used internally by Map implementations and other data structures

Like toString and equals, IntelliJ will offer to generate hashCode for you:

public class Player {
  // Other things omitted
  private int id;

  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + id;
    return result;
  }
}

Let’s say that I want to process a large corpus of text and then be able to quickly answer queries about how many times particular words appear.

> Click or hit Control-Enter to run Example.main above

In Java, certain data structures (Maps, ArrayLists, etc.) only operate on objects. (Because Object provides hashCode.)

But then how do we insert primitive types (ints, longs, etc.) into them?

Integer imAnObject = new Integer(5);
imAnObject = (Integer) 5; // You can cast primitives to object wrapper
int imNotAnObject = (int) imAnObject; // And back

Primitive Type Object Wrapper

Primitive Type	Object Wrapper
`int`	`Integer`
`long`	`Long`
`boolean`	`Boolean`
`char`	`Character`
`double`	`Double`
`byte`	`Byte`
`short`	`Short`
`float`	`Float`

int

Integer

long

Long

boolean

Boolean

char

Character

double

Double

byte

Byte

short

Short

float

Float

So how do we implement a Map?

Use a hashCode to retrieve a hash code for each object.
Use that value—or a smaller part of it—as an index into an array.
But what about collisions?

$hash table$

public class Item {
  public Object key;
  public Object value;
  public Item next;
  Item(Object setKey, Object setValue, Item setNext) {
    this.key = setKey;
    this.value = setValue;
    this.next = setNext;
  }
}
public class HashMap {
  private Item[] items = new Item[16];
  public int itemCount = 0;
  private int hash(Object key) {
    int hashValue = key.hashCode() % 16;
    if (hashValue < 0) {
      hashValue += 16;
    }
    return hashValue;
  }
  public void put(Object key, Object value) {
    int bucket = hash(key);
    Item current = items[bucket];
    for (; current != null; current = current.next) {
      if (current.key.equals(key)) {
        current.value = value;
        return;
      }
    }
    Item newItem = new Item(key, value, items[bucket]);
    items[bucket] = newItem;
    itemCount++;
    return;
  }
}
public class Example {
  public static void main(String[] unused) {
    HashMap ourHashMap = new HashMap();
    ourHashMap.put("test", "me");
    System.out.println(ourHashMap.itemCount);
    ourHashMap.put("test", "another");
    System.out.println(ourHashMap.itemCount);
  }
}

> Click or hit Control-Enter to run Example.main above

Let’s consider the performance of our simple HashMap in two cases. First, if the array is very small relative to the number of items:

put: O(n) with n being the number of items
get: O(n) with n being the number of items
At this point the HashMap is acting like a linked list.

Let’s consider the performance of our simple HashMap in two cases. Second, if the array is very large relative to the number of items:

put: O(1)
get: O(1)
At this point the HashMap is acting like an array.
What’s the problem? It requires a lot of space.

In reality we want our HashMap to blend the good features of an array and a linked list.

Usually implementations will enlarge the array part of a HashMap once it gets filled past a certain point (called the load factor).

This is cool stuff!

Hash functions already provide:

Determinism: it can convert an arbitrary amount of data into a single limited-size value. If we repeat the computation on the same data, we get the same value.
Uniformity: over many inputs, each output value is equally likely.
Efficiency: it is efficient to compute.

But what if there were hash functions with the following new properties:

Given the hash, it is infeasible to determine the original input
A small change to the input produces a large change in the output
The function is difficult to compute, not easy

A hash function that satisfies these properties is known as a cryptographic hash function, largely because they are ubiquitous in modern cryptography.

I need to be able to check your password, but I don’t want to save it. Is that possible?

Yes!
Save the cryptographic hash of your password, not the password itself.
When you submit your password, I hash it and compare it with the saved hash.
If someone steals my database, they can’t recover the original passwords.
- Given the hashes, you can’t recover the original passwords
- Hash values reveal nothing about how close you are to the actual password
- Hashing inputs to test them is expensive

$ $ cat example.txt
The quick brown fox jumps over the lazy dog
$ md5 < example.txt
37c4b87edffc5d198ff5a185cee7ee09
$ cat example.txt
The quick brown fox jumps over the lazy doG
$ md5 < example.txt
75559fc9857fe9bebf65f97760e3f67d

Heard of blockchain?

Blocks are linked through cryptographic hashes. And the blockchain is secured through an old idea called Hashcash.

Given a hash value of a given size, I can estimate how much work you’ll have to do to guess the input that produced that value
So I can force you to do that work and verify that you did it easily by hashing the value that you gave me

Alice: "Here’s a challenge for you Bob, find an input that produces hash 39c3aa4015e7964914c311915316a2f78157c946.
Bob: "Geez, that’s hard. Give me a few minutes… OK, got it."
Alice: "Wow, you’re right. I computed the hash and it’s 39c3aa4015e7964914c311915316a2f78157c946 that must have been hard."

Hashcash was intended to help fight spam. Now it’s the reason that Bitcoin mining is ruining the planet.

In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the function’s output for a random input.

The existence of such one-way functions is still an open conjecture. In fact, their existence would prove that the complexity classes P and NP are not equal, thus resolving the foremost unsolved question of theoretical computer science.

New exam practice problems are available.
MP7 (the final project) is out. Please get started!
The anonymous feedback form remains available on the course website. Use it to give us feedback!
My office hours continue today at 11AM in the lounge outside of Siebel 0226.

Let’s Imagine…

Remember Arrays?

Java Maps

A Map By Any Other Name

And Why Does This Work in Java?

Example `hashCode`

Map Usage Example

(An Annoying Aside on Java Primitive Object Wrappers)

Primitive Object Wrappers

(Exciting Stuff…)

Brief `Map` Implementation

`Map` As Array + Linked List

`HashMap` Performance

`HashMap` Performance

Realistic `HashMap` Performance

Looking forward to CS 225 yet?

And cooler still…

Cryptographic Hash Functions

A Simple Example

Small Input Change, Big Output Change

A Modern Example

Hashcash Example

Unintended Consequences

And, Beautiful Theory

Questions About Hashing?

Announcements

Let’s Imagine…​

Remember Arrays?

Java Maps

A Map By Any Other Name

And Why Does This Work in Java?

Example hashCode

Map Usage Example

(An Annoying Aside on Java Primitive Object Wrappers)

Primitive Object Wrappers

(Exciting Stuff…​)

Brief Map Implementation

Map As Array + Linked List

HashMap Performance

HashMap Performance

Realistic HashMap Performance

Looking forward to CS 225 yet?

And cooler still…​

Cryptographic Hash Functions

A Simple Example

Small Input Change, Big Output Change

A Modern Example

Hashcash Example

Unintended Consequences

And, Beautiful Theory

Questions About Hashing?

Announcements

Let’s Imagine…

Example `hashCode`

(Exciting Stuff…)

Brief `Map` Implementation

`Map` As Array + Linked List

`HashMap` Performance

`HashMap` Performance

Realistic `HashMap` Performance

And cooler still…