Design HashSet

Updated on 22 May, 2025
Design HashSet header image

Problem Statement

The task is to design a custom HashSet class named MyHashSet, simulating the basic functionality of a typical HashSet in programming. The operations that this class needs to support are adding an element, checking the existence of an element, and removing an element. This implementation should avoid using any built-in hash table libraries. The steps to follow for implementing each functionality are:

  • void add(key): This method should insert the integer key into the HashSet.
  • bool contains(key): This method should return a boolean indicating whether the integer key is present in the HashSet.
  • void remove(key): This method should remove the integer key from the HashSet if it exists; if not, there should be no change and no error.

The HashSet should handle a reasonably high number of operations efficiently, given the constraints.

Examples

Example 1

Input:

["MyHashSet", "add", "add", "contains", "contains", "add", "contains", "remove", "contains"]
[[], [1], [2], [1], [3], [2], [2], [2], [2]]

Output:

[null, null, null, true, false, null, true, null, false]

Explanation:

MyHashSet myHashSet = new MyHashSet();
myHashSet.add(1); // set = [1]
myHashSet.add(2); // set = [1, 2]
myHashSet.contains(1); // return True
myHashSet.contains(3); // return False, (not found)
myHashSet.add(2); // set = [1, 2]
myHashSet.contains(2); // return True
myHashSet.remove(2); // set = [1]
myHashSet.contains(2); // return False, (already removed)

Constraints

  • 0 <= key <= 106
  • At most 104 calls will be made to add, remove, and contains.

Approach and Intuition

Given the constraints and methods defined in the problem statement, an effective approach to implementing the MyHashSet class involves the following key points:

  1. Data Structure Choice:

    • Considering the absence of built-in hash libraries, a simplistic primary data structure could be an array (list in Python, vector in C++ or ArrayList in Java). However, for direct access considering the range of key values 0 to 10^6, a boolean array would be appropriate where the index of the array directly represents the key.
  2. Initialization:

    • Initialize a boolean array large enough to accommodate the maximum key value plus one (to handle the zero index). Initially, all values in this array should be set to False indicating none of the keys are present in the set initially.
  3. Implementing add(key):

    • For adding a key, simply set the boolean value at the index representing the key to True.
  4. Implementing contains(key):

    • To check if a key exists, return the boolean value at the index representing the key. If it's True, that means the key exists.
  5. Implementing remove(key):

    • To remove a key, set the boolean value at the index representing the key to False.

In this approach, each operation (add, check, or remove) is achieved in constant time O(1), which is efficient given the maximum operations cap of 10^4. This efficiency is crucial as it ensures that the solution can handle the upper limits of operation calls effectively within reasonable time bounds.

Solutions

  • Java
java
class CustomHashSet {
  private Bucket[] buckets;
  private int range;

  public CustomHashSet() {
    this.range = 769;
    this.buckets = new Bucket[this.range];
    for (int i = 0; i < this.range; i++)
      this.buckets[i] = new Bucket();
  }

  protected int hashFunction(int key) {
    return (key % this.range);
  }

  public void add(int key) {
    int idx = this.hashFunction(key);
    this.buckets[idx].addKey(key);
  }

  public void remove(int key) {
    int idx = this.hashFunction(key);
    this.buckets[idx].removeKey(key);
  }

  public boolean contains(int key) {
    int idx = this.hashFunction(key);
    return this.buckets[idx].containsKey(key);
  }
}

class Bucket {
  private BinarySearchTree bst;

  public Bucket() {
    bst = new BinarySearchTree();
  }

  public void addKey(int key) {
    this.bst.root = this.bst.insert(this.bst.root, key);
  }

  public void removeKey(int key) {
    this.bst.root = this.bst.delete(this.bst.root, key);
  }

  public boolean containsKey(int key) {
    TreeNode node = this.bst.search(this.bst.root, key);
    return (node != null);
  }
}

class TreeNode {
  int value;
  TreeNode left;
  TreeNode right;

  TreeNode(int x) {
    value = x;
  }
}

class BinarySearchTree {
  TreeNode root = null;

  public TreeNode search(TreeNode root, int value) {
    if (root == null || value == root.value)
      return root;
    return value < root.value ? search(root.left, value) : search(root.right, value);
  }

  public TreeNode insert(TreeNode root, int value) {
    if (root == null)
      return new TreeNode(value);

    if (value > root.value)
      root.right = insert(root.right, value);
    else if (value == root.value)
      return root;
    else
      root.left = insert(root.left, value);
    return root;
  }

  public TreeNode delete(TreeNode root, int key) {
    if (root == null)
      return null;

    if (key > root.value)
      root.right = delete(root.right, key);
    else if (key < root.value)
      root.left = delete(root.left, key);
    else {
      if (root.left == null && root.right == null)
        root = null;
      else if (root.right != null) {
        root.value = successor(root);
        root.right = delete(root.right, root.value);
      } else {
        root.value = predecessor(root);
        root.left = delete(root.left, root.value);
      }
    }
    return root;
  }
  
  public int successor(TreeNode root) {
    root = root.right;
    while (root.left != null)
      root = root.left;
    return root.value;
  }

  public int predecessor(TreeNode root) {
    root = root.left;
    while (root.right != null)
      root = root.right;
    return root.value;
  }
}

This Java solution implements a custom HashSet using a combination of an array and binary search trees. It's designed to efficiently perform core set operations such as adding, removing, and checking the existence of elements. Here’s a breakdown of key components and their functions:

  • CustomHashSet Class: This class initializes an array of Bucket objects. The size is set to a prime number, 769, to reduce the chances of collisions in a hash table. The hash function employed uses modulo operation to determine the index for each key.

  • Bucket Class: Each bucket contains a binary search tree to store the keys that hash to the same index. This helps manage collisions efficiently by using tree operations to store and retrieve keys.

  • BinarySearchTree and TreeNode Classes: These classes manage the binary search tree operations within each bucket. The TreeNode class represents each node of the tree, and the BinarySearchTree class includes methods to insert, delete, and search for nodes.

The operations defined include:

  • add(int key): Computes the bucket index for the key using the hash function and adds the key to the appropriate bucket.
  • remove(int key): Identifies the correct bucket and removes the key from it.
  • contains(int key): Checks whether a specific key exists in the set by searching for it in the corresponding bucket.

Efficiency considerations:

  • Using a combination of hashing and binary search trees allows the set operations to handle collisions more gracefully and maintain efficiency in terms of time complexity, especially for unbalanced data.

Use cases:

  • This implementation can be particularly useful in scenarios where simple array or linked-list based approaches to hash sets may lead to high collision rates, affecting the performance of basic operations like add, remove, and contains.

Comments

No comments yet.