Dot Product of Two Sparse Vectors

Updated on 26 May, 2025
Dot Product of Two Sparse Vectors header image

Problem Statement

The problem involves computing the dot product of two sparse vectors. A sparse vector is characterized by having most of its elements as zero, making its memory-efficient storage crucial. To address this, we define a class SparseVector with methods:

  • SparseVector(nums): This initializes the SparseVector object with the vector nums.
  • dotProduct(vec): Calculates and returns the dot product with another vector vec.

The challenge lies in efficiently storing the sparse vector and performing the dot product operation, given that the majority of elements could be zeroes, which inherently do not impact the dot product calculation. An interesting side question proposed is the efficient handling of the dot product when only one vector in the operation is sparse.

Examples

Example 1

Input:

nums1 = [1,0,0,2,3], nums2 = [0,3,0,4,0]

Output:

8

Explanation:

v1 = SparseVector(nums1) , v2 = SparseVector(nums2)
v1.dotProduct(v2) = 1*0 + 0*3 + 0*0 + 2*4 + 3*0 = 8

Example 2

Input:

nums1 = [0,1,0,0,0], nums2 = [0,0,0,0,2]

Output:

0

Explanation:

v1 = SparseVector(nums1) , v2 = SparseVector(nums2)
v1.dotProduct(v2) = 0*0 + 1*0 + 0*0 + 0*0 + 0*2 = 0

Example 3

Input:

nums1 = [0,1,0,0,2,0,0], nums2 = [1,0,0,0,3,0,4]

Output:

6

Constraints

  • n == nums1.length == nums2.length
  • 1 <= n <= 10^5
  • 0 <= nums1[i], nums2[i] <= 100

Approach and Intuition

The key idea revolves around optimizing the storage and computation for vectors that predominantly consist of zeroes. We will utilize specific data structures to minimize memory overhead and computational complexity.

  1. Data Storage:

    • Instead of storing every element of the vector, store only the non-zero elements. A dictionary or hashmap can be employed here, where the key is the index of the non-zero element and the value is the element itself.
  2. Dot Product Calculation:

    • When computing the dot product, iterate through the non-zero elements of one vector, and for each element, check if the corresponding index in the other vector is also non-zero. If it is, multiply both values and add the result to a cumulative sum.
  3. Handling Non-Sparse Vectors:

    • If only one of the vectors is sparse, the above method remains efficient. The sparse vector is used to drive the computation, ensuring that we only consider elements for which there is a corresponding non-zero element in the non-sparse vector.
    • This significantly reduces the number of multiplications and additions needed compared to a naive approach that iterates through both vectors irrespective of their sparsity.

Following this method ensures that the computational complexity is proportional to the number of non-zero elements in the sparsest vector involved in the operation, rather than being dependent on the total number of elements in both vectors. This makes the solution particularly efficient for very large vectors where non-zero elements are few.

Solutions

  • Java
  • Python
java
class SparseVector {

    List<int[]> indexPairs;

    SparseVector(int[] nums) {
        indexPairs = new ArrayList<>();
        for (int index = 0; index < nums.length; index++) {
            if (nums[index] != 0) {
                indexPairs.add(new int[]{index, nums[index]});
            }
        }
    }

    public int dotProduct(SparseVector vec) {
        int sum = 0, ptr1 = 0, ptr2 = 0;
        while (ptr1 < indexPairs.size() && ptr2 < vec.indexPairs.size()) {
            if (indexPairs.get(ptr1)[0] == vec.indexPairs.get(ptr2)[0]) {
                sum += indexPairs.get(ptr1)[1] * vec.indexPairs.get(ptr2)[1];
                ptr1++;
                ptr2++;
            }
            else if (indexPairs.get(ptr1)[0] > vec.indexPairs.get(ptr2)[0]) {
                ptr2++;
            }
            else {
                ptr1++;
            }
        }
        return sum;
    }
}

The solution implements a class SparseVector in Java to calculate the dot product of two sparse vectors efficiently. Sparse vectors are those which predominantly contain zeroes. The class utilizes:

  • A constructor SparseVector(int[] nums) that identifies non-zero elements in the input array and stores them as pairs {index, value} in an ArrayList indexPairs. This condensation captures just the necessary data, saving space and potentially speeding up operations due to reduced overhead from zero values.

  • A method dotProduct(SparseVector vec), which computes the dot product between the sparse vector instance and another sparse vector vec. The method initializes a sum and uses two pointers ptr1 and ptr2 to traverse the indexPairs of the two vectors. By comparing the indices at current pointers:

    • If indices match, it multiplies the corresponding values and adds them to the sum, and then both pointers are moved forward.
    • If an index in the first vector is greater than in the second vector, the pointer for the second vector is incremented to catch up, and vice versa.

This approach optimizes the dot product computation by focusing only on the non-zero elements, making it particularly effective for vectors with large dimensions but few non-zero entries.

python
class SparseVector:
    def __init__(self, nums):
        self.non_zero_pairs = []
        for i, num in enumerate(nums):
            if num != 0:
                self.non_zero_pairs.append((i, num))

    def dotProduct(self, other):
        output = 0
        idx1, idx2 = 0, 0

        while idx1 < len(self.non_zero_pairs) and idx2 < len(other.non_zero_pairs):
            index1, value1 = self.non_zero_pairs[idx1]
            index2, value2 = other.non_zero_pairs[idx2]
            if index1 == index2:
                output += value1 * value2
                idx1 += 1
                idx2 += 1
            elif index1 < index2:
                idx1 += 1
            else:
                idx2 += 1

        return output

The given Python3 code implements a class SparseVector to handle the dot product of two sparse vectors efficiently. Each sparse vector is represented only by its non-zero elements to optimize space usage.

Here are the specific details:

  • The __init__ method initializes a SparseVector instance by storing each non-zero element along with its index in a list named non_zero_pairs. This list uses tuples to keep pairs where the first element is the index of the non-zero element and the second is the value itself.

  • The dotProduct method calculates the dot product of the instance with another SparseVector. This method iterates simultaneously through the non_zero_pairs lists of both vectors. Moreover, calculation proceeds when indices of the elements match, in which case products of the respective values are added to the output total. If indices don’t match, the index pointer of the smaller index moves to the next element. Continue this way until one of the vectors is fully traversed.

The approach maximizes efficiency by:

  • Reducing the number of operations by skipping zero elements.
  • Utilizing tuple pairs for easy access and clarity.

This solution is particularly effective for operations involving large vectors where non-zero entries are sparse (few and far between), minimizing memory usage and processing time.

Comments

No comments yet.