H-Index

Updated on 02 June, 2025
H-Index header image

Problem Statement

The h-index is an academic metric that measures the productivity and citation impact of the publications of a researcher. The h-index is defined as the maximum value ( h ), such that a researcher has published at least ( h ) papers that have each received ( h ) or more citations. In other words, if a researcher's h-index is ( h ), they have ( h ) papers which have been cited at least ( h ) times.

In this problem, you are provided an array citations where citations[i] represents the number of citations that the ( i )th paper has received. Your task is to determine the researcher's h-index from this data.

Examples

Example 1

Input:

citations = [3,0,6,1,5]

Output:

3

Explanation:

[3,0,6,1,5] means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively.
Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.

Example 2

Input:

citations = [1,3,1]

Output:

1

Constraints

  • n == citations.length
  • 1 <= n <= 5000
  • 0 <= citations[i] <= 1000

Approach and Intuition

To solve this problem, understanding exactly how the h-index is calculated with respect to a list of citation counts is key. Let's break down the steps:

  1. Sorting the Citations:
    Start by sorting the citations array in descending order. This arrangement ensures that higher citation counts are considered first for higher possible values of ( h ).

  2. Calculating the h-Index:
    Iterate through the sorted list, checking at each point whether the count of papers (indexed from 1 in a 0-based index system) is at least as great as the number of citations at the current index. This is crucial because, for any position ( i ) in the sorted list, if ( i+1 ) (which represents the number of papers in a 1-based index system) is greater than or equal to citations[i], the h-index is determined based on this count.

  3. Edge Cases and Complexity:
    Consider edge cases where all papers have zero citations, or all papers have a number of citations higher than the number of papers. The algorithm should still efficiently produce the correct h-index.

  • For example, in the citation array [3,0,6,1,5]:
    • Sorted, this becomes [6,5,3,1,0]
    • Here, the algorithm will confirm that the h-index 3 is correct because there are at least 3 papers that have received 3 or more citations (6, 5, and 3).

This approach runs efficiently within the given constraints, operating primarily with a time complexity of ( O(n \log n) ) due to the sorting step. The subsequent iteration to determine the h-index is linear, ( O(n) ).

Solutions

  • Java
java
public class Solution {
    public int calculateHIndex(int[] refs) {
        int len = refs.length;
        int[] counts = new int[len + 1];
        for (int cite : refs)
            counts[Math.min(len, cite)]++;
        int index = len;
        for (int accumulated = counts[len]; index > accumulated; accumulated += counts[index])
            index--;
        return index;
    }
}

The provided Java solution calculates the H-Index, which is a metric used to measure the impact and relevance of a researcher's publications. The H-Index quantifies both the number of publications and the number of citations per publication.

  • The calculateHIndex method takes an array of integers refs, representing the number of citations each publication has received.
  • An auxiliary array counts of size len + 1 (where len is the number of publications), increments counts for each citation number up to the maximum len to handle cases where citations exceed the number of publications.
  • Iterate through each citation in refs and increment the counts array using Math.min(len, cite) to ensure the index doesn't exceed the length of the counts array.
  • Calculate the H-Index by checking from the highest possible index (len) downwards, accumulating counts from the counts array until the index is less than or equal to the number of accumulated values.
  • The final index value, when the loop ceases, represents the H-Index of the researcher, defined as the maximum number of papers that have received at least that number of citations.

This method efficiently computes the H-Index in a time complexity linearly dependent on the number of publications, utilizing an array-based counting approach. The solution's auxiliary space is also linearly dependent on the number of publications.

Comments

No comments yet.