Remove Sub-Folders from the Filesystem

Updated on 10 July, 2025
Remove Sub-Folders from the Filesystem header image

Problem Statement

In this task, we are provided with a list of folder paths and need to filter out any folders that are nested within another folder from the list. A nested folder or sub-folder is identified by its path: if one folder's path is a prefix of another and is followed directly by a '/', it is considered a sub-folder. Our goal is to return a list of main folders with all their sub-folders removed. The paths are structured with a leading '/', and only contain lowercase letters. Each path in the list is distinct, ensuring no duplicate entries.

Examples

Example 1

Input:

folder = ["/a","/a/b","/c/d","/c/d/e","/c/f"]

Output:

["/a","/c/d","/c/f"]

Explanation:

Folders "/a/b" is a subfolder of "/a" and "/c/d/e" is inside of folder "/c/d" in our filesystem.

Example 2

Input:

folder = ["/a","/a/b/c","/a/b/d"]

Output:

["/a"]

Explanation:

Folders "/a/b/c" and "/a/b/d" will be removed because they are subfolders of "/a".

Example 3

Input:

folder = ["/a/b/c","/a/b/ca","/a/b/d"]

Output:

["/a/b/c","/a/b/ca","/a/b/d"]

Constraints

  • 1 <= folder.length <= 4 * 104
  • 2 <= folder[i].length <= 100
  • folder[i] contains only lowercase letters and '/'.
  • folder[i] always starts with the character '/'.
  • Each folder name is unique.

Approach and Intuition

To solve the problem of removing sub-folders from the folder list effectively, we can employ the following steps:

  1. Firs, sort the list of folder paths. This allows sub-folders to appear immediately after their parent folders, making it easier to identify when a folder is a sub-folder of the previous one in the sorted list.

  2. Initialize an empty list to collect the result. As we iterate through the sorted list, we check each folder against the path of the last added folder to our result list.

  3. For each folder in the sorted list, check if it begins with the path of the last folder added to the result list followed immediately by a '/'. If it does, it is a sub-folder and should be skipped.

  4. If it's not a sub-folder, add this folder path to our result list.

  5. This method, combined with the sorting step, ensures that we efficiently check for and exclude sub-folders, resulting in a list that contains only the top-level folders.

By implementing these steps, we efficiently filter out all sub-folders, ensuring each folder in the output list is not contained within any other folder from the input list.

Solutions

  • C++
cpp
class FileSystem {
private:
    struct Node {
        bool endOfPath;
        unordered_map<string, Node*> next;
        Node() : endOfPath(false) {}
    };
    
    Node* rootDirectory;
    
    void freeMemory(Node* node) {
        if (node == nullptr) return;
        for (auto& child : node->next) {
            freeMemory(child.second);
        }
        delete node;
    }
    
public:
    FileSystem() : rootDirectory(new Node()) {}
    
    ~FileSystem() { freeMemory(rootDirectory); }
    
    vector<string> filterSubfolders(vector<string>& folders) {
        for (string& path : folders) {
            Node* current = rootDirectory;
            istringstream stream(path);
            string part;
    
            while (getline(stream, part, '/')) {
                if (part.empty()) continue;
                if (current->next.find(part) == current->next.end()) {
                    current->next[part] = new Node();
                }
                current = current->next[part];
            }
            current->endOfPath = true;
        }
    
        vector<string> output;
        for (string& path : folders) {
            Node* current = rootDirectory;
            istringstream stream(path);
            string part;
            bool isNested = false;
    
            while (getline(stream, part, '/')) {
                if (part.empty()) continue;
                Node* next = current->next[part];
                if (next->endOfPath && stream.rdbuf()->in_avail() != 0) {
                    isNested = true;
                    break;
                }
                current = next;
            }
            if (!isNested) output.push_back(path);
        }
    
        return output;
    }
};

This solution outlines an approach to the problem of filtering out sub-folders from a list of filesystem paths using C++. In this implementation:

  1. A FileSystem class is introduced, encapsulating a trie structure to manage the hierarchy of folder paths efficiently.

  2. Inside this class, a nested Node structure represents each folder in the filesystem. Each Node contains a boolean endOfPath to mark the termination of a path, and an unordered map next that maps folder names to their corresponding child Nodes.

  3. The constructor FileSystem() initializes the root node of the filesystem.

  4. The destructor ~FileSystem() ensures all dynamically allocated nodes are properly deleted to prevent memory leaks.

  5. The primary functionality resides in filterSubfolders, which accepts a vector of strings representing folder paths and returns a vector of strings containing the filtered list of paths.

  6. During the execution:

    • Each folder path is inserted into the trie. Each component of the path (split by '/') is checked and inserted into the trie if it does not already exist.
    • After inserting all paths, the method processes each path again to determine if it is a sub-folder. This is done by traversing the trie according to the parts of the path and checking if a terminal node (endOfPath = true) is encountered before the actual end of the current path string, indicating a sub-folder.
  7. The approach embodies efficiency in both time and space by leveraging the trie structure, which allows fast inserts and lookups. The final result is a list of path strings that represent non-nested directories. The overall complexity primarily depends on the number of paths and the depth of the directory structure.

By utilizing this approach, you achieve a structured and scalable way to manage and filter folder paths, suitable for applications involving complex filesystem operations.

  • Java
java
class FileManager {
    
    static class Node {
    
        boolean isEof;
        HashMap<String, Node> subnodes;
    
        Node() {
            this.isEof = false;
            this.subnodes = new HashMap<>();
        }
    }
    
    Node rootNode;
    
    FileManager() {
        this.rootNode = new Node();
    }
    
    public List<String> filterSubfolders(String[] folders) {
        // Construct Trie with folder paths
        for (String directory : folders) {
            Node current = rootNode;
            String[] parts = directory.split("/");
    
            for (String part : parts) {
                // Ignore empty names
                if (part.isEmpty()) continue;
                // Create a new node if absent
                current.subnodes.computeIfAbsent(part, k -> new Node());
                current = current.subnodes.get(part);
            }
            // Flag as complete folder path
            current.isEof = true;
        }
    
        // Filter out subfolders
        List<String> output = new ArrayList<>();
        for (String directory : folders) {
            Node current = rootNode;
            String[] parts = directory.split("/");
            boolean skip = false;
    
            for (int j = 0; j < parts.length; j++) {
                // Ignore empty directory names
                if (parts[j].isEmpty()) continue;
    
                Node next = current.subnodes.get(parts[j]);
                // Determine if current path is within a marked folder
                if (next.isEof && j != parts.length - 1) {
                    skip = true;
                    break; // Immediate cessation on subfolder discovery
                }
    
                current = next;
            }
            // Include if it's not a subfolder
            if (!skip) output.add(directory);
        }
    
        return output;
    }
}

In the provided solution, the given Java code defines how to manage a file directory system and filter out sub-folders from a set of directory paths. By utilizing a Trie (prefix tree) data structure for efficient folder path management, the approach involves two main processes:

  • Constructing the Trie: The filterSubfolders method begins by constructing a Trie where each unique path from the directory list gets its own node sequence. The path strings are split into parts (using "/" as the delimiter), and each part is inserted into the Trie. Empty names are ignored during the splitting process. As each complete path is stored, the last node in its sequence is marked with isEof = true, indicating the end of a folder path.

  • Filtering Subfolders: Once the Trie is built, the code then re-checks each directory path, navigating through the Trie structure. For each part of a directory split, the algorithm checks if the node at that part has been finished (isEof = true). If it finds isEof set to true before reaching the end of the parts (not the last path part), it marks the path as a subfolder (skips this path) because a complete path exists before this directory's end, indicating it's a subfolder of some already recognized path. Only paths that aren't subfolders are added to the final output list.

This method efficiently identifies and excludes sub-directories from the output using a Trie to navigate and verify paths effectively—one node per directory part path, ensuring streamlined identification and exclusion based on predefined complete path nodes (isEof property). The specific implementation ensures that full paths are identified from ground up, and intermediately complete paths denote the beginning of a valid, non-subdirectory path. This solution not only allows precise directory management but also provides efficient querying capabilities for directory validation.

  • Python
python
class FileStructure:
        
    class Node:
        def __init__(self):
            self.terminal = False
            self.child = {}
        
    def __init__(self):
        self.base = self.Node()
        
    def deleteSubfolders(self, paths):
        # Construct a trie based on the directory paths
        for p in paths:
            focus = self.base
            parts = p.split("/")
                
            for dir_name in parts:
                if dir_name == "":
                    continue
                    
                # Create a new node if it doesn't exist
                if dir_name not in focus.child:
                    focus.child[dir_name] = self.Node()
                focus = focus.child[dir_name]
                
            # Set the end of a complete path
            focus.terminal = True
            
        # Determine valid folder paths
        folders = []
        for p in paths:
            focus = self.base
            parts = p.split("/")
            subfolder = False
                
            for i, dir_name in enumerate(parts):
                if dir_name == "":
                    continue
                current = focus.child[dir_name]
                # Check pre-existed terminal nodes within the path
                if current.terminal and i != len(parts) - 1:
                    subfolder = True
                    break  # Existing sub-directory structure detected
                focus = current
                
            # Add if it is not a subfolder
            if not subfolder:
                folders.append(p)
            
        return folders

The provided solution presents a method to remove subfolder paths from a list of directory paths using a Trie data structure in Python. This method effectively filters out redundant paths that are subfolders of other paths in the list. Below is a step-by-step explanation of how this is achieved:

  1. Initialize the Trie: Start by defining a Trie node class within your FileStructure class. Each node contains a dictionary to store child nodes (child) and a boolean to indicate the termination of a path (terminal).

  2. Build the Trie: Iterate over each path in the provided list. Split the path into its constituent directories and insert each directory into the Trie, creating new nodes as necessary. Mark the end of each complete path by setting terminal to True on the last node of every path.

  3. Filter Paths: Iterate over the paths again and for each directory in the path, check if a pre-existing node in the Trie is marked as terminal before reaching the final directory of the current path. This indicates that the current path is a subfolder. If a path does not meet this criterion, add it to the result list.

  4. Output the Result: After all paths are checked against the Trie, the remaining paths in your result list are those which are not subfolders. These paths are therefore not contained within any other path in the list.

This approach avoids the need for multiple nested loops or complex string operations, making the solution efficient especially for large sets of paths.

Comments

No comments yet.