C++ cstring strtok() - Tokenize String

Updated on September 27, 2024
strtok() header image

Introduction

The strtok() function in C++ is a tokenizer that breaks a string into a sequence of tokens based on a set of delimiters. This function is part of the cstring header and proves essential in textual data parsing where you need to divide a string into parts, such as when reading files or processing user inputs.

In this article, you will learn how to effectively utilize the strtok() function to tokenize strings in C++. You will explore practical examples that demonstrate tokenizing strings by various delimiters and discuss how to handle multiple delimiters efficiently.

Tokenizing a String Using strtok()

Basic String Tokenization

  1. Include the cstring header in your C++ program.

  2. Define a character array for the string to tokenize and another for the delimiters.

  3. Use strtok() to find the first token and then loop to find subsequent tokens.

    cpp
    #include <iostream>
    #include <cstring>
    
    int main() {
        char str[] = "Sample text, with: several; delimiters!";
        char delim[] = " ,:;!";
        char *token = strtok(str, delim);
    
        while (token != nullptr) {
            std::cout << token << std::endl;
            token = strtok(nullptr, delim);
        }
    
        return 0;
    }
    

    This code breaks down the string based on multiple delimiters: spaces, commas, colons, semicolons, and exclamation marks. Each token found is printed on a new line.

Handling Multiple Delimiters

  1. Prepare the string and multiple delimiters similarly as before.

  2. Continue to use strtok() and loop to retrieve all tokens.

    cpp
    #include <iostream>
    #include <cstring>
    
    int main() {
        char str[] = "Example\nstring with different\tdelimiters";
        char delim[] = " \n\t";
        char *token = strtok(str, delim);
    
        while (token != nullptr) {
            std::cout << token << std::endl;
            token = strtok(nullptr, delim);
        }
    
        return 0;
    }
    

    The provided example shows how to tokenize a string that includes space, tab, and newline as delimiters. Each token is extracted and printed line by line.

Handling Non-ASCII Delimiters

Working with Special Characters

  1. Define the string and delimiters, including non-ASCII characters.

  2. Tokenize using strtok() in a similar looping structure.

    cpp
    #include <iostream>
    #include <cstring>
    
    int main() {
        char str[] = "one,two;three|four@five";
        char delim[] = ",;|@";
        char *token = strtok(str, delim);
    
        while (token != nullptr) {
            std::cout << token << std::endl;
            token = strtok(nullptr, delim);
        }
    
        return 0;
    }
    

    In this instance, the delimiters include common symbols like commas, semicolons, pipes, and the '@' symbol. It efficiently breaks the string into discrete tokens.

Conclusion

Utilizing the strtok() function in C++ allows for the efficient breaking down of strings into tokens based on specified delimiters. This method is highly valuable in various applications that require string manipulation, such as parsing files or processing user commands. By mastering strtok(), you greatly enhance the text processing capabilities of your C++ programs, ensuring robust and flexible data handling. Remember to consider the handling of special characters and multiple delimiters to fully leverage the power of string tokenization.