Arduino strtok

The Arduino strtok function is a function that returns tokens from a string. This means you can extract wanted data from a string and eliminate unwanted data. Multiple calls are made to strtok to obtain each token string in turn.

The prototype is:


   char *strtok(char *str, const char *delim);

On this page you can find:

  • Example code for using Arduino strtok that you can use immediately,
  • An explanation of how strtok works,
  • An explanation of how you got it wrong (I did),
  • Code that shows the result of using strtok (the original string is destroyed).

You give the Arduino strtok function a string and a delimiter (a constant string) that defines the separation characters in the string. For instance you could have a CSV file such as:


   char *mystring = "apples,pears,bananas";

The objective is to find the tokens (fruits) and eliminate the delimiters (commas) and use these resulting tokens in your program. The Arduino strtok function lets you step through each fruit returning a properly formed string returning a pointer to each item after each call to strtok.

The first call to strtok sets up the "tokeniser" and returns the first token. For subsequent calls, you set the first parameter to NULL (or '\0'), and each token string is returned until there are no more tokens. When exhausted, the token returned is the NULL string.

Arduino Strtok Example

void setup(void) {
char *token;
char *mystring = "apples,pears,bananas";
const char *delimiter =",";

   Serial.begin(115200);

   token = strtok(mystring, delimiter);

   while (token != NULL) {
      Serial.println(token);
      token=strtok(NULL, delimiter);
   }

}

void loop(void) {

}

In this case the output is:

apples
pears
bananas

The Arduino strtok function is part of the standard C library and normally you have to include the following lines to use it:

    #include <string.h>

...but the Arduino environment does this automatically for you in the background.

How Arduino strtok works

The first thing the Arduino strtok algorithm does it to ignore all leading delim characters.


   char *strtok(char *str, const char *delim);

The Arduino strtok algorithm then scans through the string str looking for subsequent delim characters. If any delim character is found then the characters before the delimiter are returned as a token i.e as a string pointer.

The same process is followed for the next call to strtok except the str string parameter is set to NULL so that the algorithm knows to obtain the next token i.e. a pointer is set to remember the position in the original string from the last call.

How it works in more detail

At the start of the Arduino strtok algorithm all delimiter characters are skipped with the token pointer moved forwards on each skip. So the token pointer points to the start of a non-delimter string.

Then, when the next delimiter character is found, the first character of the delimiter is set to NULL (or '\0'). This forms a null terminated string which is why the function can return a 'string' as the token return value (a pointer into the original string).

On the subsequent calls a stored pointer is updated to just past the end of the first delimiter and the process is repeated; returning a correctly formed string again. This is repeated until the end of the original string is reached.

More Complex use of Arduino strtok

Since the delimiter can be more than one character long, you can use the Arduino strtok function where the separator has more than one character. However the delimiter string must be kept the same between calls - meaning the string has to be structured in a consistent manner i.e. delimiter characters can never form part of the ones you are trying to extract as a tokens.


   char *mystring = ":,:::56:--:2.5:[:24";

Task: Extract the numbers from the 'oddly' delimited string.

Note: The delimiter must be the same between separate calls to strtok.

You could write the  following sketch:

void setup(void) {
char *token;
char *mystring = ":,:::56:--:2.5:[:24";
const char *delimiter =
":,-[";

 
Serial.begin(115200); token = strtok(mystring, delimiter); while (token != NULL) { Serial.println(token); token=strtok(NULL, delimiter); } } void loop(void) { }

In this case the output is:

56
2.5
24

If you don't see why this works the check out an explanation here.

If you need to split a string where you need to identify sequences of characters (that are delimiters), then use the strstr function.

Questions on how strtok works

Why does the delimiter not work?

If you though that the delimiter parameter delim refers to a string delimiter sequence of characters to be exactly matched, you'd be wrong. Each character in delim is treated as an individual delimiter. Here's an example to explain:

Lets say you have this string:


   char *mystring = "apples::pears:grapes::bananas";

... and you want to separate out elements enclosed by two colons. You might write:

const char *delimiter ="::";

... expecting delimiter to search for sets of  double characters. In fact you get the following output:

Results from strtok
apples
pears
grapes
bananas

So what happened?

The function strtok does not work in that way. Unfortunately the operation of strtok is not explained clearly. In strtok  the order of characters in the delim string does not mater, and each character has equal precedence. Writing the code "::" just wastes time as each string character is compared to the colon character twice.

In the Arduino strtok function delimiter characters specify that if any of the characters matching a character in the string, then flag up delimiter action.

Note: Because delimiter characters are ignored from the start then all characters in the returned string never have delimiter characters from the start of the string.

Where is memory for the string?

One question you might have, is why are the tokens returned not assigned to a new character string? i.e. why don't I need to assign more memory for each found token?

The reason is that the original string storage area is trampled over by the function, so a side effect of using strtok is that the original string is destroyed!

Warning: The original string is destroyed by Arduino strtok.

Arduino strtok testing example

The following example code shows destruction of the original string by explicitly using array pointer positions (array positions counted manually) to print out each token.

void setup(void) {
char *token;
char *mystring = "apples,pears,bananas";
const char *delimiter =",";
char *p;

   Serial.begin(115200);

   Serial.println("Here is the original string (before strtok)");
   Serial.println(mystring);

   Serial.println("\nResults from strtok");

   token = strtok(mystring, delimiter);

   while (token != NULL) {
      Serial.println(token);
      token=strtok(NULL, delimiter);
   }

   Serial.println("\nResult of direct array address decoding");
   p = &mystring[0]; // apples
   Serial.println(p);

   p = &mystring[7]; // pears
   Serial.println(p);

   p = &mystring[13]; // bananas
   Serial.println(p);

   Serial.println("\nHere is the original string (trunctated)");
   Serial.println(mystring);

}

void loop(void) {

}

In this case the output is:

Here is the original string (before strtok)
apples,pears,bananas

Results from strtok
apples
pears
bananas

Result of direct array address decoding
apples
pears
bananas

Here is the original string (trunctated)
apples

The array addresses of indices 0, 7, 13 into the string are same pointer addresses returned by the Arduino strtok function when it returns each token pointer.

Both the first and last print statements try to print out the same string 'mystring', but as you can see, strtok has altered the contents of the string as described here.

Can I change the delimiter values?

The algorithm lets you change the delimiter string between calls. You may think this is a good idea but in practice it is a waste of time. Consider the following code:

A working example

void setup(void) {
char *token;
char *mystring = "apples,xpears,[email protected]";
char *delimiter =",;";
const char *newDelim = "[email protected]";
int c = 1;

   Serial.begin(115200);

   token = strtok(mystring, delimiter);

   while (token != NULL) {
      Serial.println(token);
      if (c++ == 2) delimiter = newDelim;
      token=strtok(NULL, delimiter);
  }
}

void loop(void) {

}


Results from strtok
apples
pears
grapes
bananas
coconuts

For this case, the operation does work, although it is easier to use a fixed delimiter as follows:

const char *delimiter =",;[email protected]";

It was pointless changing the delimiter because the new delimiter string covers all cases.

A flawed example

That was a trivial example what about

char *mystring = "apples;@pears;@coconuts,p;bananas,p;grapes";

The required delimiters are ";@" followed by ",p;".

void setup(void) {
char *token;
char *mystring = "apples;@pears;@coconuts,p;bananas,p;grapes";
char *delimiter =";@";
const char *newDelim = ",p;";
int c = 1;

   Serial.begin(115200);

   token = strtok(mystring, delimiter);

   while (token != NULL) {
      Serial.println(token);
      if (c++ == 2) delimiter = newDelim;
      token=strtok(NULL, delimiter);
  }
}

void loop(void) {

}
Results from strtok
apples
pears
@coconuts
bananas
gra
es

You can see that apples, pears and bananas are correctly parsed but you have but grapes and coconuts where tokenising has failed.

This is the key problem in changing the delimiter string. Once the 'pears' is found using the delimiter ";@", the delimiter is changed to ",p;".

char *mystring = "apples;@pears;@coconuts,p;bananas,p;grapes";
nNull zero '\0' inserted here ^
Pointer start for next call ^

The first delimiter found is ';'. The stored pointer is set to point to the following '@' character  - this taken as the next pointer start position. If the delimiter was not changed it would be ignored. Since the new delimiter does not include '@' (",p;") this character is included in the next output token!

    @coconuts

The next problem is that a delimiter occurs as a character within a desired token i.e. the letter 'p is taken as a delimiter. So 'p' splits the desired token in half.

    gra
    es

The algorithm is working as it should!


TIP: Do not change the delimiter character set in the middle of tokenizing.

If you really want to do something clever to fix this action use strstr or write a recursive descent expression parser.

There are two things to remember:

  1. Don't include any token parameter in the delimiter string.
  2. Don't change the delimiter between calls.

Gotchas

If you have not used the function in a while and use the following program...

void setup(void) {
char *token;
char *mystring = "apples,pears,bananas";

   Serial.begin(115200);

   token = strtok(mystring, ',');

   while (token != NULL) {
      Serial.println(token);
      token=strtok(NULL, ',');
   }

}

void loop(void) {

}

...you may think you have written it correctly, after all you only need a single character delimiter because that is all you are trying to detect.

Note: The delimiter must be a character string - not a single character.

In this case the output is:

apples,pears,bananas

This is undefined operation and you won't get an error message - the delimiter must be a properly terminated string because the delimiter can contain more than 1 character.


New! Comments

Have your say about what you just read! Leave me a comment in the box below.



Privacy Policy | Contact | About Me

Site Map | Terms of Use