Arduino strtok:How to Easily Extract Tokens from a string. Learn how strtok works - with walk-through examples - and how
to use it to find multiple tokens. Find out why you can't change
delimiters halfway through and avoid the one problem you'll
probably fall for!
Arduino strtok : A token is a set of characters that you want to find
within a string and this function gives you the means to find them.
How to get tokens out of a string.
Find out Exactly How strtok Works...
...shown with walk-through examples.
Find out why you can't change the delimiters half way through.
Learn the one gotcha that you'll probably fall for and avoid it.
Learn how the function returns multiple tokens without using more memory.
If you sent a string of data to your Arduino through the serial
interface with each number separated by a comma - the question is: How to easily retrieve each number, and use it in your program?
On this page you can find:
Example code for using Arduino strtok that you can use immediately,
An explanation of how strtok works,
An explanation of how you got it wrong (I did),
Code that shows the result of using strtok (the original string is destroyed).
The strtok function returns tokens from a
string one after another until there are no more. This means you can
extract wanted data from a string and ignore unwanted data (separators).
Multiple calls are made to strtok to obtain
each token string in turn.
The prototype is:
You give the Arduino strtok function a string and a delimiter (a constant
string) that defines the separation characters in the string. For
instance you could have a CSV file such as:
The objective is to find the tokens (fruits) and eliminate the
delimiters (commas) and use these resulting tokens in your program. The
Arduino strtok function lets you step through each fruit returning a
properly formed string returning a pointer to each item after each call
The first call to strtok sets up the "tokeniser" and returns the
first token. For subsequent calls, you set the first parameter to NULL
(or '\0'), and each token string is returned until there are no more
tokens. When exhausted, the token returned is the NULL string.
The Arduino strtok function is part of the standard C library and normally you have to include the following lines to use it:
...but the Arduino environment does this automatically for you in the background.
How Arduino strtok works
The first thing the Arduino strtok algorithm does it to ignore all leading delim characters.
The Arduino strtok algorithm then scans through the string str
looking for subsequent delim characters. If any delim character is found
then the characters before the delimiter are returned as a token i.e as
a string pointer.
The same process is followed for the next call to strtok except the
str string parameter is set to NULL so that the algorithm knows to
obtain the next token i.e. a pointer is set to remember the position in
the original string from the last call.
How it works in more detail
At the start of the Arduino strtok algorithm all delimiter characters
are skipped with the token pointer moved forwards on each skip. So the
token pointer points to the start of a non-delimter string.
Then, when the next delimiter character is found, the first character of the
delimiter is set to NULL (or '\0'). This forms a
null terminated string which is why the
function can return a 'string' as the token return value (a pointer into the original string).
On the subsequent calls a stored pointer is updated to just past the
end of the first delimiter and the process is repeated; returning a
correctly formed string again. This is repeated until the end of the
original string is reached.
More Complex use of Arduino strtok
Since the delimiter can be more than one character long, you can use
the Arduino strtok function where the separator has more than one
character. However the delimiter string must be kept the same between
meaning the string has to be structured in a consistent manner i.e.
delimiter characters can never form part of the ones you are trying to
extract as a tokens.
Task: Extract the numbers from the 'oddly' delimited string.
Note: The delimiter must be the same between separate calls to strtok.
If you don't see why this works the check out an explanation here.
If you need to split a string where you need to identify sequences of
characters (that are delimiters), then use the strstr function.
Questions on how strtok works
Why does the delimiter not work?
If you though that the delimiter parameter delim refers to a string
delimiter sequence of characters to be exactly matched, you'd be wrong.
Each character in delim is treated as an individual delimiter. Here's an
example to explain:
Lets say you have this string:
... and you want to separate out elements enclosed by two colons. You might write:
... expecting delimiter to search for sets of double characters. In fact you get the following output:
Results from strtok
So what happened?
The function strtok does not work in that way. Unfortunately the
operation of strtok is not explained clearly. In strtok the order
of characters in the delim string does not mater, and each character has
equal precedence. Writing the code "::" just wastes time as each string
character is compared to the colon character twice.
In the Arduino strtok function delimiter characters specify that if any
of the characters matching a character in the string, then flag up delimiter action.
Note: Because delimiter characters are ignored from the start then all
characters in the returned string never have delimiter characters from the start of the string.
Where is memory for the string?
One question you might have, is why are the tokens returned not
assigned to a new character string? i.e. why don't I need to assign more
memory for each found token?
The reason is that the original string storage area is trampled over
by the function, so a side effect of using strtok is that the original
string is destroyed!
Warning: The original string is destroyed by Arduino strtok.
Arduino strtok testing example
The following example code shows destruction of the original string by explicitly
using array pointer positions (array positions counted manually) to
print out each token.
Serial.println("Here is the original string (before strtok)");
Serial.println("\nResults from strtok");
Serial.println("\nResult of direct array address decoding");
Serial.println("\nHere is the original string (trunctated)");
In this case the output is:
Here is the original string (before strtok)
Results from strtok
Result of direct array address decoding
Here is the original string (trunctated)
The array addresses of indices 0, 7, 13 into the string are same pointer addresses
returned by the Arduino strtok function when it returns each token
Both the first and last print statements try to print out the same string
'mystring', but as you can see, strtok has altered the contents of the
string as described here.
Can I change the delimiter values?
The algorithm lets you change the delimiter string between calls. You
may think this is a good idea but in practice it is a waste of time.
Consider the following code:
Results from strtok
You can see that apples, pears and bananas are correctly parsed but
you have but grapes and coconuts where tokenising has failed.
This is the key problem in changing the delimiter string. Once the
'pears' is found using the delimiter ";@", the delimiter is changed
char*mystring="apples;@pears;@coconuts,p;bananas,p;grapes"; nNull zero '\0' inserted here ^ Pointer start for next call ^
The first delimiter found is ';'. The stored pointer is set to point to the following '@'
character - this taken as the next pointer start position. If the
delimiter was not changed it would be ignored. Since the new delimiter
does not include '@' (",p;") this character is included in the next
The next problem is that a delimiter occurs as a character within a
desired token i.e. the letter 'p is taken as a delimiter. So 'p' splits
the desired token in half.
algorithm is working as it should!
TIP: Do not change the delimiter character set in the middle of tokenizing.
If you really want to do something clever to fix this action use strstr or write a recursive descent expression parser.
There are two things to remember:
Don't include any token parameter in the delimiter string.
Don't change the delimiter between calls.
If you have not used the function in a while and use the following program...