[TIP] Words Tokenization
April 24, 2009 Matteo Bertozzi | Filed Under Tips | No CommentsSometimes is useful to split the input text in a list of words to Indexing or Searching data.
Here is how to extract words from a sentence in C.
char str[] = "Hi, I'm a test. (This is just a test). "
"Join The #qt IRC Channel!"
"GNU/Linux - theo@gmail.com";
char delims[] = " !\"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~";
char *result = NULL;
result = strtok(str, delims);
while(result != NULL) {
printf("%s\n", result);
result = strtok(NULL, delims);
}
…and this is the Qt way.
QString str = "Hi, I'm a test. (This is just a test). "
"Join The #qt IRC Channel! GNU/Linux - theo@gmail.com";
QString delim = QRegExp::escape(" !\"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~");
QRegExp regexp(QString("[%1]").arg(delim),Qt::CaseSensitive,QRegExp::RegExp2);
qDebug() << str.split(regexp, QString::SkipEmptyParts);