Discussion:
Regex Localization problem
Antonio Scotti
2006-05-16 07:42:56 UTC
Permalink
Hi,
I have a portability problem with regex (probably due to localization
issues).

I've found that, under Win32, the regular expression "\w" matches all the
alphanumeric chars, including non ascii ones (such as e grave or e acute).
This doesn't hold while under linux. In fact, \w only seems to match ascii
characters. I've tried chancing the locale, but nothing changed.

Is there a way to make \w have on linux the same behaviour it has on win32?

Thanks in advance.
John Maddock
2006-05-17 09:23:01 UTC
Permalink
Post by Antonio Scotti
Hi,
I have a portability problem with regex (probably due to localization
issues).
I've found that, under Win32, the regular expression "\w" matches all
the alphanumeric chars, including non ascii ones (such as e grave or
e acute). This doesn't hold while under linux. In fact, \w only seems
to match ascii characters. I've tried chancing the locale, but
nothing changed.
Is there a way to make \w have on linux the same behaviour it has on win32?
Sure, it all comes down to what the default C++ locale is:

std::locale::global(std::locale("en_US"));

called before any regexes are constructed would set the locale to "en_US"
*provided that locale is supported by your implementation*.

Alternatively, you can set the locale in specific regex instances:

boost::regex e;
e.imbue(std::locale("en_US"));
e.assign(my_regular_expression); // now uses en_US as it's locale

HTH, John.
Antonio Scotti
2006-05-17 16:02:49 UTC
Permalink
Thanks a lot. It worked wonderfully!
Post by John Maddock
Post by Antonio Scotti
Hi,
I have a portability problem with regex (probably due to localization
issues).
I've found that, under Win32, the regular expression "\w" matches all
the alphanumeric chars, including non ascii ones (such as e grave or
e acute). This doesn't hold while under linux. In fact, \w only seems
to match ascii characters. I've tried chancing the locale, but
nothing changed.
Is there a way to make \w have on linux the same behaviour it has on win32?
std::locale::global(std::locale("en_US"));
called before any regexes are constructed would set the locale to "en_US"
*provided that locale is supported by your implementation*.
boost::regex e;
e.imbue(std::locale("en_US"));
e.assign(my_regular_expression); // now uses en_US as it's locale
HTH, John.
Loading...