Discussion:
[boost.regex]Does boost:u32regex recognize the unicode named blocks like "\p{IsBasicLatin}"?
gj_uestc
2008-08-08 02:03:48 UTC
Permalink
Hi,all
Nowadays I am using boost:u32regex to do some regular expression processing.But it seems that "/p{IsBasicLatin}"is not a accessable expression by boost::make_u32regex(tmp).Does boost:regex not suppor the named unicode blocks or I have to pass some other flags to the library? Now I was using the defult flag wich indicate using perl syntactic.

thanks&regards
Juan
John Maddock
2008-08-08 08:27:16 UTC
Permalink
Post by gj_uestc
Hi,all
Nowadays I am using boost:u32regex to do some regular expression
processing.But it seems that "/p{IsBasicLatin}"is not a accessable
expression by boost::make_u32regex(tmp).Does boost:regex not suppor
the named unicode blocks or I have to pass some other flags to the
library? Now I was using the defult flag wich indicate using perl
syntactic.
The named properties/character classes supported are here:
http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/character_classes/optional_char_class_names.html

As you can see I haven't added support for language-specific blocks yet :-(

John.
John Maddock
2008-08-08 08:48:38 UTC
Permalink
Post by John Maddock
Post by gj_uestc
Hi,all
Nowadays I am using boost:u32regex to do some regular expression
processing.But it seems that "/p{IsBasicLatin}"is not a accessable
expression by boost::make_u32regex(tmp).Does boost:regex not suppor
the named unicode blocks or I have to pass some other flags to the
library? Now I was using the defult flag wich indicate using perl
syntactic.
http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/character_classes/optional_char_class_names.html
As you can see I haven't added support for language-specific blocks yet :-(
I forgot to mention that \p{IsBasicLatin} is the same as: [\x0-\x7f],
likewise the other continuous blocks can be expressed in the same way.

HTH, John.
gj_uestc
2008-08-11 03:34:34 UTC
Permalink
Thanks John,
I have used "[\x0-\x7f]" instead of "/p{IsBasicLatin}" to construct the regular expression (expression=boost::make_u32regex("[\\x0-\\x7f]" )). The regular expression has been constructed correctly but it cannot accecpt instance string either "a" or "	" (boost:u32match("a",expression)==false).I am wondering whether it has something to do with unicode? I have tried expression=boost::regex("[\\x0-\\x7f]" )); then I can pass the string "a" but not string "	"(boost:match("a",expression)==true), which I think is reaonable for boost:regex since it does not support the unicode. So my point is: why the boost:u32match doesn't work well?

Thanks&regards
Juan
Post by John Maddock
Post by John Maddock
Post by gj_uestc
Hi,all
Nowadays I am using boost:u32regex to do some regular expression
processing.But it seems that "/p{IsBasicLatin}"is not a accessable
expression by boost::make_u32regex(tmp).Does boost:regex not suppor
the named unicode blocks or I have to pass some other flags to the
library? Now I was using the defult flag wich indicate using perl
syntactic.
http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/character_classes/optional_char_class_names.html
As you can see I haven't added support for language-specific blocks yet :-(
I forgot to mention that \p{IsBasicLatin} is the same as: [\x0-\x7f],
likewise the other continuous blocks can be expressed in the same way.
HTH, John.
Andrea Denzler
2008-08-11 03:53:40 UTC
Permalink
Juan, I have no problems with this sample code



#include "stdafx.h"



void test(wstring str)

{

u32regex expression = make_u32regex("^([\\x0-\\x7f]+)$");

wsmatch what;

if(u32regex_match(str, what, expression))

{

// what[0] contains the whole string

// what[1] contains ascii text

wcout << what[1] << _T(" is ascii") << endl;

} else {

wcout << str << _T(" is not ascii") << endl;

}

}



int _tmain(int argc, _TCHAR* argv[])

{

test(_T("Hello World!"));

test(_T("Îöò"));

return 0;

}





Hello World! is ascii

¶÷‗ is not ascii





_____

Da: boost-users-***@lists.boost.org [mailto:boost-users-***@lists.boost.org] Per conto di gj_uestc
Inviato: lunedì 11 agosto 2008 5.35
A: boost-***@lists.boost.org; ***@johnmaddock.co.uk
Oggetto: Re: [Boost-users] [boost.regex]Does boost:u32regexrecognize theunicode named blocks like "\p{IsBasicLatin}"?



Thanks John,

I have used "[\x0-\x7f]" instead of "/p{IsBasicLatin}" to construct the regular expression (expression=boost::make_u32regex("[\\x0-\\x7f]" )). The regular expression has been constructed correctly but it cannot accecpt instance string either "a" or "&#x9;" (boost:u32match("a",expression)==false).I am wondering whether it has something to do with unicode? I have tried expression=boost::regex("[\\x0-\\x7f]" )); then I can pass the string "a" but not string "&#x9;"(boost:match("a",expression)==true), which I think is reaonable for boost:regex since it does not support the unicode. So my point is: why the boost:u32match doesn't work well?



Thanks&regards

Juan
Post by John Maddock
Post by John Maddock
Post by gj_uestc
Hi,all
Nowadays I am using boost:u32regex to do some regular expression
processing.But it seems that "/p{IsBasicLatin}"is not a accessable
expression by boost::make_u32regex(tmp).Does boost:regex not suppor
the named unicode blocks or I have to pass some other flags to the
library? Now I was using the defult flag wich indicate using perl
syntactic.
http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/character_classes/optional_char_class_names.html
As you can see I haven't added support for language-specific blocks yet :-(
I forgot to mention that \p{IsBasicLatin} is the same as: [\x0-\x7f],
likewise the other continuous blocks can be expressed in the same way.
HTH, John.
_____

e100办理䞚务抜取心劚倧奖惊喜连连 <http://popme.163.com/link/004669_0806_172.html> 赶快行劚
gj_uestc
2008-08-11 08:31:49 UTC
Permalink
Ok, it works for me now. Thanks a lot!




圚2008-08-11 11:53:40"Andrea Denzler" <***@andreaplanet.com> 写道


Juan, I have no problems with this sample code



#include "stdafx.h"



void test(wstring str)

{

u32regex expression = make_u32regex("^([\\x0-\\x7f]+)$");

wsmatch what;

if(u32regex_match(str, what, expression))

{

// what[0] contains the whole string

// what[1] contains ascii text

wcout << what[1] << _T(" is ascii") << endl;

} else {

wcout << str << _T(" is not ascii") << endl;

}

}



int _tmain(int argc, _TCHAR* argv[])

{

test(_T("Hello World!"));

test(_T("Îöò"));

return 0;

}





Hello World! is ascii

¶÷‗ is not ascii





Da:boost-users-***@lists.boost.org [mailto:boost-users-***@lists.boost.org] Per conto di gj_uestc
Inviato: lunedì 11 agosto 2008 5.35
A:boost-***@lists.boost.org; ***@johnmaddock.co.uk
Oggetto: Re: [Boost-users] [boost.regex]Does boost:u32regexrecognize theunicode named blocks like "\p{IsBasicLatin}"?



Thanks John,

I have used "[\x0-\x7f]" instead of "/p{IsBasicLatin}" to construct the regular expression (expression=boost::make_u32regex("[\\x0-\\x7f]" )). The regular expression has been constructed correctly but it cannot accecpt instance string either "a" or "&#x9;" (boost:u32match("a",expression)==false).I am wondering whether it has something to do with unicode? I have tried expression=boost::regex("[\\x0-\\x7f]" )); then I can pass the string "a" but not string "&#x9;"(boost:match("a",expression)==true), which I think is reaonable for boost:regex since it does not support the unicode. So my point is: why the boost:u32match doesn't work well?



Thanks&regards

Juan
Post by John Maddock
Post by John Maddock
Post by gj_uestc
Hi,all
    Nowadays I am using boost:u32regex to do some regular expression
processing.But it seems that "/p{IsBasicLatin}"is not a accessable
expression by boost::make_u32regex(tmp).Does boost:regex not suppor
the named unicode blocks or I have to pass some other flags to the
library? Now I was using the defult flag wich indicate using perl
syntactic.
 
http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/character_classes/optional_char_class_names.html
 
As you can see I haven't added support for language-specific blocks yet :-(
 
I forgot to mention that \p{IsBasicLatin} is the same as: [\x0-\x7f],
likewise the other continuous blocks can be expressed in the same way.
 
HTH, John.
 
e100办理䞚务抜取心劚倧奖惊喜连连赶快行劚
Loading...