pattern matching - re.match with '-' not working in python -
i have following code
if(keys==14): print (values) result = re.match("(^\-)", str(values)) print "rr is", result, (values)
but match not working sample line. here output got
-bob rr none bob
can't reproduce:
>>> x='-bob' >>> re.match('(^\-)', x) <_sre.sre_match object; span=(0, 1), match='-'>
your values
may different, presumably having non-printable characters in addition @ start. try printing repr
of make sure that's not case...!
edit: , indeed repr
reveals values
start soft hyphen (encoded in utf-8) may display dash isn't dash. re matching encoded byte strings can inconvenient, let's first decode text:
>>> x = '\xc2\xadbob' # or else `values` obtained such >>> y = x.decode('utf8') >>> re.match(ur'([\xad-])', y) <_sre.sre_match object @ 0x1004c0af8>
(switched python 2 here why repr of match object differs above snippet, using python 3). on side note, i've dropped redundant caret in pattern (the ^
@ start) since match
method always anchored @ start anyway.
the core ideas of course are: (a) use unicode work semantically actual text rather string of bytes; (b) use repr
check glyphs there, rather may in context (since non-displaying glyphs , look-alike ones may otherwise fool you); , (c) [\xad-]
part of (unicode) re pattern matches either soft hyphen ('\xad') or actual dash -- dash must come second, right before ]
, due re pattern syntax (otherwise indicate matching range of characters, while want match actual dash if that's present).
Comments
Post a Comment