regex - Scala regular expression : matching a long unicode Devanagari pattern fails -
consider following script code: import scala.util.matching.regex
val virama = "्" val consonantnonvowelpattern = s"(म|त|य)([^$virama])".r // val consonantnonvowelpattern = s"(थ|ठ|छ|स|ब|घ|ण|ट|ज|ग|न|ष|भ|ळ|ढ|ख|श|प|ह|ध|ङ|म|झ|ड|ल|व|र|फ|क|द|च|ञ|त|य)([^$virama])".r var output = "असय रामः " output = consonantnonvowelpattern.replaceallin(output, _ match { case consonantnonvowelpattern(consonant, followingcharacter) => consonant + virama + "a" + followingcharacter }) println("after virama addition: " + output.mkstring("-"))
it produces following correct output: after virama addition: अ-स-य-्-a- -र-ा-म-्-a-ः-
however, if use longer pattern (commented out above), following wrong output: after virama addition: अ-स-्-a-य- -र-्-a-ा-म-्-a-ः-
is bug? doing wrong?
the below lalit pant-
i'm assuming correct output second case is: अ-स-्-a-य-्-a- -र-्-a-ा-म-्-a-ः-
if that's case, read on. if not, tell me expected output.
the problem appears bigger 'consonantnonvowelpattern', presence of 'सय' in 'output' makes 'य' show 'followingcharacter' in pattern match after 'स' consonant. 'य' consequently never reported consonant.
Comments
Post a Comment