Delete leaves in a tree with regex (Python) -
i have syntax tree, saved in text file in "lisp-style", open , closed brackets show relations. want delete leaves. example, have " (det the)" want become " det". i'm not expert of regex, wonder how handle behaviour in more complex structure, nested brackets. example of tree (in file in 1 row, indented simpler visualization):
(s (np i) (vp (vp (v shot) (np (det an) (n elephant))) (pp (p in) (np (det my) (n pajamas)))))
i have like:
(s np (vp (vp v (np det n)) (pp p (np det n))))
something this?
re.sub("\((\w*) (\w*)\)", r"\1", t)
where t variable holding syntax tree.
for unicode support, see comments below.
Comments
Post a Comment