python - NLTK chunked parse tree, save it into a file and loading it with CorpusReader class -
let's have chunked corpus below, , saved in file called test.txt
[rapunzel/nnp] let/vbd down/rp [her/pp$ long/jj golden/jj hair/nn]
then can load chunkedcorpusreader.
>>> nltk.corpus.reader import chunkedcorpusreader >>> reader = chunkedcorpusreader('.','test.txt') >>> reader.chunked_sents()[0] tree('s', [tree('np', [('rapunzel', 'nnp')]), ('let', 'vbd'), ('down', 'rp'), tree('np', [('her', 'pp$'), ('long', 'jj'), ('golden', 'jj'), ('hair', 'nn')])]) >>> print(reader.chunked_sents()[0]) (s (np rapunzel/nnp) let/vbd down/rp (np her/pp$ long/jj golden/jj hair/nn))
and made change on tree object, say, switched chunk tag np npp , called new
.
>>> print(new) (s (npp rapunzel/nnp) let/vbd down/rp (npp her/pp$ long/jj golden/jj hair/nn))
and want save new
tree in file , load chunkedcorpusreader or other readers, did test.txt
. however, couldn't find way save nltk tree object in file, , moreover, read file. can help?
the default conversion string, print
gave you, not bad: merges words pos tags, , indents new lines properly. since file.write()
doesn't automatically convert string, must pass str(newtree)
file's write
method.
for more control on appearance of tree's string representation, use tree method pformat()
. note tree.pformat()
called tree.pprint()
in earlier versions of nltk; in latest version, tree.pformat()
returns string while tree.pprint()
writes stdout.
if want tree delimited square brackets, add option parens="[]"
pformat()
.
>>> print(new.pformat(parens="[]")) [s [np rapunzel/nnp] let/vbd down/rp [np her/pp$ long/jj golden/jj hair/nn]]
Comments
Post a Comment