Treetools package - computational linguistics - python

How can I obtain the LCFRS grammar using treetools? I used the following terminal command
treetools grammar wsj_0001.prd output leftright --dest-format rcg --markov v:1 h:2
where wsj_0001.prd is a tree and the output file I get is empty.
https://pypi.python.org/pypi/treetools/0.1.0 - I used the last command form the ones listed.
Thanks.

I would suggest using a sentence from the Clark paper

Related

How to filter command line output?

I need to filter the output of a command executed in network equipment in order to bring only the lines that match a text like '10.13.32.34'.
I created a python code that brings all the output of the command but I need only part of this.
I am using Python 3.7.3 running on a Windows 10 Pro.
The code I used is below and I need the filtering part because I am a network engineer without the basic notion of python programming. (till now...)
from steelscript.steelhead.core import steelhead
from steelscript.common.service import UserAuth
auth = UserAuth(username='admin', password='password')
sh = steelhead.SteelHead(host='em01r001', auth=auth)
from steelscript.cmdline.cli import CLIMode
sh.cli.exec_command("show connections optimized", mode=CLIMode.CONFIG)
output = (sh.cli.exec_command("show connections optimized"))
I have no idea what your output looks like, so used the text in your question as example data. Anyway, for a simple pattern such as what's shown in your question, you could do it like this:
output = '''\
I need to filter the output of a command executed
in network equipment in order to bring only the
lines that match a text like '10.13.32.34'. I
created a python code that brings all the output
of the command but I need only part of this.
I am using Python 3.7.3 running on a Windows 10
Pro.
The code I used is below and I need the filtering
part because I am a network engineer without the
basic notion of python programming. (till now...)
'''
# Filter the lines of text in output.
filtered = ''.join(line for line in output.splitlines()
if '10.13.32.34' in line)
print(filtered) # -> lines that match a text like '10.13.32.34'. I
You could do something similar for more complex patterns by using the re.search() function in Python's built-in regular expression re module. Using regular expressions is more complicated, but extremely powerful. There are many tutorials on using them, including one in Python's own documentation titled the Regular Expression HOWTO.

Get output from SyntaxNet as python object, not text

After executing some of an example syntaxnet scripts(like parse.sh) I receive output in text-conll format. My goal is to take some features and proceed them to next network. One possible choice is to parse text output with something like nltk.corpus.reader.ConllCorpusReader to a python object. But for me interesting
is:
It is possible with some code modification to get from SyntaxNet not text, but Python object related to parsed results?
I've found that in parser_eval.py on lines 133-138 syntaxnet fetched already text version of results.
while True:
tf_eval_epochs, tf_eval_metrics, tf_documents = sess.run([
parser.evaluation['epochs'],
parser.evaluation['eval_metrics'],
parser.evaluation['documents'],
])
But I cannot locate the place from what object this text was generated and how.
There are many ways to do it, and from what I know all involve parsing the output of SyntaxNet, and load it into NLTK objects. I wrote a simple post on my blog, exemplifying it:
http://www.davidsbatista.net/blog/2017/03/25/syntaxnet/

Using custom POS tags for NLTK chunking?

Is it possible to use non-standard part of speech tags when making a grammar for chunking in the NLTK? For example, I have the following sentence to parse:
complication/patf associated/qlco with/prep breast/noun surgery/diap
independent/adj of/prep the/det use/inpr of/prep surgical/diap device/medd ./pd
Locating the phrases I need from the text is greatly assisted by specialized tags such as "medd" or "diap". I thought that because you can use RegEx for parsing, it would be independent of anything else, but when I try to run the following code, I get an error:
grammar = r'TEST: {<diap>}'
cp = nltk.RegexpParser(grammar)
cp.parse(sentence)
ValueError: Transformation generated invalid chunkstring:
<patf><qlco><prep><noun>{<diap>}<adj><prep><det><inpr><prep>{<diap>}<medd><pd>
I think this has to do with the tags themselves, because the NLTK can't generate a tree from them, but is it possible to skip that part and just get the chunked items returned? Maybe the NLTK isn't the best tool, and if so, can anyone recommend another module for chunking text?
I'm developing in python 2.7.6 with the Anaconda distribution.
Thanks in advance!
Yes it is possible to use custom tags for NLTK chunking. I have used the same.
Refer: How to parse custom tags using nltk.Regexp.parser()
The ValueError and the error description suggest that there is an error in the formation of your grammar and you need to check that. You can update the answer with the same for suggestions on corrections.
#POS Tagging
words=word_tokenize(example_sent)
pos=nltk.pos_tag(words)
print(pos)
#Chunking
chunk=r'Chunk: {<JJ.?>+<NN.?>+}'
par=nltk.RegexpParser(chunk)
par2=par.parse(pos)
print('Chunking - ',par2)
print('------------------------------ Parsing the filtered chunks')
# printing only the required chunks
for i in par2.subtrees():
if i.label()=='Chunk':
print(i)
print('------------------------------NER')
# NER
ner=nltk.ne_chunk(pos)
print(ner)

Dumbo(Python)/Hadoop unexpected output

I'm trying to execute the following code with dumbo(Python) / haddop
https://github.com/klbostee/dumbo/wiki/Short-tutorial#jobs-and-runners
I followed the tutorial correctly, I have done every step but when I run code in hadoop environment I obtain as output as follows:
SEQ/org.apache.hadoop.typedbytes.TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritable�������ޭǡ�q���%�O��������������172.16.1.10������������������172.16.1.12������������������172.16.1.30������
It should return a list of IP addresses with connections counter.
Why those characters appear? Is it an encoding problem? How do I fix it? Thanks
Also if I try other programs in the tutorial, I have the same problem.
I answer by myself. That output is the serialized form of Dumbo. There is no error.
To convert it into a readable text, it's sufficient the follow command (the answer was in the tutorial ! I don't saw it)
dumbo cat ipcounts/part* -hadoop /usr/local/hadoop | sort -k2,2nr | head -n 5

How can I use Microsoft Word's spelling/grammar checker programmatically?

I want to process a medium to large number of text snippets using a spelling/grammar checker to get a rough approximation and ranking of their "quality." Speed is not really of concern either, so I think the easiest way is to write a script that passes off the snippets to Microsoft Word (2007) and runs its spelling and grammar checker on them.
Is there a way to do this from a script (specifically, Python)? What is a good resource for learning about controlling Word programmatically?
If not, I suppose I can try something from Open Source Grammar Checker (SO).
Update
In response to Chris' answer, is there at least a way to a) open a file (containing the snippet(s)), b) run a VBA script from inside Word that calls the spelling and grammar checker, and c) return some indication of the "score" of the snippet(s)?
Update 2
I've added an answer which seems to work, but if anyone has other suggestions I'll keep this question open for some time.
It took some digging, but I think I found a useful solution. Following the advice at http://www.nabble.com/Edit-a-Word-document-programmatically-td19974320.html I'm using the win32com module (if the SourceForge link doesn't work, according to this Stack Overflow answer you can use pip to get the module), which allows access to Word's COM objects. The following code demonstrates this nicely:
import win32com.client, os
wdDoNotSaveChanges = 0
path = os.path.abspath('snippet.txt')
snippet = 'Jon Skeet lieks ponies. I can haz reputashunz? '
snippet += 'This is a correct sentence.'
file = open(path, 'w')
file.write(snippet)
file.close()
app = win32com.client.gencache.EnsureDispatch('Word.Application')
doc = app.Documents.Open(path)
print "Grammar: %d" % (doc.GrammaticalErrors.Count,)
print "Spelling: %d" % (doc.SpellingErrors.Count,)
app.Quit(wdDoNotSaveChanges)
which produces
Grammar: 2
Spelling: 3
which match the results when invoking the check manually from Word.

Categories

Resources