load_earley not available in nltk 3.0 - python

I am trying to reproduce an example from the NLTK text book - http://www.ling.helsinki.fi/kit/2009s/clt231/NLTK/book/ch10-AnalyzingTheMeaningOfSentences.html
However, while running this example :
>>> from nltk.parse import load_earley
>>> cp = load_earley('grammars/book_grammars/sql0.fcfg')
>>> query = 'What cities are located in China'
>>> trees = cp.nbest_parse(query.split())
>>> answer = trees[0].node['sem']
>>> q = ' '.join(answer)
>>> print q
I am getting the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name load_earley
Is load_earley discontinued? If so, I am not able to find the replacement for it. Kindly help

I believe that is the case, I ran into the same issue. That course uses NLTK version 2.0. Also:
The material presented in this book assumes that you are using Python version 2.4 or 2.5

Never mind. I got the updated course work -> http://www.nltk.org/book/ch10.html
The above code translates into
>>>from nltk import load_parser
>>> cp = load_parser('grammars/book_grammars/sql0.fcfg')
>>> query = 'What cities are located in China'
>>> trees = list(cp.parse(query.split()))
>>> answer = trees[0].label()['SEM']
>>> answer = [s for s in answer if s]
>>> q = ' '.join(answer)
>>> print(q)
load_earley is replaced by load_parser
>>> trees = cp.nbest_parse(query.split())
>>> answer = trees[0].node['sem']
is replaced by
>>> trees = list(cp.parse(query.split()))
>>> answer = trees[0].label()['SEM']
>>> answer = [s for s in answer if s]

Related

Text Merging - How to do this in Python? (R source)

I have tried several methods but none worked to translate it to Python, specially because I have this error:
'str' object does not support item assignment
R can do the same with the following code:
f<-0
text<- c("foo", "btextr", "cool", "monsttex")
for (i in 1:length(text)){
f[i]<-paste(text[i],text[i+1], sep = "_")
}
f
The output is:
"foo_btextr" "btextr_cool" "cool_monsttex" "monsttex_NA"
I would appreciate so much if you can help me to do the same for Python. Thanks.
In R your output would have been (next time please put this in the question):
> f
[1] "foo_btextr" "btextr_cool" "cool_monsttex" "monsttex_NA"
In Python strings are immutable. So you'll need to create new strings, e.g.:
new_strings = []
text = ['foo', 'btextr', 'cool', 'monsttex']
for i,t in enumerate(text):
try:
new_strings.append(text[i] + '_' + text[i+1])
except IndexError:
new_strings.append(text[i] + '_NA')
Which results in:
>>> new_strings
['foo_btextr', 'btextr_cool', 'cool_monsttex', 'monsttex_NA']
this works:
>>> from itertools import zip_longest
>>>
>>> f = ['foo', 'btextr', 'cool', 'monsttex']
>>>
>>> ['_'.join(i) for i in zip_longest(f, f[1:], fillvalue='NA')]
['foo_btextr', 'btextr_cool', 'cool_monsttex', 'monsttex_NA']

Building regular expression for Python

\b(?:AN|AcntNumber) : (\w+)
the above regex prints the 'AcntNumber' as well
AcntNumber : c422731c7c2a4f9cbe98fbfbf410265f
but I want only to print c422731c7c2a4f9cbe98fbfbf410265f. Can anyone help me please?
Split the string from : and you have your Account Number.
>>> string = "AcntNumber : c422731c7c2a4f9cbe98fbfbf410265f"
>>> frags = string.split(':')
>>> number = frags[1].strip()
>>> number
'c422731c7c2a4f9cbe98fbfbf410265f'
Or:
>>> import re
>>> string = "AcntNumber : c422731c7c2a4f9cbe98fbfbf410265f"
>>> e = "\b?:AN|AcntNumber : (\w+)"
>>> ext = re.findall(e, string)
>>> ext[0]
'c422731c7c2a4f9cbe98fbfbf410265f'
>>>

AttributeError when trying to do a block partition with graph-tool

I am getting this error:
AttributeError: 'list' object has no attribute 'clear'
when trying to execute the example at this page
The example is:
>>> g = gt.collection.data["power"]
>>> bstack, mdl = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)
>>> t = gt.get_hierarchy_tree(bstack)[0]
>>> tpos = pos = gt.radial_tree_layout(t, t.vertex(t.num_vertices() - 1), weighted=True)
>>> cts = gt.get_hierarchy_control_points(g, t, tpos)
>>> pos = g.own_property(tpos)
>>> b = bstack[0].vp["b"]
>>> gt.graph_draw(g, pos=pos, vertex_fill_color=b, vertex_shape=b, edge_control_points=cts,
... edge_color=[0, 0, 0, 0.3], vertex_anchor=0, output="power_nested_mdl.pdf")
<...>
and it gives me the exception when running the line:
>>> bstack, mdl = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)
Any clue?
Thanks
list.clear() is not in Python 2, only in Python 3. The example runs without problem in Python 3.
Anyway, graph-tool is supposed to work on Python 2.7 and above, so this might as well be reported as a bug.

python max function with mixed strings and numbers

Could someone explain to me why the following code :
li = [u'ansible-1.1.tar.gz', u'ansible-1.2.1.tar.gz', u'ansible-1.2.2.tar.gz', u'ansible-1.2.3.tar.gz',
u'ansible-1.2.tar.gz', u'ansible-1.3.0.tar.gz', u'ansible-1.3.1.tar.gz', u'ansible-1.3.2.tar.gz',
u'ansible-1.3.3.tar.gz', u'ansible-1.3.4.tar.gz', u'ansible-1.4.1.tar.gz', u'ansible-1.4.2.tar.gz',
u'ansible-1.4.3.tar.gz', u'ansible-1.4.4.tar.gz', u'ansible-1.4.tar.gz']
print(max(li))
returns :
ansible-1.4.tar.gz
Thank you
PS: It returns 1.4.4 when there are only numbers (1.4, 1.4.4, etc)
Because they are compared lexicographically:
>>> ord('t'), ord('4')
(116, 52)
>>> 't' > '4'
True
>>> 'ansible-1.4.tar.gz' > 'ansible-1.4.4.tar.gz'
True
To get ansible-1.4.4.tar.gz as result, you need to pass key function.
For example:
>>> li = [u'ansible-1.1.tar.gz', u'ansible-1.2.1.tar.gz', u'ansible-1.2.2.tar.gz', u'ansible-1.2.3.tar.gz',
... u'ansible-1.2.tar.gz', u'ansible-1.3.0.tar.gz', u'ansible-1.3.1.tar.gz', u'ansible-1.3.2.tar.gz',
... u'ansible-1.3.3.tar.gz', u'ansible-1.3.4.tar.gz', u'ansible-1.4.1.tar.gz', u'ansible-1.4.2.tar.gz',
... u'ansible-1.4.3.tar.gz', u'ansible-1.4.4.tar.gz', u'ansible-1.4.tar.gz']
>>>
>>> import re
>>> def get_version(fn):
... return list(map(int, re.findall(r'\d+', fn)))
...
>>> get_version(u'ansible-1.4.4.tar.gz')
[1, 4, 4]
>>> max(li, key=get_version)
'ansible-1.4.4.tar.gz'
Here is another good solution,
Python has its own module called pkg_resources which has method to parse_version
>>> from pkg_resources import parse_version
>>> max(li, key=parse_version)
u'ansible-1.4.4.tar.gz'
>>>

Move an entire element in with lxml.etree

Within lxml, is it possible, given an element, to move the entire thing elsewhere in the xml document without having to read all of it's children and recreate it? My best example would be changing parents. I've rummaged around the docs a bit but haven't had much luck. Thanks in advance!
.append, .insert and other operations do that by default
>>> from lxml import etree
>>> tree = etree.XML('<a><b><c/></b><d><e><f/></e></d></a>')
>>> node_b = tree.xpath('/a/b')[0]
>>> node_d = tree.xpath('/a/d')[0]
>>> node_d.append(node_b)
>>> etree.tostring(tree) # complete 'b'-branch is now under 'd', after 'e'
'<a><d><e><f/></e><b><c/></b></d></a>'
>>> node_f = tree.xpath('/a/d/e/f')[0] # Nothing stops us from moving it again
>>> node_f.append(node_b) # Now 'b' and its child are under 'f'
>>> etree.tostring(tree)
'<a><d><e><f><b><c/></b></f></e></d></a>'
Be careful when moving nodes having a tail text. In lxml tail text belong to the node and moves around with it. (Also, when you delete a node, its tail text is also deleted)
>>> tree = etree.XML('<a><b><c/></b>TAIL<d><e><f/></e></d></a>')
>>> node_b = tree.xpath('/a/b')[0]
>>> node_d = tree.xpath('/a/d')[0]
>>> node_d.append(node_b)
>>> etree.tostring(tree)
'<a><d><e><f/></e><b><c/></b>TAIL</d></a>'
Sometimes it's a desired effect, but sometimes you will need something like that:
>>> tree = etree.XML('<a><b><c/></b>TAIL<d><e><f/></e></d></a>')
>>> node_b = tree.xpath('/a/b')[0]
>>> node_d = tree.xpath('/a/d')[0]
>>> node_a = tree.xpath('/a')[0]
>>> # Manually move text
>>> node_a.text = node_b.tail
>>> node_b.tail = None
>>> node_d.append(node_b)
>>> etree.tostring(tree)
>>> # Now TAIL text stays within its old place
'<a>TAIL<d><e><f/></e><b><c/></b></d></a>'
You could use .append(), .insert() methods to add a subelement to the existing element:
>>> from lxml import etree
>>> from_ = etree.fromstring("<from/>")
>>> to = etree.fromstring("<to/>")
>>> to.append(from_)
>>> etree.tostring(to)
'<to><from/></to>'

Categories

Resources