Pyparsing : issues with setResultsName - python

I'm parsing multiple choice questions with multiple answers that look like this :
ParserElement.setDefaultWhitespaceChars(u""" \t""")
in_ = """1) first stem.
= option one one key
= option one two key
- option one three distractor
= option one four key
2) second stem ?
- option two one distractor
- option two two distractor
= option one three key
3) third stem.
- option three one key
= option three two distractor
"""
The equal sign represents a correct answer, the dash a distractor.
My grammar looks like this :
newline = Suppress(u"\n")
end_number = Suppress(oneOf(u') / ('))
end_stem = Suppress(oneOf(u"? .")) + newline
end_phrase = Optional(u'.').suppress() + newline
phrase = OneOrMore(Word(alphas)) + end_phrase
prefix = Word(u"-", max=1)('distractor') ^ Word(u"=", max=1)('key')
stem = Group(OneOrMore(Word(alphas))) + end_stem
number = Word(nums) + end_number
question = number + stem('stem') +
Group(OneOrMore(Group(prefix('prefix') + phrase('phrase'))))('options')
And when I'm parsing the results:
for match, start, end in question.scanString(in_):
for o in match.options:
try:
print('key', o.prefix.key)
except:
print('distractor', o.prefix.distractor)
I get :
AttributeError: 'unicode' object has no attribute 'distractor'
I'm pretty sure the result names are chainable. If so, what am I doing wrong ? I can easily work around this but it's unsatisfactory not knowing what I did wrong and what I misunderstood.

The problem is that o is actually the prefix -- when you call o.prefix, you're actually going one level deeper then you need to, and are retrieving the string the prefix maps to, not the ParseResults object.
You can see this by modifying the code so that it prints out the parse tree:
for match, start, end in question.scanString(in_):
for o in match.options:
print o.asXML()
try:
print('key', o.prefix.key)
except:
print('distractor', o.prefix.distractor)
The code will then print out:
<prefix>
<key>=</key>
<phrase>option</phrase>
<ITEM>one</ITEM>
<ITEM>one</ITEM>
<ITEM>key</ITEM>
</prefix>
Traceback (most recent call last):
File "so07.py", line 37, in <module>
print('distractor', o.prefix.distractor)
AttributeError: 'str' object has no attribute 'distractor'
The problem then becomes clear -- if o is the prefix, then it doesn't make sense to do o.prefix. Rather, you need to simply call o.key or o.distractor.
Also, it appears that if you try and call o.key where no key exists, then pyparsing will return an empty string rather than throwing an exception.
So, your fixed code should look like this:
for match, start, end in question.scanString(in_):
for o in match.options:
if o.key != '':
print('key', o.key)
else:
print('distractor', o.distractor)

Related

Check if character within string; pass if True, do stuff if False

I am writing code to process a list of URL's, however some of the URL's have issues and I need to pass them in my for loop. I've tried this:
x_data = []
y_data = []
for item in drop['URL']:
if re.search("J", str(item)) == True:
pass
else:
print(item)
var = urllib.request.urlopen(item)
hdul = ft.open(var)
data = hdul[0].data
start = hdul[0].header['WMIN']
finish = hdul[0].header['WMAX']
start_log = np.log10(start)
finish_log = np.log10(finish)
redshift = hdul[0].header['Z']
length = len(data[0])
xaxis = np.linspace(start, finish, length)
#calculating emitted wavelength from observed and redshift
x_axis_nr = [xaxis[j]/(redshift+1) for j in range(len(xaxis))]
gauss_kernel = Gaussian1DKernel(5/3)
flux = np.convolve(data[0], gauss_kernel)
wavelength = np.convolve(x_axis_nr, gauss_kernel)
x_data.append(x_axis_nr)
y_data.append(data[0])
where drop is a previously defined pandas DataFrame. Previous questions on this topic suggested regex might be the way to go, and I have tried this to filter out any URL containing the letter J (which are only the bad ones).
I get this:
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0581.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0582.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0584.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0587.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0589.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0592.fit
http://www.gama-survey.org/dr3/data/spectra/2qz/J113606.3+001155a.fit
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-2a3083a3a6d7> in <module>
14 finish_log = np.log10(finish)
15 redshift = hdul[0].header['Z']
---> 16 length = len(data[0])
17
18 xaxis = np.linspace(start, finish, length)
TypeError: object of type 'numpy.float32' has no len()
which is the same kind of error I was having before trying to remove J urls, so clearly my regex is not working. I would appreciate some advice on how to filter these, and am happy to provide more information as required.
There's no need to compare the result of re.search with True. From documentation you can see that search returns a match object when a match is found:
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
So, when comparing a match object with True the return is False and your else condition is executed.
In [35]: re.search('J', 'http://www.gama-survey.org/dr3/data/spectra/2qz/J113606.3+001155a.fit') == True
Out[35]: False

python googlemaps all possible distances between different locations

schools=['GSGS','GSGL','JKG','JMG','MCGD','MANGD','SLSA','WHGR','WOG','GCG','LP',
'PGG', 'WVSG', 'ASGE','CZG', 'EAG','GI']
for i in range (1,17):
gmaps = googlemaps.Client(key='')
distances = gmaps.distance_matrix((GSGS), (schools), mode="driving"['rows'][0]['elements'][0]['distance']['text']
print(distances)
The elements of the list are schools. I didn't want to make the list to long so I used these abbreviations.
I want to get all the distances between "GSGS" and the schools in the list. I don't know what to write inside the second bracket.
distances = gmaps.distance_matrix((GSGS), (schools)
If I run it like that, it outputs this error:
Traceback (most recent call last):
File "C:/Users/helpmecoding/PycharmProjects/untitled/distance.py", line 31, in
<module>
distances = gmaps.distance_matrix((GSGS), (schools), mode="driving")['rows'][0]['elements'][0]['distance']['text']
KeyError: 'distance'
I could do it one for one but thats not what I want. If I write another school from the list schools and delete the for loop it works fine.
I know I have to do a loop so that it cycles trough the list, but I don't know how to do it. Behind every variable for example "GSGS" is the address/location from the school.
I deleted the key just for safety.
My Dad helped me and we solved the problem. Now i have what i want :) Now i have to do a list with all distances between the schools. And if i got that i have to do the Dijkstra Algorithm to find the shortest route between them. Thanks for helping!
import googlemaps
GSGS = (address)
GSGL = (address)
. . .
. . .
. . .
schools =
(GSGS,GSGL,JKG,JMG,MCGD,MANGD,SLSA,WHGR,WOG,GCG,LP,PGG,WVSG,ASGE,CZG,EAG,GI)
school_names = ("GSGS","GSGL","JKG","JMG","MCGD","MANGD","SLSA","WHGR","WOG","GCG","LP","PGG","WVSG","ASGE","CZG","EAG","GI")
school_distances = ()
for g in range(0,len(schools)):
n = 0
for i in schools:
gmaps = googlemaps.Client(key='TOPSECRET')
distances = gmaps.distance_matrix(schools[g], i)['rows'][0]['elements'][0]['distance']['text']
if school_names[g] != school_names[n]:
print(school_names[g] + " - " + school_names[n] + " " + distances)
else:
print(school_names[g] + " - " + school_names[n] + " " + "0 km")
n = n + 1
In my experience, it is sometimes difficult to know what is going on when you use a third-party api. Though I am not a proponent of reinventing the wheel sometimes it is necessary to get a full picture of what is going on. So, I recommend giving it a shot building your own api endpoint request call and see if that works.
import requests
schools = ['GSGS','GSGL','JKG','JMG','MCGD','MANGD','SLSA','WHGR','WOG','GCG','LP','PGG', 'WVSG', 'ASGE','CZG', 'EAG','GI']
def gmap_dist(apikey, origins, destinations, **kwargs):
units = kwargs.get("units", "imperial")
mode = kwargs.get("mode", "driving")
baseurl = "https://maps.googleapis.com/maps/api/distancematrix/json?"
urlargs = {"key": apikey, "units": units, "origins": origins, "destinations": destinations, "mode": mode}
req = requests.get(baseurl, params=urlargs)
data = req.json()
print(data)
# do this for each key and index pair until you
# find the one causing the problem if it
# is not immediately evident from the whole data print
print(data["rows"])
print(rows[0])
# Check if there are elements
try:
distances = data['rows'][0]['elements'][0]['distance']
except KeyError:
raise KeyError("No elements found")
except IndexError:
raise IndexError("API Request Error. No response returned")
else:
return distances
Also as a general rule of thumb it is good to have a test case to make sure things are working as they should before testing the whole list,
#test case
try:
test = gmap_dist(apikey="", units="imperial", origins="GSGS", destinations="GSGL", mode="driving")
except Exception as err:
raise Exception(err)
else:
dists = gmap_dist(apikey="", units="imperial", origins="GSGS", destinations=schools, mode="driving")
print(dists)
Lastly, if you are testing the distance from "GSGS" to other schools, then you might want to get it out of your list of schools as the distance will be 0.
Now, I suspect that the reason you are getting this exception is because there are no json elements returned. Probably, because one of your parameters was improperly formatted.
If this function returns a KeyError still. Check the address spelling and make sure your apikey is valid. Although if it was the Apikey I would expect they would not bother to give you even empty results.
Hope this helps. Comment if it doesn't work.

Error building bitstring in python 3.5 : the datatype is being set to U32 without my control

I'm using a function to build an array of strings (which happens to be 0s and 1s only), which are rather large. The function works when I am building smaller strings, but somehow the data type seems to be restricting the size of the string to 32 characters long (U32), without my having asked for it. Am I missing something simple?
As I build the strings, I am first casting them as lists so as to more easily manipulate individual characters before joining them into a string again. Am I somehow limiting my ability to use 'larger' data types by my method? The value of np.max(CM1) in this case is something like ~300 (one recent run yielded 253), but the string only come out 32 characters long...
''' Function to derive genome and count mutations in provided list of cells '''
def derive_genome_biopsy(biopsy_list, family_dict, CM1):
derived_genomes_inBx = np.zeros(len(biopsy_list)).astype(str)
for position, cell in np.ndenumerate(biopsy_list):
if cell == 0: continue
temp_parent = 2
bitstring = list('1')
bitstring += (np.max(CM1)-1)*'0'
if cell == 1:
derived_genomes_inBx[position] = ''.join(bitstring)
continue
else:
while temp_parent > 1:
temp_parent = family_dict[cell]
bitstring[cell-1] = '1'
if temp_parent == 1: break
cell = family_dict[cell]
derived_genomes_inBx[position] = ''.join(bitstring)
return derived_genomes_inBx
The specific error message I get is:
Traceback (most recent call last):
File "biopsyCA.py", line 77, in <module>
if genome[site] == '1':
IndexError: string index out of range
family_dict is a dictionary which carries a list of parents and children that the algorithm above works through to reconstruct the 'genome' of individuals from the branching family tree. it basically sets positions in the bitstring to '1' if your parent had it, then if your grandparent etc... until you get to the first bit, which is always '1', then it should be done.
The 32 character limitation comes from the conversion of float64 array to string array in this line:
derived_genomes_inBx = np.zeros(len(biopsy_list)).astype(str)
The resulting array contains datatype S32 values which limit the contents to 32 characters.
To change this limit, use 'S300' or larger instead of str.
You may also use map(str, np.zeros(len(biopsy_list)) to get more flexible string list and convert it back to numpy array with numpy.array() after you have populated it.
Thanks to help from a number of folks here and local, I finally got this working and the working function is:
''' Function to derive genome and count mutations in provided list of cells '''
def derive_genome_biopsy(biopsy_list, family_dict, CM1):
derived_genomes_inBx = list(map(str, np.zeros(len(biopsy_list))))
for biopsy in range(0,len(biopsy_list)):
if biopsy_list[biopsy] == 0:
bitstring = (np.max(CM1))*'0'
derived_genomes_inBx[biopsy] = ''.join(bitstring)
continue
bitstring = list('1')
bitstring += (np.max(CM1)-1)*'0'
if biopsy_list[biopsy] == 1:
derived_genomes_inBx[biopsy] = ''.join(bitstring)
continue
else:
temp_parent = family_dict[biopsy_list[biopsy]]
bitstring[biopsy_list[biopsy]-1] = '1'
while temp_parent > 1:
temp_parent = family_dict[position]
bitstring[temp_parent-1] = '1'
if temp_parent == 1: break
derived_genomes_inBx[biopsy] = ''.join(bitstring)
return derived_genomes_inBx
The original problem was as Teppo Tammisto pointed out an issue with the 'str' datastructure taking 'S32' format. Once I changed to using the list(map(str, ...) functionality a few more issues arose with the original code, which I've now fixed. When I finish this thesis chapter I'll publish the whole family of functions to use to virtually 'biopsy' a cellular automaton model (well, just an array really) and reconstruct 'genomes' from family tree data and the current automaton state vector.
Thanks all!

KeyError with Python dictionary

I've been practicing on a ><> (Fish) interpreter and am stuck on an error I'm getting. The problematic code seems to be here:
import sys
from random import randint
file = sys.argv[1]
code = open(file)
program = code.read()
print(str(program))
stdin = sys.argv[2]
prgmlist = program.splitlines()
length = len(prgmlist)
prgm = {}
for x in range(0,length-1):
prgm[x+1] = list(prgmlist[x])
The goal here was to take the code and put it into a sort of grid, so that each command could be taken and computed separately. By grid, I mean a map to a list:
{line1:["code","code","code"]
line2:["code","code","code"]
line3:...}
and so on.
However, when I try to retrieve a command using cmd = prgm[y][x] it gives me KeyError: 0.
Any help is appreciated.
Here's a traceback:
Traceback (most recent call last):
File "/Users/abest/Documents/Python/><>_Interpreter.py", line 270, in <module>
cmd = prgm[cmdy][cmdx]
KeyError: 0
And a pastebin of the entire code.
The input is the hello world program from the wiki page:
!v"hello, world"r!
>l?!;o
Few issues -
You are not considering the last line , since your range is - for x in range(0,length-1): - and the stop argument of range is exlusive, so it does not go to length-1 . You actually do not need to get the length of use range, you can simply use for i, x in enumerate(prgmlist): . enumerate() in each iteration returns the index as well as the current element.
for i, x in enumerate(prgmlist, 1):
prgm[i] = list(x)
Secondly, from your actual code seems like you are defining cmdx initially as 0 , but in your for loop (as given above) , you are only starting the index in the dictionary from 1 . So you should define that starting at 1. Example -
stacks, str1, str2, cmdx, cmdy, face, register, cmd = {"now":[]}, 0, 0, 1, 0, "E", 0, None
And you should start cmdy from 0 . Seems like you had both of them reversed.
You'll want to use something like
cmd = prgm[x][y]
the first part prgm[x] will access the list that's the value for the x key in the dictionary then [y] will pull the yth element from the list.

Improving error messages with pyparsing

Edit: I did a first version, which Eike helped me to advance quite a bit on it. I'm now stuck to a more specific problem, which I will describe bellow. You can have a look at the original question in the history
I'm using pyparsing to parse a small language used to request specific data from a database. It features numerous keyword, operators and datatypes as well as boolean logic.
I'm trying to improve the error message sent to the user when he does a syntax error, since the current one is not very useful. I designed a small example, similar to what I'm doing with the language aforementioned but much smaller:
#!/usr/bin/env python
from pyparsing import *
def validate_number(s, loc, tokens):
if int(tokens[0]) != 0:
raise ParseFatalException(s, loc, "number musth be 0")
def fail(s, loc, tokens):
raise ParseFatalException(s, loc, "Unknown token %s" % tokens[0])
def fail_value(s, loc, expr, err):
raise ParseFatalException(s, loc, "Wrong value")
number = Word(nums).setParseAction(validate_number).setFailAction(fail_value)
operator = Literal("=")
error = Word(alphas).setParseAction(fail)
rules = MatchFirst([
Literal('x') + operator + number,
])
rules = operatorPrecedence(rules | error , [
(Literal("and"), 2, opAssoc.RIGHT),
])
def try_parse(expression):
try:
rules.parseString(expression, parseAll=True)
except Exception as e:
msg = str(e)
print("%s: %s" % (msg, expression))
print(" " * (len("%s: " % msg) + (e.loc)) + "^^^")
So basically, the only things which we can do with this language, is writing series of x = 0, joined together with and and parenthesis.
Now, there are cases, when and and parenthesis are used, where the error reporting is not very good. Consider the following examples:
>>> try_parse("x = a and x = 0") # This one is actually good!
Wrong value (at char 4), (line:1, col:5): x = a and x = 0
^^^
>>> try_parse("x = 0 and x = a")
Expected end of text (at char 6), (line:1, col:1): x = 0 and x = a
^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = a)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (x = a)))
^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = 0)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (xxxxxxxx = 0)))
^^^
Actually, it seems that if the parser can't parse (and parse here is important) something after a and, it doesn't produce good error messages anymore :(
And I mean parse, since if it can parse 5 but the "validation" fails in the parse action, it still produces a good error message. But, if it can't parse a valid number (like a) or a valid keyword (like xxxxxx), it stops producing the right error messages.
Any idea?
Pyparsing will always have somewhat bad error messages, because it backtracks. The error message is generated in the last rule that the parser tries. The parser can't know where the error really is, it only knows that there is no matching rule.
For good error messages you need a parser that gives up early. These parsers are less flexible than Pyparsing, but most conventional programming languages can be parsed with such parsers. (C++ and Scala IMHO can't.)
To improve error messages in Pyparsing use the - operator, it works like the + operator, but it does not backtrack. You would use it like this:
assignment = Literal("let") - varname - "=" - expression
Here is a small article on improving error reporting, by Pyparsing's author.
Edit
You could also generate good error messages for the invalid numbers in the parse actions that do the validation. If the number is invalid you raise an exception that is not caught by Pyparsing. This exception can contain a good error message.
Parse actions can have three arguments [1]:
s = the original string being parsed (see note below)
loc = the location of the matching substring
toks = a list of the matched tokens, packaged as a ParseResults object
There are also three useful helper methods for creating good error messages [2]:
lineno(loc, string) - function to give the line number of the location within the string; the first line is line 1, newlines start new rows.
col(loc, string) - function to give the column number of the location within the string; the first column is column 1, newlines reset the column number to 1.
line(loc, string) - function to retrieve the line of text representing lineno(loc, string). Useful when printing out diagnostic messages for exceptions.
Your validating parse action would then be like this:
def validate_odd_number(s, loc, toks):
value = toks[0]
value = int(value)
if value % 2 == 0:
raise MyFatalParseException(
"not an odd number. Line {l}, column {c}.".format(l=lineno(loc, s),
c=col(loc, s)))
[1] http://pythonhosted.org/pyparsing/pyparsing.pyparsing.ParserElement-class.html#setParseAction
[2] HowToUsePyparsing
Edit
Here [3] is an improved version of the question's current (2013-4-10) script. It gets the example errors right, but other error are indicated at the wrong position. I believe there are bugs in my version of Pyparsing ('1.5.7'), but maybe I just don't understand how Pyparsing works. The issues are:
ParseFatalException seems not to be always fatal. The script works as expected when I use my own exception.
The - operator seems not to work.
[3] http://pastebin.com/7E4kSnkm

Categories

Resources