Python pyparsing issue - python

I am very new to python and using pyparsing but getting some exception with following code
while site_contents.find('---', line_end) != line_end + 2:
cut_start = site_contents.find(" ", site_contents.find("\r\n", start))
cut_end = site_contents.find(" ", cut_start+1)
line_end = site_contents.find("\r\n", cut_end)
name = site_contents[cut_start:cut_end].strip()
float_num = Word(nums + '.').setParseAction(lambda t:float(t[0]))
nonempty_line = Literal(name) + Word(nums+',') + float_num + Suppress(Literal('-')) + float_num * 2
empty_line = Literal(name) + Literal('-')
line = nonempty_line | empty_line
parsed = line.parseString(site_contents[cut_start:line_end])
start = line_end
Exception
Traceback (most recent call last):
File "D:\Ecllipse_Python\HellloWorld\src\HelloPython.py", line 108, in <module>
parsed = line.parseString(site_contents[cut_start:line_end]) # parse line of data following cut name
File "C:\Users\arbatra\AppData\Local\Continuum\Anaconda\lib\site-packages\pyparsing.py", line 1041, in parseString
raise exc
pyparsing.ParseException: Expected W:(0123...) (at char 38), (line:1, col:39)
how to resolve this issue?

You'll get a little better exception message if you give names to your expressions, using setName. From the "Expected W:(0123...)" part of the exception message, it looks like the parser is not finding a numeric value where it is expected. But the default name is not showing us enough to know which type of numeric field is expected. Modify your parser to add setName as shown below, and also change the defintion of nonempty_line:
float_num = Word(nums + '.').setParseAction(lambda t:float(t[0])).setName("float_num")
integer_with_commas = Word(nums + ',').setName("int_with_commas")
nonempty_line = Literal(name) + integer_with_commas + float_num + Suppress(Literal('-')) + float_num * 2
I would also preface the call to parseString with:
print site_contents[cut_start:line_end]
at least while you are debugging. Then you can compare the string being parsed with the error message, including the column number where the parse error is occurring, as given in your posted example as "(at char 38), (line:1, col:39)". "char xx" starts with the first character as "char 0"; "col:xx" starts with the first column as "col:1".
These code changes might help you pinpoint your problem:
print "12345678901234567890123456789012345678901234567890"
print site_contents[cut_start:line_end]
try:
parsed = line.parseString(site_contents[cut_start:line_end])
except ParseException as pe:
print pe.loc*' ' + '^'
print pe
Be sure to run this in a window that uses a monospaced font (so that all the character columns line up, and all characters are the same width as each other).
Once you've done this, you may have enough information to fix the problem yourself, or you'll have some better output to edit into your original question so we can help you better.

Related

PyParsing and multi-line syslog messages

I have copy-pasted a PyParsing syslog parser from here and there.
It's all nice and fluffy, but I have some Syslog messages that look non-compliant to the "standard":
Apr 2 09:23:09 dawn Java App[537]: [main] ERROR ch.java.core.Verifier - Unknown validation error
java.lang.NullPointerException
at org.databin.cms.CMSSignedData.getSignedData(Unknown Source)
at org.databin.cms.CMSSignedData.<init>(Unknown Source)
at org.databin.cms.CMSSignedData.<init>(Unknown Source)
And so on. Now with my PyParsing grammar I go through syslog.log line by line.
def main():
with open("system.log", "r") as myfile:
data = myfile.readlines()
pattern = Parser()._pattern
pattern.runTests(data)
if __name__ == '__main__':
main()
I somehow need to handle multi-line syslog messages. Either I need
to attach the many lines of these Java exceptions to the Syslog message, that has already been parsed.
or make the left side optional.
I don't know. Right now my implementation fails, because it assumes a new line is logged by a new app. Which would be... usual... unless Java...
> Traceback (most recent call last): File
> "/Users/wishi/PycharmProjects/Sparky_1/syslog_to_spark.py", line 39,
> in <module>
> main() File "/Users/wishi/PycharmProjects/Sparky_1/syslog_to_spark.py", line 34,
> in main
> pattern.runTests(data) File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2305, in runTests
> if comment is not None and comment.matches(t, False) or comments and not t: File
> "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2205, in matches
> self.parseString(_ustr(testString), parseAll=parseAll) File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 1622, in parseString
> loc, tokens = self._parse( instring, 0 ) File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 1383, in _parseNoCache
> loc,tokens = self.parseImpl( instring, preloc, doActions ) File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2410, in parseImpl
> if (instring[loc] == self.firstMatchChar and IndexError: string index out of range
Does anyone know a simple way to avoid failure here?
from pyparsing import Word, alphas, Suppress, Combine, nums, string, Regex, Optional, ParserElement, LineEnd, OneOrMore, \
unicodeString, White
import sys
from datetime import datetime
class Parser(object):
# log lines don't include the year, but if we don't provide one, datetime.strptime will assume 1900
ASSUMED_YEAR = str(datetime.now().year)
def __init__(self):
ints = Word(nums)
ParserElement.setDefaultWhitespaceChars(" \t")
NL = Suppress(LineEnd())
unicodePrintables = u''.join(unichr(c) for c in xrange(sys.maxunicode)
if not unichr(c).isspace())
# priority
# priority = Suppress("<") + ints + Suppress(">")
# timestamp
month = Word(string.ascii_uppercase, string.ascii_lowercase, exact=3)
day = ints
hour = Combine(ints + ":" + ints + ":" + ints)
timestamp = month + day + hour
# a parse action will convert this timestamp to a datetime
timestamp.setParseAction(
lambda t: datetime.strptime(Parser.ASSUMED_YEAR + ' ' + ' '.join(t), '%Y %b %d %H:%M:%S'))
# hostname
# usually hostnames follow some convention
hostname = Word(alphas + nums + "_-.")
# appname
# if you call your app "my big fat app with a very long name" go away
appname = (Word(alphas + nums + "/-_.()") + Optional(Word(" ")) + Optional(Word(alphas + nums + "/-_.()")))(
"appname") + (Suppress("[") + ints("pid") + Suppress("]")) | (Word(alphas + "/-_.")("appname"))
appname.setName("appname")
# message
# supports messages with printed unicode
message = Combine(OneOrMore(Word(unicodePrintables) | OneOrMore("\t") | OneOrMore(" "))) + Suppress(OneOrMore(NL))
messages = OneOrMore(message) # does not work
# pattern build
# (add results names to make it easier to access parsed fields)
self._pattern = timestamp("timestamp") + hostname("hostname") + Optional(appname) + Optional(Suppress(':')) + messages("message")
def parse(self, line):
if line.strip():
parsed = self._pattern.parseString(line)
return parsed.asDict()
The partly parsed result is:
[datetime.datetime(2018, 4, 2, 9, 23, 9), 'dawn', 'Java', 'App', '537', '[main] ERROR ch.databin.core.Verifier - Unknown validation error']
- appname: ['Java', 'App']
- hostname: 'dawn'
- message: '[main] ERROR ch.databin.core.Verifier - Unknown validation error'
- pid: '537'
- timestamp: datetime.datetime(2018, 4, 2, 9, 23, 9)
It only contains the first line.
So for syslog messages without linebreaks this works.
The simplest solution is to go back to parsing a line at a time, and keep the valid log lines in a list. If you get a valid log line, just append it to the list; if you don't then append it to the 'messages' item of the last line in the list.
def main():
valid_log_lines = []
with open("system.log", "r") as myfile:
data = myfile.read()
pattern = Parser()._pattern
for line in data.splitlines():
try:
log_dict = pattern.parse(line)
if log_dict is None:
continue
except ParseException:
if valid_log_lines:
valid_log_lines[-1]['message'] += '\n' + line
else:
valid_log_lines.append(log_dict)
To speed up detection of invalid lines, try adding timestamp.leaveWhitespace(), so that any line that does not start with a timestamp in column 1 will immediately fail.
Or you can modify your parser to handle multi-line log messages, that is a longer topic.
I like that you were using runTests, but that is more of a development tool; in your actual code, probably use parseString or one of its ilk.

Simple string concatenation fails

I have the following code:
from yahoo_finance import Currency
symbolslist = ["EURUSD","EURGBP","EURJPY","EURRUB","USDCAD","USDCHF","AUSUSD"]
for i in range(len(symbolslist)):
symbol = symbolslist[i]
nomisma = Currency(symbol).get_rate()
quota = symbol + " = " + nomisma
print quota
And I get the result:
EURUSD = 1.0891
EURGBP = 0.7322
EURJPY = 129.7440
EURRUB = 63.0560
USDCAD = 1.2614
USDCHF = 0.9622
Traceback (most recent call last):
File "yahoopy.py", line 13, in <module>
quota = symbol + " = " + nomisma
TypeError: cannot concatenate 'str' and 'NoneType' objects
I'm aware that this error has been talked about in this link.
But I was hoping that I could overcome this bug without resorting to mysql.
The problem is a typo. Instead of AUDUSD you wrote AUSUSD. Fix it and the error will be gone:
symbolslist = ["EURUSD","EURGBP","EURJPY","EURRUB","USDCAD","USDCHF","AUDUSD"]
Still it is a good idea to use format as #BhargavRao suggested to catch such bugs.
Add an if clause above your concat statement
if nomisma:
quota = symbol + " = " + nomisma
Assumption - It means that AUSUSD is not present in your database, that is why Currency(symbol).get_rate() is returning None. Also as mentioned here it is AUDUSD and not AUSUSD
Note : It is better to use format to concat as in
quota = "{} = {}".format(symbol,nomisma)

Python - NameError

I have the following code that uses 3 strings 'us dollars','euro', '02-11-2014',
and a number to calculate the exchange rate for that given date. I modified the
code to pass those arguments but I get an error when I try to call it with
python currencyManager.py "us dollars" "euro" 100 "02-11-2014"
Traceback (most recent call last):
File "currencyManager.py", line 37. in <module>
currencyManager(currTo,currFrom,currAmount,currDate)
NameError: name 'currTo' is not defined
I'm fairly new to Python so my knowledge is limited. Any help would be greatly appreciated. Thanks.
Also the version of Python I'm using is 3.4.2.
import urllib.request
import re
def currencyManager(currTo,currFrom,currAmount,currDate):
try:
currency_to = currTo #'us dollars'
currency_from = currFrom #'euro'
currency_from_amount = currAmount
on_date = currDate # Day-Month-Year
currency_from = currency_from.replace(' ', '+')
currency_to = currency_to.replace(' ', '+')
url = 'http://www.wolframalpha.com/input/?i=' + str(currency_from_amount) + '+' + str(currency_from) + '+to+' + str(currency_to) + '+on+' + str(on_date)
req = urllib.request.Request(url)
output = ''
urllib.request.urlopen(req)
page_fetch = urllib.request.urlopen(req)
output = page_fetch.read().decode('utf-8')
search = '<area shape="rect.*href="\/input\/\?i=(.*?)\+.*?&lk=1'
result = re.findall(r'' + search, output, re.S)
if len(result) > 0:
amount = float(result[0])
print(str(amount))
else:
print('No match found')
except URLError as e:
print(e)
currencyManager(currTo,currFrom,currAmount,currDate)
The command line
python currencyManager.py "us dollars" "euro" 100 "02-11-2014"
does not automatically assign "us dollars" "euro" 100 "02-11-2014" to currTo,currFrom,currAmount,currDate.
Instead the command line arguments are stored in a list, sys.argv.
You need to parse sys.argv and/or pass its values on to the call to currencyManager:
For example, change
currencyManager(currTo,currFrom,currAmount,currDate)
to
import sys
currencyManager(*sys.argv[1:5])
The first element in sys.argv is the script name. Thus sys.argv[1:5] consists of the next 4 arguments after the script name (assuming 4 arguments were entered on the command line.) You may want to check that the right number of arguments are passed on the command line and that they are of the right type. The argparse module can help you here.
The * in *sys.argv[1:5] unpacks the list sys.argv[1:5] and passes the items in the list as arguments to the function currencyManager.

Syntax error in for line in python

I am running a python script. I am getting an unexplained syntax error in for line.
This is the code:
today = datetime.date.today()
url="http://www.99acres.com/property-in-velachery-chennai-south-ffid?"
print "INSERT INTO Property (URL,Rooms, Place, Phonenumber1,Phonenumber2,Phonenumber3,Typeofperson, Name)"
print "VALUES ("
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
properties = soup.findAll(('a', {'title':re.compile('Bedroom')}),('i',{'class':'pdate'})
for eachproperty in properties:
print today,","+ "http:/" + eachproperty['href'] ",", eachproperty.string"," ,.join(re.findall("'([a-zA-Z0-9,\s]*)'", eachproperty['onclick']))
print ")"
Error is
$ python properties.py
File "properties.py", line 15
for eachproperty in properties:
^
SyntaxError: invalid syntax
Update
Is the following line correct ?
properties = soup.findAll(('a', {'title':re.compile('Bedroom')}),('i',{'class':'pdate'}))
The preceding line has an incorrect count of opening ( parenthesis compared to the number of closing parens:
properties = soup.findAll(('a', {'title':re.compile('Bedroom')}),('i',{'class':'pdate'})
# --^^ ---^ ---^-^-^ -----^
Add one more closing ):
properties = soup.findAll(('a', {'title':re.compile('Bedroom')}),('i',{'class':'pdate'}))

Error always on line 102 of my code

So I am creating a module, and I am importing it to a python shell and running some stuff to make sure all features work and such.
For some reason every time I run the code, it gives the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ryansaxe/Desktop/Code/python/modules/pymaps.py", line 102, in url_maker
#anything can be here
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'
So where the #anything can be here is, is whatever is on line 102 of my code. Originally line 102 was:
if isinstance(startindex,datetime.datetime):
and I got the error above. I put a quick print statement on line 102 to check and it gave the same error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ryansaxe/Desktop/Code/python/modules/pymaps.py", line 102, in url_maker
print 'Hello'
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'
Is this some sort of bug? Why is it telling me there is an error with datetime on the line print 'Hello'?
Because it may be helpful, I will give you the function I am having trouble with since I have no clue how this is possible. I am keeping the print 'Hello' line so you can see where line 102 is:
def url_maker(latitudes,longitudes,times=None,color='red',label=' ',zoom=12,center=None,start=None,end=None,by=None,size='600x300'):
urls = []
import datetime
if isinstance(times[0],str) or isinstance(times[0],datetime.datetime):
from dateutil import parser
if isinstance(times[0],str):
times = [parser.parse(x) for x in times]
if isinstance(start,str):
startindex = parser.parse(start)
else:
startindex = start
if isinstance(end,str):
endindex = parse.parse(end)
else:
endindex = end
print 'Hello'
if isinstance(startindex,datetime.datetime):
startpos = between_times(times,startindex,by='start')
elif isinstance(startindex,int):
if isinstance(endindex,datetime.datetime):
startpos = between_times(times,endindex,by='end') - start
else:
startpos = start
else:
pass
if isinstance(endindex,datetime.datetime):
endpos = between_times(times,endindex,by='end')
elif isinstance(endindex,int):
if isinstance(startindex,datetime.datetime):
endpos = between_times(times,startindex,by='start') + end
else:
endpos = end
else:
pass
else:
times = range(1,len(latitudes) + 1)
if isinstance(start,int):
startpos = start
else:
startpos = None
if isinstance(end,int):
endpos = end
else:
endpos = None
if isinstance(by,str):
lat,lon,t = latitudes[startpos:endpos],latitudes[startpos:endpos],times[startpos:endpos]
print lat
t,lats,lons = time_sample(t,by,lat,lon)
elif isinstance(by,int):
lats,lons,t = latitudes[startpos:endpos:by],latitudes[startpos:endpos:by],times[startpos:endpos:by]
else:
lats,lons,t= latitudes[startpos:endpos],latitudes[startpos:endpos],times[startpos:endpos]
print t
print len(t)
if center == None:
latit = [str(i) for i in lats]
longi = [str(i) for i in lons]
center = '&center=' + common_finder(latit,longi)
else:
center = '&center=' + '+'.join(center.split())
zoom = '&zoom=' + str(zoom)
for i in range(len(lats)):
#label = str(i)
x,y = str(lats[i]),str(lons[i])
marker = '&markers=color:' + color + '%7Clabel:' + label + '%7C' + x + ',' + y
url = 'http://maps.googleapis.com/maps/api/staticmap?maptype=roadmap&size=' + size + zoom + center + marker + '&sensor=true'
urls.append(url)
#print i
return urls,t
You are running with a stale bytecode cache or are re-running the code in an existing interpreter without restarting it.
The traceback code has only bytecode to work with, which contains filename and linenumber information. When an exception occurs, the source file is loaded to retrieve the original line of code, but if the source file has changed, that leads to the wrong line being shown.
Restart the interpreter and/or remove all *.pyc files; the latter will be recreated when the interpreter imports the code again.
As for your specific exception; you probably imported the datetime class from the datetime module somewhere:
from datetime import datetime
The datetime class does not have a datetime attribute, only the module does.

Categories

Resources