PyParsing and multi-line syslog messages - python

I have copy-pasted a PyParsing syslog parser from here and there.
It's all nice and fluffy, but I have some Syslog messages that look non-compliant to the "standard":
Apr 2 09:23:09 dawn Java App[537]: [main] ERROR ch.java.core.Verifier - Unknown validation error
java.lang.NullPointerException
at org.databin.cms.CMSSignedData.getSignedData(Unknown Source)
at org.databin.cms.CMSSignedData.<init>(Unknown Source)
at org.databin.cms.CMSSignedData.<init>(Unknown Source)
And so on. Now with my PyParsing grammar I go through syslog.log line by line.
def main():
with open("system.log", "r") as myfile:
data = myfile.readlines()
pattern = Parser()._pattern
pattern.runTests(data)
if __name__ == '__main__':
main()
I somehow need to handle multi-line syslog messages. Either I need
to attach the many lines of these Java exceptions to the Syslog message, that has already been parsed.
or make the left side optional.
I don't know. Right now my implementation fails, because it assumes a new line is logged by a new app. Which would be... usual... unless Java...
> Traceback (most recent call last): File
> "/Users/wishi/PycharmProjects/Sparky_1/syslog_to_spark.py", line 39,
> in <module>
> main() File "/Users/wishi/PycharmProjects/Sparky_1/syslog_to_spark.py", line 34,
> in main
> pattern.runTests(data) File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2305, in runTests
> if comment is not None and comment.matches(t, False) or comments and not t: File
> "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2205, in matches
> self.parseString(_ustr(testString), parseAll=parseAll) File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 1622, in parseString
> loc, tokens = self._parse( instring, 0 ) File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 1383, in _parseNoCache
> loc,tokens = self.parseImpl( instring, preloc, doActions ) File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2410, in parseImpl
> if (instring[loc] == self.firstMatchChar and IndexError: string index out of range
Does anyone know a simple way to avoid failure here?
from pyparsing import Word, alphas, Suppress, Combine, nums, string, Regex, Optional, ParserElement, LineEnd, OneOrMore, \
unicodeString, White
import sys
from datetime import datetime
class Parser(object):
# log lines don't include the year, but if we don't provide one, datetime.strptime will assume 1900
ASSUMED_YEAR = str(datetime.now().year)
def __init__(self):
ints = Word(nums)
ParserElement.setDefaultWhitespaceChars(" \t")
NL = Suppress(LineEnd())
unicodePrintables = u''.join(unichr(c) for c in xrange(sys.maxunicode)
if not unichr(c).isspace())
# priority
# priority = Suppress("<") + ints + Suppress(">")
# timestamp
month = Word(string.ascii_uppercase, string.ascii_lowercase, exact=3)
day = ints
hour = Combine(ints + ":" + ints + ":" + ints)
timestamp = month + day + hour
# a parse action will convert this timestamp to a datetime
timestamp.setParseAction(
lambda t: datetime.strptime(Parser.ASSUMED_YEAR + ' ' + ' '.join(t), '%Y %b %d %H:%M:%S'))
# hostname
# usually hostnames follow some convention
hostname = Word(alphas + nums + "_-.")
# appname
# if you call your app "my big fat app with a very long name" go away
appname = (Word(alphas + nums + "/-_.()") + Optional(Word(" ")) + Optional(Word(alphas + nums + "/-_.()")))(
"appname") + (Suppress("[") + ints("pid") + Suppress("]")) | (Word(alphas + "/-_.")("appname"))
appname.setName("appname")
# message
# supports messages with printed unicode
message = Combine(OneOrMore(Word(unicodePrintables) | OneOrMore("\t") | OneOrMore(" "))) + Suppress(OneOrMore(NL))
messages = OneOrMore(message) # does not work
# pattern build
# (add results names to make it easier to access parsed fields)
self._pattern = timestamp("timestamp") + hostname("hostname") + Optional(appname) + Optional(Suppress(':')) + messages("message")
def parse(self, line):
if line.strip():
parsed = self._pattern.parseString(line)
return parsed.asDict()
The partly parsed result is:
[datetime.datetime(2018, 4, 2, 9, 23, 9), 'dawn', 'Java', 'App', '537', '[main] ERROR ch.databin.core.Verifier - Unknown validation error']
- appname: ['Java', 'App']
- hostname: 'dawn'
- message: '[main] ERROR ch.databin.core.Verifier - Unknown validation error'
- pid: '537'
- timestamp: datetime.datetime(2018, 4, 2, 9, 23, 9)
It only contains the first line.
So for syslog messages without linebreaks this works.

The simplest solution is to go back to parsing a line at a time, and keep the valid log lines in a list. If you get a valid log line, just append it to the list; if you don't then append it to the 'messages' item of the last line in the list.
def main():
valid_log_lines = []
with open("system.log", "r") as myfile:
data = myfile.read()
pattern = Parser()._pattern
for line in data.splitlines():
try:
log_dict = pattern.parse(line)
if log_dict is None:
continue
except ParseException:
if valid_log_lines:
valid_log_lines[-1]['message'] += '\n' + line
else:
valid_log_lines.append(log_dict)
To speed up detection of invalid lines, try adding timestamp.leaveWhitespace(), so that any line that does not start with a timestamp in column 1 will immediately fail.
Or you can modify your parser to handle multi-line log messages, that is a longer topic.
I like that you were using runTests, but that is more of a development tool; in your actual code, probably use parseString or one of its ilk.

Related

TypeError: 'in <string>' requires string as left operand, not bytes,python

I have an error when executing a script made in python that occupies telnet, at the moment of executing I get the error that I indicate at the end. This error is executed by the line that I mention in the code but I could not find any solution , I hope you can help me solve it.
Code:
def simulate(location,loggers,sensors,host,port,interval):
'''Simulate the reception of oxygen data to cacheton '''
temp_range = MAX_TEMP - MIN_TEMP
o2_range = MAX_O2 - MIN_O2
t = 0.0
steps = -1
while 1:
for logger in loggers:
for sensor in sensors:
temp1 = random.random() * temp_range + MIN_TEMP
oxi1 = random.random() * o2_range + MIN_O2
unix_time = int(time.time())
command = "PUT /%s/oxygen/%i/%i/?oxy=%.1f&temp=%.1f&time=%i&depth=1&salinity=32&status=1" % (location, logger, sensor, oxi1, temp1, unix_time)
print (command)
tn = telnetlib.Telnet(host,port)
tn.write(command+"\n\n")#here is theerror
tn.write("\n")
tn.close()
t += interval
if steps > 0: steps -= 1
if steps == 0: break
time.sleep(interval)
Error:
Traceback (most recent call last):
File "simulate_oxygen_cacheton.py", line 57, in <module>
simulate(args.location, range(loggers), range(sensors), args.host, args.port, args.interval)
File "simulate_oxygen_cacheton.py", line 29, in simulate
tn.write(command+"\n\n")
File "/home/mauricio/anaconda3/lib/python3.7/telnetlib.py", line 287, in write
if IAC in buffer:
TypeError: 'in <string>' requires string as left operand, not bytes
The problem here is that in telnetlib.py, IAC is of type bytes but your command+"\n\n" is of type str. You may also need to convert any strings you are passing to tn.write() into byte strings by feeding them through str.encode()
Try:
tn.write(str.encode(command+"\n\n"))

string index out of range? - python

im buliding a chat and i'm keep getting the next error:
Traceback (most recent call last):
File "C:/Users/ronen/Documents/CyberLink/tryccc.py", line 651, in <module>
main()
File "C:/Users/ronen/Documents/CyberLink/tryccc.py", line 630, in main
print_message(data_from_server, len(temp_message), username)
File "C:/Users/ronen/Documents/CyberLink/tryccc.py", line 265, in print_message
temp_l = data[0]
IndexError: string index out of range
i am trying to get the first char of the data string and convert it into int but i get this error
the problem is in the first line of the code
def print_message(d_temp, line_length, this_username):
temp_l = d_temp[0] #the problematic line
len_username = int(temp_l)
username_sender = d_temp[1:(len_username + 1)]
message_sent = d_temp[(len_username + 1): -4]
hour_time = d_temp[-4: -2]
min_time = d_temp[-2:]
printed_message = "\r" + hour_time + ":" + min_time + " " + username_sender + " : " + message_sent
print printed_message, # Prints this message on top of what perhaps this client started writing.
# if this client started typing message
complete_line_with_space(len(printed_message), line_length)
data- the data (string) from the server
line_length - the length of the temp massage
this_username - the client's username
thank you to all the helpers
empty d_temp will give this error.
This might be the reason:
>>> d_temp = ""
>>> d_temp[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
This happens because print_message is executed with an empty string as first parameter
print_message("",...)
This means the problem is somewhere else in your code.

Python pyparsing issue

I am very new to python and using pyparsing but getting some exception with following code
while site_contents.find('---', line_end) != line_end + 2:
cut_start = site_contents.find(" ", site_contents.find("\r\n", start))
cut_end = site_contents.find(" ", cut_start+1)
line_end = site_contents.find("\r\n", cut_end)
name = site_contents[cut_start:cut_end].strip()
float_num = Word(nums + '.').setParseAction(lambda t:float(t[0]))
nonempty_line = Literal(name) + Word(nums+',') + float_num + Suppress(Literal('-')) + float_num * 2
empty_line = Literal(name) + Literal('-')
line = nonempty_line | empty_line
parsed = line.parseString(site_contents[cut_start:line_end])
start = line_end
Exception
Traceback (most recent call last):
File "D:\Ecllipse_Python\HellloWorld\src\HelloPython.py", line 108, in <module>
parsed = line.parseString(site_contents[cut_start:line_end]) # parse line of data following cut name
File "C:\Users\arbatra\AppData\Local\Continuum\Anaconda\lib\site-packages\pyparsing.py", line 1041, in parseString
raise exc
pyparsing.ParseException: Expected W:(0123...) (at char 38), (line:1, col:39)
how to resolve this issue?
You'll get a little better exception message if you give names to your expressions, using setName. From the "Expected W:(0123...)" part of the exception message, it looks like the parser is not finding a numeric value where it is expected. But the default name is not showing us enough to know which type of numeric field is expected. Modify your parser to add setName as shown below, and also change the defintion of nonempty_line:
float_num = Word(nums + '.').setParseAction(lambda t:float(t[0])).setName("float_num")
integer_with_commas = Word(nums + ',').setName("int_with_commas")
nonempty_line = Literal(name) + integer_with_commas + float_num + Suppress(Literal('-')) + float_num * 2
I would also preface the call to parseString with:
print site_contents[cut_start:line_end]
at least while you are debugging. Then you can compare the string being parsed with the error message, including the column number where the parse error is occurring, as given in your posted example as "(at char 38), (line:1, col:39)". "char xx" starts with the first character as "char 0"; "col:xx" starts with the first column as "col:1".
These code changes might help you pinpoint your problem:
print "12345678901234567890123456789012345678901234567890"
print site_contents[cut_start:line_end]
try:
parsed = line.parseString(site_contents[cut_start:line_end])
except ParseException as pe:
print pe.loc*' ' + '^'
print pe
Be sure to run this in a window that uses a monospaced font (so that all the character columns line up, and all characters are the same width as each other).
Once you've done this, you may have enough information to fix the problem yourself, or you'll have some better output to edit into your original question so we can help you better.

Python ValueError: chr() arg not in range(256)

So I am learning python and redoing some old projects. This project involves taking in a dictionary and a message to be translated from the command line, and translating the message. (For example: "btw, hello how r u" would be translated to "by the way, hello how are you".
We are using a scanner supplied by the professor to read in tokens and strings. If necessary I can post it here too. Heres my error:
Nathans-Air-4:py1 Nathan$ python translate.py test.xlt test.msg
Traceback (most recent call last):
File "translate.py", line 26, in <module>
main()
File "translate.py", line 13, in main
dictionary,count = makeDictionary(commandDict)
File "/Users/Nathan/cs150/extra/py1/support.py", line 12, in makeDictionary
string = s.readstring()
File "/Users/Nathan/cs150/extra/py1/scanner.py", line 105, in readstring
return self._getString()
File "/Users/Nathan/cs150/extra/py1/scanner.py", line 251, in _getString
if (delimiter == chr(0x2018)):
ValueError: chr() arg not in range(256)
Heres my main translate.py file:
from support import *
from scanner import *
import sys
def main():
arguments = len(sys.argv)
if arguments != 3:
print'Need two arguments!\n'
exit(1)
commandDict = sys.argv[1]
commandMessage = sys.argv[2]
dictionary,count = makeDictionary(commandDict)
message,messageCount = makeMessage(commandMessage)
print(dictionary)
print(message)
i = 0
while count < messageCount:
translation = translate(message[i],dictionary,messageCount)
print(translation)
count = count + 1
i = i +1
main()
And here is my support.py file I am using...
from scanner import *
def makeDictionary(filename):
fp = open(filename,"r")
s = Scanner(filename)
lyst = []
token = s.readtoken()
count = 0
while (token != ""):
lyst.append(token)
string = s.readstring()
count = count+1
lyst.append(string)
token = s.readtoken()
return lyst,count
def translate(word,dictionary,count):
i = 0
while i != count:
if word == dictionary[i]:
return dictionary[i+1]
i = i+1
else:
return word
i = i+1
return 0
def makeMessage(filename):
fp = open(filename,"r")
s = Scanner(filename)
lyst2 = []
string = s.readtoken()
count = 0
while (string != ""):
lyst2.append(string)
string = s.readtoken()
count = count + 1
return lyst2,count
Does anyone know whats going on here? I've looked through several times and i dont know why readString is throwing this error... Its probably something stupid i missed
chr(0x2018) will work if you use Python 3.
You have code that's written for Python 3 but you run it with Python 2. In Python 2 chr will give you a one character string in the ASCII range. This is an 8-bit string, so the maximum parameter value for chris 255. In Python 3 you'll get a unicode character and unicode code points can go up to much higher values.
The issue is that the character you're converting using chr isn't within the range accepted (range(256)). The value 0x2018 in decimal is 8216.
Check out unichr, and also see chr.

Error always on line 102 of my code

So I am creating a module, and I am importing it to a python shell and running some stuff to make sure all features work and such.
For some reason every time I run the code, it gives the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ryansaxe/Desktop/Code/python/modules/pymaps.py", line 102, in url_maker
#anything can be here
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'
So where the #anything can be here is, is whatever is on line 102 of my code. Originally line 102 was:
if isinstance(startindex,datetime.datetime):
and I got the error above. I put a quick print statement on line 102 to check and it gave the same error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ryansaxe/Desktop/Code/python/modules/pymaps.py", line 102, in url_maker
print 'Hello'
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'
Is this some sort of bug? Why is it telling me there is an error with datetime on the line print 'Hello'?
Because it may be helpful, I will give you the function I am having trouble with since I have no clue how this is possible. I am keeping the print 'Hello' line so you can see where line 102 is:
def url_maker(latitudes,longitudes,times=None,color='red',label=' ',zoom=12,center=None,start=None,end=None,by=None,size='600x300'):
urls = []
import datetime
if isinstance(times[0],str) or isinstance(times[0],datetime.datetime):
from dateutil import parser
if isinstance(times[0],str):
times = [parser.parse(x) for x in times]
if isinstance(start,str):
startindex = parser.parse(start)
else:
startindex = start
if isinstance(end,str):
endindex = parse.parse(end)
else:
endindex = end
print 'Hello'
if isinstance(startindex,datetime.datetime):
startpos = between_times(times,startindex,by='start')
elif isinstance(startindex,int):
if isinstance(endindex,datetime.datetime):
startpos = between_times(times,endindex,by='end') - start
else:
startpos = start
else:
pass
if isinstance(endindex,datetime.datetime):
endpos = between_times(times,endindex,by='end')
elif isinstance(endindex,int):
if isinstance(startindex,datetime.datetime):
endpos = between_times(times,startindex,by='start') + end
else:
endpos = end
else:
pass
else:
times = range(1,len(latitudes) + 1)
if isinstance(start,int):
startpos = start
else:
startpos = None
if isinstance(end,int):
endpos = end
else:
endpos = None
if isinstance(by,str):
lat,lon,t = latitudes[startpos:endpos],latitudes[startpos:endpos],times[startpos:endpos]
print lat
t,lats,lons = time_sample(t,by,lat,lon)
elif isinstance(by,int):
lats,lons,t = latitudes[startpos:endpos:by],latitudes[startpos:endpos:by],times[startpos:endpos:by]
else:
lats,lons,t= latitudes[startpos:endpos],latitudes[startpos:endpos],times[startpos:endpos]
print t
print len(t)
if center == None:
latit = [str(i) for i in lats]
longi = [str(i) for i in lons]
center = '&center=' + common_finder(latit,longi)
else:
center = '&center=' + '+'.join(center.split())
zoom = '&zoom=' + str(zoom)
for i in range(len(lats)):
#label = str(i)
x,y = str(lats[i]),str(lons[i])
marker = '&markers=color:' + color + '%7Clabel:' + label + '%7C' + x + ',' + y
url = 'http://maps.googleapis.com/maps/api/staticmap?maptype=roadmap&size=' + size + zoom + center + marker + '&sensor=true'
urls.append(url)
#print i
return urls,t
You are running with a stale bytecode cache or are re-running the code in an existing interpreter without restarting it.
The traceback code has only bytecode to work with, which contains filename and linenumber information. When an exception occurs, the source file is loaded to retrieve the original line of code, but if the source file has changed, that leads to the wrong line being shown.
Restart the interpreter and/or remove all *.pyc files; the latter will be recreated when the interpreter imports the code again.
As for your specific exception; you probably imported the datetime class from the datetime module somewhere:
from datetime import datetime
The datetime class does not have a datetime attribute, only the module does.

Categories

Resources