Convert formatted log string back to LogRecord - python

What is the best way to convert a log string back to the LogRecord that caused that string to be generated in the first place, provided that I have the formatter string.
I know I can use regex for this, but I was wondering if there's a better way to achieve this.
Formatter:
%(asctime)s--%(name)s--%(levelname)s--%(funcName)s:%(lineno)s---%(message)s
Sample:
2014-07-28 16:46:39,221--sys.log--DEBUG--hello:61---hello world
Regex:
^(?P<asctime>.*?)--(?P<name>.*?)--(?P<levelname>.*?)--(?P<funcName>.*?):(?P<lineno>.*?)---(?P<message>.*?)$
Regex example:
import re
pattern = re.compile('^(?P<asctime>.*?)--(?P<name>.*?)--(?P<levelname>.*?)--(?P<funcName>.*?):(?P<lineno>.*?)---(?P<message>.*?)$')
print pattern.match('2014-07-28 16:46:39,221--sys.log--DEBUG--hello:61---hello world').groupdict()
Output:
{'name': 'sys.log', 'funcName': 'hello', 'lineno': '61', 'asctime': '2014-07-2816:46:39,221', 'message': 'hello world', 'levelname': 'DEBUG'}
References:
https://docs.python.org/2/library/logging.html
https://docs.python.org/2/howto/logging.html

For this example, just split at the double-dashes:
sample = '2014-07-28 16:46:39,221--sys.log--DEBUG--hello:61---hello world'
fields = ('asctime', 'name', 'levelname', 'funcName', 'message')
values = { k: v for k, v in zip(fields, sample.split('--', len(fields) - 1)) }
# and do some mending
values['funcName'], values['lineno'] = values['funcName'].split(':')
values['message'] = values['message'][1:]
>>> values
{'asctime': '2014-07-28 16:46:39,221',
'funcName': 'hello',
'levelname': 'DEBUG',
'lineno': '61',
'message': 'hello world',
'name': 'sys.log'}

Related

Parsing a string to dict with Python

I have a string like this:
/acommand foo='bar' msg='Hello World!' -debugMode
or like this:
/acommand
foo='bar'
msg='Hello, World!'
-debugMode
How can I parse this string to a dict and a list like this:
{"command": "/acommand", "foo": "bar", "msg": "Hello World!"}
["-debugMode"]
I've tried to use string.split to parse it but seems it's not feasible.
argparse seems like that it was born for the command line interface so it doesn't apply.
How to achieve this with Python? Thanks!
you can try something like this:
s = "/acommand foo='bar' msg='Hello World!' -debugMode"
debug = [s.split(" ")[-1]]
s_ = "command=" + ' '.join(s.split(" ")[:-1]).replace("'","")
d = dict(x.split("=") for x in s_.split(" ",2))
print (d)
print (debug)
{'command': '/acommand', 'foo': 'bar', 'msg': 'Hello World!'},
['-debugMode']

parsing a file with python "'str' object has no attribute 'get'"

I am a fairly new dev and trying to parse "id" values from this file. Running into the issue below.
My python code:
import ast
from pathlib import Path
file = Path.home() /'AppData'/'Roaming'/'user-preferences-prod'
with open(file, 'r') as f:
contents = f.read()
ids = ast.literal_eval(contents)
profileids = []
for data in ids:
test= data.get('id')
profileids.append(test)
print(profileids))
This returns the error: ValueError: malformed node or string: <_ast.Name object at 0x0000023D8DA4D2E8> at ids = ast.literal_eval(contents)
A snippet of the content in my file of interest:
{"settings":{"defaults":{"value1":,"value2":,"value3":null,"value4":null,"proxyid":null,"sites":{},"sizes":[],"value5":false},"value6":true,"value11":,"user":{"value9":"","value8": ,"value7":"","value10":""},"webhook":"},'profiles':[{'billing': {'address1': '', 'address2': '', 'city': '', 'country': 'United States', 'firstName': '', 'lastName': '', 'phone': '', 'postalCode': '', 'province': '', 'usesBillingInformation': False}, 'createdAt': 123231231213212, 'id': '23123123123213, 'name': ''
I need this code to be looped as there are multiple id values that I am interested in and need them all to be entered into a list.Hopefully I explained it all. the file type is "file" according to windows, I just view its contents with notepad.
It appears to me that you have a file with a string representation of a dict (dictionary). So, what you need to do is:
string_of_dict →ast.literal_eval()→ dict
Open file and read in the text into a string variable. Currently I think this string is going into ids.
Then convert the string representation of dict into a dict using ast library as shown below. Reference
import ast
string_of_dict = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
ast.literal_eval(string_of_dict)
Output:
{'muffin': 'lolz', 'foo': 'kitty'}
Solution
Something like this should most likely work. You may have to tweak it a little bit.
import ast
with open(file, 'r') as f:
contents = f.read()
ids = ast.literal_eval(contents)
profileids = []
for data in ids:
test= data.get('id')
profileids.append(test)
print(profileids)

String indices must be integers - Django

I have a pretty big dictionary which looks like this:
{
'startIndex': 1,
'username': 'myemail#gmail.com',
'items': [{
'id': '67022006',
'name': 'Adopt-a-Hydrant',
'kind': 'analytics#accountSummary',
'webProperties': [{
'id': 'UA-67522226-1',
'name': 'Adopt-a-Hydrant',
'websiteUrl': 'https://www.udemy.com/,
'internalWebPropertyId': '104343473',
'profiles': [{
'id': '108333146',
'name': 'Adopt a Hydrant (Udemy)',
'type': 'WEB',
'kind': 'analytics#profileSummary'
}, {
'id': '132099908',
'name': 'Unfiltered view',
'type': 'WEB',
'kind': 'analytics#profileSummary'
}],
'level': 'STANDARD',
'kind': 'analytics#webPropertySummary'
}]
}, {
'id': '44222959',
'name': 'A223n',
'kind': 'analytics#accountSummary',
And so on....
When I copy this dictionary on my Jupyter notebook and I run the exact same function I run on my django code it runs as expected, everything is literarily the same, in my django code I'm even printing the dictionary out then I copy it to the notebook and run it and I get what I'm expecting.
Just for more info this is the function:
google_profile = gp.google_profile # Get google_profile from DB
print(google_profile)
all_properties = []
for properties in google_profile['items']:
all_properties.append(properties)
site_selection=[]
for single_property in all_properties:
single_propery_name=single_property['name']
for single_view in single_property['webProperties'][0]['profiles']:
single_view_id = single_view['id']
single_view_name = (single_view['name'])
selections = single_propery_name + ' (View: '+single_view_name+' ID: '+single_view_id+')'
site_selection.append(selections)
print (site_selection)
So my guess is that my notebook has some sort of json parser installed or something like that? Is that possible? Why in django I can't access dictionaries the same way I can on my ipython notebooks?
EDITS
More info:
The error is at the line: for properties in google_profile['items']:
Django debug is: TypeError at /gconnect/ string indices must be integers
Local Vars are:
all_properties =[]
current_user = 'myemail#gmail.com'
google_profile = `the above dictionary`
So just to make it clear for who finds this question:
If you save a dictionary in a database django will save it as a string, so you won't be able to access it after.
To solve this you can re-convert it to a dictionary:
The answer from this post worked perfectly for me, in other words:
import json
s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
json_acceptable_string = s.replace("'", "\"")
d = json.loads(json_acceptable_string)
# d = {u'muffin': u'lolz', u'foo': u'kitty'}
There are many ways to convert a string to a dictionary, this is only one. If you stumbled in this problem you can quickly check if it's a string instead of a dictionary with:
print(type(var))
In my case I had:
<class 'str'>
before converting it with the above method and then I got
<class 'dict'>
and everything worked as supposed to

Python iterate over list and join lines without a special character to the previous item

I'm wondering if anyone has a sort of hacky / cool solution to this problem . I have a text file like so:
NAME:name
ID:id
PERSON:person
LOCATION:location
NAME:name
morenamestuff
ID:id
PERSON:person
LOCATION:location
JUNK
So I have some blocks that all contain lines that can be split into a dict, and some that cannot. How can I take lines without the : character and join them to the previous line? Here's what I'm currently doing
# loop through chunk
# the first element of dat is a Title, so skip that
key_map = dict(x.split(':') for x in dat[1:])
But I of course get an error because the second chunk has a line without the : character. So I wanted my dict to look something like this after correctly splitting it:
# there will be a key_map for each chunk of data
key_map['NAME'] == 'name morenamestuff' # 3rd line appended to previous
key_map['ID'] == 'id'
key_map['PERSON'] = 'person'
key_map['LOCATION'] = 'location
Solution
EDIT: Here's my final solution on github, and the full code here:
parseScript.py
import re
import string
bad_chars = '(){}"<>[] ' # characers we want to strip from the string
key_map = []
# parse file
with open("dat.txt") as f:
data = f.read()
data = data.strip('\n')
data = re.split('}|\[{', data)
# format file
with open("format.dat") as f:
formatData = [x.strip('\n') for x in f.readlines()]
data = filter(len, data)
# strip and split each station
for dat in data[1:-1]:
# perform black magic, don't even try to understand this
dat = dat.translate(string.maketrans("", "", ), bad_chars).split(',')
key_map.append(dict(x.split(':') for x in dat if ':' in x ))
if ':' not in dat[1]:key_map['NAME']+=dat[k][2]
for station in range(0, len(key_map)):
for opt in formatData:
print opt,":",key_map[station][opt]
print ""
dat.txt
View raw here
format.dat
NAME
STID
LONGITUDE
LATITUDE
ELEVATION
STATE
ID
out.dat
View raw here
When in doubt, write your own generator.
Add in itertools.groupby to chunk by groups of text delimited by whitespace breaks.
def chunker(s):
it = iter(s)
out = [next(it)]
for line in it:
if ':' in line or not line:
yield ' '.join(out)
out = []
out.append(line)
if out:
yield ' '.join(out)
usage:
from itertools import groupby
[dict(x.split(':') for x in g) for k,g in groupby(chunker(lines), bool) if k]
Out[65]:
[{'ID': 'id', 'LOCATION': 'location', 'NAME': 'name', 'PERSON': 'person'},
{'ID': 'id',
'LOCATION': 'location',
'NAME': 'name morenamestuff',
'PERSON': 'person'}]
(if those fields are always the same, I'd go with something like creating some namedtuples instead of a bunch of dicts)
from collections import namedtuple
Thing = namedtuple('Thing', 'ID LOCATION NAME PERSON')
[Thing(**dict(x.split(':') for x in g)) for k,g in groupby(chunker(lines), bool) if k]
Out[76]:
[Thing(ID='id', LOCATION='location', NAME='name', PERSON='person'),
Thing(ID='id', LOCATION='location', NAME='name morenamestuff', PERSON='person')]
Here is something that addresses all your requirements. It handles joining of multiple lines, ignoring blank lines, and ignoring junk lines that do not appear within a block. It is implemented as a generator that yields each dictionary as it is completed.
def parser(data):
d = {}
for line in data:
line = line.strip()
if not line:
if d:
yield d
d = {}
else:
if ':' in line:
key, value = line.split(':')
d[key] = value
else:
if d:
d[key] = '{} {}'.format(d[key], line)
if d:
yield d
When run with this data:
ignore me
NAME:name1
ID:id1
PERSON:person1
LOCATION:location1
NAME:name2
morenamestuff
ID:id2
PERSON:person2
LOCATION:location2
junk
and
other
stuff
NAME:name3
morenamestuff
and more
ID:id3
PERSON:person3
more person stuff
LOCATION:location3
JUNK
MORE JUNK
>>> for d in parser(open('data')):
... print d
{'PERSON': 'person1', 'LOCATION': 'location1', 'NAME': 'name1', 'ID': 'id1'}
{'PERSON': 'person2', 'LOCATION': 'location2', 'NAME': 'name2 morenamestuff', 'ID': 'id2'}
{'PERSON': 'person3 more person stuff', 'LOCATION': 'location3', 'NAME': 'name3 morenamestuff and more', 'ID': 'id3'}
You can grab the lot as a list:
>>> results = list(parser(open('data')))
>>> results
[{'PERSON': 'person1', 'LOCATION': 'location1', 'NAME': 'name1', 'ID': 'id1'}, {'PERSON': 'person2', 'LOCATION': 'location2', 'NAME': 'name2 morenamestuff', 'ID': 'id2'}, {'PERSON': 'person3 more person stuff', 'LOCATION': 'location3', 'NAME': 'name3 morenamestuff and more', 'ID': 'id3'}]
I don't find itertools or regex particularly nice to work with, here's a pure-python solution
separator = ':'
output = []
chunk = None
with open('/tmp/stuff.txt') as f:
for line in (x.strip() for x in f):
if not line:
# we are between 'chunks'
chunk, key = None, None
continue
if chunk is None:
# we are at the beginning of a new 'chunk'
chunk, key = {}, None
output.append(chunk)
if separator in line:
key, val = line.split(separator)
chunk[key] = val
else:
chunk[key] += line
not as elegant, as you requested, but this works
dat=[['NAME:name',
'ID:id',
'PERSON:person',
'LOCATION:location'],
['NAME:name',
'morenamestuff',
'ID:id',
'PERSON:person',
'LOCATION:location']]
k=1
key_map = dict(x.split(':') for x in dat[k] if ':' in x )
if ':' not in dat[k][1]:key_map['NAME']+=dat[k][1]
key_map>>
{'ID': 'id',
'LOCATION': 'location',
'NAME': 'namemorenamestuff',
'PERSON': 'person'}
Just add something to lines with no ":".
if line.find(':') == -1:
line=line+':None'
Then you won't get an error.

List in a dictionary, looping in Python

I have the following code:
TYPES = {'hotmail':{'type':'hotmail', 'lookup':'mixed', 'dkim': 'no', 'signatures':['|S|Return-Path: postmaster#hotmail.com','|R|^Return-Path:\s*[^#]+#(?:hot|msn)','^Received: from .*hotmail.com$']},
'gmail':{'type':'gmail', 'lookup':'mixed', 'dkim': 'yes', 'signatures':['|S|Subject: unsubscribe','','','']}
}
for type_key, type in TYPES.iteritems():
for sub_type_key, sub_type in type.iteritems():
for sig in sub_type['signatures']:
if ("|S|" in sig):
#String based matching
clean_sig = sig[3:len(sig)]
if (clean_sig in file_contents):
sig_match += 1
elif ("|R|" in sig):
clean_sig = sig[3:len(sig)]
#REGMATCH later
if (sig_match == sig.count):
return sub_type['type']
return None
However, it generates the error:
for sig in sub_type['signatures']:
TypeError: string indices must be integers, not str
I assume that it would see the list being pulled from dictionary element, and allow me to loop over that?
Python newbie is a newbie :(
for type_key, type in TYPES.iteritems():
for sub_type_key, sub_type in type.iteritems():
for sig in sub_type['signatures']:
should be:
for type_key, type in TYPES.iteritems():
for sig in type['signatures']:
But 'type' is a poor name choice in this case... you don't want to shadow a builtin.
Essentially, 'type_key' has the name (either 'hotmail' or 'gmail'), and 'type' has the dictionary that is the value associated with that key. So type['signatures'] is what you're wanting.
Also, you may not need to have 'gmail' inside the nested dictionary; just return 'type_key' instead of type['type'].
Bringing it all together, maybe this will work better: (Warning: untested)
providers = {
'hotmail':{
'type':'hotmail',
'lookup':'mixed',
'dkim': 'no',
'signatures':[
'|S|Return-Path: postmaster#hotmail.com',
'|R|^Return-Path:\s*[^#]+#(?:hot|msn)',
'^Received: from .*hotmail.com$']
},
'gmail':{
'type':'gmail',
'lookup':'mixed',
'dkim': 'yes',
'signatures':['|S|Subject: unsubscribe','','','']
}
}
for provider, provider_info in providers.iteritems():
for sig in provicer_info['signatures']:
if ("|S|" in sig):
#String based matching
clean_sig = sig[3:len(sig)]
if (clean_sig in file_contents):
sig_match += 1
elif ("|R|" in sig):
clean_sig = sig[3:len(sig)]
#REGMATCH later
if (sig_match == sig.count):
return provider
return None
[Posted as an answer instead of a comment because retracile beat me to the answer, but the formatting is still a point worth making.]
Laying out the data helps to visualize it:
TYPES = {
'hotmail': {
'type': 'hotmail',
'lookup': 'mixed',
'dkim': 'no',
'signatures': ['|S|Return-Path: postmaster#hotmail.com',
'|R|^Return-Path:\s*[^#]+#(?:hot|msn)',
'^Received: from .*hotmail.com$'],
},
'gmail': {
'type': 'gmail',
'lookup': 'mixed',
'dkim': 'yes',
'signatures': ['|S|Subject: unsubscribe', '', '', ''],
},
}
Note: You can have an ending comma after the last item in a dict, list, or tuple (used above only for the dicts—it's not always more clear), and you don't have to worry about screwing around with that comma, which is a Good Thing™.

Categories

Resources