I am new to python.
I have a config file as shown below in the same order. I need to retrieve key, value pairs from config file and will use those values in my script
# Name and details
(
{ group => 'abc',
host => 'pqr.com',
user => 'anonymous',
src => '/var/tmp',
dest => '/tmp',
},
{ group => 'abc',
host =>'pqr.com',
user => 'anonymous',
src => '/tmp'
dest => '/var/tmp'
},
{ group => 'pqr',
host =>'abc.com',
user => 'xyz',
src => '/home/pp',
dest => '/var/tmp',
},
{ group => 'xyz',
host =>'p.com',
user => 'x',
src => '/home/',
dest => '/tmp',
}
)
Each
{
}
is considerd as one block..Group,user,host are unique as well as repeated.
I have to read and parse the config file and display key and value pair.Pls help.
Key : group,Value : 'abc'(say)
key : host ,Value :'pqr.com'
Key : user, Value :'anonymous'
Key : src,Value :'/var/tmp',
key : dest,Value : '/tmp'
Thank you,
I have written the code which displays keys and values taking cfg file(shown above) as an input.
idx = 0
dictList = []
while True:
try:
start = config.index("{", idx)
end = config.index("}", start+1)
slice = config[start+1:end-1]
sliceList = [s.strip() for s in slice.split(",") if s.strip()]
dd = {}
for item in sliceList:
key, value = [s.strip() for s in item.split("=>")]
print key, value
Output while displaying keys,values
key 'value'
group 'abc'
host 'pqr.com'
user 'ananymous'
src '/use/tmp
Now the problem is ,how to display the value corresponding to a key.
Eg : print group- should display abc
print host should display pqr.com, and so on.
You'll probably need to parse it, here's a small example on how to do this.
import re
def parse(data):
'''Parse data block, return itertator on objects inside'''
for block in re.finditer('{[^}]*}', data, re.M): # Split to objects
obj = {}
for match in re.finditer("([a-z]+) => '([^']*)'", block.group()):
obj[match.group(1)] = match.group(2)
yield obj
Now you have two problems :)
Your data is bit malformed to be directly interpreted by Python. So you would have to per-process the data before interpreting it
Change all Occurrence of => to : : data.replace("=>",":")
Quote all the Keys : re.sub(" (\w+) ",r"'\1'",data.replace("=>",":"))
You can then feed it to ast.literal_eval
import re,ast
ast.literal_eval(re.sub(" (\w+) ",r"'\1'",data.replace("=>",":")))
http://docs.python.org/library/configparser.html
You want to try that out for this.
But your config file format will want to change to a more ini format
[section]
key = value
http://deron.meranda.us/python/demjson/
demjson also is nice for python objects -> strings and back.
I tend to use these in this situation.
Related
I am currently trying to parse 2 text files, then have a .csv output. One contains a list of path/file location, and the other is contains other info related to the path/file location.
1st text file contains (path.txt):
C:/Windows/System32/vssadmin.exe
C:/Users/Administrator/Desktop/google.com
2nd text file contains (filelist.txt):
-= List of files in hash: =-
$VAR1 = {
'File' => [
{
'RootkitInfo' => 'Normal',
'FileVersionLabel' => '6.1.7600.16385',
'ProductVersion' => '6.1.7601.17514',
'Path' => 'C:/Windows/System32/vssadmin.exe',
'Signer' => 'Microsoft Windows',
'Size' => '210944',
'SHA1' => 'da39a3ee5e6b4b0d3255bfef95601890afd80709'
},
{
'RootkitInfo' => 'Normal',
'FileVersionLabel' => '6.1.7600.16385',
'ProductVersion' => '6.1.7601.17514',
'Path' => 'C:/Users/Administrator/Desktop/steam.exe',
'Signer' => 'Valve Inc.',
'Size' => '300944',
'SHA1' => 'cf23df2207d99a74fbe169e3eba035e633b65d94'
},
{
'RootkitInfo' => 'Normal',
'FileVersionLabel' => '6.1.7600.16385',
'ProductVersion' => '6.1.7601.17514',
'Path' => 'C:/Users/Administrator/Desktop/google.com',
'Signer' => 'Valve Inc.',
'Size' => '300944',
'SHA1' => 'cf23df2207d99a74fbe169e3eba035e633b78987'
},
.
.
.
]
}
How do I go about having a .csv output containing the path of the file with its corresponding hash value? Also, in case I would like to add additional column/info corresponding to the path?
Sample table output:
<table>
<tr>
<th>File Path</th>
<th>Hash Value</th>
</tr>
<tr>
<td>C:/Windows/System32/vssadmin.exe</td>
<td>da39a3ee5e6b4b0d3255bfef95601890afd80709</td>
</tr>
<tr>
<td>C:/Users/Administrator/Desktop/google.com</td>
<td>cf23df2207d99a74fbe169e3eba035e633b78987</td>
</tr>
</table>
You could construct regex pattern that matches what you are looking for
pattern = r"""{.*?(C:/Windows/System32/vssadmin.exe).*?'SHA1' => '([^']*)'.*?}"""
To use it with multiple file names in a loop turn that pattern into a format string.
fmt = r"""{{.*?({}).*?'SHA1' => '([^']*)'.*?}}"""
Something like this:
import re
with open('filelist.txt') as f:
s = f.read()
with open('path.txt') as f:
for line in f:
pattern = fmt.format(line.strip())
m = re.search(pattern, s, flags=re.DOTALL)
if m:
print(m.groups())
else:
print('no match for', fname)
It's a little inefficient and depends on the contents of the files to be exactly like you represented - like capitalization being the same.
Or without regular expressions: iterate over the lines of filelist.txt; find the Path line; extract the path with a slice, see if it is a path from path.txt; find the very next SHA1 line; extract the hash with a slice. This relies on the position of the two lines relative to each other and the position of the characters in each line. This will probably be more efficient.
with open('path.txt') as f:
fnames = set(line.strip() for line in f)
with open('filelist.text') as f:
for line in f:
line = line.strip()
if line.startswith("'Path'") and line[11:-2] in fnames:
name = line[11:-2]
while not line.startswith("'SHA1'"):
line = next(f)
line = line.strip()
print((name, line[11:-2]))
This one also assumes the text files are as you represented them.
To parse the alleged second .txt (of which it is not), you will need to re-structure it so that it looks like a normal python data structure. It's pretty close, and there are ways to coerce it to look like one:
import ast
contents = "" # this will be to hold the read contents of that file
filestart = False
with open('filelist.txt') as fh:
for line in fh:
if not filestart and not line.startswith("$VAR"):
continue
elif line.startswith("$VAR"):
contents+="{" # start the dictionary
filestart = True # to kill the first if statement
else:
contents += line # fill out with rest of file
# create dictionary, we use ast here because json will fail
result = ast.literal_eval(contents.replace("=>", ":"))
# {'File': [{'RootkitInfo': 'Normal', 'FileVersionLabel': '6.1.7600.16385', 'ProductVersion': '6.1.7601.17514', 'Path': 'C:/Windows/System32/vssadmin.exe', 'Signer': 'Microsoft Windows', 'Size': '210944', 'SHA1': 'da39a3ee5e6b4b0d3255bfef95601890afd80709'}, {'RootkitInfo': 'Normal', 'FileVersionLabel': '6.1.7600.16385', 'ProductVersion': '6.1.7601.17514', 'Path': 'C:/Users/Administrator/Desktop/steam.exe', 'Signer': 'Valve Inc.', 'Size': '300944', 'SHA1': 'cf23df2207d99a74fbe169e3eba035e633b65d94'}, {'RootkitInfo': 'Normal', 'FileVersionLabel': '6.1.7600.16385', 'ProductVersion': '6.1.7601.17514', 'Path': 'C:/Users/Administrator/Desktop/google.com', 'Signer': 'Valve Inc.', 'Size': '300944', 'SHA1': 'cf23df2207d99a74fbe169e3eba035e633b78987'}]}
files = result["File"] # get your list from here
Now that it's in a tolerable format, I'd convert it to a dict of file: hash key-value pairs for easy lookup against your other file
files_dict = {file['Path']: file['SHA1'] for file in files}
# now grab your other file, and lookups should be quite simple
with open("path.txt") as fh:
results = [f"{filepath.strip()}, {files_dict.get(filepath.strip())}" for filepath in fh]
# Now you can put that to a csv
with open("paths.csv", "w") as fh:
fh.write('File Path, Hash Value') # write the header
fh.write('\n'.join(results))
There are better ways to do this, but that could be left as an exercise to the reader
I am new to python and am trying to read a file and create a dictionary from it.
The format is as follows:
.1.3.6.1.4.1.14823.1.1.27 {
TYPE = Switch
VENDOR = Aruba
MODEL = ArubaS3500-48T
CERTIFICATION = CERTIFIED
CONT = Aruba-Switch
HEALTH = ARUBA-Controller
VLAN = Dot1q INSTRUMENTATION:
Card-Fault = ArubaController:DeviceID
CPU/Memory = ArubaController:DeviceID
Environment = ArubaSysExt:DeviceID
Interface-Fault = MIB2
Interface-Performance = MIB2
Port-Fault = MIB2
Port-Performance = MIB2
}
The first line OID (.1.3.6.1.4.1.14823.1.1.27 { ) I want this to be the key and the remaining lines are the values until the }
I have tried a few combinations but am not able to get the correct regex to match these
Any help please?
I have tried something like
lines = cache.readlines()
for line in lines:
searchObj = re.search(r'(^.\d.*{)(.*)$', line)
if searchObj:
(oid, cert ) = searchObj.groups()
results[searchObj(oid)] = ", ".join(line[1:])
print("searchObj.group() : ", searchObj.group(1))
print("searchObj.group(1) : ", searchObj.group(2))
You can try this:
import re
data = open('filename.txt').read()
the_key = re.findall("^\n*[\.\d]+", data)
values = [re.split("\s+\=\s+", i) for i in re.findall("[a-zA-Z0-9]+\s*\=\s*[a-zA-Z0-9]+", data)]
final_data = {the_key[0]:dict(values)}
Output:
{'\n.1.3.6.1.4.1.14823.1.1.27': {'VENDOR': 'Aruba', 'CERTIFICATION': 'CERTIFIED', 'Fault': 'MIB2', 'VLAN': 'Dot1q', 'Environment': 'ArubaSysExt', 'HEALTH': 'ARUBA', 'Memory': 'ArubaController', 'Performance': 'MIB2', 'CONT': 'Aruba', 'MODEL': 'ArubaS3500', 'TYPE': 'Switch'}}
You could use a nested dict comprehension along with an outer and inner regex.
Your blocks can be separated by
.numbers...numbers.. {
// values here
}
In terms of regular expression this can be formulated as
^\s* # start of line + whitespaces, eventually
(?P<key>\.[\d.]+)\s* # the key
{(?P<values>[^{}]+)} # everything between { and }
As you see, we split the parts into key/value pairs.
Your "inner" structure can be formulated like
(?P<key>\b[A-Z][-/\w]+\b) # the "inner" key
\s*=\s* # whitespaces, =, whitespaces
(?P<value>.+) # the value
Now let's build the "outer" and "inner" expressions together:
rx_outer = re.compile(r'^\s*(?P<key>\.[\d.]+)\s*{(?P<values>[^{}]+)}', re.MULTILINE)
rx_inner = re.compile(r'(?P<key>\b[A-Z][-/\w]+\b)\s*=\s*(?P<value>.+)')
result = {item.group('key'):
{match.group('key'): match.group('value')
for match in rx_inner.finditer(item.group('values'))}
for item in rx_outer.finditer(string)}
print(result)
A demo can be found on ideone.com.
I have json file which has duplicate keys.
Example
{
"data":"abc",
"data":"xyz"
}
I want to make this as
{
"data1":"abc",
"data2":"xyz"
}
I tried using object_pairs_hook with json_loads, but it is not working. Could anyone one help me with Python solution for above problem
You can pass the load method a keyword parameter to handle pairing, there you can check for duplicates like this:
raw_text_data = """{
"data":"abc",
"data":"xyz",
"data":"xyz22"
}"""
def manage_duplicates(pairs):
d = {}
k_counter = Counter(defaultdict(int))
for k, v in pairs:
d[k+str(k_counter[k])] = v
k_counter[k] += 1
return d
print(json.loads(raw_text_data, object_pairs_hook=manage_duplicates))
I used Counter to count each key, if it already exists, I'm saving the key as k+str(k_counter[k) - so it will be added with a trailing number.
P.S
If you have control on the input, I would highly recommend to change your json structure to:
{"data": ["abc", "xyz"]}
The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:
The names within an object SHOULD be unique.
A quick and dirty solution using re.
import re
s = '{ "data":"abc", "data":"xyz", "test":"one", "test":"two", "no":"numbering" }'
def find_dupes(s):
keys = re.findall(r'"(\w+)":', s)
return list(set(filter(lambda w: keys.count(w) > 1, keys)))
for key in find_dupes(s):
for i in range(1, len(re.findall(r'"{}":'.format(key), s)) + 1):
s = re.sub(r'"{}":'.format(key), r'"{}{}":'.format(key, i), s, count=1)
print(s)
Prints this string:
{
"data1":"abc",
"data2":"xyz",
"test1":"one",
"test2":"two",
"no":"numbering"
}
Given the following data received from a web form:
for key in request.form.keys():
print key, request.form.getlist(key)
group_name [u'myGroup']
category [u'social group']
creation_date [u'03/07/2013']
notes [u'Here are some notes about the group']
members[0][name] [u'Adam']
members[0][location] [u'London']
members[0][dob] [u'01/01/1981']
members[1][name] [u'Bruce']
members[1][location] [u'Cardiff']
members[1][dob] [u'02/02/1982']
How can I turn it into a dictionary like this? It's eventually going to be used as JSON but as JSON and dictionaries are easily interchanged my goal is just to get to the following structure.
event = {
group_name : 'myGroup',
notes : 'Here are some notes about the group,
category : 'social group',
creation_date : '03/07/2013',
members : [
{
name : 'Adam',
location : 'London',
dob : '01/01/1981'
}
{
name : 'Bruce',
location : 'Cardiff',
dob : '02/02/1982'
}
]
}
Here's what I have managed so far. Using the following list comprehension I can easily make sense of the ordinary fields:
event = [ (key, request.form.getlist(key)[0]) for key in request.form.keys() if key[0:7] != "catches" ]
but I'm struggling with the members list. There can be any number of members. I think I need to separately create a list for them and add that to a dictionary with the non-iterative records. I can get the member data like this:
tmp_members = [(key, request.form.getlist(key)) for key in request.form.keys() if key[0:7]=="members"]
Then I can pull out the list index and field name:
member_arr = []
members_orig = [ (key, request.form.getlist(key)[0]) for key in request.form.keys() if key[0:7] ==
"members" ]
for i in members_orig:
p1 = i[0].index('[')
p2 = i[0].index(']')
members_index = i[0][p1+1:p2]
p1 = i[0].rfind('[')
members_field = i[0][p1+1:-1]
But how do I add this to my data structure. The following won't work because I could be trying to process members[1][name] before members[0][name].
members_arr[int(members_index)] = {members_field : i[1]}
This seems very convoluted. Is there a simper way of doing this, and if not how can I get this working?
You could store the data in a dictionary and then use the json library.
import json
json_data = json.dumps(dict)
print(json_data)
This will print a json string.
Check out the json library here
Yes, convert it to a dictionary, then use json.dumps(), with some optional parameters, to print out the JSON in the format you need:
eventdict = {
'group_name': 'myGroup',
'notes': 'Here are some notes about the group',
'category': 'social group',
'creation_date': '03/07/2013',
'members': [
{'name': 'Adam',
'location': 'London',
'dob': '01/01/1981'},
{'name': 'Bruce',
'location': 'Cardiff',
'dob': '02/02/1982'}
]
}
import json
print json.dumps(eventdict, indent=4)
The order of the key:value pairs is not always consistent, but if you're just looking for pretty-looking JSON that can be parsed by a script, while remaining human-readable, this should work. You can also sort the keys alphabetically, using:
print json.dumps(eventdict, indent=4, sort_keys=True)
The following python functions can be used to create a nested dictionary from the flat dictionary. Just pass in the html form output to decode().
def get_key_name(str):
first_pos = str.find('[')
return str[:first_pos]
def get_subkey_name(str):
'''Used with lists of dictionaries only'''
first_pos = str.rfind('[')
last_pos = str.rfind(']')
return str[first_pos:last_pos+1]
def get_key_index(str):
first_pos = str.find('[')
last_pos = str.find(']')
return str[first_pos:last_pos+1]
def decode(idic):
odic = {} # Initialise an empty dictionary
# Scan all the top level keys
for key in idic:
# Nested entries have [] in their key
if '[' in key and ']' in key:
if key.rfind('[') == key.find('[') and key.rfind(']') == key.find(']'):
print key, 'is a nested list'
key_name = get_key_name(key)
key_index = int(get_key_index(key).replace('[','',1).replace(']','',1))
# Append can't be used because we may not get the list in the correct order.
try:
odic[key_name][key_index] = idic[key][0]
except KeyError: # List doesn't yet exist
odic[key_name] = [None] * (key_index + 1)
odic[key_name][key_index] = idic[key][0]
except IndexError: # List is too short
odic[key_name] = odic[key_name] + ([None] * (key_index - len(odic[key_name]) + 1 ))
# TO DO: This could be a function
odic[key_name][key_index] = idic[key][0]
else:
key_name = get_key_name(key)
key_index = int(get_key_index(key).replace('[','',1).replace(']','',1))
subkey_name = get_subkey_name(key).replace('[','',1).replace(']','',1)
try:
odic[key_name][key_index][subkey_name] = idic[key][0]
except KeyError: # Dictionary doesn't yet exist
print "KeyError"
# The dictionaries must not be bound to the same object
odic[key_name] = [{} for _ in range(key_index+1)]
odic[key_name][key_index][subkey_name] = idic[key][0]
except IndexError: # List is too short
# The dictionaries must not be bound to the same object
odic[key_name] = odic[key_name] + [{} for _ in range(key_index - len(odic[key_name]) + 1)]
odic[key_name][key_index][subkey_name] = idic[key][0]
else:
# This can be added to the output dictionary directly
print key, 'is a simple key value pair'
odic[key] = idic[key][0]
return odic
a bit new to python and json.
i have this json file:
{ "hosts": {
"example1.lab.com" : ["mysql", "apache"],
"example2.lab.com" : ["sqlite", "nmap"],
"example3.lab.com" : ["vim", "bind9"]
}
}
what i want to do is use the hostname variable and extract the values of each hostname.
its a bit hard to explain but im using saltstack, which already iterates over hosts and i want it to be able to extract each host's values from the json file using the hostname variable.
hope im understood.
thanks
o.
You could do something along these lines:
import json
j='''{ "hosts": {
"example1.lab.com" : ["mysql", "apache"],
"example2.lab.com" : ["sqlite", "nmap"],
"example3.lab.com" : ["vim", "bind9"]
}
}'''
specific_key='example2'
found=False
for key,di in json.loads(j).iteritems(): # items on Py 3k
for k,v in di.items():
if k.startswith(specific_key):
found=True
print k,v
break
if found:
break
Or, you could do:
def pairs(args):
for arg in args:
if arg[0].startswith(specific_key):
k,v=arg
print k,v
json.loads(j,object_pairs_hook=pairs)
Either case, prints:
example2.lab.com [u'sqlite', u'nmap']
If you have the JSON in a string then just use Python's json.loads() function to load JSON parse the JSON and load its contents into your namespace by binding it to some local name
Example:
#!/bin/env python
import json
some_json = '''{ "hosts": {
"example1.lab.com" : ["mysql", "apache"],
"example2.lab.com" : ["sqlite", "nmap"],
"example3.lab.com" : ["vim", "bind9"]
}
}'''
some_stuff = json.loads(some_json)
print some_stuff['hosts'].keys()
---> [u'example1.lab.com', u'example3.lab.com', u'example2.lab.com']
As shown you then access the contents of some_stuff just as you would any other Python dictionary ... all the top level variable declaration/assignments which were serialized (encoded) in the JSON will be keys in that dictionary.
If the JSON contents are in a file you can open it like any other file in Python and pass the file object's name to the json.load() function:
#!/bin/python
import json
with open("some_file.json") as f:
some_stuff = json.load(f)
print ' '.join(some_stuff.keys())
If the above json file is stored as 'samplefile.json', you can write following in python:
import json
f = open('samplefile.json')
data = json.load(f)
value1 = data['hosts']['example1.lab.com']
value2 = data['hosts']['example2.lab.com']
value3 = data['hosts']['example3.lab.com']