So i have a list of elements:
elements = [room1, room2, room3]
I also have a list of key/value attributes that each room has:
keys = ["level", "finish1", "finish2"]
values = [["ground", "paint1", "carpet1"],["ground", "paint1", "paint2"], ["second level", "paint1", "paint2"]]
is there a way to serialize this two lists into a json file structured like this:
{'room1': [{'level': 'ground', 'finish1': 'paint1', 'finish2': 'carpet1'}],'room2': [{'level': 'ground', 'finish1': 'paint1', 'finish2': 'paint2'}],'room3': [{'level': 'second level', 'finish1': 'paint1', 'finish2': 'paint2'}]}
I am on this weird platform that doesnt support dictionaries so I created a class for them:
class collection():
def __init__(self,name,key,value):
self.name = name
self.dict = {}
self.dict[key] = value
def __str__(self):
x = str(self.name) + " collection"
for key,value in self.dict.iteritems():
x = x + '\n'+ ' %s= %s ' % (key, value)
return x
then i found a peiece of code that would allow me to create a basic json code from two parallel lists:
def json_list(keys,values):
lst = []
for pn, dn in zip(values, keys):
d = {}
d[dn]=pn
lst.append(d)
return json.dumps(lst)
but this code desnt give me the {room1: [{ ... structure
Any ideas would be great. This software I am working with is based on IronPython2.7
Ok, so the above worked great. I got a great feedback from Comments. I have one more variation that I didnt account for. Sometimes when I try to mix more than singe element type (rooms, columns etc) they might not have the same amount of attributes. For example a room can have (level, finish and finish) while column might have only thickness and material. If i kept it all organized in parallel lists key/value is it possible to modify the definition below:
keys = [[thickness, material],[level,finish,finish]]
values = [[100,paint],[ground,paint,paint]]
elements = [column,room]
How would i need to modify the definition below to make it work? Again I want to export a json file.
I don't know how Python can even work without dictionaries, so please just test this and tell me the error it shows you:
import json
elements = ['r1','r2','r3']
keys = ["level", "finish1", "finish2"]
values = [["ground", "paint1", "carpet1"],["ground", "paint1", "paint2"], ["second level", "paint1", "paint2"]]
d = dict()
for (index, room) in enumerate(elements):
d[room] = dict()
for (index2, key) in enumerate(keys):
d[room][key] = values[index][index2]
print json.dumps(d)
This may work.
#-*- encoding: utf-8 -*-
import json
elements = ["room1", "room2", "room3"]
keys = ["level", "finish1", "finish2"]
values = [["ground", "paint1", "carpet1"],["ground", "paint1", "paint2"], ["second level", "paint1", "paint2"]]
what_i_want = dict((room, [dict(zip(keys, value))])
for room, value in zip(elements, values))
print(json.dumps(what_i_want))
Related
Given the sample xml below:
<_Document>
<_Data1> 'foo'
<_SubData1> 'bar1' </_SubData1>
<_SubData2> 'bar2' </_SubData2>
<_SubData3> 'bar3' </_SubData3>
</_Data1>
</_Document>
I want to capture each SubData value and update it with the Data1 value in a dictionary and then append that value to a list. Such that the output would look something like:
[{Data1: 'foo', SubData1: 'bar1'}, {Data1: 'foo', SubData2: 'bar2'}, {Data1: 'foo', SubData3: 'bar3'}]
My code is:
from lxml import etree
import re
new_records = []
for child in root.iter('_Document'): #finding all children with each 'Document' string
for top_data in child.iter(): #iterating through the entirety of each 'Document' sections tags and text.
if "Data" in top_data.tag:
for data in top_data:
rec = {}
if data.text is not None and data.text.isspace() is False: #avoiding NoneTypes and empty data.
g = data.tag.strip("_") #cleaning up the tag
rec[g] = data.text.replace("\n", " ") #cleaning up the value
for b in re.finditer(r'^_SubData', data.tag): #searching through each 'SubData' contained in a given tag.
for subdata in data:
subdict = {}
if subdata.text is not None: #again preventing NoneTypes
z = subdata.tag.strip("_") #tag cleaning
subdict[z] = subdata.text.replace("\n", " ") #text cleaning
rec.update(subdict) #update the data record dictionary with the subdata
new_records.append(rec) #appending to the list
This, unfortunately, outputs:
[{Data1: 'foo', SubData3: 'bar3'}]
As it only updates and appends the final update of the dictionary.
I've tried different varieties of this including initializing a list after the first 'if' statement in the second for loop to append after each loop pass, but that required quite a bit of clean up at the end to get through the nesting it would cause.
I've also tried initializing empty dictionaries outside of the loops to update to preserve the previous updates and append that way.
I'm curious if there is some functionality of lxml that I've missed or a more pythonic approach to get the desired output.
I offered what I think of as a declarative approach in another solution. If you're more comfortable explicitly defining the structure with loops, here's an imperative approach:
from xml.etree import ElementTree as ET
import pprint
new_records = []
document = ET.parse('input.xml').getroot()
for elem in document:
if elem.tag.startswith('_Data'):
data = elem
data_name = data.tag[1:] # skip leading '_'
data_val = data.text.strip()
for elem in data:
if elem.tag.startswith('_SubData'):
subdata = elem
subdata_name = subdata.tag[1:]
subdata_val = subdata.text.strip()
new_records.append(
{data_name: data_val, subdata_name: subdata_val}
)
pprint.pprint(new_records)
The input and output is the same as in my other solution.
You can do this with Python's built-in ElementTree class and its iterparse() method which walks an XML tree and produces a pair of event and element for every step through the tree. We listen for when it starts parsing an element, and if its _Data... or _SubData... we act.
This is a declarative approach, and relies on the fact that _SubData is only a child of _Data, that is, that your very small and simple sample is exactly representative of what you're actually dealing with.
You'll need to manage a little state for the _Data elements, but that's it:
from xml.etree import ElementTree as ET
import pprint
new_records = []
data_name = None
data_val = None
for event, elem in ET.iterparse('input.xml', ['start']):
tag_name = elem.tag[1:] # skip possible leading '_'
if event == 'start' and tag_name.startswith('Data'):
data_name = tag_name
data_val = elem.text.strip()
if event == 'start' and tag_name.startswith('SubData'):
subdata_name = tag_name
subdata_val = elem.text.strip()
record = {
data_name: data_val, subdata_name: subdata_val
}
new_records.append(record)
pprint.pprint(new_records)
I modified your sample, my input.xml:
<_Document>
<_Data1>foo
<_SubData1>bar1</_SubData1>
<_SubData2>bar2</_SubData2>
<_SubData3>bar3</_SubData3>
</_Data1>
<_Data2>FOO
<_SubData1>BAR1</_SubData1>
<_SubData2>BAR2</_SubData2>
<_SubData3>BAR3</_SubData3>
</_Data2>
</_Document>
When I run my script on that input, I get:
[{'Data1': 'foo', 'SubData1': 'bar1'},
{'Data1': 'foo', 'SubData2': 'bar2'},
{'Data1': 'foo', 'SubData3': 'bar3'},
{'Data2': 'FOO', 'SubData1': 'BAR1'},
{'Data2': 'FOO', 'SubData2': 'BAR2'},
{'Data2': 'FOO', 'SubData3': 'BAR3'}]
Consider dictionary comprehension using dictionary merge:
new_records = [
{
**{doc.tag.replace('_', ''): doc.text.strip().replace("'", "")},
**{data.tag.replace('_', ''): data.text.strip().replace("'", "")}
}
for doc in root.iterfind('*')
for data in doc.iterfind('*')
]
new_records
[{'Data1': 'foo', 'SubData1': 'bar1'},
{'Data1': 'foo', 'SubData2': 'bar2'},
{'Data1': 'foo', 'SubData3': 'bar3'}]
Good morning,
I have a configuration file with data like this:
[hostset 1]
ip = 192.168.122.136
user = test
password =
pkey = ~/.ssh/id_rsa
[hostset 2]
ip = 192.168.122.138
user = test
password =
pkey = ~/.ssh/id_rsa
I want to be able to join the ips of any given number of host sets in this configuration file if the other values are the same, so the ingested and formatted data would be stored in a dict, something like this:
{
ip: ['192.168.122.136', '192.168.122.138'],
user: 'test',
password: '',
pkey: '~/.ssh/id_rsa',
}
by doing something like:
from configparser import ConfigParser
def unpack(d):
return [value for key, value in d.items()]
def parse(configuration_file):
parser = ConfigParser()
parser.read(configuration_file)
hosts = [unpack(connection) for connection in [section for section in dict(parser).values()]][1:]
return [i for i in hosts]
if __name__ == '__main__':
parse('config.ini')
I can get a list of lists containing the elements of the configuration file, like this:
[['192.168.122.136', 'test', '', '~/.ssh/id_rsa'], ['192.168.122.138', 'test', '', '~/.ssh/id_rsa']]
Then I just need a way of comparing the two lists and if all elements are similar except for the ip, then join them into a list like:
[['192.168.122.136','192.168.122.138'], 'test', '', '~/.ssh/id_rsa']
So I would just need a smart way of doing this with a list of lists of no specific length and join all similar lists.
Got some help from a friend and solved the question. The key was making the values I wanted to compare into a tuple, making that tuple the key to a dictionary and the value the ips. From this. I can assert that if the tuple key already exists, then I will append the ip to the value.
from configparser import ConfigParser
from ast import literal_eval as literal
def unpack(d):
return [value for key, value in d.items()]
def parse(configuration_file):
parser = ConfigParser()
parser.read(configuration_file)
hosts = [unpack(connection) for connection in [section for section in dict(parser).values()]][1:]
d = dict()
for item in hosts:
try:
d[str((item[1:]))].append(item[0])
except KeyError:
d[str((item[1:]))] = [item[0]]
return d
if __name__ == '__main__':
for k, v in parse('config.ini').items():
print([v, *literal(k)])
In this solution, I presumed that the file format is exactly as described in the question:
First we split the host sets:
we suppose that your data is in rowdata variable
HostSets = rowdata.split("[hostset ") # first element is empty
Dict = {}
for i in range (1,len(HostSets)):
l = HostSets[i].split("ip = ")#two elements the first is trash
ip = l[1].split()[0]
conf =l[1].split("\n",1 )[1] #splits only the first element
try :
Dict[conf].append(ip)
except :
Dict[conf] = list()
Dict[conf].append(ip)
print('{')
for element in Dict:
print("ip: ",Dict[element],",",element)
print('}')
My code is as follows:
import json
def reformat(importscompanies):
#print importscompanies
container={}
child=[]
item_dict={}
for name, imports in importscompanies.iteritems():
item_dict['name'] = imports
item_dict['size'] = '500'
child.append(dict(item_dict))
container['name'] = name
container['children'] = child
if __name__ == '__main__':
raw_data = json.load(open('data/bricsinvestorsfirst.json'))
run(raw_data)
def run(raw_data):
raw_data2 = raw_data[0]
the_output = reformat(raw_data2)
My issue is, the code isn't going through the whole file. It's only outputting one entry. Why is this? Am I rewriting something and do I need another dict that appends with every loop?
Also, it seems as though the for loop is going through the iteritems for each dict key. Is there a way to make it pass only once?
The issue is indeed
raw_data2 = raw_data[0]
I ended up creating an iterator to access the dict values.
Thanks.
Lastly, I'm hoping my final Json file looks this way, using the data I provided above:
{'name': u'name', 'children': [{'name': u'500 Startups', 'size': '500'}, {'name': u'AffinityChina', 'size': '500'}]}
Try this. Though your sample input and output data don't really give many clues as to where the "name" fields should come from. I've assumed you wanted the name of the original item in your list.
original_json = json.load(open('data/bricsinvestorsfirst.json'),'r')
response_json = {}
response_json["name"] = "analytics"
# where your children list will go
children = []
size = 500 # or whatever else you want
# For each item in your original list
for item in original_json:
children.append({"name" : item["name"],
"size" : size})
response_json["children"] = children
print json.dumps(response_json,indent=2)
"It's only outputting one entry" because you only select the first dictionary in the JSON file when you say raw_data2 = raw_data[0]
Try something like this as a starting point (I haven't tested/ran it):
import json
def run():
with open('data/bricsinvestorsfirst.json') as input_file:
raw_data = json.load(input_file)
children = []
for item in raw_data:
children.append({
'name': item['name'],
'size': '500'
})
container = {}
container['name'] = 'name'
container['children'] = children
return json.dumps(container)
if __name__ == '__main__':
print run()
Alright, so basically I have a Google script that searches for a keyword. The results look like:
http://www.example.com/user/1234
http://www.youtube.com/user/125
http://www.forum.com/user/12
What could I do to organize these results like this?:
Forums:
http://www.forum.com/user/12
YouTubes:
http://www.youtube.com/user/125
Unidentified:
http://www.example.com/user/1234
By the way I'm organizing them with keywords. If the url has "forum" in it then it goes to the forum list, if it has YouTube it goes to the YouTube list, but if no keywords match up then it goes to unidentified.
1/. Create a dict, and assign an empty list to each keyword you have.
eg
my_dict = {'forums':[],'youtube':[],'unidentified':[]}
2/.Iterate over your urls.
3/. Generate a key for your url,domain name in your case, you can extract the key using re regex module.
4/ Check the dictionary ( of step#1) for this key, if it does not exist, assign it to 'unidentified key, if it exists, append this url to the list in the dictionary with that key.
Something like this? I guess you will be able to adapt this example to your needs
import pprint
import re
urls = ['http://www.example.com/user/1234',
'http://www.youtube.com/user/126',
'http://www.youtube.com/user/125',
'http://www.forum.com/useryoutube/12']
pattern = re.compile('//www\.(\w+)\.')
keys = ['forum', 'youtube']
results = dict()
for u in urls:
ms = pattern.search(u)
key = ms.group(1)
if key in keys:
results.setdefault(key, []).append(u)
pprint.pprint(results)
import urlparse
urls = """
http://www.example.com/user/1234
http://www.youtube.com/user/125
http://www.forum.com/user/12
""".split()
categories = {
"youtube.com": [],
"forum.com": [],
"unknown": [],
}
for url in urls:
netloc = urlparse.urlparse(url).netloc
if netloc.count(".") == 2:
# chop sub-domain
netloc = netloc.split(".", 1)[1]
if netloc in categories:
categories[netloc].append(url)
else:
categories["unknown"].append(url)
print categories
Parse the urls. Find the category. Append the full url
You should probably keep your sorted results in a dictionary and the unsorted ones in a list. You could then sort it like so:
categorized_results = {"forum": [], "youtube": []}
uncategorized_results = []
for i in results:
i = i.split(".")
for k in categorized_results:
j = True
if k in i:
categorized_results[k].append(i)
j = False
if j:
uncategorized_results.append(i)
If you'd like to output it neatly:
category_aliases: {"forum": "Forums:", "youtube": "Youtubes:"}
for i in categorized_results:
print(category_aliases[i])
for j in categorized_results[i]:
print(j)
print("\n")
print("Unidentified:")
print("\n".join(uncategorized_results)) # Let's not put in another for loop.
How about this:
from urlparse import urlparse
class Organizing_Results(object):
CATEGORY = {'example': [], 'youtube': [], 'forum': []}
def __init__(self):
self.url_list = []
def add_single_url(self, url):
self.url_list.append(urlparse(url))
def _reduce_result_list(self, acc, element):
for c in self.CATEGORY:
if c in element[1]:
return self.CATEGORY[c].append(element)
return self.CATEGORY['example'].append(element)
def get_result(self):
reduce(lambda x, y: c._reduce_result_list(x, y), c.url_list, [])
return self.CATEGORY
c = Organizing_Results()
c.add_single_url('http://www.example.com/user/1234')
c.add_single_url('http://www.youtube.com/user/1234')
c.add_single_url('http://www.unidentified.com/user/1234')
c.get_result()
You can easy broaden the class with more functions as you need.
I got two functions that return a list of dictionary and i'm trying to get json to encode it, it works when i try doing it with my first function, but now i'm appending second function with a syntax error of ": expected". I will eventually be appending total of 7 functions that each output a list of dict. Is there a better way of accomplishing this?
import dmidecode
import simplejson as json
def get_bios_specs():
BIOSdict = {}
BIOSlist = []
for v in dmidecode.bios().values():
if type(v) == dict and v['dmi_type'] == 0:
BIOSdict["Name"] = str((v['data']['Vendor']))
BIOSdict["Description"] = str((v['data']['Vendor']))
BIOSdict["BuildNumber"] = str((v['data']['Version']))
BIOSdict["SoftwareElementID"] = str((v['data']['BIOS Revision']))
BIOSdict["primaryBIOS"] = "True"
BIOSlist.append(BIOSdict)
return BIOSlist
def get_board_specs():
MOBOdict = {}
MOBOlist = []
for v in dmidecode.baseboard().values():
if type(v) == dict and v['dmi_type'] == 2:
MOBOdict["Manufacturer"] = str(v['data']['Manufacturer'])
MOBOdict["Model"] = str(v['data']['Product Name'])
MOBOlist.append(MOBOdict)
return MOBOlist
def get_json_dumps():
jsonOBJ = json
#Syntax error is here, i can't use comma to continue adding more, nor + to append.
return jsonOBJ.dumps({'HardwareSpec':{'BIOS': get_bios_specs()},{'Motherboard': get_board_specs()}})
Use multiple items within your nested dictionary.
jsonOBJ.dumps({
'HardwareSpec': {
'BIOS': get_bios_specs(),
'Motherboard': get_board_specs()
}
})
And if you want multiple BIOS items or Motherboard items, just use a list.
...
'HardwareSpec': {
'BIOS': [
get_bios_specs(),
get_uefi_specs()
]
...
}
If you want a more convenient lookup of specs, you can just embed a dict:
jsonOBJ.dumps({'HardwareSpec':{'BIOS': get_bios_specs(),
'Motherboard': get_board_specs()
}
})