im creating a function that will handle xml data, the data can vary but the structure is the same :
events ( list like )
event
info
additional info
the function needs to create a dictionary that contains a mapping in which if the data being looped is not 0 then the data needs to be mapped in a dictionary, heres my solution:
def parse_items(self, xml):
""" Builds a dynamic dictionary tree wich holds each event in a dictionary
that can be accessed by number of event """
parsed_items = {}
parsed_item = {}
sub_info = {}
for num, item in enumerate(xml):
for tag in item:
if len(tag) != 0:
for info in tag:
sub_info[info.tag] = info.text
parsed_item[tag.tag] = sub_info
# Need to flush the dictionary else it will repeat info
sub_info = {}
else:
parsed_item[tag.tag] = tag.text
parsed_items[num] = parsed_item
# Need to flush the dictionary else it will repeat info
parsed_item = {}
return parsed_items
my question is, is there a way to make this dynamically without having to make for loops for every level of data ?
(Reposting as an answer, because the questioner intends to use the idea)
In the latest versions of Python, there are dict comprehensions as well as list comprehensions. Like this:
sub_info = {i.tag: i.text for i in tag}
Related
I can't seems to wrap my head around this silly issue. There are API requests that run simultaneously 23 different section stored in a dictionary :
polygonsDict = {
'Sect1':'100,6.3',
'Sect2':'100,6.0',
'Sect3':'100,5.5' # and more sections
}
urlDict = {
'traffic': 'https://google.com'
}
Here is the code where I iteratively :
section_key = list(polygonsDict.keys()) #convert the dictionary key to list for iteration
for idx, section in enumerate(section_key):
traffics(section ,urlDict['traffic']+polygonsDict[section]).getPolygonTraffics() #This line is constructing the link for request.
Then, the link is send to a class called traffics with getPolygonTraffics function :
class traffics:
def __init__(self, name, traffics):
self.traffics = traffics
self.name = name
def getPolygonTraffics(self):
try :
print("TRF: Running for "+self.name+"...")
raw = req.get(self.traffics).json()
traffics_raw = [raw] #wrap the dict to list
traffics_ls = []
for ls in traffics_raw:
traffics_ls.append(ls)
#Melt the list of dictionary to form a DataFrame
traffics_df = pd.DataFrame(traffics_ls).explode('jams').reset_index(drop=True)
print(traffics_df)
#exception when `jams` is not found in the list of dict.
except KeyError:
print("Skip!")
In getPolygonTraffics, I want to append every traffics_raw (the json requests) to one individual list and eventually, explode them to a dataFrame. How can I achieve this? I'm not very sure how to explain this.
Current output is multiple lists of each dictionary :
[{}]
[{}]
WHat I want is : [{},{},{}]. Then explode to DataFrame.
I have the follow two queryset:
opus = Work_Music.objects.filter(composerwork__person=self.kwargs['pk'], level=0).order_by('date_completed')
event = LifeEvent.objects.filter(person=self.kwargs['pk']).order_by('date_end')
The first pulls the work of a composer and the second pulls his life events.
I want to create a nested dictionary: The first level is keyed by year. The second level has two keys 'work' and 'life'. It should be a list of values because there could be multiple work and events in a given year.
I have written following:
# timeline = defaultdict(list)
timeline = dict()
for o in opus:
if o.date_comp_f not in timeline:
timeline[o.date_comp_f] = {}
timeline[o.date_comp_f]['work'] = {}
timeline[o.date_comp_f]['work'].append(o)
else:
timeline[o.date_comp_f]['work'].append(o)
for e in event:
if e.date_end_y not in timeline:
timeline[e.date_end_y] = {}
timeline[e.date_end_y]['life'] = {}
timeline[e.date_end_y]['life'].append(e)
else:
timeline[e.date_end_y]['life'].append(e)
timeline = dict(timeline)
I also want to sort the first level key in chronological order. How do I do this? I keep getting Key errors.
You were using {} when you wanted to use a list? I'm guessing this was a typo, but here is the fix (with a few simplifications):
# timeline = defaultdict(list)
timeline = dict()
for o in opus:
if o.date_comp_f not in timeline:
timeline[o.date_comp_f] = {}
timeline[o.date_comp_f]['work'] = []
timeline[o.date_comp_f]['work'].append(o)
for e in event:
if e.date_end_y not in timeline:
timeline[e.date_end_y] = {}
timeline[e.date_end_y]['life'] = []
timeline[e.date_end_y]['life'].append(e)
timeline = dict(timeline)
If this doesn't work, (which I assume it doesn't) can you please provide the Work_Music and LifeEvents models?
I am facing difficulty in writing a method that updates the dictionaries based on certain dynamic action
I have a list of dictionaries like
data_list = [{'x_data':1.987, 'y_data':25.9, 'plant_id':12}, {'x_data':1.024, 'y_data':19.9, 'plant_id':14}]
action = "x_data+y_data"
#or it may be "y_data/x_data" or "x_data*y_data" and "1/(x_data*y_data").
output : [{z_data:27.887 'plant_id':12}, {z_data:20.925, 'plant_id':14}]
So, I want a method that takes list and action and gives an updated list.
update_dictionaries(data_list, action):
return updated_list
Thank you.
Use pythons eval function https://docs.python.org/3/library/functions.html#eval
You should be able to use the following example to modify your code to do what you want
def my_func(x,y,action):
try:
z = eval(action)
except:
z = None
return z
my_func(1,2,"x+y")
#3
my_func(2,3,"x*y")
#6
Hope this helps.
def update_dictionaries(data_list, action):
action = action.replace("x_data","dic['x_data']")
action = action.replace("y_data","dic['y_data']")
for dic in data_list:
ans = eval(action)
dic['z_data']=ans
dic.pop('x_data',None)
dic.pop('y_data',None)
return data_list
Thank me later!
I have a unique (unique keys) dictionnary that I update adding some new keys depending data on a webpage.
and I want to process only the new keys that may appear after a long time. Here is a piece of code to understand :
a = UniqueDict()
while 1:
webpage = update() # return a list
for i in webpage:
title = getTitle(i)
a[title] = new_value # populate only new title obtained because it's a unique dictionnary
if len(a) > 50:
a.clear() # just to clear dictionnary if too big
# Condition before entering this loop to process only new title entered
for element in a.keys():
process(element)
Is there a way to know only new keys added in the dictionnary (because most of the time, it will be the same keys and values so I don't want them to be processed) ?
Thank you.
What you might also do, is keep the processed keys in a set.
Then you can check for new keys by using set(d.keys()) - set_already_processed.
And add processed keys using set_already_processed.add(key)
You may want to use a OrderedDict:
Ordered dictionaries are just like regular dictionaries but they remember the order that items were inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added.
Make your own dict that tracks additions:
class NewKeysDict(dict):
"""A dict, but tracks keys that are added through __setitem__
only. reset() resets tracking to begin tracking anew. self.new_keys
is a set holding your keys.
"""
def __init__(self, *args, **kw):
super(NewKeysDict, self).__init__(*args, **kw)
self.new_keys = set()
def reset(self):
self.new_keys = set()
def __setitem__(self, key, value):
super(NewKeysDict, self).__setitem__(key, value)
self.new_keys.add(key)
d = NewKeysDict((i,str(i)) for i in range(10))
d.reset()
print(d.new_keys)
for i in range(5, 10):
d[i] = '{} new'.format(i)
for k in d.new_keys:
print(d[k])
(because most of the time, it will be the same keys and values so I don't want them to be processed)
You get complicate !
The keys are immutable and unique.
Each key is followed by a value separated, by a colon.
dict = {"title",title}
text = "textdude"
dict["keytext"]=text
This is add a value textdude, with the new key called "keytext".
For check, we use "in".
"textdude" in dict
He return true
I am using the following sets of generators to parse XML in to CSV:
import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import csv
def flatten_list(aList, prefix=''):
for i, element in enumerate(aList, 1):
eprefix = "{}{}".format(prefix, i)
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, eprefix)
# treat like list
elif element[0].tag == element[1].tag:
yield from flatten_list(element, eprefix)
elif element.text:
text = element.text.strip()
if text:
yield eprefix[:].rstrip('.'), element.text
def flatten_dict(parent_element, prefix=''):
prefix = prefix + parent_element.tag
if parent_element.items():
for k, v in parent_element.items():
yield prefix + k, v
for element in parent_element:
eprefix = element.tag
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, prefix=prefix)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
yield from flatten_list(element, prefix=eprefix)
# if the tag has attributes, add those to the dict
if element.items():
for k, v in element.items():
yield eprefix+k
# this assumes that if you've got an attribute in a tag,
# you won't be having any text. This may or may not be a
# good idea -- time will tell. It works for the way we are
# currently doing XML configuration files...
elif element.items():
for k, v in element.items():
yield eprefix+k
# finally, if there are no child tags and no attributes, extract
# the text
else:
yield eprefix, element.text
def makerows(pairs):
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend((v,))
else:
headers.append(k)
columns[k] = [k, v]
m = max(len(c) for c in columns.values())
for c in columns.values():
c.extend(' ' for i in range(len(c), m))
L = [columns[k] for k in headers]
rows = list(zip(*L))
return rows
def main():
with open('2-Response_duplicate.xml', 'r', encoding='utf-8') as f:
xml_string = f.read()
xml_string= xml_string.replace('', '') #optional to remove ampersands.
root = ElementTree.XML(xml_string)
# for key, value in flatten_dict(root):
# key = key.rstrip('.').rsplit('.', 1)[-1]
# print(key,value)
writer = csv.writer(open("try5.csv", 'wt'))
writer.writerows(makerows(flatten_dict(root)))
if __name__ == "__main__":
main()
One column of the CSV, when opened in Excel, looks like this:
ObjectGuid
2adeb916-cc43-4d73-8c90-579dd4aa050a
2e77c588-56e5-4f3f-b990-548b89c09acb
c8743bdd-04a6-4635-aedd-684a153f02f0
1cdc3d86-f9f4-4a22-81e1-2ecc20f5e558
2c19d69b-26d3-4df0-8df4-8e293201656f
6d235c85-6a3e-4cb3-9a28-9c37355c02db
c34e05de-0b0c-44ee-8572-c8efaea4a5ee
9b0fe8f5-8ec4-4f13-b797-961036f92f19
1d43d35f-61ef-4df2-bbd9-30bf014f7e10
9cb132e8-bc69-4e4f-8f29-c1f503b50018
24fd77da-030c-4cb7-94f7-040b165191ce
0a949d4f-4f4c-467e-b0a0-40c16fc95a79
801d3091-c28e-44d2-b9bd-3bad99b32547
7f355633-426d-464b-bab9-6a294e95c5d5
This is due to the fact that there are 14 tags with name ObjectGuid. For example, one of these tags looks like this:
<ObjectGuid>2adeb916-cc43-4d73-8c90-579dd4aa050a</ObjectGuid>
My question: is there an efficient method to enumerate the headers (the keys) such that each key is enumerated like so with it's corresponding value (text in the XML data structure):
It would be displayed in Excel as follows:
ObjectGuid_1 ObjectGuid_2 ObejectGuid3 etc.....
Please let me know if there is any other information that you need from me (such as sample XML). Thank you for your help.
It is a mistake to add an element, attribute, or annotative descriptor to the data set itself for the purpose of identity… Normalizing the data should only be done if you own that data and know with 100% guarantee that doing so will not
have any negative effect on additional consumers (ones relying on attribute order to manipulate the DOM). However what is the point of using a dict or nested dicts (which I don’t quite get either t) if the efficiency of the hashed table lookup is taken right back by making 0(n) checks for this attribute new attribute? The point of this hashing is random look up..
If it’s simply the structured in (key, value) pair, which makes sense here.. Why not just use some other contiguous data structure, but treat it like a dictionary.. Say a named tuple…
A second solution is if you want to add additional state is to throw your generator in a class.
class order:
def__init__(self, lines, order):
self.lines = lines
self.order - python(max)
def __iter__(self):
for l, line in enumerate(self.lines, 1);
self.order.append( l, line))
yield line
when open (some file.csv) as f:
lines = oder( f);
Messing with the data a Harmless conversion? For example if were to create a conversion dictionary (see below)
Well that’s fine, that is until one of the values is blank…
types = [ (‘x ’, float’),
(‘y’, float’)
with open(‘some.csv’) as f:
for row in cvs.DictReader(f):
row.update((key, conversion (row [ key]))
for key, conversion in field_types)
[x: ,y: 2. 2] — > that is until there is an empty data point.. Kaboom.
So My suggestion would not be to change or add to the data, but change the algorithm in which deal with such.. If the problem is order why not simply treat say a tuple as a named tuple similar to a dictionary, the caveat being mutability however makes sense with data uniformity...
*I don’t understand the nested dictionary…That is for the y header values yes?
values and order key —> key — > ( key: value ) ? or you could just skip the
first row :p..
So just skip the first row..
for line in {header: list, header: list }
line.next() # problem solved.. or print(line , end = ‘’)
*** Notables
-To iterator over multiple sequences in parallel
h = [a,b,c]
x = [1,2,3]
for i in zip(a,b):
print(i)
(a, 1)
(b, 2)
-Chaining
a = [1,2 , 3]
b= [a, b , c ]enter code here
for x in chain(a, b):
//remove white space