Assign Nested Keys and Values in Dictionaries - python

I need to assign nested values to a dictionary. I have simplified my question for ease of understanding:
Data = {}
day1 = 'March12'
day2 = 'March14'
e1 = 'experiment1'
e2 = 'experiment2'
Data[day1][e1] = 4
But the Data[day1][e1] = 4 command does not function (for the same reason as test = {} ; test["foo"]["bar"] = 0). Is there a workaround to do this?
I tried to do things like:
me1 = {e1 : 4}
me2 = {e2 : 5}
Data = {day1 : me1}
Data = {day2 : me2}
But I couldn't succeed; all the things I wrote have somehow overwritten the existing values or were not as I would like to have. I'm probably missing out something...
Some additional notes: At the begininng don't have any info about the length of the dictionary, or how it exactly looks like. And instead of the value 4, I assign an object as a value. I need to use such a structure (Data[day1][e1]) because I have to assign the objects to their keys inside a loop.

You need to store a new dictionary inside of Data to make this work:
Data[day1] = {}
Data[day1][e1] = 4
but normally you'd first test to see if that dictionary is there yet; using dict.setdefault() to make that a one-step process:
if day1 not in Data
Data[day1] = {}
Data[day1][e1] = 4
The collections.defaultdict() type automates that process:
from collections import defaultdict
Data = defaultdict(dict)
Data[day1][e1] = 4
The day1 key doesn't exist yet, but the defaultdict() object then calls the configured constructor (dict here) to produce a new value for that key as needed.

You have to create each empty dict for each key like
Data = {}
Data['day1'] = {}
Data['day1']['e1'] = 4

Related

add multiple values to one key, but defaultdict only allows 2

In the CSV I'm reading from, there are multiple rows for each ID:
ID,timestamp,name,text
444,2022-03-01T11:05:00.000Z,Amrita Patel,Hello
444,2022-03-01T11:06:00.000Z,Amrita Patel,Nice to meet you
555,2022-03-01T12:05:00.000Z,Zach Do,Good afternoon
555,2022-03-01T11:06:00.000Z,Zach Do,I like oranges
555,2022-03-01T11:07:00.000Z,Zach Do,definitely
I need to extract each such that I will have one file per ID, with the timestamp, name, and text in that file. For example, for ID 444, it will have 2 timestamps and 2 different texts in it, along with the name.
I'm able to get the text designated to the proper ID, using this code:
from collections import defaultdict
d = {}
l = []
list_of_lists = []
for k in csv_file:
l.append([k['ID'],k['text']])
list_of_lists.append(l)
for key, val in list_of_lists[0]:
d.setdefault(key, []).append(val)
The problem is that this isn't enough, I need to add in the other values to the one ID key. If I try:
l.append([k['ID'],[k['text'],k['name']]])
I get
ValueError: too many values to unpack
Just use a list for value instead,
{key: [value1, value2], ...}

generating duplicate values (fill down?) when parsing XML into Dataframe

I have a problem parsing XML into a data frame using Python. When I print out the values, some values seem to 'filldown', or repeat themselves. (see column adres). Does anyone one know what could be wrong?
import xml.etree.ElementTree as et
import pandas as pd
import xmltodict
import json
tree = et.parse('20191125_DMG_PI.xml')
root = tree.getroot()
df_cols = ["status", "priref", "full_name", "detail", "adres"]
rows = []
for record in root:
for child in record:
s_priref = child.get('priref')
for field in child.findall('Address'):
s_address = field.find('address').text
#for sub in field.findall('address.country'):
# s_country = sub.find('value').text if s_country is not None else None
for field in child.findall('name'):
s_full_name = field.find('value').text
for field in child.findall('name.status'):
s_status = field.find('value').text
for field in child.findall('level_of_detail'):
s_detail = field.find('value').text
rows.append({"status": s_status,
"priref": s_priref,
"full_name": s_full_name,
"detail": s_detail,
"adres": s_address},)
out_df = pd.DataFrame(rows, columns=df_cols)
print(out_df)
First off, findall() returns an empty list if there is nothing found which matches the search criteria, so in the loop
for field in child.findall("..."):
# this is only performed if child.findall() doesn't return empty
The consequence of this, in this case, is that s_address, s_full_name, s_status, and s_detail do not necessarily get assigned to a new value on each iteration of the outer loop. Hence, they will retain the value from the most recent iteration that the respective child.findall() clause returned non-empty.
The simple way to fix this is to assign them all to some initial value on each iteration of the outer loop, i.e.
for child in record:
s_piref = child.get('piref')
s_address = ''
s_full_name = ''
s_detail = ''
s_status = ''
# ...
Although it might be better (perhaps more 'pythonic') to do something like this:
# Store child.findall() and field.find() keys in a dict
dict = {'Address' : 'address',
'name' : 'value',
'name.status' : 'value',
'level_of_detail' : 'value'}
# To store the reference keys
ref = ["adres", "full_name", "status", "detail", "piref"]
for record in root:
# Initialize a second dict from the same keys mapping to
# empty strings instead
s = dict.fromkeys(dict.keys(), '')
s["piref"] = "piref"
for key in dict:
for field in child.findall(key):
s[key] = field.find(m[key])
rows.append(dict(zip(ref, s.values())),)
Which should work just the same as the other method but makes it easier to add more keys/fields as needed.

Extract value from key-value pair of dictionary

I have a CSV file with column name (in first row) and values (rest of the row). I wanted to create variables to store these values for every row in a loop. So I started off by creating a dictionary with the CSV file and I got a list of the records with a key-value pair. So now I wanted to create variables to store the "value" extracted from the "key" of each item and within a loop for every record. I am not sure if I am setting this correctly.
Here is the dictionary I have.
my_dict = [{'value id':'value1', 'name':'name1','info':'info1'},
{'value id':'value2', 'name':'name2','info':'info2'},
{'value id':'value3', 'name':'name3','info':'info3'},
}]
for i in len(my_dict):
item[value id] = value1
item[name] = name1
item[info] = info1
The value id and name will be unique and are identifiers the list. Ultimately, I wanted to create an item object i.e. item[info] = info1 and I can add other codes to modify the item[info].
try this,
my_dict = [{'value':'value1', 'name':'name1','info':'info1'},
{'value':'value2', 'name':'name2','info':'info2'},
{'value':'value3', 'name':'name3','info':'info3'}]
for obj in my_dict:
value = obj['value']
name = obj['name']
info = obj['info']
to expand on #aws_apprentice's point, you can capture the data by creating some additional variables
my_dict = [{'value':'value1', 'name':'name1','info':'info1'},
{'value':'value2', 'name':'name2','info':'info2'},
{'value':'value3', 'name':'name3','info':'info3'}]
values = []
names = []
info = []
for obj in my_dict:
values.append(obj['value'])
names.append(obj['name'])
info.append(obj['info'])

How to implement a select-like function

I got a dataset in python and the structure of it is like
Tree Species number of trunks
------------------------------
Acer rubrum 1
Quercus bicolor 1
Quercus bicolor 1
aabbccdd 0
and I have a question of can I implement a function similar to
Select sum(number of trunks)
from trees.data['Number of Trunks']
where x = trees.data["Tree Species"]
group by trees.data["Tree Species"]
in python? x is an array contains five elements:
x = array(['Acer rubrum', 'Acer saccharum', 'Acer saccharinum',
'Quercus rubra', 'Quercus bicolor'], dtype='<U16')
what I want to do is mapping each elements in x to trees.data["Tree Species"] and calculate the sum of number of trunks, it should return an array of
array = (sum_num(Acer rubrum), sum_num(Acer saccharum), sum_num(Acer saccharinum),
sum_num(Acer Quercus rubra), sum_num(Quercus bicolor))
Did you want to look at Python Pandas. That will allow you to do something like
df.groupby('Tree Species')['Number of Trunks'].sum()
Please note here df is whatever the variable name you read in your data frame. I would recommend you to look at pandas and lambda function too.
You can do something like this:
import pandas as pd
df = pd.DataFrame()
tree_species = ["Acer rubrum", "Quercus bicolor", "Quercus bicolor", "aabbccdd"]
no_of_trunks = [1,1,1,0]
df["Tree Species"] = tree_species
df["Number of Trunks"] = no_of_trunks
df.groupby('Tree Species').sum() #This will create a pandas dataframe
df.groupby('Tree Species')['Number of Trunks'].sum() #This will create a pandas series.
You can do the same thing by just using dictionaries too:
tree_species = ["Acer rubrum", "Quercus bicolor", "Quercus bicolor", "aabbccdd"]
no_of_trunks = [1,1,1,0]
d = {}
for key, trunk in zip(tree_species, no_of_trunks):
if not key in d.keys():
d[key] = 0
d[key] += trunk
print(d)

Searching items of large list in large python dictionary quickly

I am currently working to make a dictionary with a tuple of names as keys and a float as the value of the form {(nameA, nameB) : datavalue, (nameB, nameC) : datavalue ,...}
The values data is from a matrix I have made into a pandas DataFrame with the names as both the index and column labels. I have created an ordered list of the keys for my final dictionary called keys with the function createDictionaryKeys(). The issue I have is that not all the names from this list appear in my data matrix. I want to only include the names do appear in the data matrix in my final dictionary.
How can I do this search avoiding the slow linear for loop? I created a dictionary that has the name as key and a value of 1 if it should be included and 0 otherwise as well. It has the form {nameA : 1, nameB: 0, ... } and is called allow_dict. I was hoping to use this to do some sort of hash search.
def createDictionary( keynamefile, seperator, datamatrix, matrixsep):
import pandas as pd
keys = createDictionaryKeys(keynamefile, seperator)
final_dict = {}
data_df = pd.read_csv(open(datamatrix), sep = matrixsep)
pd.set_option("display.max_rows", len(data_df))
df_indices = list(data_df.index.values)
df_cols = list(data_df.columns.values)[1:]
for i in df_indices:
data_df = data_df.rename(index = {i:df_cols[i]})
data_df = data_df.drop("Unnamed: 0", 1)
allow_dict = descriminatePromoters( HARDCODEDFILENAME, SEP, THRESHOLD )
#print ( item for item in df_cols if allow_dict[item] == 0 ).next()
present = [ x for x in keys if x[0] in df_cols and x[1] in df_cols]
for i in present:
final_dict[i] = final_df.loc[i[0],i[1]]
return final_dict
Testing existence in python sets is O(1), so simply:
present = [ x for x in keys if x[0] in set(df_cols) and x[1] in set(df_cols)]
...should give you some speed up. Since you're iterating through in O(n) anyway (and have to to construct your final_dict), something like:
colset = set(df_cols)
final_dict = {k: final_df.loc[k[0],k[1]]
for k in keys if (k[0] in colset)
and (k[1] in colset)}
Would be nice, I would think.

Categories

Resources