How to append to a dictionary of dictionaries - python

I'm trying to create a dictionary of dictionaries like this:
food = {"Broccoli": {"Taste": "Bad", "Smell": "Bad"},
"Strawberry": {"Taste": "Good", "Smell": "Good"}}
But I am populating it from an SQL table. So I've pulled the SQL table into an SQL object called "result". And then I got the column names like this:
nutCol = [i[0] for i in result.description]
The table has about 40 characteristics, so it is quite long.
I can do this...
foodList = {}
for id, food in enumerate(result):
addMe = {str(food[1]): {nutCol[id + 2]: food[2], nulCol[idx + 3]:
food[3] ...}}
foodList.update(addMe)
But this of course would look horrible and take a while to write. And I'm still working out how I want to build this whole thing so it's possible I'll need to change it a few times...which could get extremely tedious.
Is there a DRY way of doing this?

In order to make solution position independent you can make use of dict1.update(dict2). This simply merges dict2 with dict1.
In our case since we have dict of dict, we can use dict['key'] as dict1 and simply add any additional key,value pair as dict2.
Here is an example.
food = {"Broccoli": {"Taste": "Bad", "Smell": "Bad"},
"Strawberry": {"Taste": "Good", "Smell": "Good"}}
addthis = {'foo':'bar'}
Suppose you want to add addthis dict to food['strawberry'] , we can simply use,
food["Strawberry"].update(addthis)
Getting result:
>>> food
{'Strawberry': {'Taste': 'Good', 'foo': 'bar', 'Smell': 'Good'},'Broccoli': {'Taste': 'Bad', 'Smell': 'Bad'}}
>>>

Assuming that column 0 is what you wish to use as your key, and you do wish to build a dictionary of dictionaries, then its:
detail_names = [col[0] for col in result.description[1:]]
foodList = {row[0]: dict(zip(detail_names, row[1:]))
for row in result}
Generalising, if column k is your identity then its:
foodList = {row[k]: {col[0]: row[i]
for i, col in enumerate(result.description) if i != k}
for row in result}
(Here each sub dictionary is all columns other than column k)

addMe = {str(food[1]):dict(zip(nutCol[2:],food[2:]))}
zip will take two (or more) lists of items and pair the elements, then you can pass the result to dict to turn the pairs into a dictionary.

Related

Sort the values of lists corresponding to a row of excel numbers in a dictionary in descending order using openpyxl

I currently have a task to sort the values of a list corresponding to a row of excel numbers in descending order and store them in a dictionary.
For example I have an excel file like this:
0.296178 0.434362 0.033033 0.758968
0.559323 0.455792 0.770423 0.770423
I have created a dictionary using the code below to store the value of each cell as a list, with the key for each row being the row number.
from openpyxl import load_workbook
import operator
vWB = load_workbook(filename="voting.xlsx")
vSheet = vWB.active
# Creating the Dictionary of agents and their alternative values
dictionary = {
i + 1: [cell.value for cell in row[0:]]
for i, row in enumerate(vSheet.rows)
}
output:
{1: [0.296177589742431, 0.434362232763003, 0.0330331941352554, 0.758968208500514], 2: [0.559322784244604, 0.455791535747786, 0.770423104229537, 0.770423104229537]....etc }
However, I'm unsure how to order the values of each list in descending order without changing the order of the keys too.
What I want:
{1: [0.758968208500514, 0.434362232763003, 0.296177589742431, 0.0330331941352554, 0.758968208500514], 2: [0.770423104229537, 0.770423104229537, 0.559322784244604, 0.455791535747786]....etc }
How do I sort the list of values for each key in descending order?
you can try like this
d = {1: [0.758968208500514, 0.434362232763003, 0.296177589742431, 0.0330331941352554, 0.758968208500514],
2: [0.770423104229537, 0.770423104229537, 0.559322784244604, 0.455791535747786]}
for key, value in d.items():
print(sorted(value,reverse=True))
output
[0.758968208500514, 0.758968208500514, 0.434362232763003, 0.296177589742431, 0.0330331941352554] [0.770423104229537, 0.770423104229537, 0.559322784244604, 0.455791535747786]

Creating a Python dictionary from two columns in pandas

Trying hard to create a dictionary by comparing values and to add greater values more than 90 into a new dictionary. I have an actual data set working on with pandas but to simplify, i will try to explain with lists
list_names = ['annie','betty', 'charlie','debie', 'elf', 'frank', 'goe', 'hayri']
list_ages = [19,23,44,31,55,65,15,40]
The ages are corresponding to each name in the list and the target is to create a key:value pair that contains values only older than 30.
Thank you for any help.
Try this
list_names = ['annie','betty', 'charlie','debie', 'elf', 'frank', 'goe', 'hayri']
list_ages = [19,23,44,31,55,65,15,40]
{name: age for name, age in zip(list_names,list_ages) if age>30}
Try taking example from this code:
list_names = name((k, v) for k, v in list_names.items() if v > 30)

How to iterate through nested dicts (counters) and update keys recursively

I'm reading data from a file into a series of lists as follows:
sourceData = [[source, topic, score],[source, topic, score],[source, topic, score]...]
wherein the sources and topics in each list may be the same or different.
What I am trying to achieve is a dictionary which groups the topics associated with each source, and their associated scores (the scores will then be averaged, but for the purpose of this question let's just list them as values of the topic (key)).
The results would ideally look like a list of nested dicts as follows:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
I think the best way to do this would be to create a Counter of the sources, and then a dict for each topics per source, and save each dict as a value for each corresponding source. However I am having trouble iterating properly to get the desired result.
Here's what I have so far:
sourceDict = {}
sourceDictList = []
for row in sourceData:
source = row[0]
score = row[1]
topic = row[2]
sourceDict = [source,{topic:score}]
sourceDictList.append(sourceDict)
sourceList.append(source)
wherein sourceDictList results in the following: [[source, {topic: score}]...], (essentially reformatting the data from the originally list of lists), and sourceList is just a list of all the source (some repeating).
Then I initialize a counter and match the source from the counter with the source from sourceDictList and if they match, save the topic:score dict as the key:
sourceCounter = Counter(sourceList)
for key,val in sourceCounter.items():
for dictitem in sourceDictList:
if dictitem[0] == key:
sourceCounter[key] = dictitem[1]
But the output is only saving the last topic:score dict to each source. So instead of the desired:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
I am only getting:
Counter({SOURCE1: {TOPIC_n: 'SCORE_n'}, SOURCE2: {TOPIC_n: 'SCORE_n'}, SOURCE3: {TOPIC_n: 'SCORE_n'}})
I am under the impression that if there is a unique key saved to a dict, it will append that key:value pair without overwriting previous ones. Am I missing something?
Appreciate any help on this.
Simply we can do:
sourceData = [
['source1', 'topic1', 'score1'],
['source1', 'topic2', 'score1'],
['source1', 'topic1', 'score2'],
['source2', 'topic1', 'score1'],
['source2', 'topic2', 'score2'],
['source2', 'topic1', 'score3'],
]
sourceDict = {}
for row in sourceData:
source = row[0]
topic = row[1]
score = row[2]
if source not in sourceDict:
# This will be executed when the source
# comes for the first time.
sourceDict[source] = {}
if topic not in sourceDict[source]:
# This will be executed when the topic
# inside that source comes for the first time.
sourceDict[source][topic] = []
sourceDict[source][topic].append(score)
print(sourceDict)
You can simply use the collection's defaultdict
sourdata = [['source', 'topic', 2],['source', 'topic', 3], ['source', 'topic2', 3],['source2', 'topic', 4]]
from collections import defaultdict
sourceDict = defaultdict(dict)
for source, topic, score in sourdata:
topicScoreDict = sourceDict[source]
topicScoreDict[topic] = topicScoreDict.get(topic, []) + [score]
>>> print(sourceDict)
>>> defaultdict(<class 'dict'>, {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}})
>>> print(dict(sourceDict))
>>> {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}}

convert list to dataframe using dictionary

I am new to Pythonland and I have a question. I have a list as below and want to convert it into a dataframe.
I read on Stackoverflow that it is better to create a dictionary then a list so I create one as follows.
column_names = ["name", "height" , "weight", "grade"] # Actual list has 10 entries
row_names = ["jack", "mick", "nick","pick"]
data = ['100','50','A','107','62','B'] # The actual list has 1640 entries
dic = {key:[] for key in column_names}
dic['name'] = row_names
t = 0
while t< len(data):
dic['height'].append(data[t])
t = t+3
t = 1
while t< len(data):
dic['weight'].append(data[t])
t = t+3
So on and so forth, I have 10 columns so I wrote above code 10 times to complete the full dictionary. Then i convert
it to dataframe. It works perfectly fine, there has to
be a way to do this in shorter way. I don't know how to refer to key of a dictionary with a number. Should it be wrapped to a function. Also, how can I automate adding one to value of t before executing the next loop? Please help me.
You can iterate through columnn_names like this:
dic = {key:[] for key in column_names}
dic['name'] = row_names
for t, column_name in enumerate(column_names):
i = t
while i< len(data):
dic[column_name].append(data[i])
i += 3
Enumerate will automatically iterate through t form 0 to len(column_names)-1
i = 0
while True:
try:
for j in column_names:
d[j].append(data[i])
i += 1
except Exception as er: #So when i value exceed by data list it comes to exception and it will break the loop as well
print(er, "################")
break
The first issue that you have all columns data concatenated to a single list. You should first investigate how to prevent it and have list of lists with each column values in a separate list like [['100', '107'], ['50', '62'], ['A', 'B']]. Any way you need this data structure to proceed efficiently:
cl_count = len(column_names)
d_count = len(data)
spl_data = [[data[j] for j in range(i, d_count, cl_count)] for i in range(cl_count)]
Then you should use dict comprehension. This is a 3.x Python feature so it will not work in Py 2.x.
df = pd.DataFrame({j: spl_data[i] for i, j in enumerate(column_names)})
First, we should understand how an ideal dictionary for a dataframe should look like.
A Dataframe can be thought of in two different ways:
One is a traditional collection of rows..
'row 0': ['jack', 100, 50, 'A'],
'row 1': ['mick', 107, 62, 'B']
However, there is a second representation that is more useful, though perhaps not as intuitive at first.
A collection of columns:
'name': ['jack', 'mick'],
'height': ['100', '107'],
'weight': ['50', '62'],
'grade': ['A', 'B']
Now, here is the key thing to realise, the 2nd representation is more useful
because that is the representation interally supported and used in dataframes.
It does not run into conflict of datatype within a single grouping (each column needs to have 1 fixed datatype)
Across a row representation however, datatypes can vary.
Also, operations can be performed easily and consistently on an entire column
because of this consistency that cant be guaranteed in a row.
So, tl;dr DataFrames are essentially collections of equal length columns.
So, a dictionary in that representation can be easily converted into a DataFrame.
column_names = ["name", "height" , "weight", "grade"] # Actual list has 10 entries
row_names = ["jack", "mick"]
data = [100, 50,'A', 107, 62,'B'] # The actual list has 1640 entries
So, With that in mind, the first thing to realize is that, in its current format, data is a very poor representation.
It is a collection of rows merged into a single list.
The first thing to do, if you're the one in control of how data is formed, is to not prepare it this way.
The goal is a list for each column, and ideally, prepare the list in that format.
Now, however, if it is given in this format, you need to iterate and collect the values accordingly. Here's a way to do it
column_names = ["name", "height" , "weight", "grade"] # Actual list has 10 entries
row_names = ["jack", "mick"]
data = [100, 50,'A', 107, 62,'B'] # The actual list has 1640 entries
dic = {key:[] for key in column_names}
dic['name'] = row_names
print(dic)
Output so far:
{'height': [],
'weight': [],
'grade': [],
'name': ['jack', 'mick']} #so, now, names are a column representation with all correct values.
remaining_cols = column_names[1:]
#Explanations for the following part given at the end
data_it = iter(data)
for row in zip(*([data_it] * len(remaining_cols))):
for i, val in enumerate(row):
dic[remaining_cols[i]].append(val)
print(dic)
Output:
{'name': ['jack', 'mick'],
'height': [100, 107],
'weight': [50, 62],
'grade': ['A', 'B']}
And we are done with the representation
Finally:
import pd
df = pd.DataFrame(dic, columns = column_names)
print(df)
name height weight grade
0 jack 100 50 A
1 mick 107 62 B
Edit:
Some explanation for the zip part:
zip takes any iterables and allows us through iterate through them together.
data_it = iter(data) #prepares an iterator.
[data_it] * len(remaining_cols) #creates references to the same iterator
Here, this is similar to [data_it, data_it, data_it]
The * in *[data_it, data_it, data_it] allows us to unpack the list into 3 arguments for the zip function instead
so, f(*[data_it, data_it, data_it]) is equivalent to f(data_it, data_it, data_it) for any function f.
the magic here is that traversing through an iterator/advancing an iterator will now reflect the change across all references
Putting it all together:
zip(*([data_it] * len(remaining_cols))) will actually allow us to take 3 items from data at a time, and assign it to row
So, row = (100, 50, 'A') in first iteration of zip
for i, val in enumerate(row): #just iterate through the row, keeping index too using enumerate
dic[remaining_cols[i]].append(val) #use indexes to access the correct list in the dictionary
Hope that helps.
If you are using Python 3.x, as suggested by l159, you can use a comprehension dict and then create a Pandas DataFrame out of it, using the names as row indexes:
data = ['100', '50', 'A', '107', '62', 'B', '103', '64', 'C', '105', '78', 'D']
column_names = ["height", "weight", "grade"]
row_names = ["jack", "mick", "nick", "pick"]
df = pd.DataFrame.from_dict(
{
row_label: {
column_label: data[i * len(column_names) + j]
for j, column_label in enumerate(column_names)
} for i, row_label in enumerate(row_names)
},
orient='index'
)
Actually, the intermediate dictionary is a nested dictionary: the keys of the outer dictionary are the row labels (in this case the items of the row_names list); the value associated with each key is a dictionary whose keys are the column labels (i.e., the items in column_names) and values are the correspondent elements in the data list.
The function from_dict is used to create the DataFrame instance.
So, the previous code produces the following result:
height weight grade
jack 100 50 A
mick 107 62 B
nick 103 64 C
pick 105 78 D

Replace values from pandas dataset with dictionary

I am extracting a column from excel document with pandas. After that, I want to replace for each row of the selected column, all keys contained in multiple dictionaries grouped in a list.
import pandas as pd
file_loc = "excelFile.xlsx"
df = pd.read_excel(file_loc, usecols = "C")
In this case, my dataframe is called by df['Q10'], this data frame has more than 10k rows.
Traditionally, if I want to replace a value in df I use;
df['Q10'].str.replace('val1', 'val1')
Now, I have a dictionary of words like:
mydic = [
{
'key': 'wasn't',
'value': 'was not'
}
{
'key': 'I'm',
'value': 'I am'
}
... + tons of line of key value pairs
]
Currently, I have created a function that iterates over "mydic" and replacer one by one all occurrences.
def replaceContractions(df, mydic):
for cont in contractions:
df.str.replace(cont['key'], cont['value'])
Next I call this function passing mydic and my dataframe:
replaceContractions(df['Q10'], contractions)
First problem: this is very expensive because mydic has a lot of item and data set is iterate for each item on it.
Second: It seems that doesn't works :(
Any Ideas?
Convert your "dictionary" to a more friendly format:
m = {d['key'] : d['value'] for d in mydic}
m
{"I'm": 'I am', "wasn't": 'was not'}
Next, call replace with the regex switch and pass m to it.
df['Q10'] = df['Q10'].replace(m, regex=True)
replace accepts a dictionary of key-replacement pairs, and it should be much faster than iterating over each key-replacement at a time.

Categories

Resources