Python dictionaries shown as unequal despite no found differences - python

I'm comparing 2 dictionaries in Python, loadedAgreement and latestDBAgreement. I know from this thread that a simple == or != can be used for dictionary comparison.
My dictionaries come out as unequal, and to get the diffs, I used the approach recommended here:
value = { k : second_dict[k] for k in set(second_dict) - set(first_dict) }
from both sides. But when I do this from both sides, both results are empty {}. So why are the dictionaries still unequal?
diffvalues1 = { k : latestDBAgreement[k] for k in set(latestDBAgreement) - set(loadedAgreement) }
diffvalues2 = { k : loadedAgreement[k] for k in set(loadedAgreement) - set(latestDBAgreement) }
As you can see in the debugger, the code dropped into the != section, but both diffs are empty.

Dicts can also differ in values. To see which, you can do something like this:
{
k: (v, latestDBAgreement[k])
for k, v in loadedAgreement.items()
if v != latestDBAgreement[k]}
(This of course assumes that the keys are the same, so it doesn't generalize.)

Related

Can I avoid a multiple for-loops (without comprehension) or creating separate dictionaries in this Python scenario?

I want to construct a dictionary from which I'll pick data that needs too be inserted into six different new dataframes.
The raw data is currently in two different pandas dataframes (names base_df and stim_df) with the same layout. 13 columns with headers being the values in the columns_dict dictionary and generally ~1200 - 1600 rows. The three functions called (sum_big_small, average_or_median, and interpeak_calc) just return a float from a subset of certain columns i the pandas dataframes (including the numpy.float64 values nan or -inf)
For the first approach I tried (see below), I didn't really think through. I guess each line of dictionary comprehension would overwrite the last (at least I get an index error when trying to access some of it later)?
import numpy as np
self.wells = 96
columns_dict: dict[str, str] = {
'auc': 'AUC (a.u.)', 'peak_width_10': 'Peak width 10 (ms)', 'peak_width_90': 'Peak width 90 (ms)',
'slope_20': 'Slope 20 (a.u./ms)', 'slope_80': 'Slope 80 (a.u./ms)', 'decay_time': 'Decay time (ms)',
'rise_time': 'Rise time (ms)'
}
input_dfs = {'base': base_df, 'stim': stim_df}
average_median: list[str] = ['average', 'median']
base_stim: list[str] = ['base', 'stim']
big_small_total: list[str] = ['big', 'small', 'total']
writing_dict = {i: {'raw': {'number': {k: {'number': [self.sum_big_small(input_dfs[i], well, k) for well in range(1, self.wells + 1)]} for k in big_small_total}}} for i in base_stim}
writing_dict = {i: {'raw': {j: {k: {column: [self.average_or_median(input_dfs[i], well, j, k, columns_dict[column]) for well in range(1, self.wells + 1)] for column in columns} for k in big_small_total} for j in average_median}} for i in base_stim}
writing_dict = {i: {'raw': {j: {k: {'interpeak': [self.interpeak_calc(input_dfs[i], well, j, k) for well in range(1, self.wells + 1)]} for k in big_small_total} for j in average_median}} for i in base_stim}
writing_dict = {'percentage': {'percentage': {j: {k: {column: (100 * (np.array(writing_dict['stim']['raw'][j][k][column]) / np.array(writing_dict['base']['raw'][j][k][column]))) for column in columns_more} for k in big_small_total} for j in average_median}}}
But the only way I can then think of doing this is (1) make four different dictionaries, one for each dict comprehension (which would seem less 'neat'), or (2) make everything as for-loops so I can put some of the key variable on the left of the equal; something like (but of course more nested loops):
for i in base_stim:
for j in average_median:
writing_dict[i]['raw'][j] = SOMETHING
But the above definitely seems less elegant. And I think I'd need to set up all the keys before doing the for loops, right?
Is there a better way I can't see, without having to set up the writing_dict keys first, and do nested for-loops, and while still adding to the same dictionary?
Extra: I suspect there's also a possibility that I might be better off just staying in pandas, but I definitely couldn't find a way for this. I need to do calculations with subsets of each column, based on the value of two other columns, so the "final" dataframes will end up being 96 rows. In case I'm on the completely wrong track, here's one of the functions I'm using to return values to the list comprehension nested in the dict comprehensions to exemplify what I'm trying to do:
df['Well number'] is 1-96 (both included), and df['Big (1) or Small (-1)'] is either 1.0 or -1.0.
def average_or_median(self, df: pd.DataFrame, well: int, j: str, k: str, column: str) -> any: # the median returns np.array??
"""
Takes the DataFrame, and calculates average or median for all values for a given well number,
either for "big" (=1), for "small" (=-1) or total ("big" + "small"), and returns the calculated value.
Remember, numpy arrays can return -inf or nan, both floats, when dividing by zero or averaging an empty list.
Used to get average or median for AUC, Peak Width, Slope, Decay Time, and Rise Time.
:param df: DatraFrame of either baseline (base_df) or stimulted (stim_df).
:param well: The well number to process.
:param j: String, either "average" or "median".
:param k: String, either "big", "small", or "total".
:param column: The column header for the data to be processed.
"""
if k == 'big':
bs = 1.0
elif k == 'small':
bs = -1.0
else:
bs = 0
if abs(bs) == 1.0:
if j == 'average':
return np.average(df.loc[(df['Well number'] == well) & (df['Big (1) or Small (-1)'] == bs)][column])
elif j == 'median':
return np.median(df.loc[(df['Well number'] == well) & (df['Big (1) or Small (-1)'] == bs)][column])
else:
if j == 'average':
return np.average(df.loc[(df['Well number'] == well)])
elif j == 'median':
return np.median(df.loc[(df['Well number'] == well)])

Nested Dictionary for loop

I'm new to programming. I'm trying to figure out how to subtract 'budgeted' from 'actual' and then update the value to 'variance' using a nested for loop. However, I've read that it isn't the best practice to change a dictionary while iterating. So far, I've been stumped on how to proceed.
for i in properties:
for j in properties[i]:
if j == "actual":
sum = properties[i][j]
print('\nActual:' , sum)
if j == "budgeted":
sum_two = properties[i][j]
print('Budgeted:' , sum_two)
diff = sum_two - sum
print('Variance:', diff)
default_value = 0
properties = {587: {'prop_name': 'Collington'}, 'rental_income': {'apartment_rent': '5120-0000', 'resident_assistance': '5121-0000', 'gain_loss': '5120-0000'}, 51200000: {'actual': 29620, 'budgeted': 30509, 'variance': default_value}, 51210000: {'actual': 25620, 'budgeted': 40509, 'variance': default_value}, ............
just iterate through the dictionary and check if in the inner dictionary, if actual, variance and budgeted exists or not, if yes then modify the variance value
for k, v in properties.items():
if (('actual' in v.keys()) and ('variance' in v.keys()) and ('budgeted' in v.keys())):
properties[k]['variance'] = properties[k]['actual']-properties[k]['budgeted']
There is nothing wrong with modifying the values inside a dictionary while iterating. The only thing that is not recommend is modifying the dictionary itself, that is adding/removing elements.
Try something like:
for i in properties:
properties[i]['variance'] = properties[i]['budgeted'] - properties[i]['actual']
If you aren't sure that bugeted and actual exist in the dictionaries, you should catch the KeyError and handle approprately:
for i in properties:
try:
properties[i]['variance'] = properties[i]['budgeted'] - properties[i]['actual']
except KeyError:
properties[i]['variance'] = -1 # Set to some special value or just pass
Your data is in a strange format, I always try to group like objects together in dictionaries rather than have metadata and "lists" of items in the same level of a dictionary. This will work for you though:
for prop in properties:
p = properties[prop]
if 'actual' or 'budgeted' in p.keys():
# get() wont error if not found, also default to 0 if not found
p['variance'] = p.get('budgeted', 0) - p.get('actual', 0)
import json
print(json.dumps(properties, indent=4))
Output:
{
"587": {
"prop_name": "Collington"
},
"rental_income": {
"apartment_rent": "5120-0000",
"resident_assistance": "5121-0000",
"gain_loss": "5120-0000"
},
"51200000": {
"actual": 29620,
"budgeted": 30509,
"variance": 889
},
"51210000": {
"actual": 25620,
"budgeted": 40509,
"variance": 14889
}
}
sum = None
sum_two = None
for i in properties:
for j in i:
if j=="actual":
sum = properties [i]["actual"]
print('\nActual:' , sum)
if j == "budgeted":
sum_two = properties[i]["budgeted"]
print('Budgeted:' , sum_two)
diff = sum_two - sum
print('Variance:', diff)
I didn't get what mean exactly, but this should work.

Count occurance of an item in a list and store it in another list if it is exists more than once

Let's say I have the following list.
my_list = ['4/10', '8/-', '9/2', '4/11', '-/13', '19/10', '25/-', '26/-', '4/12', '10/16']
I would like to check the occurrence of each item and if it exists more than once I would like to store it in a new list.
For example from the above list, 4 is existed 3 times before / as 4/10, 4/11, 4/12. So I would like to create a new list called new list and store them as new_list = '4/10', '4/11', '4/12, 19/10'.
An additional example I want to consider also /. if 10 exist twice as 4/10 and 10/16 I don want to consider it as a duplicate since the position after and before / is different.
If there any way to count the existence of an item in a list and store them in a new list?
I tried the following but got an error.
new_list = []
d = Counter(my_list)
for v in d.items():
if v > 1:
new_list.append(v)
The error TypeError: '>' not supported between instances of 'tuple' and 'int'
Can anyone help with this?
I think below code is quite self-explanatory. It will work alright. If you have any issues or need clarification, feel free to ask.
NOTE : This code is not very efficient and can be improved a lot. But will work allright if you are not running this on extremely large data.
my_list = ['4/10', '8/-', '9/2', '4/11', '-/13', '19/10', '25/-', '26/-', '4/12', '10/16']
frequency = {}; new_list = [];
for string in my_list:
x = '';
for j in string:
if j == '/':
break;
x += j;
if x.isdigit():
frequency[x] = frequency.get(x, 0) + 1;
for string in my_list:
x = '';
for j in string:
if j == '/':
break;
x += j;
if x.isdigit():
if frequency[x] > 1:
new_list.append(string);
print(new_list);
.items() is not what you think - it returns a list of key-value pairs (tuples), not sole values. You want to:
d = Counter(node)
new_list = [ k for (k,v) in d.items() if v > 1 ]
Besides, I am not sure how node is related to my_list but I think there is some additional processing you didn't show.
Update: after reading your comment clarifying the problem, I think it requires two separate counters:
first_parts = Counter([x.split('/')[0] for x in my_list])
second_parts = Counter([x.split('/')[1] for x in my_list])
first_duplicates = { k for (k,v) in first_parts.items() if v > 1 and k != '-' }
second_duplicates = { k for (k,v) in second_parts.items() if v > 1 and k != '-' }
new_list = [ e for e in my_list if
e.split('/')[0] in first_duplicates or e.split('/')[1] in second_duplicates ]
this might help : create a dict to contain the pairings and then extract the pairings that have a length more than one. defaultdict helps with aggregating data, based on the common keys.
from collections import defaultdict
d = defaultdict(list)
e = defaultdict(list)
m = [ent for ent in my_list if '-' not in ent]
for ent in m:
front, back = ent.split('/')
d[front].append(ent)
e[back].append(ent)
new_list = []
for k,v in d.items():
if len(v) > 1:
new_list.extend(v)
for k,v in e.items():
if len(v) > 1:
new_list.extend(v)
sortr = lambda x: [int(ent) for ent in x.split("/")]
from operator import itemgetter
sorted(set(new_list), key = sortr)
print(new_list)
['4/10', '4/11', '4/12', '19/10']

Finding the max depth of a set in a dictionary

I have a dictionary where the key is a string and the values of the key are a set of strings that also contain the key (word chaining). I'm having trouble finding the max depth of a graph, which would be the set with the most elements in the dictionary, and I'm try print out that max graph as well.
Right now my code prints:
{'DOG': [],
'HIPPOPOTIMUS': [],
'POT': ['SUPERPOT', 'HIPPOPOTIMUS'],
'SUPERPOT': []}
1
Where 1 is my maximum dictionary depth. I was expecting the depth to be two, but there appears to be only 1 layer to the graph of 'POT'
How can I find the maximum value set from the set of keys in a dictionary?
import pprint
def dict_depth(d, depth=0):
if not isinstance(d, dict) or not d:
return depth
print max(dict_depth(v, depth+1) for k, v in d.iteritems())
def main():
for keyCheck in wordDict:
for keyCompare in wordDict:
if keyCheck in keyCompare:
if keyCheck != keyCompare:
wordDict[keyCheck].append(keyCompare)
if __name__ == "__main__":
#load the words into a dictionary
wordDict = dict((x.strip(), []) for x in open("testwordlist.txt"))
main()
pprint.pprint (wordDict)
dict_depth(wordDict)
testwordlist.txt:
POT
SUPERPOT
HIPPOPOTIMUS
DOG
The "depth" of a dictionary will naturally be 1 plus the maximum depth of its entries. You've defined the depth of a non-dictionary to be zero. Since your top-level dictionary doesn't contain any dictionaries of its own, the depth of your dictionary is clearly 1. Your function reports that value correctly.
However, your function isn't written expecting the data format you're providing it. We can easily come up with inputs where the depth of substring chains is more than just one. For example:
DOG
DOGMA
DOGMATIC
DOGHOUSE
POT
Output of your current script:
{'DOG': ['DOGMATIC', 'DOGMA', 'DOGHOUSE'],
'DOGHOUSE': [],
'DOGMA': ['DOGMATIC'],
'DOGMATIC': [],
'POT': []}
1
I think you want to get 2 for that input because the longest substring chain is DOG → DOGMA → DOGMATIC, which contains two hops.
To get the depth of a dictionary as you've structured it, you want to calculate the chain length for each word. That's 1 plus the maximum chain length of each of its substrings, which gives us the following two functions:
def word_chain_length(d, w):
if len(d[w]) == 0:
return 0
return 1 + max(word_chain_length(d, ww) for ww in d[w])
def dict_depth(d):
print(max(word_chain_length(d, w) for w in d))
The word_chain_length function given here isn't particularly efficient. It may end up calculating the lengths of the same chain multiple times if a string is a substring of many words. Dynamic programming is a simple way to mitigate that, which I'll leave as an exercise.
Sorry my examples wont be in python because my python is rusty but you should get the idea.
Lets say this is a binary tree:
(written in c++)
int depth(TreeNode* root){
if(!root) return 0;
return 1+max(depth(root->left), depth(root->right));
}
Simple. Now lets expand this too more then just a left and right.
(golang code)
func depthfunc(Dic dic) (int){
if dic == nil {
return 0
}
level := make([]int,0)
for key, anotherDic := range dic{
depth := 1
if ok := anotherDic.(Dic); ok { // check if it does down further
depth = 1 + depthfunc(anotherDic)
}
level = append(level, depth)
}
//find max
max := 0
for _, value := range level{
if value > max {
max = value
}
}
return max
}
The idea is that you just go down each dictionary until there is no more dictionaries to go down adding 1 to each level you traverse.

Is there any alternative ways other than loop to update the dictionary from any sequence data (Python)?

I have a code to update a dictionary like this :
c = { }
for i in ID :
d = {i : V[i]}
c.update(d)
Both ID and V are sequence data with a complex and huge items, where ID is a list and V is a dictionary.
Is there any ways in python to do that logic without using loop processes like "for"?
The use of loop processes take a lot of iteration impacted on run time.
No, you can't avoid a loop but you can try these alternatives:
c = { }
for i in ID :
c[i] = V[i]
or
c = dict([(i, V[i]) for i in ID])
or
c = {i: V[i] for i in ID}
the short way of your code is:
c.update({i:V[i] for i in ID})
also you could use map, but it will iterate over
c.update(dict(map(lambda i:(i,V[i]),ID)))
Its all O(n) and you could just move it into C part rather than Python by using above notations!

Categories

Resources