Parsing a distance matrix csv into python dictionary structure - python

I have a distance matrix laid out like this in a csv file
, A, B, C,
A, 0
B, 3, 0
C, 6, 4, 0
And I would like to parse it into a python dictionary like this...
graph = {'A': {'B': 3, 'C': 6},
'B': {'A': 3, 'C': 4},
'C': {'A': 6, 'B': 4}}

With the file you specified, you will never get that dict in graph. In any case, if you provide the correct CSV file, the code below will result in exactly what you want.
Just pay attention that inside the CSV file, you cannot have a comma in the end of the header row (first row), and you cannot have spaces in the column names (first row). Otherwise, you'll get a weird dict.
import pandas
import io
import math
d = pandas.read_csv('csv_file.csv',sep=',',header=0,index_col=0)
d_dict = d.to_dict() # use d.to_dict(orient='index') for the transpose
graph = { k.strip():{ k2.strip():v2 for k2,v2 in v.items() if not math.isnan(v2) } for k,v in d_dict.items() }
print(graph)
which generates the corresponding dict
{'A': {'A': 0, 'B': 3, 'C': 6},
'B': {'B': 0.0, 'C': 4.0},
'C': {'C': 0.0}}
the csv_file.csv
,A,B,C
A, 0
B, 3, 0
C, 6, 4, 0

Related

Python combine values of identical dictionaries without using looping

I have list of identical dictionaries:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
I need to get something like this:
a = [1, 4, 7]
b = [2, 5, 8]
c = [3, 6, 9]
I know how to do in using for .. in .., but is there way to do it without looping?
If i do
a, b, c = zip(*my_list)
i`m getting
a = ('a', 'a', 'a')
b = ('b', 'b', 'b')
c = ('c', 'c', 'c')
Any solution?
You need to extract all the values in my_list.You could try:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
a, b, c = zip(*map(lambda d: d.values(), my_list))
print(a, b, c)
# (1, 4, 7) (2, 5, 8) (3, 6, 9)
Pointed out by #Alexandre,This work only when the dict is ordered.If you couldn't make sure the order, consider the answer of yatu.
You will have to loop to obtain the values from the inner dictionaries. Probably the most appropriate structure would be to have a dictionary, mapping the actual letter and a list of values. Assigning to different variables is usually not the best idea, as it will only work with the fixed amount of variables.
You can iterate over the inner dictionaries, and append to a defaultdict as:
from collections import defaultdict
out = defaultdict(list)
for d in my_list:
for k,v in d.items():
out[k].append(v)
print(out)
#defaultdict(list, {'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]})
Pandas DataFrame has just a factory method for this, so if you already have it as a dependency or if the input data is large enough:
import pandas as pd
my_list = ...
df = pd.DataFrame.from_rows(my_list)
a = list(df['a']) # df['a'] is a pandas Series, essentially a wrapped C array
b = list(df['b'])
c = list(df['c'])
Please find the code below. I believe that the version with a loop is much easier to read.
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
# we assume that all dictionaries have the sames keys
a, b, c = map(list, map(lambda k: map(lambda d: d[k], my_list), my_list[0]))
print(a,b,c)

How to write nested dictionaries to a CSV file

i have a dictionary as
count = {'lt60': {'a': 0, 'b': 0, 'c': 0, 'd': 0}, 'ge60le90': {'a': 4, 'b': 0, 'C': 0, 'd': 0}, 'gt90': {'a': 0, 'b': 1, 'c': 2, 'd': 1} }
i want to write this dictionary in a CSV format like this ..as you can see in this picture
what i want is pic the keys from lt60, ge60le90, gt90 and want to write them in a row. like i pick 'a' and its value from all the nested dictionaries and write its value in that row.
You can use pandas to do this:
import pandas as pd
count = {'lt60': {'a': 0, 'b': 0, 'c': 0, 'd': 0},
'ge60le90': {'a': 4, 'b': 0, 'c': 0, 'd': 0},
'gt90': {'a': 0, 'b': 1, 'c': 2, 'd': 1} }
df = pd.DataFrame(count).rename_axis('relation_type').reset_index()
df = df.rename(columns={'ge60le90': 'confidence<90',
'gt90': 'confidence>90',
'lt60': 'confidence<60'})
df.to_csv('out.csv', index=False)
# relation_type confidence<90 confidence>90 confidence<60
# 0 a 4 0 0
# 1 b 0 1 0
# 2 c 0 2 0
# 3 d 0 1 0
Another way of doing it would be to utilize csv module. (Note that in your dictionary you have an upper case C which I corrected in my code below):
import csv
lookup = {'ge60le90': 'confidence<90','gt90': 'confidence>90', 'lt60': 'confidence<60'}
count = {'lt60': {'a': 0, 'b': 0, 'c': 0, 'd': 0}, 'ge60le90': {'a': 4, 'b': 0, 'c': 0, 'd': 0}, 'gt90': {'a': 0, 'b': 1, 'c': 2, 'd': 1} }
# Getting keys from dictionary that match them with titles below.
rowKeys = [k for k in count['lt60'].keys()]
titles = [['relation type'] + list(lookup[k] for k in count.keys())]
# Getting all row variable values for every title.
rows = [[count[k][i] for k in count.keys()] for i in rowKeys]
# Concatenating variables and values.
fields = [[rowKeys[i]] + rows[i] for i in range(len(rowKeys))]
# Concatenating final output to be written to file.
result = titles + fields
print("Final result to be written: ")
for r in result:
print(r)
# Writing to file.
with open("output.csv", "w", newline="") as outFile:
writer = csv.writer(outFile, delimiter=';',quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerows(result)
Note that the ; delimiter works for European Windows and might not work for you. In this case, use , instead.
This problem can be simplified by iterating through your dict to pull out your keys and values of those keys
count = {'lt60': {'a': 0, 'b': 0, 'c': 0, 'd': 0},
'ge60le90': {'a': 4, 'b': 0, 'c': 0, 'd': 0},
'gt90': {'a': 0, 'b': 1, 'c': 2, 'd': 1} }
# Create outfile
f = open("C:\\Users\\<USER>\\Desktop\\OUT.csv","w")
# Write first row
f.write(",a,b,c,d\n")
# Iterate through keys
for keys in count:
print(keys)
f.write(keys + ",")
KEYS = count[keys]
# Iterate though values
for values in KEYS:
print(KEYS[values])
f.write(str(KEYS[values]) + ",")
f.write("\n")
f.close()

Computing the sum of all unique values in a numpy array containing rows of dicts

I have a large numpy array, with each row containing a dict of words, in a similar format to below:
data = [{'a': 1, 'c': 2}, {'ba': 3, 'a': 4}, ... }
Could someone please point me in the right direction for how would I go about computing the sum of all the unique values of the dicts in each row of the numpy array? From the example above, I would hope to obtain something like this:
result = {'a': 5, 'c': 2, 'ba': 3, ...}
At the moment, the only way I can think to do it is iterating through each row of the data, and then each key of the dict, if a unique key is found then append it to the new dict and set the value, if a key that's already contained in the dict is found then add the value of that key to the key in the 'result'. Although this seems like an inefficient way to do it.
You could use a Counter() and update it with each dictionary contained in data, in a loop:
from collections import Counter
data = [{'a': 1, 'c': 2}, {'ba': 3, 'a': 4}]
c = Counter()
for d in data:
c.update(d)
output:
Counter({'a': 5, 'ba': 3, 'c': 2})
alternate one liner:
(as proposed by #AntonVBR in the comments)
sum((Counter(dict(x)) for x in data), Counter())
A pure Python solution using for-loops:
data = [{'a': 1, 'c': 2}, {'ba': 3, 'a': 4}]
result = {}
for d in data:
for k, v in d.items():
if k in result:
result[k] += v
else:
result[k] = v
output:
{'c': 2, 'a': 5, 'ba': 3}

keep highest value of duplicate keys in dicts

For school i am writing a small program for a rankinglist for a game.
I am using dicts for this, with the name of the player as keyname, and the score as keyvalue.
there will be 10 games, and each game will have an automatic ranking system which i print to file.
ive already managed to code the ranking system, but now im facing a bigger challange which i cannot solve:
I have to make an overall ranking, which means someplayername can be in several contests with several scores, but i need to only keep the highest score of a duplicate.
In short: I need some help with keeping the duplicate key with the highest value:
like this:
dict1 = {"a": 6, "b": 4, "c": 2, "g": 1}
dict2 = {"a": 3, "f": 4, "g": 5, "d": 2}
dictcombined = {'a': 6, 'b': 4, 'c': 2, 'g': 5, 'f': 4, 'd': 2}
the normal merge option just takes the second dict and thus that value.
thnx in advance
You need to have a function that will keep track of the highest scores for each player. It will add a player to the total if not already there, otherwise adding it if it's higher.
Something like this:
def addScores(scores, total):
for player in scores:
if player not in total or total[player] < scores[player]:
total[player] = scores[player]
This works like a charm:
dict1 = {"a": 6, "z": 4, "g": 1, "hh": 50, "ggg": 1}
dict2 = {"a": 3, "g": 5, "d": 2, "hh": 50}
for key in dict1:
if key not in dict2 or dict1[key] > dict2[key]:
dict2[key] = dict1[key]
print (dict1)
print (dict2)
dict3 = {**dict1, **dict2}
print (dict3)
Now I can compare dict3 with other dicts and so on.
Here's a variation on Matt Eding's answer that compares each value individually instead of creating sets of values. As a plus, it doesn't need any imports.
def combine_dicts(chooser, *dicts):
combined = {}
for d in dicts:
for k, v in d.items():
if k not in combined:
combined[k] = v
else:
combined[k] = chooser(v, combined[k])
return combined
Usage:
>>> combine_dicts(max, dict1, dict2)
{'a': 6, 'b': 4, 'c': 2, 'g': 5, 'f': 4, 'd': 2}
Here is my generalized solution to your question. It's a function that can combine an arbitrary number of dictionaries and has an option for other comparison functions should you want to say, keep track of the minimum values instead.
import collections
def combine_dicts(func, *dicts):
default = collections.defaultdict(set)
for d in dicts:
for k, v in d.items():
default[k].add(v)
return {k: func(v) for k, v in default.items()}
It uses a defaultdict with set as its default_factory to keep track of repetitions of keys with different values. Then it returns a dictionary comprehension to filter out the desired values.
dict1 = {"a": 6, "b": 4, "c": 2, "g": 1}
dict2 = {"a": 3, "d": 2, "f": 4, "g": 5}
dict_comb = combine_dicts(max, dict1, dict2)
print(dict_comb) # -> {'a': 6, 'b': 4, 'c': 2, 'd': 2, 'f': 4, 'g': 5}
Yet another approach, surprisingly not proposed (since 100% built-in)
>>> dict(sorted([*dict1.items(), *dict2.items()]))
{'a': 6, 'b': 4, 'c': 2, 'd': 2, 'f': 4, 'g': 5}
If your key-value pairs are less "lexicographic", you may want to target the numerics specifically, doing
>>> dict(sorted([*dict1.items(), *dict2.items()], key=lambda item: item[1]))
{'g': 5, 'c': 2, 'd': 2, 'a': 6, 'b': 4, 'f': 4}
You might consider using Pandas for this. It also has a ton of other helpful functionality for working with data.
There's probably an ideal way to solve this, but the first thing I thought of is to create two Series (which are sort of like dicts), concatenate them, group by the labels (a, b, c, etc.), then get the max for each group.
import pandas as pd
s1, s2 = [pd.Series(d, name='Scores') for d in [dict1, dict2]]
result = pd.concat([s1, s2]).groupby(level=0).max()
>>> result
a 6
b 4
c 2
d 2
f 4
g 5
Name: Scores, dtype: int64
If you want the result as a dict:
>>> result.to_dict()
{'a': 6, 'b': 4, 'c': 2, 'd': 2, 'f': 4, 'g': 5}

How to subtract values from dictionaries

I have two dictionaries in Python:
d1 = {'a': 10, 'b': 9, 'c': 8, 'd': 7}
d2 = {'a': 1, 'b': 2, 'c': 3, 'e': 2}
I want to substract values between dictionaries d1-d2 and get the result:
d3 = {'a': 9, 'b': 7, 'c': 5, 'd': 7 }
Now I'm using two loops but this solution is not too fast
for x,i in enumerate(d2.keys()):
for y,j in enumerate(d1.keys()):
I think a very Pythonic way would be using dict comprehension:
d3 = {key: d1[key] - d2.get(key, 0) for key in d1}
Note that this only works in Python 2.7+ or 3.
Use collections.Counter, iif all resulting values are known to be strictly positive. The syntax is very easy:
>>> from collections import Counter
>>> d1 = Counter({'a': 10, 'b': 9, 'c': 8, 'd': 7})
>>> d2 = Counter({'a': 1, 'b': 2, 'c': 3, 'e': 2})
>>> d3 = d1 - d2
>>> print d3
Counter({'a': 9, 'b': 7, 'd': 7, 'c': 5})
Mind, if not all values are known to remain strictly positive:
elements with values that become zero will be omitted in the result
elements with values that become negative will be missing, or replaced with wrong values. E.g., print(d2-d1) can yield Counter({'e': 2}).
Just an update to Haidro answer.
Recommended to use subtract method instead of "-".
d1.subtract(d2)
When - is used, only positive counters are updated into dictionary.
See examples below
c = Counter(a=4, b=2, c=0, d=-2)
d = Counter(a=1, b=2, c=3, d=4)
a = c-d
print(a) # --> Counter({'a': 3})
c.subtract(d)
print(c) # --> Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})
Please note the dictionary is updated when subtract method is used.
And finally use dict(c) to get Dictionary from Counter object
Haidro posted an easy solution, but even without collections you only need one loop:
d1 = {'a': 10, 'b': 9, 'c': 8, 'd': 7}
d2 = {'a': 1, 'b': 2, 'c': 3, 'e': 2}
d3 = {}
for k, v in d1.items():
d3[k] = v - d2.get(k, 0) # returns value if k exists in d2, otherwise 0
print(d3) # {'c': 5, 'b': 7, 'a': 9, 'd': 7}

Categories

Resources