I am looking for a nice pythonic solution for my problem.
I have a dictionary like:
char_dist = {'b' : 0.345, 'd' : 0.158, 'c' : 0.059, 'w' : 0.437}
And I would like to get some like this:
new_dict = {'b': {'b': 0.11902,
'd': 0.05451,
'c': 0.020355,
'w': 0.150765},
'd': {'b': 0.054501,
'd': 0.024964,
'c': 0.009322,
'w': 0.150765},
'c': {'b': 0.020355,
'd': 0.009322,
'c': 0.003481,
'w': 0.025783},
'w': {'b': 0.150765,
'd': 0.069046,
'c': 0.025783,
'w': 0.190969}}
The new dict is the result of multiply the values from old dict.
new_dict = {char_dist[key] : {char_dist[key1]: char_dist[key1][value] * char_dist[key2][value], etc...
P.S. I tried some like this, but still figuring out:
new = defaultdict(dict)
for base, val in char_distribution.items():
new[base] = {base: p for base, p in
zip('bdcw', char_dist)}
pprint(matrix)
But I got the same value for all the nested dictionary:
defaultdict(<class 'dict'>,
{'b': {'b': 0.11902,
'd': 0.05451,
'c': 0.020355,
'w': 0.150765},
'c': {'b': 0.11902,
'd': 0.05451,
'c': 0.020355,
'w': 0.150765}
'd': {'b': 0.11902,
'd': 0.05451,
'c': 0.020355,
'w': 0.150765}
'w': {'b': 0.11902,
'd': 0.05451,
'c': 0.020355,
'w': 0.150765}})
I want to create a kind of transition matrix.
You can do it with nested dictionary comprehensions:
expected = {kk: {k: vv*v for k, v in char_dist.items()} for kk, vv in char_dist.items()}
print(expected)
[out]:
{'b': {'b': 0.11902, 'c': 0.02035, 'd': 0.05451, 'w': 0.15076},
'c': {'b': 0.02035, 'c': 0.00348, 'd': 0.00932, 'w': 0.02578},
'd': {'b': 0.05451, 'c': 0.00932, 'd': 0.02496, 'w': 0.06905},
'w': {'b': 0.15076, 'c': 0.02578, 'd': 0.06905, 'w': 0.19097}}
I guess if you're dealing with distributions, some linear algebra won't hurt. Meet Pandas:
import pandas as pd
....
df = pd.DataFrame([char_dist])
df.T.dot(df)
Output:
b d c w
b 0.119025 0.054510 0.020355 0.150765
d 0.054510 0.024964 0.009322 0.069046
c 0.020355 0.009322 0.003481 0.025783
w 0.150765 0.069046 0.025783 0.190969
I think the easiest:
char_dist = {'b': 0.345, 'd': 0.158, 'c': 0.059, 'w': 0.437}
old_dict = {'b': 0.68746258423, 'd': 0.5429823052, 'c': 0.5849805243, 'w': 0.95840285}
new_dict = dict.fromkeys(char_dist, old_dict)
print(new_dict)
Came up with something like this.
base_distribution = {'A' : 0.345, 'C' : 0.158, 'G' : 0.059, 'T' : 0.437}
markov = defaultdict()
for base, val in base_distribution.items():
markov[base] = markov.get(base, {})
for key, val in base_distribution.items():
p = round(base_distribution[base] * base_distribution[key], 4)
markov[base][key] = markov[base].get(key, p)
pprint(markov)
defaultdict(None,
{'A': {'A': 0.119, 'C': 0.0545, 'G': 0.0204, 'T': 0.1508},
'C': {'A': 0.0545, 'C': 0.025, 'G': 0.0093, 'T': 0.069},
'G': {'A': 0.0204, 'C': 0.0093, 'G': 0.0035, 'T': 0.0258},
'T': {'A': 0.1508, 'C': 0.069, 'G': 0.0258, 'T': 0.191}})
Related
I'm trying to plot a dataset contained in a dictionary:
my_dict = [{'A': [0.7315847607219574],
'B': [0.5681159420289855],
'C': [0.9999999999999997],
'D': [0.5793801642856945],
'E': [0.6867350732769776],
'F': [0.7336804366512104]},
{'A': [0.4758837897858464],
'B': [0.4219886317147244],
'C': [0.6206223617183635],
'D': [0.3911170612926995],
'E': [0.5159829508133175],
'F': [0.479838956092881]},
{'A': [0.7315847607219574],
'B': [0.5681159420289855],
'C': [0.9999999999999997],
'D': [0.5793801642856945],
'E': [0.6867350732769776],
'F': [0.7336804366512104]}]
then
df = pd.DataFrame(my_dict)
df.plot(kind="barh")
plt.show()
dtypes is showing object type for all, and the syntax error TypeError: no numeric data to plot
I've exhausted most of my brain cells trying to figure this out but with no avail. All help will be appreciated.
Extracting the number from the list does the job
import pandas as pd
import matplotlib.pyplot as plt
my_dict = [{'A': [0.7315847607219574],
'B': [0.5681159420289855],
'C': [0.9999999999999997],
'D': [0.5793801642856945],
'E': [0.6867350732769776],
'F': [0.7336804366512104]},
{'A': [0.4758837897858464],
'B': [0.4219886317147244],
'C': [0.6206223617183635],
'D': [0.3911170612926995],
'E': [0.5159829508133175],
'F': [0.479838956092881]},
{'A': [0.7315847607219574],
'B': [0.5681159420289855],
'C': [0.9999999999999997],
'D': [0.5793801642856945],
'E': [0.6867350732769776],
'F': [0.7336804366512104]}]
for entry in my_dict:
for k,v in entry.items():
entry[k] = v[0]
df = pd.DataFrame(my_dict)
df.plot(kind="barh")
plt.show()
I have a list of dictionaries like
[
{'a': {'q': 1}, 'b': {'r': 2}, 'c': {'s': 3}},
{'a': {'t': 4}, 'b': {'u': 5}, 'c': {'v': 6}},
{'a': {'w': 7}, 'b': {'x': 8}, 'c': {'z': 9}}
]
and I want the output to be
{
'a': {'q': 1, 't': 4, 'w': 7},
'b': {'r': 2, 'u': 5, 'x': 8},
'c': {'s': 3, 'v': 6, 'z': 9}
}
There are several ways of doing this, one with usage of collections.defaultdict:
import collections
result = collections.defaultdict(dict)
lst = [
{'a': {'q': 1}, 'b': {'r': 2}, 'c': {'s': 3}},
{'a': {'t': 4}, 'b': {'u': 5}, 'c': {'v': 6}},
{'a': {'w': 7}, 'b': {'x': 8}, 'c': {'z': 9}}
]
for dct in lst:
for key, value in dct.items():
result[key].update(value)
print(result)
I have recently been working on a python application that handles some sort of schedule. I have a dictionary that contains the number of days in a rotation in a schedule, and then each day contains a dictionary with each different part of the day. It looks like this:
schedule = {
'rotation': 6,
'1' : {'B': '8:32', 'C': '9:34', 'D' : '10:36', 'F':'12:11', 'G': '1:13', 'H':'2:15'},
'2' : {'A': '8:32', 'B': '9:34', 'C,' : '10:36', 'E':'12:11', 'F': '1:13', 'G,':'2:15'},
'3' : {'A': '8:32', 'B': '9:34', 'D,' : '10:36', 'E':'12:11', 'F': '1:13', 'H,':'2:15'},
'4' : {'A': '8:32', 'C': '9:34', 'D,' : '10:36', 'E':'12:11', 'G': '1:13', 'H,':'2:15'},
'5' : {'B' : '8:40', 'D' : '11:00', 'F' : '12:55', 'H' : '2:15' },
'6' : {'A' : '8:40', 'C' : '11:00', 'E' : '12:55', 'G' : '2:15' }
}
This all looks like it should work, yet when I print it out, I get a distorted dictionary that looks like it is sorted:
{'1': {'C': '9:34', 'B': '8:32', 'D': '10:36', 'G': '1:13', 'F': '12:11', 'H': '2:15'},
'3': {'A': '8:32', 'D,': '10:36', 'B': '9:34', 'E': '12:11', 'F': '1:13', 'H,': '2:15'},
'2': {'A': '8:32', 'B': '9:34', 'E': '12:11', 'F': '1:13', 'C,': '10:36', 'G,': '2:15'},
'5': {'H': '2:15', 'B': '8:40', 'D': '11:00', 'F': '12:55'},
'4': {'A': '8:32', 'C': '9:34', 'E': '12:11', 'G': '1:13', 'D,': '10:36', 'H,': '2:15'},
'6': {'A': '8:40', 'C': '11:00', 'E': '12:55', 'G': '2:15'},
'rotation': 6}
As you can see, in day 1, it starts with C instead of B when printing, and the 'rotation' is at the end of the dictionary instead of the front. Why does my dictionary print like this?
The order in a dictionary is not stable, due to the hash function. On top of this, Python now uses a salt value when hashing, meaning that the order will be different each run (except if you ask for a stable dict).
Python dictionary is not required to preserve order. If order is what you want then you could use lists. If you just want to view a dictionary in sorted order, you can use .sort() or sorted() to help you print.
You can sort this dictionary but you have to make an exception for rotation since its values do not fit the rest of the format of being a dictionary with alphabetical keys
d = {k: dict(sorted(v.items(), key=lambda x: x[0])) if k != 'rotation' else schedule[k] for k, v in schedule.items()}
print(d)
# {'rotation': 6, '1': {'B': '8:32', 'C': '9:34', 'D': '10:36', 'F': '12:11', 'G': '1:13', 'H': '2:15'}, '2': {'A': '8:32', 'B': '9:34', 'C,': '10:36', 'E': '12:11', 'F': '1:13', 'G,': '2:15'}, '3': {'A': '8:32', 'B': '9:34', 'D,': '10:36', 'E': '12:11', 'F': '1:13', 'H,': '2:15'}, '4': {'A': '8:32', 'C': '9:34', 'D,': '10:36', 'E': '12:11', 'G': '1:13', 'H,': '2:15'}, '5': {'B': '8:40', 'D': '11:00', 'F': '12:55', 'H': '2:15'}, '6': {'A': '8:40', 'C': '11:00', 'E': '12:55', 'G': '2:15'}}
I have a list of dictionaries, it looks something like this:
[{'T': 13472}, {'A': 13472}, {'C': 9866, 'T': 3606}, {'G': 13472}, {'G': 13472}, {'A': 221, 'C': 26, 'T': 12845, 'G': 380}, {'T': 13472}, {'A': 13472}, {'C': 546, 'T': 12926}, {'C': 13472}, {'A': 13472}, {'C': 10674, 'T': 2798}, {'C': 13472}, {'A': 13472}, {'C': 554, 'T': 12918}, {'C': 13472}, {'A': 13472}]
The issue is right now, it's formatted as a string. In other words, when I try to iterate through the items in the list, I get only individual characters. Is there a way to convert it back into a "list of dictionaries" type?
Use ast.literal_eval to convert a string to a python object(safely):
>>> from ast import literal_eval
>>> strs = "[{'T': 13472}, {'A': 13472}, {'C': 9866, 'T': 3606}, {'G': 13472}, {'G': 13472}, {'A': 221, 'C': 26, 'T': 12845, 'G': 380}, {'T': 13472}, {'A': 13472}, {'C': 546, 'T': 12926}, {'C': 13472}, {'A': 13472}, {'C': 10674, 'T': 2798}, {'C': 13472}, {'A': 13472}, {'C': 554, 'T': 12918}, {'C': 13472}, {'A': 13472}]"
>>> literal_eval(strs)
[{'T': 13472}, {'A': 13472}, {'C': 9866, 'T': 3606}, {'G': 13472}, {'G': 13472}, {'A': 221, 'C': 26, 'T': 12845, 'G': 380}, {'T': 13472}, {'A': 13472}, {'C': 546, 'T': 12926}, {'C': 13472}, {'A': 13472}, {'C': 10674, 'T': 2798}, {'C': 13472}, {'A': 13472}, {'C': 554, 'T': 12918}, {'C': 13472}, {'A': 13472}]
Do you mean you have something like:
x = "[{'T': 13472}, {'A': 13472}]"
Then you could always simply evaluate it, assuming the source is safe. Have a look at:
http://docs.python.org/2/library/functions.hatml#eval
http://docs.python.org/2/library/ast.html#ast.literal_eval
I have a multidictionary:
{'a': {'b': {'c': {'d': '2'}}},
'b': {'b': {'c': {'d': '7'}}},
'c': {'b': {'c': {'d': '3'}}},
'f': {'d': {'c': {'d': '1'}}}}
How can I sort it based on the values '2' '3' '7' '1'
so my output will be:
f.d.c.d.1
a.b.c.d.2
c.b.c.d.3
b.b.c.d.7
You've got a fixed-shape structure, which is pretty simple to sort:
>>> d = {'a': {'b': {'c': {'d': '2'}}}, 'c': {'b': {'c': {'d': '3'}}}, 'b': {'b': {'c': {'d': '7'}}}, 'f': {'d': {'c': {'d': '1'}}}}
>>> sorted(d, key=lambda x: d[x].values()[0].values()[0].values()[0])
['f', 'a', 'c', 'b']
>>> sorted(d.items(), key=lambda x: x[1].values()[0].values()[0].values()[0])
[('f', {'d': {'c': {'d': '1'}}}),
('a', {'b': {'c': {'d': '2'}}}),
('c', {'b': {'c': {'d': '3'}}}),
('b', {'b': {'c': {'d': '7'}}})]
Yes, this is a bit ugly and clumsy, but only because your structure is inherently ugly and clumsy.
In fact, other than the fact that d['f'] has a key 'd' instead of 'b', it's even more straightforward. I suspect that may be a typo, in which case things are even easier:
>>> d = {'a': {'b': {'c': {'d': '2'}}}, 'c': {'b': {'c': {'d': '3'}}}, 'b': {'b': {'c': {'d': '7'}}}, 'f': {'b': {'c': {'d': '1'}}}}
>>> sorted(d.items(), key=lambda x:x[1]['b']['c']['d'])
[('f', {'b': {'c': {'d': '1'}}}),
('a', {'b': {'c': {'d': '2'}}}),
('c', {'b': {'c': {'d': '3'}}}),
('b', {'b': {'c': {'d': '7'}}})]
As others have pointed out, this is almost certainly not the right data structure for whatever it is you're trying to do. But, if it is, this is how to deal with it.
PS, it's confusing to call this a "multidictionary". That term usually means "dictionary with potentially multiple values per key" (a concept which in Python you'd probably implement as a defaultdict with list or set as its default). A single, single-valued dictionary that happens to contain dictionaries is better named a "nested dictionary".
In my opinion this kind of design is very hard to read and maintain. Can you consider replacing the internal dictionaries with string-names?
E.g.:
mydict = {
'a.b.c.d' : 2,
'b.b.c.d' : 7,
'c.b.c.d' : 3,
'f.d.c.d' : 1,
}
This one is much easier to sort and waaaay more readable.
Now, a dictionary is something unsortable due to its nature. Thus, you have to sort an e.g. a list representation of it:
my_sorted_dict_as_list = sorted(mydict.items(),
key=lambda kv_pair: kv_pair[1])
you can do it recursively:
d = {'a': {'b': {'c': {'d': '2'}}}, 'c': {'b': {'c': {'d': '3'}}}, 'b': {'b': {'c': {'d': '7'}}}, 'f': {'d': {'c': {'d': '1'}}}}
def nested_to_string(item):
if hasattr(item, 'items'):
out = ''
for key in item.keys():
out += '%s.' % key + nested_to_string(item[key])
return out
else:
return item + '\n'
print nested_to_string(d)
or
def nested_to_string(item):
def rec_fun(item, temp, res):
if hasattr(item, 'items'):
for key in item.keys():
temp += '%s.' % key
rec_fun(item[key], temp, res)
temp = ''
else:
res.append(temp + item)
res = []
rec_fun(d, '', res)
return res
why do you want to do this.
Your data structure is basically a multi-level tree, so a good way to do what you want is to do what is called a depth-first traversal of it, which can be done recursively, and then massage the intermediate results a bit to sort and format them them into the desired format.
multidict = {'a': {'b': {'c': {'d': '2'}}},
'b': {'b': {'c': {'d': '7'}}},
'c': {'b': {'c': {'d': '3'}}},
'f': {'d': {'c': {'d': '1'}}}}
def nested_dict_to_string(nested_dict):
chains = []
for key,value in nested_dict.items():
chains.append([key] + visit(value))
chains = ['.'.join(chain) for chain in sorted(chains, key=lambda chain: chain[-1])]
return '\n'.join(chains)
def visit(node):
result = []
try:
for key,value in node.items():
result += [key] + visit(value)
except AttributeError:
result = [node]
return result
print nested_dict_to_string(multidict)
Output:
f.d.c.d.1
a.b.c.d.2
c.b.c.d.3
b.b.c.d.7