I have get this txt file
A:{A:0, B:6, C:4, D:3, E:0, F:0, G:0}
B:{A:6, B:0, C:2, D:0, E:4, F:0, G:0}
C:{A:4, B:2, C:0, D:2, E:0, F:8, G:0}
D:{A:3, B:0, C:2, D:0, E:3, F:0, G:0}
E:{A:0, B:4, C:0, D:3, E:0, F:7, G:6}
F:{A:0, B:0, C:8, D:0, E:7, F:0, G:6}
G:{A:0, B:0, C:0, D:0, E:6, F:6, G:0}
titles = []
with open("graph.txt", "r") as file:
for line in file:
column=line.split(":")
title=column[0]
titles.append(title)
i need to make dictionaries for each title which i got like A,B,C
Format each line properly and you can use ast.literal_eval. I used regex to find each key and replace it with the same key surrounded by quotes.
import ast
import re
KEY_PATTERN = re.compile(r'(\w+?):')
dics = []
with open('graph.txt') as f:
for line in f:
line = line.strip()
if line:
dic_str = "{" + KEY_PATTERN.sub(r'"\g<1>":', line) + "}"
dics.append(ast.literal_eval(dic_str))
print(dics)
Can be shorter (though harder to read):
import ast
import re
KEY_PATTERN = re.compile(r'(\w+?):')
with open('graph.txt') as f:
dics = [ast.literal_eval("{" + KEY_PATTERN.sub(r'"\g<1>":', line) + "}") for line in f if line.strip()]
print(dics)
Output:
[{'A': {'A': 0, 'B': 6, 'C': 4, 'D': 3, 'E': 0, 'F': 0, 'G': 0}}, {'B': {'A': 6, 'B': 0, 'C': 2, 'D': 0, 'E': 4, 'F': 0, 'G': 0}}, {'C': {'A': 4, 'B': 2, 'C': 0, 'D': 2, 'E': 0, 'F': 8, 'G': 0}}, {'D': {'A': 3, 'B': 0, 'C': 2, 'D': 0, 'E':3, 'F': 0, 'G': 0}}, {'E': {'A': 0, 'B': 4, 'C': 0, 'D': 3, 'E': 0, 'F': 7, 'G': 6}}, {'F': {'A': 0, 'B': 0, 'C': 8, 'D': 0, 'E': 7, 'F': 0, 'G': 6}}, {'G': {'A': 0, 'B': 0, 'C': 0, 'D': 0, 'E': 6, 'F': 6, 'G': 0}}]
If you want the result to be just one dict than change:
dics = []
# and
dics.append(ast.literal_eval(dic_str))
to
dics = {}
# and
dics.update(ast.literal_eval(dic_str))
This can be achieved with the following sample:
import re
titles = []
with open("graph.txt", "r") as file:
for line in file:
if ':' in line:
title=re.match(r"^(.*?):", line).groups()[0]
dict_str= re.match("^.*?\{(.*?)\}", line).groups()[0]
dictionary = {key:value for (key,value) in (item.strip().split(':') for item in dict_str.split(','))}
titles.append({title: dictionary})
for item in titles:
print(item)
This will produce output like:
{'A': {'A': '0', 'C': '4', 'B': '6', 'E': '0', 'D': '3', 'G': '0', 'F': '0'}}
{'B': {'A': '6', 'C': '2', 'B': '0', 'E': '4', 'D': '0', 'G': '0', 'F': '0'}}
{'C': {'A': '4', 'C': '0', 'B': '2', 'E': '0', 'D': '2', 'G': '0', 'F': '8'}}
{'D': {'A': '3', 'C': '2', 'B': '0', 'E': '3', 'D': '0', 'G': '0', 'F': '0'}}
{'E': {'A': '0', 'C': '0', 'B': '4', 'E': '0', 'D': '3', 'G': '6', 'F': '7'}}
{'F': {'A': '0', 'C': '8', 'B': '0', 'E': '7', 'D': '0', 'G': '6', 'F': '0'}}
{'G': {'A': '0', 'C': '0', 'B': '0', 'E': '6', 'D': '0', 'G': '0', 'F': '6'}}
Related
container = {'15/09/2021': {'a': '5', 'b': '7', 'c': '9', 'd': 'missing', 'e': '18'}, '16/09/2021': {'a': '6', 'b': '7', 'c': '9', 'd': '10', 'e': '12'}, '17/09/2021': {'a': '7', 'b': '8', 'c': '10', 'd': '11', 'e': 'missing'}, '18/09/2021': {'a': '9', 'b': '12', 'c': '15', 'd': 'missing', 'e': 'missing'}}
with open('output.json', 'w') as outfile:
json.dump(container, outfile, indent=4)
The result I'm getting :
This is the desired result :
I think both pictures are the same JSON, only thing is different it is the indentation.
I'm trying to calculate difference between two dictionaries to return a specific value.
I've entered different values which should return different results, but the result remains unchanged.
diets = {"normal" : {'p': '32.50', 'c': '60', 'f': '40.86'},
"oncology" : {'p': '35', 'c': '52.50', 'f': '37.63'},
"cardiology" : {'p': '32.50', 'c': '30', 'f': '26.88'},
"diabetes" : {'p': '20', 'c': '27.50', 'f': '27.95'},
"kidney" : {'p': '15', 'c': '55', 'f': '23.65'}}
amounts = {'p': p, 'c': c, 'f': f}
value = { k : diets[k] for k in set(diets) - set(amounts) }
calculate_error = min(value)
print(calculate_error)
When i input 32, 60 and 40, the returned result should be normal, but oncology is returned instead
You should look at the values you are creating when you do this:
set(diets)
This is just a list of keys.
{'cardiology', 'diabetes', 'kidney', 'normal', 'oncology'}
When you subtract the other list of keys, you just get the original list because no values are in common.
You need to actually step through the items and do the subtraction to get the differences. Then you can find the sum of the diffs and the min of that sum.
One way would be:
diets = {"normal" : {'p': '32.50', 'c': '60', 'f': '40.86'},
"oncology" : {'p': '35', 'c': '52.50', 'f': '37.63'},
"cardiology" : {'p': '32.50', 'c': '30', 'f': '26.88'},
"diabetes" : {'p': '20', 'c': '27.50', 'f': '27.95'},
"kidney" : {'p': '15', 'c': '55', 'f': '23.65'}}
amounts = {'p': 32., 'c': 60., 'f': 40.}
mins = [(diet, sum([abs(amounts[k] - float(d[k])) for k in amounts])) for diet, d in diets.items()]
the_min = min(mins, key = lambda x: x[1])
mins will be:
[('normal', 1.3599999999999994),
('oncology', 12.869999999999997),
('cardiology', 43.620000000000005),
('diabetes', 56.55),
('kidney', 38.35)]
the_min will be:
('normal', 1.3599999999999994)
It looks you totally confused what value would be
>>> diets = {"normal" : {'p':'32.50', 'c':'60', 'f':'40.86'},
... "oncology" : {'p':'35', 'c':'52.50', 'f':'37.63'},
... "cardiology" : {'p':'32.50', 'c':'30', 'f':'26.88'},
... "diabetes" : {'p':'20', 'c':'27.50', 'f':'27.95'},
... "kidney" : {'p':'15', 'c':'55', 'f':'23.65'}}
>>> set(diets)
{'kidney', 'cardiology', 'oncology', 'normal', 'diabetes'}
>>> amounts = {'p':32, 'c':60, 'f':40}
>>> set(amounts)
{'c', 'f', 'p'}
>>> set(diets) - set(amounts)
{'cardiology', 'diabetes', 'kidney', 'oncology', 'normal'}
>>> value = { k : diets[k] for k in set(diets) - set(amounts) }
>>> value
{'cardiology': {'p': '32.50', 'c': '30', 'f': '26.88'},
'diabetes': {'p': '20', 'c': '27.50', 'f': '27.95'},
'kidney': {'p': '15', 'c': '55', 'f': '23.65'},
'oncology': {'p': '35', 'c': '52.50', 'f': '37.63'},
'normal': {'p': '32.50', 'c': '60', 'f': '40.86'}}
>>> min(value)
'cardiology'
that said I would expect that you to get cardiology, i.e. the min from diets.keys()
That said, note that the values in the diets are str, e.g. '32.50', You will need to convert these before any calculations.
I have a python dictionary like this example:
small example:
dict = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
I only need the value part of every item which is a sequence of letters and the letters are A, T, C or G and also the length of each sequence is 7 so, for every sequence of letters there are 7 positions. I want to get the frequency of the 4 mentioned letters in every position (we have 7 positions). for every position I will make a dictionary in which the letters are key and the frequency of every letter is value. and at the end I want to make a dictionary for all seven positions and the fist dictionary would be the value of the final dictionary.
here is the expected output for the small example:
expected output:
final = {one: {'T': 2, 'A': 1, 'C': 0, 'G': 0}, two: {'T': 0, 'A': 2, 'C': 1, 'G': 0}, three: {'T': 1, 'A': 0, 'C': 2, 'G': 0}, four: {'T': 0, 'A': 0, 'C': 3, 'G': 0}, five: {'T': 0, 'A': 2, 'C': 1, 'G': 0}, six: {'T': 1, 'A': 2, 'C': 0, 'G': 0}, seven: {'T': 1, 'A': 1, 'C': 0, 'G': 1}}
to get this output I wrote a code in python but it does not return what exactly I want. do you know how to fix the following code?
one=[]
two=[]
three=[]
four=[]
five=[]
six=[]
seven=[]
mylist = dict.values()
for threeq in mylist:
one.append(threeq[0])
two.append(threeq[1])
three.append(threeq[2])
four.append(threeq[3])
five.append(threeq[4])
six.append(threeq[5])
seven.append(threeq[6])
from collections import Counter
one=Counter(one)
two=Counter(two)
three=Counter(three)
four=Counter(four)
five=Counter(five)
six=Counter(six)
seven=Counter(seven)
Here is a way to do it, using Counter:
from collections import Counter
data = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
out = {i:Counter(col) for i, col in enumerate(zip(*(data.values()))) }
# we can add the missing keys whose count is 0:
for count in out.values():
count.update(dict.fromkeys('ATGC', 0))
print(out)
# {0: Counter({'T': 2, 'G': 1, 'A': 0, 'C': 0}), 1: Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}),
# 2: Counter({'C': 2, 'T': 1, 'A': 0, 'G': 0}), 3: Counter({'C': 3, 'A': 0, 'T': 0, 'G': 0}),
# 4: Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}), 5: Counter({'A': 2, 'T': 1, 'G': 0, 'C': 0}),
# 6: Counter({'G': 1, 'T': 1, 'A': 1, 'C': 0})}
I left the original indices as integers, it's probably easier to use them than strings like 'one', 'two'... But if you really want to:
numbers_as_strings = ['one', 'two', 'three', 'four', 'five', 'six', 'seven']
out = {numbers_as_strings[key]:value for key, value in out.items()}
print(out)
# {'one': Counter({'T': 2, 'G': 1, 'A': 0, 'C': 0}),
# 'two': Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}) ....
Try this:
values = list(dict.values())
r = {}
for i in range(7):
r[i+1] = {'T': 0, 'A': 0, 'C': 0, 'G': 0}
for v in values:
r[i+1][v[i]] += 1
dict = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
options=['T','A','C','G']
innerdicts=['one','two','three','four','five','six','seven']
def getposcount(idx,letter,dict):
count=0
for v in dict.values():
if v[idx]==letter:
count+=1
return count
d = {x:{y:getposcount(innerdicts.index(x),y,dict) for y in options} for x in innerdicts}
print(d)
Output
{'six': {'T': 1, 'A': 2, 'G': 0, 'C': 0}, 'one': {'T': 2, 'A': 0, 'G': 1, 'C': 0}, 'two': {'T': 0, 'A': 2, 'G': 0, 'C': 1}, 'five': {'T': 0, 'A': 2, 'G': 0, 'C': 1}, 'three': {'T': 1, 'A': 0, 'G': 0, 'C': 2}, 'seven': {'T': 1, 'A': 1, 'G': 1, 'C': 0}, 'four': {'T': 0, 'A': 0, 'G': 0, 'C': 3}}
If you are willing to accept the integers as keys, you can do:
from collections import Counter
def counts_with_zero(count, keys='TACG'):
return {key: count.get(key, 0) for key in keys}
d = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT',
'chr12:104659651-104659658': 'GACCAAA'}
values = list(d.values())
result = {i: counts_with_zero(Counter(column)) for i, column in enumerate(zip(*values), 1)}
print(result)
Output
{1: {'A': 0, 'C': 0, 'G': 1, 'T': 2},
2: {'A': 2, 'C': 1, 'G': 0, 'T': 0},
3: {'A': 0, 'C': 2, 'G': 0, 'T': 1},
4: {'A': 0, 'C': 3, 'G': 0, 'T': 0},
5: {'A': 2, 'C': 1, 'G': 0, 'T': 0},
6: {'A': 2, 'C': 0, 'G': 0, 'T': 1},
7: {'A': 1, 'C': 0, 'G': 1, 'T': 1}}
So I have a DataFrame, I labeled the columns a - i. I want to make a Dictionary of Dictionaries where the outer key is column "a", the inner key is column "d", and the value is "e". I know how to do this by iterating through each row, but I feel like there is a more efficient way to do this using DataFrame.to_dict() but I can't figure out how...maybe DataFrame.group_by could help but that seems to be used for grouping column or index IDs.
How can I use pandas (or numpy) to create a Dictionary of Dictionaries efficiently without iterating through each row? I've shown an example of my current method and what the desired output should be below.
#!/usr/bin/python
import numpy as np
import pandas as pd
tmp_array = np.array([['AAA', 86880690, 86914111, '22RV1', 2, 2, 'H', '-'], ['ABA', 86880690, 86914111, 'A549', 2, 2, 'L', '-'], ['AAC', 86880690, 86914111, 'BFTC-905', 3, 3, 'H', '-'], ['AAB', 86880690, 86914111, 'BT-20', 2, 2, 'H', '-'], ['AAA', 86880690, 86914111, 'C32', 2, 2, 'H', '-']])
DF = pd.DataFrame(tmp_array,columns=["a,b,c,d,e,g,h,i".split(",")])
#print(DF)
a b c d e g h i
0 AAA 86880690 86914111 22RV1 2 2 H -
1 ABA 86880690 86914111 A549 2 2 L -
2 AAC 86880690 86914111 BFTC-905 3 3 H -
3 AAB 86880690 86914111 BT-20 2 2 H -
4 AAA 86880690 86914111 C32 2 2 H -
from collections import defaultdict
from itertools import izip
D_a_d_e = defaultdict(dict)
for a,d,e in izip(DF["a"],DF["d"],DF["e"]):
D_a_d_e[a][d] = e
#print(D_a_d_e)
#ignore the defaultdict part
defaultdict(<type 'dict'>, {'ABA': {'A549': '2'}, 'AAA': {'22RV1': '2', 'C32': '2'}, 'AAC': {'BFTC-905': '3'}, 'AAB': {'BT-20': '2'}})
I saw this https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column but it was a little different and it also doesn't have an answer.
There's a to_dict method:
In [11]: DF.to_dict()
Out[11]:
{'a': {0: 'AAA', 1: 'ABA', 2: 'AAC', 3: 'AAB', 4: 'AAA'},
'b': {0: '86880690', 1: '86880690', 2: '86880690' 3: '86880690', 4: '86880690'},
'c': {0: '86914111', 1: '86914111', 2: '86914111', 3: '86914111', 4: '86914111'},
'd': {0: '22RV1', 1: 'A549', 2: 'BFTC-905', 3: 'BT-20', 4: 'C32'},
'e': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
'g': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
'h': {0: 'H', 1: 'L', 2: 'H', 3: 'H', 4: 'H'},
'i': {0: '-', 1: '-', 2: '-', 3: '-', 4: '-'}}
In [12]: DF.to_dict(orient="index")
Out[12]:
{0: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': '22RV1', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
1: {'a': 'ABA', 'b': '86880690', 'c': '86914111', 'd': 'A549', 'e': '2', 'g': '2', 'h': 'L', 'i': '-'},
2: {'a': 'AAC', 'b': '86880690', 'c': '86914111', 'd': 'BFTC-905', 'e': '3', 'g': '3', 'h': 'H', 'i': '-'},
3: {'a': 'AAB', 'b': '86880690', 'c': '86914111', 'd': 'BT-20', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
4: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': 'C32', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'}}
With that in mind you can do the groupby:
In [21]: DF.set_index("d").groupby("a")[["e"]].apply(lambda x: x["e"].to_dict())
Out[21]:
a
AAA {'C32': '2', '22RV1': '2'}
AAB {'BT-20': '2'}
AAC {'BFTC-905': '3'}
ABA {'A549': '2'}
dtype: object
That said, you may be able to use a straight up MultiIndex instead of a dictionary of dictionaries:
In [31]: res = DF.set_index(["a", "d"])["e"]
In [32]: res
Out[32]:
a d
AAA 22RV1 2
ABA A549 2
AAC BFTC-905 3
AAB BT-20 2
AAA C32 2
Name: e, dtype: object
It'll work much the same way:
In [33]: res["AAA"]
Out[33]:
d
22RV1 2
C32 2
Name: e, dtype: object
In [34]: res["AAA"]["22RV1"]
Out[34]: '2'
But will be a more space-efficient / you're still in pandas.
Something along these lines:
def dictmaker(df):
"""
wrapper for storing key, values in dict. Takes df.
"""
dct={} ## storage
dct[df.d.values[0]]=df.e.values[0]
return dct
DF[['a','d','e']].groupby('a').apply(dictmaker)
a
AAA {u'22RV1': u'2'}
AAB {u'BT-20': u'2'}
AAC {u'BFTC-905': u'3'}
ABA {u'A549': u'2'}
dtype: object
What is the best way to sort a nested dictionary in Python 2.6 by value? I would like to sort by the length of the inner dictionary followed by the inner dictionary with the largest value. For example:
d = {1: {'AA': {'a': 100, 'b': 1, 'c': 45}},
2: {'AA': {'c': 2}},
3: {'BB': {'d': 122, 'a': 4, 't': 22, 'r': 23, 'w': 12}},
4: {'CC': {'y': 12, 'g': 15, 'b': 500}}}
The desired solution is a nested list:
lst = [[3, 'BB', {'d': 122, 'a': 4, 't': 22, 'r': 23, 'w': 12}],
[4, 'CC', {'y': 12, 'g': 15, 'b': 500}],
[1, 'AA', {'a': 100, 'b': 1, 'c': 45}],
[2, 'AA', {'c': 2}]]
With your corrected data-structure:
d = {1: {'AA': {'a': 100, 'b': 1, 'c': 45}},
2: {'AA': {'c': 2}},
3: {'BB': {'d': 122, 'a': 4, 't': 22, 'r': 23, 'w': 12}},
4: {'CC': {'y': 12, 'g': 15, 'b': 500}}}
def sortkey(x):
num,d1 = x
key,d2 = d1.items()[0] #Some may prefer `next(d.iteritems())`
return len(d2),max(d2.values())
exactly_how_you_want_it = [([k] + v.keys() + v.values()) for k,v in
sorted(d.items(),reverse=True,key=sortkey)]
for item in exactly_how_you_want_it:
print item
results:
[3, 'BB', {'a': 4, 'r': 23, 'd': 122, 'w': 12, 't': 22}]
[4, 'CC', {'y': 12, 'b': 500, 'g': 15}]
[1, 'AA', {'a': 100, 'c': 45, 'b': 1}]
[2, 'AA', {'c': 2}]