I'm using the following code to unzip a dictionary and count the values at each site:
result = [Counter(site) for site in zip(*myDict.values())]
The output looks something like: Counter({'A': 74}), Counter({'G': 72, 'C': 2})
There are five possible values: A, T, G, C, and N
I only want the counter to spit out a value if one of the five values is less than 74. So for the above example, only the second would be outputted. How do you use an if statement within the counter? Furthermore, how can I label each site, so that above it could just say:
Site 2: 'G': 72, 'C': 2
myDict looks like this:
{'abc123': ATGGAGGACGACT, 'def332': ATGCATTGACGC}
Except there are 74 entries. Each value is the same length. Basically, I don't know how to use a counter that can give me an output for when each site of each value doesn't match up. So for the sequences above, the 4th site does not match. I want the counter to output the following:
site 4: 'G': 1, 'C': 1
You can use enumerate to index the sites and the most_common method on Counter can be used to check if the count is < 74. Here's an example with just two strings:
from collections import Counter
myDict = {'a':'ATGTTCN','b':'ATTTCCG'}
result = [(i,Counter(site)) for i,site in enumerate(zip(*myDict.values()))]
result = [x for x in result if x[1].most_common()[0][1] < 2]
for site,count in result:
print 'Site {}: {}'.format(site,str(count)[9:-2])
Output:
Site 2: 'T': 1, 'G': 1
Site 4: 'C': 1, 'T': 1
Site 6: 'G': 1, 'N': 1
using Dict Comprehension and only storing values if max(Counter(x).values())<74,
use enumerate() to get the Site number.
>>> mydict={'abc123': 'ATGGAGGACGACT', 'def332': 'ATGCATTGACGC'}
>>> result={'Site {}'.format(i+1):Counter(x) for i,x in enumerate(zip(*mydict.values())) if max(Counter(x).values())<2}
>>> result
{'Site 7': Counter({'T': 1, 'G': 1}), 'Site 6': Counter({'T': 1, 'G': 1}), 'Site 4': Counter({'C': 1, 'G': 1}), 'Site 9': Counter({'A': 1, 'C': 1}), 'Site 8': Counter({'A': 1, 'G': 1}), 'Site 11': Counter({'A': 1, 'G': 1}), 'Site 10': Counter({'C': 1, 'G': 1})}
or convert Counter to dict:
>>> {'Site {}'.format(i+1):dict(Counter(x)) for i,x in enumerate(zip(*mydict.values())) if max(Counter(x).values())<2}
{'Site 7': {'T': 1, 'G': 1}, 'Site 6': {'T': 1, 'G': 1}, 'Site 4': {'C': 1, 'G': 1}, 'Site 9': {'A': 1, 'C': 1}, 'Site 8': {'A': 1, 'G': 1}, 'Site 11': {'A': 1, 'G': 1}, 'Site 10': {'C': 1, 'G': 1}}
Related
I have 2 array of objects:
a = [{'a': 1, 'b': 2}, {'c': 3, 'd': 4}, {'e': 5, 'f': 6}]
b = [{'a': 1, 'b': 2}, {'g': 3, 'h': 4}, {'f': 6, 'e': 5}]
Output:
a - b = [{'c': 3, 'd': 4}] ("-" symbol is only for representation, showing difference. Not mathematical minus.)
b - a = [{'g': 3, 'h': 4}]
In every array, the order of key may be different. I can try following and check for that:
for i in range(len(a)):
current_val = a[i]
for x, y in current_val.items:
//search x keyword in array b and compare it with b
but this approach doesn't feel right. Is there simpler way to do this or any utility library which can do this similar to fnc or pydash?
You can use lambda:
g = lambda a,b : [x for x in a if x not in b]
g(a,b) # a-b
[{'c': 3, 'd': 4}]
g(b,a) # b-a
[{'g': 3, 'h': 4}]
Just test if all elements are in the other array
a = [{'a': 1, 'b': 2}, {'c': 3, 'd': 4}, {'e': 5, 'f': 6}]
b = [{'a': 1, 'b': 2}, {'g': 3, 'h': 4}, {'f': 6, 'e': 5}]
def find_diff(array_a, array_b):
diff = []
for e in array_a:
if e not in array_b:
diff.append(e)
return diff
print(find_diff(a, b))
print(find_diff(b, a))
the same with list comprehension
def find_diff(array_a, array_b):
return [e for e in array_a if e not in array_b]
here is the code for subtracting list of dictionaries
a = [{'a': 1, 'b': 2}, {'c': 3, 'd': 4}, {'e': 6, 'f': 6}]
b = [{'a': 1, 'b': 2}, {'g': 3, 'h': 4}, {'f': 6, 'e': 6}]
a_b = []
b_a = []
for element in a:
if element not in b:
a_b.append( element )
for element in b:
if element not in a:
b_a.append( element )
print("a-b =",a_b)
print("b-a =",b_a)
I am trying to go from this dataframe:
run property low high abs1perc0 in1out0 weight
0 bob a 5 9 1 1 2
1 bob s 5 9 1 1 2
2 bob d 1 10 0 1 2
3 tom a 1 2 1 1 2
4 tom s 2 3 1 1 2
5 tom d 8 9 0 1 2
to dictionaries that are named after a concatenation of the individual 'run' names and the column names (except property). Property has to become the key and the data has to become the values i.e:
boblow = {'a':5, 's':5, 'd':1}
bobhigh = {'a':9, 's':9, 'd':10}
bobabs1perc0 = {'a':1, 's':1, 'd':0}
...
tomlow = {'a':1, 's':2, 'd':8}
...
This would have to happen to huge dfs and I cant wrap my head around how to do it other than by hand. I started making a list of concatenated names of individual values of the 'run' column but I'm certain someone here has a much faster and smarter way of doing it.
Thanks a Bunch!!
I recommend save the the output into dict of dict , also do not merge your tuple key to one key , also after we reshape your df, to_dict still work
d=df.set_index(['run','property']).stack().unstack(1).to_dict('index')
{('bob', 'low'): {'a': 5, 'd': 1, 's': 5}, ('bob', 'high'): {'a': 9, 'd': 10, 's': 9}, ('bob', 'abs1perc0'): {'a': 1, 'd': 0, 's': 1}, ('bob', 'in1out0'): {'a': 1, 'd': 1, 's': 1}, ('bob', 'weight'): {'a': 2, 'd': 2, 's': 2}, ('tom', 'low'): {'a': 1, 'd': 8, 's': 2}, ('tom', 'high'): {'a': 2, 'd': 9, 's': 3}, ('tom', 'abs1perc0'): {'a': 1, 'd': 0, 's': 1}, ('tom', 'in1out0'): {'a': 1, 'd': 1, 's': 1}, ('tom', 'weight'): {'a': 2, 'd': 2, 's': 2}}
d[('bob','low')]
{'a': 5, 'd': 1, 's': 5}
My question is somewhat similar to this question: https://codereview.stackexchange.com/questions/175079/removing-key-value-pairs-in-list-of-dicts. Essentially, I have a list of dictionaries, and I want to remove duplicates from the list based on the unique combination of two (or more) keys within each dictionary.
Suppose I have the following list of dictionaries:
some_list_of_dicts = [
{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 1, 'd': 5, 'e': 1},
{'a': 1, 'b': 1, 'c': 1, 'd': 7, 'e': 8},
{'a': 1, 'b': 1, 'c': 1, 'd': 9, 'e': 6},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}
]
And let's suppose the combination of a, b, and c have to be unique; any other values can be whatever they want, but the combination of these three must be unique to this list. I would want to take whichever unique combo of a, b, and c came first, keep that, and discard everything else where that combination is the same.
The new list, after running it through some remove_duplicates function would look like this:
new_list = [
{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}
]
I've only managed to come up with this:
def remove_duplicates(old_list):
uniqueness_check_list = []
new_list = []
for item in old_list:
# The unique combination is 'a', 'b', and 'c'
uniqueness_check = "{}{}{}".format(
item["a"], item["b"], item["c"]
)
if uniqueness_check not in uniqueness_check_list:
new_list.append(item)
uniqueness_check_list.append(uniqueness_check)
return new_list
But this doesn't feel very Pythonic. It also has the problem that I've hardcoded in the function which keys have to be unique; it would be better if I could specify that as an argument to the function itself, but again, not sure what's the most elegant way to do this.
You can use a dict comprehension to construct a dict from the list of dicts in the reversed order so that the values of the first of any unique combinations would take precedence. Use operator.itemgetter to get the unique keys as a tuple. Reverse again in the end for the original order:
from operator import itemgetter
list({itemgetter('a', 'b', 'c')(d): d for d in reversed(some_list_of_dicts)}.values())[::-1]
This returns:
[{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}]
With the help of a function to keep track of duplicates, you can use some list comprehension:
def remove_duplicates(old_list, cols=('a', 'b', 'c')):
duplicates = set()
def is_duplicate(item):
duplicate = item in duplicates
duplicates.add(item)
return duplicate
return [x for x in old_list if not is_duplicate(tuple([x[col] for col in cols]))]
To use:
>>> remove_duplicates(some_list_of_dicts)
[
{'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2},
{'a': 1, 'c': 2, 'b': 1, 'e': 3, 'd': 2},
{'a': 1, 'c': 3, 'b': 1, 'e': 3, 'd': 2},
{'a': 1, 'c': 4, 'b': 1, 'e': 3, 'd': 2}
]
You can also provide different columns to key on:
>>> remove_duplicates(some_list_of_dicts, cols=('a', 'd'))
[
{'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2},
{'a': 1, 'c': 1, 'b': 1, 'e': 1, 'd': 5},
{'a': 1, 'c': 1, 'b': 1, 'e': 8, 'd': 7},
{'a': 1, 'c': 1, 'b': 1, 'e': 6, 'd': 9}
]
I have a python dictionary like this example:
small example:
dict = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
I only need the value part of every item which is a sequence of letters and the letters are A, T, C or G and also the length of each sequence is 7 so, for every sequence of letters there are 7 positions. I want to get the frequency of the 4 mentioned letters in every position (we have 7 positions). for every position I will make a dictionary in which the letters are key and the frequency of every letter is value. and at the end I want to make a dictionary for all seven positions and the fist dictionary would be the value of the final dictionary.
here is the expected output for the small example:
expected output:
final = {one: {'T': 2, 'A': 1, 'C': 0, 'G': 0}, two: {'T': 0, 'A': 2, 'C': 1, 'G': 0}, three: {'T': 1, 'A': 0, 'C': 2, 'G': 0}, four: {'T': 0, 'A': 0, 'C': 3, 'G': 0}, five: {'T': 0, 'A': 2, 'C': 1, 'G': 0}, six: {'T': 1, 'A': 2, 'C': 0, 'G': 0}, seven: {'T': 1, 'A': 1, 'C': 0, 'G': 1}}
to get this output I wrote a code in python but it does not return what exactly I want. do you know how to fix the following code?
one=[]
two=[]
three=[]
four=[]
five=[]
six=[]
seven=[]
mylist = dict.values()
for threeq in mylist:
one.append(threeq[0])
two.append(threeq[1])
three.append(threeq[2])
four.append(threeq[3])
five.append(threeq[4])
six.append(threeq[5])
seven.append(threeq[6])
from collections import Counter
one=Counter(one)
two=Counter(two)
three=Counter(three)
four=Counter(four)
five=Counter(five)
six=Counter(six)
seven=Counter(seven)
Here is a way to do it, using Counter:
from collections import Counter
data = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
out = {i:Counter(col) for i, col in enumerate(zip(*(data.values()))) }
# we can add the missing keys whose count is 0:
for count in out.values():
count.update(dict.fromkeys('ATGC', 0))
print(out)
# {0: Counter({'T': 2, 'G': 1, 'A': 0, 'C': 0}), 1: Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}),
# 2: Counter({'C': 2, 'T': 1, 'A': 0, 'G': 0}), 3: Counter({'C': 3, 'A': 0, 'T': 0, 'G': 0}),
# 4: Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}), 5: Counter({'A': 2, 'T': 1, 'G': 0, 'C': 0}),
# 6: Counter({'G': 1, 'T': 1, 'A': 1, 'C': 0})}
I left the original indices as integers, it's probably easier to use them than strings like 'one', 'two'... But if you really want to:
numbers_as_strings = ['one', 'two', 'three', 'four', 'five', 'six', 'seven']
out = {numbers_as_strings[key]:value for key, value in out.items()}
print(out)
# {'one': Counter({'T': 2, 'G': 1, 'A': 0, 'C': 0}),
# 'two': Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}) ....
Try this:
values = list(dict.values())
r = {}
for i in range(7):
r[i+1] = {'T': 0, 'A': 0, 'C': 0, 'G': 0}
for v in values:
r[i+1][v[i]] += 1
dict = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
options=['T','A','C','G']
innerdicts=['one','two','three','four','five','six','seven']
def getposcount(idx,letter,dict):
count=0
for v in dict.values():
if v[idx]==letter:
count+=1
return count
d = {x:{y:getposcount(innerdicts.index(x),y,dict) for y in options} for x in innerdicts}
print(d)
Output
{'six': {'T': 1, 'A': 2, 'G': 0, 'C': 0}, 'one': {'T': 2, 'A': 0, 'G': 1, 'C': 0}, 'two': {'T': 0, 'A': 2, 'G': 0, 'C': 1}, 'five': {'T': 0, 'A': 2, 'G': 0, 'C': 1}, 'three': {'T': 1, 'A': 0, 'G': 0, 'C': 2}, 'seven': {'T': 1, 'A': 1, 'G': 1, 'C': 0}, 'four': {'T': 0, 'A': 0, 'G': 0, 'C': 3}}
If you are willing to accept the integers as keys, you can do:
from collections import Counter
def counts_with_zero(count, keys='TACG'):
return {key: count.get(key, 0) for key in keys}
d = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT',
'chr12:104659651-104659658': 'GACCAAA'}
values = list(d.values())
result = {i: counts_with_zero(Counter(column)) for i, column in enumerate(zip(*values), 1)}
print(result)
Output
{1: {'A': 0, 'C': 0, 'G': 1, 'T': 2},
2: {'A': 2, 'C': 1, 'G': 0, 'T': 0},
3: {'A': 0, 'C': 2, 'G': 0, 'T': 1},
4: {'A': 0, 'C': 3, 'G': 0, 'T': 0},
5: {'A': 2, 'C': 1, 'G': 0, 'T': 0},
6: {'A': 2, 'C': 0, 'G': 0, 'T': 1},
7: {'A': 1, 'C': 0, 'G': 1, 'T': 1}}
This question already has answers here:
How do I merge two dictionaries in a single expression in Python?
(43 answers)
Closed 5 years ago.
I have two lists of dictionaries which I am trying to get the product of:
from itertools import product
list1 = [{'A': 1, 'B': 1}, {'A': 2, 'B': 2}, {'A': 2, 'B': 1}, {'A': 1, 'B': 2}]
list2 = [{'C': 1, 'D': 1}, {'C': 1, 'D': 2}]
for p in product(list1, list2):
print p
and this gives me the output:
({'A': 1, 'B': 1}, {'C': 1, 'D': 1})
({'A': 1, 'B': 1}, {'C': 1, 'D': 2})
({'A': 2, 'B': 2}, {'C': 1, 'D': 1})
({'A': 2, 'B': 2}, {'C': 1, 'D': 2})
({'A': 2, 'B': 1}, {'C': 1, 'D': 1})
({'A': 2, 'B': 1}, {'C': 1, 'D': 2})
({'A': 1, 'B': 2}, {'C': 1, 'D': 1})
({'A': 1, 'B': 2}, {'C': 1, 'D': 2})
How would I flatten these so the output is a single dict rather than a tuple of dicts?:
{'A': 1, 'B': 1, 'C': 1, 'D': 1}
{'A': 1, 'B': 1, 'C': 1, 'D': 2}
{'A': 2, 'B': 2, 'C': 1, 'D': 1}
{'A': 2, 'B': 2, 'C': 1, 'D': 2}
{'A': 2, 'B': 1, 'C': 1, 'D': 1}
{'A': 2, 'B': 1, 'C': 1, 'D': 2}
{'A': 1, 'B': 2, 'C': 1, 'D': 1}
{'A': 1, 'B': 2, 'C': 1, 'D': 2}
Looks like you want to merge the dictionaries
for p1, p2 in product(list1, list2):
merged = {**p1, **p2}
print(merged)
In earlier versions of Python, you can't merge with this expression. Use p1.update(p2) instead.