My question is somewhat similar to this question: https://codereview.stackexchange.com/questions/175079/removing-key-value-pairs-in-list-of-dicts. Essentially, I have a list of dictionaries, and I want to remove duplicates from the list based on the unique combination of two (or more) keys within each dictionary.
Suppose I have the following list of dictionaries:
some_list_of_dicts = [
{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 1, 'd': 5, 'e': 1},
{'a': 1, 'b': 1, 'c': 1, 'd': 7, 'e': 8},
{'a': 1, 'b': 1, 'c': 1, 'd': 9, 'e': 6},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}
]
And let's suppose the combination of a, b, and c have to be unique; any other values can be whatever they want, but the combination of these three must be unique to this list. I would want to take whichever unique combo of a, b, and c came first, keep that, and discard everything else where that combination is the same.
The new list, after running it through some remove_duplicates function would look like this:
new_list = [
{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}
]
I've only managed to come up with this:
def remove_duplicates(old_list):
uniqueness_check_list = []
new_list = []
for item in old_list:
# The unique combination is 'a', 'b', and 'c'
uniqueness_check = "{}{}{}".format(
item["a"], item["b"], item["c"]
)
if uniqueness_check not in uniqueness_check_list:
new_list.append(item)
uniqueness_check_list.append(uniqueness_check)
return new_list
But this doesn't feel very Pythonic. It also has the problem that I've hardcoded in the function which keys have to be unique; it would be better if I could specify that as an argument to the function itself, but again, not sure what's the most elegant way to do this.
You can use a dict comprehension to construct a dict from the list of dicts in the reversed order so that the values of the first of any unique combinations would take precedence. Use operator.itemgetter to get the unique keys as a tuple. Reverse again in the end for the original order:
from operator import itemgetter
list({itemgetter('a', 'b', 'c')(d): d for d in reversed(some_list_of_dicts)}.values())[::-1]
This returns:
[{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}]
With the help of a function to keep track of duplicates, you can use some list comprehension:
def remove_duplicates(old_list, cols=('a', 'b', 'c')):
duplicates = set()
def is_duplicate(item):
duplicate = item in duplicates
duplicates.add(item)
return duplicate
return [x for x in old_list if not is_duplicate(tuple([x[col] for col in cols]))]
To use:
>>> remove_duplicates(some_list_of_dicts)
[
{'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2},
{'a': 1, 'c': 2, 'b': 1, 'e': 3, 'd': 2},
{'a': 1, 'c': 3, 'b': 1, 'e': 3, 'd': 2},
{'a': 1, 'c': 4, 'b': 1, 'e': 3, 'd': 2}
]
You can also provide different columns to key on:
>>> remove_duplicates(some_list_of_dicts, cols=('a', 'd'))
[
{'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2},
{'a': 1, 'c': 1, 'b': 1, 'e': 1, 'd': 5},
{'a': 1, 'c': 1, 'b': 1, 'e': 8, 'd': 7},
{'a': 1, 'c': 1, 'b': 1, 'e': 6, 'd': 9}
]
I have a python dictionary like this example:
small example:
dict = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
I only need the value part of every item which is a sequence of letters and the letters are A, T, C or G and also the length of each sequence is 7 so, for every sequence of letters there are 7 positions. I want to get the frequency of the 4 mentioned letters in every position (we have 7 positions). for every position I will make a dictionary in which the letters are key and the frequency of every letter is value. and at the end I want to make a dictionary for all seven positions and the fist dictionary would be the value of the final dictionary.
here is the expected output for the small example:
expected output:
final = {one: {'T': 2, 'A': 1, 'C': 0, 'G': 0}, two: {'T': 0, 'A': 2, 'C': 1, 'G': 0}, three: {'T': 1, 'A': 0, 'C': 2, 'G': 0}, four: {'T': 0, 'A': 0, 'C': 3, 'G': 0}, five: {'T': 0, 'A': 2, 'C': 1, 'G': 0}, six: {'T': 1, 'A': 2, 'C': 0, 'G': 0}, seven: {'T': 1, 'A': 1, 'C': 0, 'G': 1}}
to get this output I wrote a code in python but it does not return what exactly I want. do you know how to fix the following code?
one=[]
two=[]
three=[]
four=[]
five=[]
six=[]
seven=[]
mylist = dict.values()
for threeq in mylist:
one.append(threeq[0])
two.append(threeq[1])
three.append(threeq[2])
four.append(threeq[3])
five.append(threeq[4])
six.append(threeq[5])
seven.append(threeq[6])
from collections import Counter
one=Counter(one)
two=Counter(two)
three=Counter(three)
four=Counter(four)
five=Counter(five)
six=Counter(six)
seven=Counter(seven)
Here is a way to do it, using Counter:
from collections import Counter
data = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
out = {i:Counter(col) for i, col in enumerate(zip(*(data.values()))) }
# we can add the missing keys whose count is 0:
for count in out.values():
count.update(dict.fromkeys('ATGC', 0))
print(out)
# {0: Counter({'T': 2, 'G': 1, 'A': 0, 'C': 0}), 1: Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}),
# 2: Counter({'C': 2, 'T': 1, 'A': 0, 'G': 0}), 3: Counter({'C': 3, 'A': 0, 'T': 0, 'G': 0}),
# 4: Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}), 5: Counter({'A': 2, 'T': 1, 'G': 0, 'C': 0}),
# 6: Counter({'G': 1, 'T': 1, 'A': 1, 'C': 0})}
I left the original indices as integers, it's probably easier to use them than strings like 'one', 'two'... But if you really want to:
numbers_as_strings = ['one', 'two', 'three', 'four', 'five', 'six', 'seven']
out = {numbers_as_strings[key]:value for key, value in out.items()}
print(out)
# {'one': Counter({'T': 2, 'G': 1, 'A': 0, 'C': 0}),
# 'two': Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}) ....
Try this:
values = list(dict.values())
r = {}
for i in range(7):
r[i+1] = {'T': 0, 'A': 0, 'C': 0, 'G': 0}
for v in values:
r[i+1][v[i]] += 1
dict = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
options=['T','A','C','G']
innerdicts=['one','two','three','four','five','six','seven']
def getposcount(idx,letter,dict):
count=0
for v in dict.values():
if v[idx]==letter:
count+=1
return count
d = {x:{y:getposcount(innerdicts.index(x),y,dict) for y in options} for x in innerdicts}
print(d)
Output
{'six': {'T': 1, 'A': 2, 'G': 0, 'C': 0}, 'one': {'T': 2, 'A': 0, 'G': 1, 'C': 0}, 'two': {'T': 0, 'A': 2, 'G': 0, 'C': 1}, 'five': {'T': 0, 'A': 2, 'G': 0, 'C': 1}, 'three': {'T': 1, 'A': 0, 'G': 0, 'C': 2}, 'seven': {'T': 1, 'A': 1, 'G': 1, 'C': 0}, 'four': {'T': 0, 'A': 0, 'G': 0, 'C': 3}}
If you are willing to accept the integers as keys, you can do:
from collections import Counter
def counts_with_zero(count, keys='TACG'):
return {key: count.get(key, 0) for key in keys}
d = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT',
'chr12:104659651-104659658': 'GACCAAA'}
values = list(d.values())
result = {i: counts_with_zero(Counter(column)) for i, column in enumerate(zip(*values), 1)}
print(result)
Output
{1: {'A': 0, 'C': 0, 'G': 1, 'T': 2},
2: {'A': 2, 'C': 1, 'G': 0, 'T': 0},
3: {'A': 0, 'C': 2, 'G': 0, 'T': 1},
4: {'A': 0, 'C': 3, 'G': 0, 'T': 0},
5: {'A': 2, 'C': 1, 'G': 0, 'T': 0},
6: {'A': 2, 'C': 0, 'G': 0, 'T': 1},
7: {'A': 1, 'C': 0, 'G': 1, 'T': 1}}
I am in the mid-way of writing a code to find all possible solutions of a input similar like "a&b|c!d|a", where a,b,c,d all are booleans and &-and, |-or !-not are the operators. By solution I mean the set of values of these variables which makes the input expression give True.
I am able to print all possible combinations of the variables, but I am not able to retain them (in this case, in a list) for later use.
What's wrong in the way I am doing it? Are there better ways to store them?
generate_combination is the method in which I am trying to do this.
Code:
import operator
# global all_combinations
all_combinations=[]
answers=[]
def solve(combination, input, rank):
try:
substituted_str=""
for i in input:
if i in combination:
substituted_str+=combination[i]
else:
substituted_str+=i
print substituted_str
# for item in rank:
except:
pass
def generate_combination(variables,comb_dict, length, current_index):
if len(comb_dict)==length:
print comb_dict #Each combination , coming out right
all_combinations.append(comb_dict)
print all_combinations,"\n" #This is not working as expected
else:
for i in [1,0]:
comb_dict[variables[current_index]]=i
generate_combination(variables,comb_dict, length,current_index+1)
comb_dict.pop(variables[current_index], None)
def main(input,variables,order):
rank=sorted(order.items(), key=operator.itemgetter(1))
generate_combination(variables, {}, len(variables), 0)
for combination in all_combinations:
print combination
ans=solve(combination, input, rank)
ans=[]
answers.extend(ans)
# for answer in answers:
# print answer
def nothing():
pass
if __name__ == '__main__':
# print "Enter your symbols for :\n"
# And=raw_input("And = ")
# Or=raw_input("Or = ")
# Not=raw_input("Not = ")
# input_str=raw_input("Enter the expression :")
And,Or,Not,input_str='&','|','!','a&b|c!d|a'
input_str=input_str.replace(" ","")
mapping={And:"&", Or:"|", Not:"!"}
order={"&":3, "|":2, "!":1}
variables=[]
processed_str=""
for i in input_str:
if i in mapping:
processed_str+=mapping[i]
else:
processed_str+=i
variables.append(i)
variables=list(set(variables))
print "Reconstituted string : ",processed_str
print "Variables : ",variables,"\n"
main(processed_str,variables,order)
Current Output:
Reconstituted string : a&b|c!d|a
Variables : ['a', 'c', 'b', 'd']
{'a': 1, 'c': 1, 'b': 1, 'd': 1}
[{'a': 1, 'c': 1, 'b': 1, 'd': 1}]
{'a': 1, 'c': 1, 'b': 1, 'd': 0}
[{'a': 1, 'c': 1, 'b': 1, 'd': 0}, {'a': 1, 'c': 1, 'b': 1, 'd': 0}]
{'a': 1, 'c': 1, 'b': 0, 'd': 1}
[{'a': 1, 'c': 1, 'b': 0, 'd': 1}, {'a': 1, 'c': 1, 'b': 0, 'd': 1}, {'a': 1, 'c': 1, 'b': 0, 'd': 1}]
{'a': 1, 'c': 1, 'b': 0, 'd': 0}
[{'a': 1, 'c': 1, 'b': 0, 'd': 0}, {'a': 1, 'c': 1, 'b': 0, 'd': 0}, {'a': 1, 'c': 1, 'b': 0, 'd': 0}, {'a': 1, 'c': 1, 'b': 0, 'd': 0}]
{'a': 1, 'c': 0, 'b': 1, 'd': 1}
[{'a': 1, 'c': 0, 'b': 1, 'd': 1}, {'a': 1, 'c': 0, 'b': 1, 'd': 1}, {'a': 1, 'c': 0, 'b': 1, 'd': 1}, {'a': 1, 'c': 0, 'b': 1, 'd': 1}, {'a': 1, 'c': 0, 'b': 1, 'd': 1}]
{'a': 1, 'c': 0, 'b': 1, 'd': 0}
[{'a': 1, 'c': 0, 'b': 1, 'd': 0}, {'a': 1, 'c': 0, 'b': 1, 'd': 0}, {'a': 1, 'c': 0, 'b': 1, 'd': 0}, {'a': 1, 'c': 0, 'b': 1, 'd': 0}, {'a': 1, 'c': 0, 'b': 1, 'd': 0}, {'a': 1, 'c': 0, 'b': 1, 'd': 0}]
{'a': 1, 'c': 0, 'b': 0, 'd': 1}
[{'a': 1, 'c': 0, 'b': 0, 'd': 1}, {'a': 1, 'c': 0, 'b': 0, 'd': 1}, {'a': 1, 'c': 0, 'b': 0, 'd': 1}, {'a': 1, 'c': 0, 'b': 0, 'd': 1}, {'a': 1, 'c': 0, 'b': 0, 'd': 1}, {'a': 1, 'c': 0, 'b': 0, 'd': 1}, {'a': 1, 'c': 0, 'b': 0, 'd': 1}]
{'a': 1, 'c': 0, 'b': 0, 'd': 0}
[{'a': 1, 'c': 0, 'b': 0, 'd': 0}, {'a': 1, 'c': 0, 'b': 0, 'd': 0}, {'a': 1, 'c': 0, 'b': 0, 'd': 0}, {'a': 1, 'c': 0, 'b': 0, 'd': 0}, {'a': 1, 'c': 0, 'b': 0, 'd': 0}, {'a': 1, 'c': 0, 'b': 0, 'd': 0}, {'a': 1, 'c': 0, 'b': 0, 'd': 0}, {'a': 1, 'c': 0, 'b': 0, 'd': 0}]
{'a': 0, 'c': 1, 'b': 1, 'd': 1}
[{'a': 0, 'c': 1, 'b': 1, 'd': 1}, {'a': 0, 'c': 1, 'b': 1, 'd': 1}, {'a': 0, 'c': 1, 'b': 1, 'd': 1}, {'a': 0, 'c': 1, 'b': 1, 'd': 1}, {'a': 0, 'c': 1, 'b': 1, 'd': 1}, {'a': 0, 'c': 1, 'b': 1, 'd': 1}, {'a': 0, 'c': 1, 'b': 1, 'd': 1}, {'a': 0, 'c': 1, 'b': 1, 'd': 1}, {'a': 0, 'c': 1, 'b': 1, 'd': 1}]
{'a': 0, 'c': 1, 'b': 1, 'd': 0}
[{'a': 0, 'c': 1, 'b': 1, 'd': 0}, {'a': 0, 'c': 1, 'b': 1, 'd': 0}, {'a': 0, 'c': 1, 'b': 1, 'd': 0}, {'a': 0, 'c': 1, 'b': 1, 'd': 0}, {'a': 0, 'c': 1, 'b': 1, 'd': 0}, {'a': 0, 'c': 1, 'b': 1, 'd': 0}, {'a': 0, 'c': 1, 'b': 1, 'd': 0}, {'a': 0, 'c': 1, 'b': 1, 'd': 0}, {'a': 0, 'c': 1, 'b': 1, 'd': 0}, {'a': 0, 'c': 1, 'b': 1, 'd': 0}]
{'a': 0, 'c': 1, 'b': 0, 'd': 1}
[{'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}, {'a': 0, 'c': 1, 'b': 0, 'd': 1}]
{'a': 0, 'c': 1, 'b': 0, 'd': 0}
[{'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}, {'a': 0, 'c': 1, 'b': 0, 'd': 0}]
{'a': 0, 'c': 0, 'b': 1, 'd': 1}
[{'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}, {'a': 0, 'c': 0, 'b': 1, 'd': 1}]
{'a': 0, 'c': 0, 'b': 1, 'd': 0}
[{'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}, {'a': 0, 'c': 0, 'b': 1, 'd': 0}]
{'a': 0, 'c': 0, 'b': 0, 'd': 1}
[{'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}, {'a': 0, 'c': 0, 'b': 0, 'd': 1}]
{'a': 0, 'c': 0, 'b': 0, 'd': 0}
[{'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}, {'a': 0, 'c': 0, 'b': 0, 'd': 0}]
I think the problem is that all the items in your all_combinations list are pointed to the same comb_dict, you are overwriting each element in every call of generate_combination.
Try to make a copy of the comb_dict:
all_combinations.append(comb_dict.copy())