I have a python dictionary like this example:
small example:
dict = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
I only need the value part of every item which is a sequence of letters and the letters are A, T, C or G and also the length of each sequence is 7 so, for every sequence of letters there are 7 positions. I want to get the frequency of the 4 mentioned letters in every position (we have 7 positions). for every position I will make a dictionary in which the letters are key and the frequency of every letter is value. and at the end I want to make a dictionary for all seven positions and the fist dictionary would be the value of the final dictionary.
here is the expected output for the small example:
expected output:
final = {one: {'T': 2, 'A': 1, 'C': 0, 'G': 0}, two: {'T': 0, 'A': 2, 'C': 1, 'G': 0}, three: {'T': 1, 'A': 0, 'C': 2, 'G': 0}, four: {'T': 0, 'A': 0, 'C': 3, 'G': 0}, five: {'T': 0, 'A': 2, 'C': 1, 'G': 0}, six: {'T': 1, 'A': 2, 'C': 0, 'G': 0}, seven: {'T': 1, 'A': 1, 'C': 0, 'G': 1}}
to get this output I wrote a code in python but it does not return what exactly I want. do you know how to fix the following code?
one=[]
two=[]
three=[]
four=[]
five=[]
six=[]
seven=[]
mylist = dict.values()
for threeq in mylist:
one.append(threeq[0])
two.append(threeq[1])
three.append(threeq[2])
four.append(threeq[3])
five.append(threeq[4])
six.append(threeq[5])
seven.append(threeq[6])
from collections import Counter
one=Counter(one)
two=Counter(two)
three=Counter(three)
four=Counter(four)
five=Counter(five)
six=Counter(six)
seven=Counter(seven)
Here is a way to do it, using Counter:
from collections import Counter
data = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
out = {i:Counter(col) for i, col in enumerate(zip(*(data.values()))) }
# we can add the missing keys whose count is 0:
for count in out.values():
count.update(dict.fromkeys('ATGC', 0))
print(out)
# {0: Counter({'T': 2, 'G': 1, 'A': 0, 'C': 0}), 1: Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}),
# 2: Counter({'C': 2, 'T': 1, 'A': 0, 'G': 0}), 3: Counter({'C': 3, 'A': 0, 'T': 0, 'G': 0}),
# 4: Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}), 5: Counter({'A': 2, 'T': 1, 'G': 0, 'C': 0}),
# 6: Counter({'G': 1, 'T': 1, 'A': 1, 'C': 0})}
I left the original indices as integers, it's probably easier to use them than strings like 'one', 'two'... But if you really want to:
numbers_as_strings = ['one', 'two', 'three', 'four', 'five', 'six', 'seven']
out = {numbers_as_strings[key]:value for key, value in out.items()}
print(out)
# {'one': Counter({'T': 2, 'G': 1, 'A': 0, 'C': 0}),
# 'two': Counter({'A': 2, 'C': 1, 'T': 0, 'G': 0}) ....
Try this:
values = list(dict.values())
r = {}
for i in range(7):
r[i+1] = {'T': 0, 'A': 0, 'C': 0, 'G': 0}
for v in values:
r[i+1][v[i]] += 1
dict = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT', 'chr12:104659651-104659658': 'GACCAAA'}
options=['T','A','C','G']
innerdicts=['one','two','three','four','five','six','seven']
def getposcount(idx,letter,dict):
count=0
for v in dict.values():
if v[idx]==letter:
count+=1
return count
d = {x:{y:getposcount(innerdicts.index(x),y,dict) for y in options} for x in innerdicts}
print(d)
Output
{'six': {'T': 1, 'A': 2, 'G': 0, 'C': 0}, 'one': {'T': 2, 'A': 0, 'G': 1, 'C': 0}, 'two': {'T': 0, 'A': 2, 'G': 0, 'C': 1}, 'five': {'T': 0, 'A': 2, 'G': 0, 'C': 1}, 'three': {'T': 1, 'A': 0, 'G': 0, 'C': 2}, 'seven': {'T': 1, 'A': 1, 'G': 1, 'C': 0}, 'four': {'T': 0, 'A': 0, 'G': 0, 'C': 3}}
If you are willing to accept the integers as keys, you can do:
from collections import Counter
def counts_with_zero(count, keys='TACG'):
return {key: count.get(key, 0) for key in keys}
d = {'chr2:173370685-173370692': 'TACCAAG', 'chr5:118309829-118309836': 'TCTCCTT',
'chr12:104659651-104659658': 'GACCAAA'}
values = list(d.values())
result = {i: counts_with_zero(Counter(column)) for i, column in enumerate(zip(*values), 1)}
print(result)
Output
{1: {'A': 0, 'C': 0, 'G': 1, 'T': 2},
2: {'A': 2, 'C': 1, 'G': 0, 'T': 0},
3: {'A': 0, 'C': 2, 'G': 0, 'T': 1},
4: {'A': 0, 'C': 3, 'G': 0, 'T': 0},
5: {'A': 2, 'C': 1, 'G': 0, 'T': 0},
6: {'A': 2, 'C': 0, 'G': 0, 'T': 1},
7: {'A': 1, 'C': 0, 'G': 1, 'T': 1}}
Related
I am trying to create a dictionary to map the amount of times a letter appears to the letter of the alphabet, however I want to print the entire alphabet in the dictionary even if a letter does not appear in the list of strings. So i want the alphabet letter to be the key and the amount of times the letter occurs as the value.
The following is my code
import string
from collections import Counter
listy = ["hello","there","I","am","a","string"]
letter_count = dict( (key, 0) for key in string.ascii_lowercase )
print(dict_count)
My expected output should be
{a:2,b:0,c:0,d:0,e:3}
and so on until i reach z
I realize the key value should be something else in the list comprehension, but I simply cannot figure out what. I just don't exactly know what i can do to map the amount of times a letter occurs to the correct letter in my dictionary so I just added 0 there. Would using a dictionary comprehension be better? I am new to dictionaries, and dictionary comprehension, but a friend of mine recommended I should learn it since apparently it is a powerful tool to have so any help would be appreciated
import string
listy = ["hello","there","I","am","a","string"]
concatenated_listy="".join(listy).lower()
letter_count = dict( (key, concatenated_listy.count(key)) for key in string.ascii_lowercase )
letter_count
Answer would be
{'a': 2, 'b': 0, 'c': 0, 'd': 0, 'e': 3, 'f': 0, 'g': 1, 'h': 2, 'i': 2, 'j': 0, 'k': 0, 'l': 2, 'm': 1, 'n': 1, 'o': 1, 'p': 0, 'q': 0, 'r': 2, 's': 1, 't': 2, 'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}
You can continue your suggested code with the below to read the letters one by one and add them to a histogram of letters encoded in your Dictionary:
import string
letterHist = dict((key, 0) for key in string.ascii_lowercase)
listy = ["hello","there","I","am","a","string"]
for word in listy:
for letter in word:
letterHist[letter.lower()] += 1
And the above should give you:
{'a': 2, 'b': 0, 'c': 0, 'd': 0, 'e': 3, 'f': 0, 'g': 1, 'h': 2, 'i': 2, 'j': 0, 'k': 0, 'l': 2, 'm': 1, 'n': 1, 'o': 1, 'p': 0, 'q': 0, 'r': 2, 's': 1, 't': 2, 'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}
You can use dict.fromkeys:
from string import ascii_lowercase
listy = ["hello","there","I","am","a","string"]
dict_count = dict.fromkeys(ascii_lowercase, 0)
for letter in ''.join(listy).lower():
dict_count[letter] += 1
>>> dict_count
{'a': 2, 'b': 0, 'c': 0, 'd': 0, 'e': 6, 'f': 0, 'g': 1, 'h': 4, 'i': 2, 'j': 0,
'k': 0, 'l': 4, 'm': 1, 'n': 1, 'o': 2, 'p': 0, 'q': 0, 'r': 3, 's': 1, 't': 3,
'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}
You could also use collectoins.Counter instead of the for-loop:
>>> dict_count = dict.fromkeys(ascii_lowercase, 0)
>>> dict_count.update(Counter(''.join(listy).lower()))
>>> dict_count
{'a': 2, 'b': 0, 'c': 0, 'd': 0, 'e': 3, 'f': 0, 'g': 1, 'h': 2, 'i': 2, 'j': 0,
'k': 0, 'l': 2, 'm': 1, 'n': 1, 'o': 1, 'p': 0, 'q': 0, 'r': 2, 's': 1, 't': 2,
'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}
Note: In this case, most approaches with dictionary comprehension will have poor performance (eg. if you use str.count), so if you need to use dict comprehension try combining it with collections.Counter:
>>> alpha_count = Counter(''.join(listy).lower())
>>> dict_count = {alpha: alpha_count.get(alpha, 0) for alpha in ascii_lowercase}
>>> dict_count
{'a': 2, 'b': 0, 'c': 0, 'd': 0, 'e': 6, 'f': 0, 'g': 1, 'h': 4, 'i': 2, 'j': 0,
'k': 0, 'l': 4, 'm': 1, 'n': 1, 'o': 2, 'p': 0, 'q': 0, 'r': 3, 's': 1, 't': 3,
'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}
``
listy = ["hello","there","I","am","a","string"]
alphabet = 'abcdefghijklmnopqrstuvwxyz'
dict1 = {}
#Create a dictionary to store the number of occurrences of each of the 26 letters
#each word initially set to 0
for i in alphabet:
dict1[i] = 0
for j in listy:
for k in j.lower(): # Converted to lowercase
if k in dict1.keys():
dict1[k] = dict1.get(k,0)+1
print(dict1)
``
My question is somewhat similar to this question: https://codereview.stackexchange.com/questions/175079/removing-key-value-pairs-in-list-of-dicts. Essentially, I have a list of dictionaries, and I want to remove duplicates from the list based on the unique combination of two (or more) keys within each dictionary.
Suppose I have the following list of dictionaries:
some_list_of_dicts = [
{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 1, 'd': 5, 'e': 1},
{'a': 1, 'b': 1, 'c': 1, 'd': 7, 'e': 8},
{'a': 1, 'b': 1, 'c': 1, 'd': 9, 'e': 6},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}
]
And let's suppose the combination of a, b, and c have to be unique; any other values can be whatever they want, but the combination of these three must be unique to this list. I would want to take whichever unique combo of a, b, and c came first, keep that, and discard everything else where that combination is the same.
The new list, after running it through some remove_duplicates function would look like this:
new_list = [
{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}
]
I've only managed to come up with this:
def remove_duplicates(old_list):
uniqueness_check_list = []
new_list = []
for item in old_list:
# The unique combination is 'a', 'b', and 'c'
uniqueness_check = "{}{}{}".format(
item["a"], item["b"], item["c"]
)
if uniqueness_check not in uniqueness_check_list:
new_list.append(item)
uniqueness_check_list.append(uniqueness_check)
return new_list
But this doesn't feel very Pythonic. It also has the problem that I've hardcoded in the function which keys have to be unique; it would be better if I could specify that as an argument to the function itself, but again, not sure what's the most elegant way to do this.
You can use a dict comprehension to construct a dict from the list of dicts in the reversed order so that the values of the first of any unique combinations would take precedence. Use operator.itemgetter to get the unique keys as a tuple. Reverse again in the end for the original order:
from operator import itemgetter
list({itemgetter('a', 'b', 'c')(d): d for d in reversed(some_list_of_dicts)}.values())[::-1]
This returns:
[{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}]
With the help of a function to keep track of duplicates, you can use some list comprehension:
def remove_duplicates(old_list, cols=('a', 'b', 'c')):
duplicates = set()
def is_duplicate(item):
duplicate = item in duplicates
duplicates.add(item)
return duplicate
return [x for x in old_list if not is_duplicate(tuple([x[col] for col in cols]))]
To use:
>>> remove_duplicates(some_list_of_dicts)
[
{'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2},
{'a': 1, 'c': 2, 'b': 1, 'e': 3, 'd': 2},
{'a': 1, 'c': 3, 'b': 1, 'e': 3, 'd': 2},
{'a': 1, 'c': 4, 'b': 1, 'e': 3, 'd': 2}
]
You can also provide different columns to key on:
>>> remove_duplicates(some_list_of_dicts, cols=('a', 'd'))
[
{'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2},
{'a': 1, 'c': 1, 'b': 1, 'e': 1, 'd': 5},
{'a': 1, 'c': 1, 'b': 1, 'e': 8, 'd': 7},
{'a': 1, 'c': 1, 'b': 1, 'e': 6, 'd': 9}
]
I am trying to read through sequencing data and classify the contained mutations. The problem I think I am having is not properly declaring each of the nested dictionaries such that they are unique.
This is how I am creating my data structure:
baseDict = {'A':0, 'T':0, 'G':0, 'C':0}
varDict = {'A':baseDict.copy(), 'T':baseDict.copy(), 'G':baseDict.copy(), 'C':baseDict.copy()}
fullDict = {'oncoSites':varDict.copy(), 'oncoGenes':varDict.copy(), 'TIIIRegions':varDict.copy()}
Then I am adding any particular mutation I read in like this:
fullDict['oncoSites'][j][k] += 1
The problem is that when I add a mutation it is added to multiple dictionaries. As an example Take if I read in a reference base of T and variant of C that is found in oncoSites then add it as:
fullDict['oncoSites'][T][C] += 1
The output I get is this:
{'TIIIRegions': {'A': {'A': 0, 'C': 0, 'G': 0, 'T': 0},
'C': {'A': 0, 'C': 0, 'G': 0, 'T': 0},
'G': {'A': 0, 'C': 0, 'G': 0, 'T': 0},
'T': {'A': 0, 'C': 1, 'G': 0, 'T': 0}},
'oncoGenes': {'A': {'A': 0, 'C': 0, 'G': 0, 'T': 0},
'C': {'A': 0, 'C': 0, 'G': 0, 'T': 0},
'G': {'A': 0, 'C': 0, 'G': 0, 'T': 0},
'T': {'A': 0, 'C': 1, 'G': 0, 'T': 0}},
'oncoSites': {'A': {'A': 0, 'C': 0, 'G': 0, 'T': 0},
'C': {'A': 0, 'C': 0, 'G': 0, 'T': 0},
'G': {'A': 0, 'C': 0, 'G': 0, 'T': 0},
'T': {'A': 0, 'C': 1, 'G': 0, 'T': 0}}}
How can I increment only a single dictionary?
You'd need a deepcopy.
Use:
{'oncoSites':deepcopy(varDict), 'oncoGenes':deepcopy(varDict), 'TIIIRegions':deepcopy(varDict)}
What was happening is: when you did varDict.copy(..) you were copying the references of the copy of baseDict
I have a list with the following elements: A,B,C,D,E,F,G.
They are either suppose to true or false hence represented by 1 and 0 respectively.
I am supposed to get a combinations but the following restrictions stay:
Element C and Fare to be true in all cases, ie,1`.
When element A is true, element E, and G can be false.
When element B is true, element D can be false.
What you want is not permutations, but product. Also, I interpret restrictions as:
C and F cannot be false
If A is false, E and G cannot be false
If B is false, D cannot be false
With that, the code is as followed:
import pprint
from itertools import product
def myproduct():
keys = 'abcdefg'
values = [(0, 1) for k in keys]
for value in product(*values):
d = dict(zip(keys, value))
# Skip: C and F that are 0 (False)
if d['c'] == 0 or d['f'] == 0:
continue
# Skip: When A is false, E and G cannot be false
if d['a'] == 0 and (d['e'] == 0 or d['g'] == 0):
continue
# Skip: When B is false, D cannot be false
if d['b'] == 0 and d['d'] == 0:
continue
yield d # This 'permutation' is good
for d in myproduct():
pprint.pprint(d)
Output:
{'a': 0, 'b': 0, 'c': 1, 'd': 1, 'e': 1, 'f': 1, 'g': 1}
{'a': 0, 'b': 1, 'c': 1, 'd': 0, 'e': 1, 'f': 1, 'g': 1}
{'a': 0, 'b': 1, 'c': 1, 'd': 1, 'e': 1, 'f': 1, 'g': 1}
{'a': 1, 'b': 0, 'c': 1, 'd': 1, 'e': 0, 'f': 1, 'g': 0}
{'a': 1, 'b': 0, 'c': 1, 'd': 1, 'e': 0, 'f': 1, 'g': 1}
{'a': 1, 'b': 0, 'c': 1, 'd': 1, 'e': 1, 'f': 1, 'g': 0}
{'a': 1, 'b': 0, 'c': 1, 'd': 1, 'e': 1, 'f': 1, 'g': 1}
{'a': 1, 'b': 1, 'c': 1, 'd': 0, 'e': 0, 'f': 1, 'g': 0}
{'a': 1, 'b': 1, 'c': 1, 'd': 0, 'e': 0, 'f': 1, 'g': 1}
{'a': 1, 'b': 1, 'c': 1, 'd': 0, 'e': 1, 'f': 1, 'g': 0}
{'a': 1, 'b': 1, 'c': 1, 'd': 0, 'e': 1, 'f': 1, 'g': 1}
{'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 0, 'f': 1, 'g': 0}
{'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 0, 'f': 1, 'g': 1}
{'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1, 'f': 1, 'g': 0}
{'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1, 'f': 1, 'g': 1}
Notes:
values is a list of (0, 1):
[(0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1)]
Each value is a tuple of 7 numbers such as:
(1, 1, 1, 0, 0, 1, 0)
d is a dictionary in which the keys are a, b, ... and the values are 0 and 1
I have a dictionary like this -
{'A': 0, 'B': 0, 'C': 0, 'D': 4}
I want to generate a list like this -
[{'A': 1, 'B': 0, 'C': 0, 'D': 4},
{'A': 0, 'B': 1, 'C': 0, 'D': 4},
{'A': 0, 'B': 0, 'C': 1, 'D': 4},
{'A': 0, 'B': 0, 'C': 0, 'D': 5}]
What is the most pythonic way to do this?
You can use list comprehension and dictionary comprehension together, like this
d = {'A': 0, 'B': 0, 'C': 0, 'D': 4}
print [{key1: d[key1] + (key1 == key) for key1 in d} for key in d]
Output
[{'A': 1, 'B': 0, 'C': 0, 'D': 4},
{'A': 0, 'B': 0, 'C': 1, 'D': 4},
{'A': 0, 'B': 1, 'C': 0, 'D': 4},
{'A': 0, 'B': 0, 'C': 0, 'D': 5}]
The idea is to generate a new dictionary for each key, and when the key matches the key of the dictionary being constructed with dictionary comprehension, then add 1 to it. (key1 == key) will evaluate to 1 only when both the keys match, otherwise it will be zero.