How do I make a dictionary with a repeating value pattern? - python

I am trying to create dictionary with a reapeating pattern like
{0:"A",
1:"B",
2:"C",
3:"D",
4:"A",
5:"B",
6:"C",
7:"D",}
and so on. How would I do that? I have tried using for loops, but couldn't figure it out.
I'm not even sure this is the right approach to my problem. I am solving a simulation numerous times with the same output, only changing 1 input for every loop of the simulation.
Basically I end up with a DataFrame that collects the output (4 different series) for every simulation with columns
[0, 1, 2, 3, 4, 5, 6, 7, 8, ...]
which I would like to rename
["A", "B", "C", "D", "A", "B", "C", "D",...]
Alternatively, is there some sort of datatype in Python, which can provide 2 levels of categorizing like
[Simulation 1: ["A", "B", "C", "D"],
Simulation 2: ["A", "B", "C", "D"],
Simulation 3: ["A", "B", "C", "D"],
Simulation 4: ["A", "B", "C", "D"],
Simulation 5: ["A", "B", "C", "D"],
and so on...]
where "A", "B", "C" and "D" each contains a column of data output, that is different for every simulation?

You can achieve this neatly with itertools.cycle:
In [1]: import itertools
In [2]: cols = [0, 1, 2, 3, 4, 5, 6, 7]
In [3]: dict(zip(cols, itertools.cycle('ABCD')))
Out[3]: {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'A', 5: 'B', 6: 'C', 7: 'D'}

If you'd rather not import modules you could use dictionary comprehension with a modulus operator (%)
print({i:'ABCD'[i%4] for i in range(12)})
{0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'A', 5: 'B', 6: 'C', 7: 'D', 8: 'A', 9: 'B', 10: 'C', 11: 'D'}

If you want to use a for-loop, you could use the modulo operator along with string.ascii_uppercase:
>>> from string import ascii_uppercase
>>> n = 8
>>> repeat_every = 4
>>> d = {i: ascii_uppercase[i % repeat_every] for i in range(n)}
>>> d
{0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'A', 5: 'B', 6: 'C', 7: 'D'}
Alternatively, is there some sort of datatype in Python, which can
provide 2 levels of categorizing like...
You could use itertoools.permutations inside a dict comprehension:
>>> from itertools import permutations
>>> from string import ascii_uppercase
>>>
>>> def pretty_print_simple_dict(d):
... print("{")
... for k, v in d.items():
... print(f"\t{k}: {v}")
... print("}")
...
>>> repeat_every = 4
>>> d = {
... f"Simulation {i + 1}": list(p)
... for i, p in enumerate(permutations(ascii_uppercase[:repeat_every]))
... }
>>>
>>> pretty_print_simple_dict(d)
{
Simulation 1: ['A', 'B', 'C', 'D']
Simulation 2: ['A', 'B', 'D', 'C']
Simulation 3: ['A', 'C', 'B', 'D']
Simulation 4: ['A', 'C', 'D', 'B']
Simulation 5: ['A', 'D', 'B', 'C']
Simulation 6: ['A', 'D', 'C', 'B']
Simulation 7: ['B', 'A', 'C', 'D']
Simulation 8: ['B', 'A', 'D', 'C']
Simulation 9: ['B', 'C', 'A', 'D']
Simulation 10: ['B', 'C', 'D', 'A']
Simulation 11: ['B', 'D', 'A', 'C']
Simulation 12: ['B', 'D', 'C', 'A']
Simulation 13: ['C', 'A', 'B', 'D']
Simulation 14: ['C', 'A', 'D', 'B']
Simulation 15: ['C', 'B', 'A', 'D']
Simulation 16: ['C', 'B', 'D', 'A']
Simulation 17: ['C', 'D', 'A', 'B']
Simulation 18: ['C', 'D', 'B', 'A']
Simulation 19: ['D', 'A', 'B', 'C']
Simulation 20: ['D', 'A', 'C', 'B']
Simulation 21: ['D', 'B', 'A', 'C']
Simulation 22: ['D', 'B', 'C', 'A']
Simulation 23: ['D', 'C', 'A', 'B']
Simulation 24: ['D', 'C', 'B', 'A']
}

Related

Divide dataframe into list of rows containing all columns

From dataframe sructured like this
A B
0 1 2
1 3 4
I need to get list like this:
[{"A": 1, "B": 2}, {"A": 3, "B": 4}]
It looks like you want:
df.values.tolist()
example:
df = pd.DataFrame([['A', 'B', 'C'],
['D', 'E', 'F']])
df.values.tolist()
output:
[['A', 'B', 'C'],
['D', 'E', 'F']]
other options
df.T.to_dict('list')
{0: ['A', 'B', 'C'],
1: ['D', 'E', 'F']}
df.to_dict('records')
[{0: 'A', 1: 'B', 2: 'C'},
{0: 'D', 1: 'E', 2: 'F'}]

Make all permutation combination and replace in string

I have dataframe column with strings, similar to:
'TCCTGTAAATCAAAGGCCAAGRG', 'GNGCNCCNGAYATRGCNTTYCC', 'GATTTCTCTYCCTGTTCTTGCA'
and I have a list of letter:
SNPs={}
SNPs["Y"] = ['C', 'T']
SNPs["R"] = ['A', 'G']
SNPs["N"] = ['C', 'G', 'A', 'T']
where every R needs to change to A/G and so on...
ex: TCCTGTAAATCAAAGGCCAAGRG
changes to TCCTGTAAATCAAAGGCCAAGAG and TCCTGTAAATCAAAGGCCAAGGG.
I want all permutation and combinations and the result in other column.
Please help me with the same.
import re, itertools
text = "GNGCNCCNGAYATRGCNTTYCC"
def getList(dict):
return list(dict.keys())
lsources = getList(SNPs)
ldests = []
for source in lsources:
ldests.append(SNPs[source])
#print(ldests)
# Generate the various pairings
for lproduct in itertools.product(*ldests):
#print(lproduct)
for i in text:
output = i
for src, dest in zip(lsources, lproduct):
# Replace each term (you could optimise this using a single re.sub)
output = output.replace("%s" % src, dest)
print(output)
this is my code..but I am not getting desired output
Try this:
>>> import itertools
>>> text = "GNGCNCCNGAYATRGCNTTYCC"
>>> SNPs={ "Y" : ['C', 'T'] , "R" : ['A', 'G'] , "N" : ['C', 'G', 'A', 'T']}
>>> text_tmp = ""
>>> dct = {}
>>> for idx, v in enumerate(text):
... if v in SNPs:
... dct[idx] = SNPs.get(v)
... text_tmp += f'_{idx}_'
... else:
... text_tmp += v
>>> text_tmp
'G_1_GC_4_CC_7_GA_10_AT_13_GC_16_TT_19_CC'
>>> dct
{1: ['C', 'G', 'A', 'T'],
4: ['C', 'G', 'A', 'T'],
7: ['C', 'G', 'A', 'T'],
10: ['C', 'T'],
13: ['A', 'G'],
16: ['C', 'G', 'A', 'T'],
19: ['C', 'T']}
>>> per_val = list(itertools.product(*dct.values()))
>>> per_key_val = list(map(dict,[zip(dct.keys(), p) for p in per_val]))
>>> per_key_val
[{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'A', 16: 'C', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'A', 16: 'C', 19: 'T'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'A', 16: 'G', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'A', 16: 'G', 19: 'T'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'A', 16: 'A', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'A', 16: 'A', 19: 'T'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'A', 16: 'T', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'A', 16: 'T', 19: 'T'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'G', 16: 'C', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'G', 16: 'C', 19: 'T'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'G', 16: 'G', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'G', 16: 'G', 19: 'T'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'G', 16: 'A', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'G', 16: 'A', 19: 'T'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'G', 16: 'T', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'C', 13: 'G', 16: 'T', 19: 'T'},
{1: 'C', 4: 'C', 7: 'C', 10: 'T', 13: 'A', 16: 'C', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'T', 13: 'A', 16: 'C', 19: 'T'},
{1: 'C', 4: 'C', 7: 'C', 10: 'T', 13: 'A', 16: 'G', 19: 'C'},
{1: 'C', 4: 'C', 7: 'C', 10: 'T', 13: 'A', 16: 'G', 19: 'T'},
...
]
>>> out = []
>>> for pkl in per_key_val:
... tmp = text_tmp
... for k,v in pkl.items():
... tmp = tmp.replace(f'_{k}_', v)
... out.append(tmp)
>>> out
['GCGCCCCCGACATAGCCTTCCC',
'GCGCCCCCGACATAGCCTTTCC',
'GCGCCCCCGACATAGCGTTCCC',
'GCGCCCCCGACATAGCGTTTCC',
'GCGCCCCCGACATAGCATTCCC',
'GCGCCCCCGACATAGCATTTCC',
'GCGCCCCCGACATAGCTTTCCC',
'GCGCCCCCGACATAGCTTTTCC',
'GCGCCCCCGACATGGCCTTCCC',
'GCGCCCCCGACATGGCCTTTCC',
'GCGCCCCCGACATGGCGTTCCC',
'GCGCCCCCGACATGGCGTTTCC',
'GCGCCCCCGACATGGCATTCCC',
'GCGCCCCCGACATGGCATTTCC',
'GCGCCCCCGACATGGCTTTCCC',
'GCGCCCCCGACATGGCTTTTCC',
'GCGCCCCCGATATAGCCTTCCC',
'GCGCCCCCGATATAGCCTTTCC',
'GCGCCCCCGATATAGCGTTCCC',
'GCGCCCCCGATATAGCGTTTCC',
'GCGCCCCCGATATAGCATTCCC',
'GCGCCCCCGATATAGCATTTCC',
'GCGCCCCCGATATAGCTTTCCC',
...
]
Update: (run on dataframe)
def rplc_per(text):
SNPs={ "Y" : ['C', 'T'] , "R" : ['A', 'G'] , "N" : ['C', 'G', 'A', 'T']}
text_tmp = ""
dct = {}
for idx, v in enumerate(text):
if v in SNPs:
dct[idx] = SNPs.get(v)
text_tmp += f'_{idx}_'
else:
text_tmp += v
per_val = list(itertools.product(*dct.values()))
per_key_val = list(map(dict,[zip(dct.keys(), p) for p in per_val]))
out = []
for pkl in per_key_val:
tmp = text_tmp
for k,v in pkl.items():
tmp = tmp.replace(f'_{k}_', v)
out.append(tmp)
return out
df = pd.DataFrame({'String': ['TCCTGTAAATCAAAGGCCAAGRG', 'GNGCNCCNGAYATRGCNTTYCC', 'GATTTCTCTYCCTGTTCTTGCA']})
df['all_per'] = df['String'].apply(rplc_per)
print(df)
Output:
String all_per
0 TCCTGTAAATCAAAGGCCAAGRG [TCCTGTAAATCAAAGGCCAAGAG, TCCTGTAAATCAAAGGCCAA...
1 GNGCNCCNGAYATRGCNTTYCC [GCGCCCCCGACATAGCCTTCCC, GCGCCCCCGACATAGCCTTTC...
2 GATTTCTCTYCCTGTTCTTGCA [GATTTCTCTCCCTGTTCTTGCA, GATTTCTCTTCCTGTTCTTGCA]

Packing a multidimensional array into a multidimensional dictionary in Python [duplicate]

This question already has answers here:
Convert list of lists to list of dictionaries
(2 answers)
Closed 2 years ago.
I have 2 arrays, one of which is two level:
lst1 = [1, 2, 3]
lst2 = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
How to make one list with dictionaries of them? like this:
dct = [{1:'a', 2:'b', 3:'c'}, {1:'d', 2:'e', 3:'f'}, {1:'g', 2:'h', 3:'i'}]
here some cool way :
lst1 = [1, 2, 3]
lst2 = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
list_of_dicts = [{lst1[i]:x for i,x in enumerate(lst_tmp)} for lst_tmp in lst2 ]
list_of_dicts
>>> [{1: 'a', 2: 'b', 3: 'c'}, {1: 'd', 2: 'e', 3: 'f'}, {1: 'g', 2: 'h', 3: 'i'}]
or if you intentionally added curly braces and meant to use a set, it is impossible to turn it into a set because it is not hashable (although there some creative way to get it as in here)..
alretnativly you can also zip it!:
list_of_dicts = [dict(zip(lst1,lst_tmp)) for lst_tmp in lst2 ]
list_of_dicts
>>> [{1: 'a', 2: 'b', 3: 'c'}, {1: 'd', 2: 'e', 3: 'f'}, {1: 'g', 2: 'h', 3: 'i'}]
The output that you want is not a dictionary, but if you mean a list of dictionaries then you can use this compact form:
lst1 = [1, 2, 3]
lst2 = [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"]]
dict_list = [dict(zip(lst1, lst2[i])) for i in range(len(lst2))]
You can do this with zip, which combines two lists which can then be converted to a dict. You can do this in one line with list comprehension.
first = [1, 2, 3]
second = [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"]]
print([dict(zip(first, item)) for item in second])
# [{1: 'a', 2: 'b', 3: 'c'}, {1: 'd', 2: 'e', 3: 'f'}, {1: 'g', 2: 'h', 3: 'i'}]
Without list comprehension:
first = [1, 2, 3]
second = [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"]]
result = []
for item in second:
result.append(dict(zip(first, item)))
print(result)
# [{1: 'a', 2: 'b', 3: 'c'}, {1: 'd', 2: 'e', 3: 'f'}, {1: 'g', 2: 'h', 3: 'i'}]
And a little bit more code, using enumerate:
first = [1, 2, 3]
second = [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"]]
result = []
for sub in second:
d = {}
for index, item in enumerate(first):
d[item] = sub[index]
result.append(d)
print(result)
# [{1: 'a', 2: 'b', 3: 'c'}, {1: 'd', 2: 'e', 3: 'f'}, {1: 'g', 2: 'h', 3: 'i'}]

Joining Lists of Lists of Strings

I've a list of lists, in which each element is a single character:
ngrams = [['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c']]
From this, I want to generate a new single list with the content ['aa','ab','ac','ba','bb','bc','ca','cb','cc']. The individual elements of each list are appended to each other but in reverse order of the lists. I've come up with this (where np = 2):
for cnt in range(np-2,-1,-1):
thisngrams[-1] = [a+b for (a,b) in zip(thisngrams[-1],thisngrams[cnt])]
My solution needs to handle np higher than just 2. I expect this is O(np), which isn't bad. Can someone suggest a more efficient and pythonic way to do what I want (or is this a good pythonic approach)?
You can try this:
ngrams = [['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c']]
new = map(''.join, zip(*ngrams))
Output:
['aa', 'ba', 'ca', 'ab', 'bb', 'cb', 'ac', 'bc', 'cc']
For more than two elements:
n = [["a", "b", "c"], ["a", "c", "d"], ["e", "f", "g"]]
new = map(''.join, zip(* reversed(ngrams)))
#in Python3
#new = list(map(''.join, zip(* reversed(ngrams))))
Output:
['eaa', 'fcb', 'gdc']

How can I duplicate element of a list in python?

I wonder if there is a more elegant way to do the following. For example with list comprehension.
Consider a simple list :
l = ["a", "b", "c", "d", "e"]
I want to duplicate each elements n times. Thus I did the following :
n = 3
duplic = list()
for li in l:
duplic += [li for i in range(n)]
At the end duplic is :
['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e', 'e']
You can use
duplic = [li for li in l for _ in range(n)]
This does the same as your code. It adds each element of l (li for li in l) n times (for _ in range n).
You can use:
l = ["a", "b", "c", "d", "e"]
n=3
duplic = [ li for li in l for i in range(n)]
Everytime in python that you write
duplic = list()
for li in l:
duplic +=
there is a good chance that it can be done with a list comprehension.
Try this:
l = ["a", "b", "c", "d", "e"]
print sorted(l * 3)
Output:
['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e', 'e']
from itertools import chain
n = 4
>>>list(chain.from_iterable(map(lambda x: [x]*n,l)))
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'e', 'e', 'e', 'e']
In [12]: l
Out[12]: ['a', 'b', 'c', 'd', 'e']
In [13]: n
Out[13]: 3
In [14]: sum((n*[item] for item in l), [])
Out[14]: ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e', 'e']

Categories

Resources