Python extract common patterns of length X among a set of sequences - python

Let's say I have the following :
data = ['ABCD', 'ABABC', 'BCAABCD']
I'm trying to make a function, that uses Counter taking three argvs, one for the data, second for the minimum proportion of the number of sequences that must have this pattern for being taken into account, and a third one that is the maximum pattern length.
A working function should gives me the following :
>>> check(data, 0.50, 2)
Counter({'A': 3, 'AB': 3, 'B': 3, 'BC': 3, 'C': 3, 'CD': 2, 'D': 2})
>>> check(data, 0.34, 4)
Counter({'A': 3, 'AB': 3, 'ABC': 3, 'ABCD': 2, 'B': 3, 'BC': 3, 'BCD': 2, 'C': 3, 'CD': 2, 'D': 2})
I'm really lost on this thing, I just know how to get the combinations thing for two or more letter like this :
Counter(combinations(data[0], 2)) & Counter(combinations(data[1], 2)) & Counter(combinations(data[2], 2))
And I also know how to get the sum of the letters in all elements of data with :
Counter(data[0]) + Counter(data[1]) + Counter(data[2])
(Strange thing, I couldn't manage to do this sum using list comprehension as I would've liked to do because of an error saying I can't do '+' between 'str' and 'int'
If you guys can't give me full code, no problem, I only need some guidance on how to start the whole thing and to get the logic.
Have a nice day to the one who read my whole thing :)

You can use a recursive generator function to get all combinations of the merged substrings (with length <= the maximum) in data and find the substring intersections using collections.defaultdict:
from collections import defaultdict
data = ['ABCD', 'ABABC', 'BCAABCD']
def combos(d, l, c = []):
if c:
yield ''.join(c)
if d and len(c) < l:
yield from combos(d[1:], l, c+[d[0]])
if not c:
yield from combos(d[1:], l, c)
def check(d, p, l):
_d = defaultdict(set)
for i in d:
for j in combos(i, l):
_d[j].add(i)
return {a:len(b) for a, b in _d.items() if len(b)/len(d) >= p}
print(check(data, 0.50, 2))
print(check(data, 0.34, 4))
Output:
{'A': 3, 'AB': 3, 'B': 3, 'BC': 3, 'C': 3, 'CD': 2, 'D': 2}
{'A': 3, 'AB': 3, 'ABC': 3, 'ABCD': 2, 'B': 3, 'BC': 3, 'BCD': 2, 'C': 3, 'CD': 2, 'D': 2}

Related

Python combine values of identical dictionaries without using looping

I have list of identical dictionaries:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
I need to get something like this:
a = [1, 4, 7]
b = [2, 5, 8]
c = [3, 6, 9]
I know how to do in using for .. in .., but is there way to do it without looping?
If i do
a, b, c = zip(*my_list)
i`m getting
a = ('a', 'a', 'a')
b = ('b', 'b', 'b')
c = ('c', 'c', 'c')
Any solution?
You need to extract all the values in my_list.You could try:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
a, b, c = zip(*map(lambda d: d.values(), my_list))
print(a, b, c)
# (1, 4, 7) (2, 5, 8) (3, 6, 9)
Pointed out by #Alexandre,This work only when the dict is ordered.If you couldn't make sure the order, consider the answer of yatu.
You will have to loop to obtain the values from the inner dictionaries. Probably the most appropriate structure would be to have a dictionary, mapping the actual letter and a list of values. Assigning to different variables is usually not the best idea, as it will only work with the fixed amount of variables.
You can iterate over the inner dictionaries, and append to a defaultdict as:
from collections import defaultdict
out = defaultdict(list)
for d in my_list:
for k,v in d.items():
out[k].append(v)
print(out)
#defaultdict(list, {'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]})
Pandas DataFrame has just a factory method for this, so if you already have it as a dependency or if the input data is large enough:
import pandas as pd
my_list = ...
df = pd.DataFrame.from_rows(my_list)
a = list(df['a']) # df['a'] is a pandas Series, essentially a wrapped C array
b = list(df['b'])
c = list(df['c'])
Please find the code below. I believe that the version with a loop is much easier to read.
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
# we assume that all dictionaries have the sames keys
a, b, c = map(list, map(lambda k: map(lambda d: d[k], my_list), my_list[0]))
print(a,b,c)

merging two python dicts and keeping the max key, val in the new updated dict

I need a method where I can merge two dicts keeping the max value when one of the keys, value are in both dicts.
dict_a maps "A", "B", "C" to 3, 2, 6
dict_b maps "B", "C", "D" to 7, 4, 1
final_dict map "A", "B", "C", "D" to 3, 7, 6, 1
I did get the job half done but I didn't figure out how to keep the max value for the 'C' key, value pair.
Used itertools chain() or update().
OK so this works by making a union set of all possible keys dict_a.keys() | dict_b.keys() and then using dict.get which by default returns None if the key is not present (rather than throwing an error). We then take the max (of the one which isn't None).
def none_max(a, b):
if a is None:
return b
if b is None:
return a
return max(a, b)
def max_dict(dict_a, dict_b):
all_keys = dict_a.keys() | dict_b.keys()
return {k: none_max(dict_a.get(k), dict_b.get(k)) for k in all_keys}
Note that this will work with any comparable values -- many of the other answers fail for negatives or zeros.
Example:
Inputs:
dict_a = {'a': 3, 'b': 2, 'c': 6}
dict_b = {'b': 7, 'c': 4, 'd': 1}
Outputs:
max_dict(dict_a, dict_b) # == {'b': 7, 'c': 6, 'd': 1, 'a': 3}
What about
{
k:max(
dict_a.get(k,-float('inf')),
dict_b.get(k,-float('inf'))
) for k in dict_a.keys()|dict_b.keys()
}
which returns
{'A': 3, 'D': 1, 'C': 6, 'B': 7}
With
>>> dict_a = {'A':3, 'B':2, 'C':6}
>>> dict_b = {'B':7, 'C':4, 'D':1}
Here is a working one liner
from itertools import chain
x = dict(a=30,b=40,c=50)
y = dict(a=100,d=10,c=30)
x = {k:max(x.get(k, 0), y.get(k, 0)) for k in set(chain(x,y))}
In[83]: sorted(x.items())
Out[83]: [('a', 100), ('b', 40), ('c', 50), ('d', 10)]
This is going to work in any case, i.e for common keys it will take the max of the value otherwise the existing value from corresponding dict.
Extending this so you can have any number of dictionaries in a list rather than just two:
a = {'a': 3, 'b': 2, 'c': 6}
b = {'b': 7, 'c': 4, 'd': 1}
c = {'c': 1, 'd': 5, 'e': 7}
all_dicts = [a,b,c]
from functools import reduce
all_keys = reduce((lambda x,y : x | y),[d.keys() for d in all_dicts])
max_dict = { k : max(d.get(k,0) for d in all_dicts) for k in all_keys }
If you know that all your values are non-negative (or have a clear smallest number), then this oneliner can solve your issue:
a = dict(a=3,b=2,c=6)
b = dict(b=7,c=4,d=1)
merged = { k: max(a.get(k, 0), b.get(k, 0)) for k in set(a) | set(b) }
Use your smallest-possible-number instead of the 0. (E. g. float('-inf') or similar.)
Yet another solution:
a = {"A":3, "B":2, "C":6}
b = {"B":7, "C":4, "D":1}
Two liner:
b.update({k:max(a[k],b[k]) for k in a if b.get(k,'')})
res = {**a, **b}
Or if you don't want to change b:
b_copy = dict(b)
b_copy.update({k:max(a[k],b[k]) for k in a if b.get(k,'')})
res = {**a, **b_copy}
> {'A': 3, 'B': 7, 'C': 6, 'D': 1}

Python nested dict comprehension

I have a dictionary:
my_dict = {'a': [1, 2, 3], 'c': 3, 'b': 2}
And I want a comprehension like add_dict = (x x +1 for x my_dict)
what would be the best approach to take when writing a comprehension to deal with keys with multiple values?
So the output would look like {'a': [2, 3, 4], 'c': 4, 'b':3} or maybe I might want to only +1 to values 1 and 2 of each key, keys 'b' and 'c' ... would be skipped.
I tried this (first two lines are kind redundant / was messing about)
my_dict = {'a': [1, 2, 3], 'b': 2, 'c': 3}
D = {x: y for (x, y) in zip(my_dict.keys(), my_dict.values())}
test = (v for v in D.values())
for x in test:
try:
if len(x):
for i in x:
print i +1
except:
print x +1
if name == 'main':
main()
output was
2
3
4
object of type 'int' has no len()
object of type 'int' has no len()
I was trying to find a more elegant way of doing this that worked using comprehensions.
Here's a one-liner for you (in Python 3), assuming the dictionary values never become double-nested:
>>> {k:([x+1 for x in v] if not isinstance(v,int) else v+1) for k,v in my_dict.items()}
{'a': [2, 3, 4], 'b': 3, 'c': 4}
Replace my_dict.items() with my_dict.iteritems() for Python 2

How to create a function (Iteration/Recursion) to run over a dictionary of tuples in Python?

I have a Python dictionary of lists like this one:
d = {'A': [(4, 4, 3), [1, 2, 3, 4, 5]],
'B': [(2, 1, 2), [5, 4, 3, 2, 1]],
'C': [(4, 1, 1), [2, 4, 1, 2, 4]]}
I need to create a formula that accesses the elements of the dictionary and, for every value [t, l]:
Calculates the mean of t (let's call this m);
Takes a random sample s, with replacement and of length len(t), from l;
Compares m with the mean of s - True if m is greater than the mean of s, False otherwise;
Repeats this process 10,000 times
Returns the percentage of times m is greater than the mean of s.
The output should look like:
In [16]: test(d)
Out[16]: {'A': 0.5, 'B': 0.9, 'C': 0.4}
I think I'm not that far from an answer, this is what I have tried:
def test(dict):
def mean_diff(dict):
for k, (v0, v1) in dict.iteritems():
m = np.mean(v0) > (np.mean(npr.choice(v1, size=(1, len(v0)), replace=True)))
return ({k: m})
for k, (v0, v1) in dict.iteritems():
bootstrap = np.array([means_diff(dict) for _ in range(10000)])
rank = float(np.sum(bootstrap))/10000
return ({k: rank})
However, I got:
RuntimeError: maximum recursion depth exceeded while calling a Python object
I'd use a list comprehension that essentially selects a random value and compares it to the mean. This will produce a list of True/False. If you take the mean of that, it will be averaging a list of 1's and 0's, so it will give you the aggregate probability.
import numpy as np
d = {'A': [(4, 4, 3), [1, 2, 3, 4, 5]],
'B': [(2, 1, 2), [5, 4, 3, 2, 1]],
'C': [(4, 1, 1), [2, 4, 1, 2, 4]]}
def makeRanks(d):
rankDict = {}
for key in d:
tup = d[key][0]
mean = np.mean(tup)
l = d[key][1]
rank = np.mean([mean > np.mean(np.random.choice(l,len(tup))) for _ in range(10000)])
rankDict[key] = rank
return rankDict
Testing
>>> makeRanks(d)
{'C': 0.15529999999999999, 'A': 0.72130000000000005, 'B': 0.031899999999999998}

List of dicts to/from dict of lists

I want to change back and forth between a dictionary of (equal-length) lists:
DL = {'a': [0, 1], 'b': [2, 3]}
and a list of dictionaries:
LD = [{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
For those of you that enjoy clever/hacky one-liners.
Here is DL to LD:
v = [dict(zip(DL,t)) for t in zip(*DL.values())]
print(v)
and LD to DL:
v = {k: [dic[k] for dic in LD] for k in LD[0]}
print(v)
LD to DL is a little hackier since you are assuming that the keys are the same in each dict. Also, please note that I do not condone the use of such code in any kind of real system.
If you're allowed to use outside packages, Pandas works great for this:
import pandas as pd
pd.DataFrame(DL).to_dict(orient="records")
Which outputs:
[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
You can also use orient="list" to get back the original structure
{'a': [0, 1], 'b': [2, 3]}
Perhaps consider using numpy:
import numpy as np
arr = np.array([(0, 2), (1, 3)], dtype=[('a', int), ('b', int)])
print(arr)
# [(0, 2) (1, 3)]
Here we access columns indexed by names, e.g. 'a', or 'b' (sort of like DL):
print(arr['a'])
# [0 1]
Here we access rows by integer index (sort of like LD):
print(arr[0])
# (0, 2)
Each value in the row can be accessed by column name (sort of like LD):
print(arr[0]['b'])
# 2
To go from the list of dictionaries, it is straightforward:
You can use this form:
DL={'a':[0,1],'b':[2,3], 'c':[4,5]}
LD=[{'a':0,'b':2, 'c':4},{'a':1,'b':3, 'c':5}]
nd={}
for d in LD:
for k,v in d.items():
try:
nd[k].append(v)
except KeyError:
nd[k]=[v]
print nd
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Or use defaultdict:
nd=cl.defaultdict(list)
for d in LD:
for key,val in d.items():
nd[key].append(val)
print dict(nd.items())
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Going the other way is problematic. You need to have some information of the insertion order into the list from keys from the dictionary. Recall that the order of keys in a dict is not necessarily the same as the original insertion order.
For giggles, assume the insertion order is based on sorted keys. You can then do it this way:
nl=[]
nl_index=[]
for k in sorted(DL.keys()):
nl.append({k:[]})
nl_index.append(k)
for key,l in DL.items():
for item in l:
nl[nl_index.index(key)][key].append(item)
print nl
#[{'a': [0, 1]}, {'b': [2, 3]}, {'c': [4, 5]}]
If your question was based on curiosity, there is your answer. If you have a real-world problem, let me suggest you rethink your data structures. Neither of these seems to be a very scalable solution.
Here are the one-line solutions (spread out over multiple lines for readability) that I came up with:
if dl is your original dict of lists:
dl = {"a":[0, 1],"b":[2, 3]}
Then here's how to convert it to a list of dicts:
ld = [{key:value[index] for key,value in dl.items()}
for index in range(max(map(len,dl.values())))]
Which, if you assume that all your lists are the same length, you can simplify and gain a performance increase by going to:
ld = [{key:value[index] for key, value in dl.items()}
for index in range(len(dl.values()[0]))]
Here's how to convert that back into a dict of lists:
dl2 = {key:[item[key] for item in ld]
for key in list(functools.reduce(
lambda x, y: x.union(y),
(set(dicts.keys()) for dicts in ld)
))
}
If you're using Python 2 instead of Python 3, you can just use reduce instead of functools.reduce there.
You can simplify this if you assume that all the dicts in your list will have the same keys:
dl2 = {key:[item[key] for item in ld] for key in ld[0].keys() }
cytoolz.dicttoolz.merge_with
Docs
from cytoolz.dicttoolz import merge_with
merge_with(list, *LD)
{'a': [0, 1], 'b': [2, 3]}
Non-cython version
Docs
from toolz.dicttoolz import merge_with
merge_with(list, *LD)
{'a': [0, 1], 'b': [2, 3]}
The python module of pandas can give you an easy-understanding solution. As a complement to #chiang's answer, the solutions of both D-to-L and L-to-D are as follows:
import pandas as pd
DL = {'a': [0, 1], 'b': [2, 3]}
out1 = pd.DataFrame(DL).to_dict('records')
Output:
[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
In the other direction:
LD = [{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
out2 = pd.DataFrame(LD).to_dict('list')
Output:
{'a': [0, 1], 'b': [2, 3]}
Cleanest way I can think of a summer friday. As a bonus, it supports lists of different lengths (but in this case, DLtoLD(LDtoDL(l)) is no more identity).
From list to dict
Actually less clean than #dwerk's defaultdict version.
def LDtoDL (l) :
result = {}
for d in l :
for k, v in d.items() :
result[k] = result.get(k,[]) + [v] #inefficient
return result
From dict to list
def DLtoLD (d) :
if not d :
return []
#reserve as much *distinct* dicts as the longest sequence
result = [{} for i in range(max (map (len, d.values())))]
#fill each dict, one key at a time
for k, seq in d.items() :
for oneDict, oneValue in zip(result, seq) :
oneDict[k] = oneValue
return result
I needed such a method which works for lists of different lengths (so this is a generalization of the original question). Since I did not find any code here that the way that I expected, here's my code which works for me:
def dict_of_lists_to_list_of_dicts(dict_of_lists: Dict[S, List[T]]) -> List[Dict[S, T]]:
keys = list(dict_of_lists.keys())
list_of_values = [dict_of_lists[key] for key in keys]
product = list(itertools.product(*list_of_values))
return [dict(zip(keys, product_elem)) for product_elem in product]
Examples:
>>> dict_of_lists_to_list_of_dicts({1: [3], 2: [4, 5]})
[{1: 3, 2: 4}, {1: 3, 2: 5}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5]})
[{1: 3, 2: 5}, {1: 4, 2: 5}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5, 6]})
[{1: 3, 2: 5}, {1: 3, 2: 6}, {1: 4, 2: 5}, {1: 4, 2: 6}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5, 6], 7: [8, 9, 10]})
[{1: 3, 2: 5, 7: 8},
{1: 3, 2: 5, 7: 9},
{1: 3, 2: 5, 7: 10},
{1: 3, 2: 6, 7: 8},
{1: 3, 2: 6, 7: 9},
{1: 3, 2: 6, 7: 10},
{1: 4, 2: 5, 7: 8},
{1: 4, 2: 5, 7: 9},
{1: 4, 2: 5, 7: 10},
{1: 4, 2: 6, 7: 8},
{1: 4, 2: 6, 7: 9},
{1: 4, 2: 6, 7: 10}]
Here my small script :
a = {'a': [0, 1], 'b': [2, 3]}
elem = {}
result = []
for i in a['a']: # (1)
for key, value in a.items():
elem[key] = value[i]
result.append(elem)
elem = {}
print result
I'm not sure that is the beautiful way.
(1) You suppose that you have the same length for the lists
Here is a solution without any libraries used:
def dl_to_ld(initial):
finalList = []
neededLen = 0
for key in initial:
if(len(initial[key]) > neededLen):
neededLen = len(initial[key])
for i in range(neededLen):
finalList.append({})
for i in range(len(finalList)):
for key in initial:
try:
finalList[i][key] = initial[key][i]
except:
pass
return finalList
You can call it as follows:
dl = {'a':[0,1],'b':[2,3]}
print(dl_to_ld(dl))
#[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
If you don't mind a generator, you can use something like
def f(dl):
l = list((k,v.__iter__()) for k,v in dl.items())
while True:
d = dict((k,i.next()) for k,i in l)
if not d:
break
yield d
It's not as "clean" as it could be for Technical Reasons: My original implementation did yield dict(...), but this ends up being the empty dictionary because (in Python 2.5) a for b in c does not distinguish between a StopIteration exception when iterating over c and a StopIteration exception when evaluating a.
On the other hand, I can't work out what you're actually trying to do; it might be more sensible to design a data structure that meets your requirements instead of trying to shoehorn it in to the existing data structures. (For example, a list of dicts is a poor way to represent the result of a database query.)
List of dicts ⟶ dict of lists
from collections import defaultdict
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def ld_to_dl(ld: list[dict[K, V]]) -> dict[K, list[V]]:
dl = defaultdict(list)
for d in ld:
for k, v in d.items():
dl[k].append(v)
return dl
defaultdict creates an empty list if one does not exist upon key access.
Dict of lists ⟶ list of dicts
Collecting into "jagged" dictionaries
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def dl_to_ld(dl: dict[K, list[V]]) -> list[dict[K, V]]:
ld = []
for k, vs in dl.items():
ld += [{} for _ in range(len(vs) - len(ld))]
for i, v in enumerate(vs):
ld[i][k] = v
return ld
This generates a list of dictionaries ld that may be missing items if the lengths of the lists in dl are unequal. It loops over all key-values in dl, and creates empty dictionaries if ld does not have enough.
Collecting into "complete" dictionaries only
(Usually intended only for equal-length lists.)
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def dl_to_ld(dl: dict[K, list[V]]) -> list[dict[K, V]]:
ld = [dict(zip(dl.keys(), v)) for v in zip(*dl.values())]
return ld
This generates a list of dictionaries ld that have the length of the smallest list in dl.
DL={'a':[0,1,2,3],'b':[2,3,4,5]}
LD=[{'a':0,'b':2},{'a':1,'b':3}]
Empty_list = []
Empty_dict = {}
# to find length of list in values of dictionry
len_list = 0
for i in DL.values():
if len_list < len(i):
len_list = len(i)
for k in range(len_list):
for i,j in DL.items():
Empty_dict[i] = j[k]
Empty_list.append(Empty_dict)
Empty_dict = {}
LD = Empty_list

Categories

Resources