Python: How to get possible combinations of keys in dict - python

Given a dict of vocabulary: {'A': 3, 'B': 4, 'C': 5, 'AB':6} and a sentence, which should be segmented: ABCAB.
I need to create all possible combinations of this sentence such as
[['A', 'B', 'C', 'A', 'B'], ['A', 'B', 'C', 'AB'], ['AB', 'C', 'AB'], ['AB', 'C', 'A', 'B']]
That's what I have:
def find_words(sentence):
for i in range(len(sentence)):
for word_length in range(1, max_word_length + 1):
word = sentence[i:i+word_length]
print(word)
if word not in test_dict:
continue
if i + word_length <= len(sentence):
if word.startswith(sentence[0]) and word not in words and word not in ''.join(words):
words.append(word)
else:
continue
next_position = i + word_length
if next_position >= len(sentence):
continue
else:
find_ngrams(sentence[next_position:])
return words
But it returns me only one list.
I was also looking for something useful in itertools but I couldn't find anything obviously useful. Might've missed it, though.

Try all possible prefixes and recursively do the same for the rest of the sentence.
VOC = {'A', 'B', 'C', 'AB'} # could be a dict
def parse(snt):
if snt == '':
yield []
for w in VOC:
if snt.startswith(w):
for rest in parse(snt[len(w):]):
yield [w] + rest
print(list(parse('ABCAB')))
# [['AB', 'C', 'AB'], ['AB', 'C', 'A', 'B'],
# ['A', 'B', 'C', 'AB'], ['A', 'B', 'C', 'A', 'B']]

Although not the most efficient solution, this should work:
from itertools import product
dic = {'A': 3, 'B': 4, 'C': 5, 'AB': 6}
choices = list(dic.keys())
prod = []
for a in range(1, len(choices)+2):
prod = prod + list(product(choices, repeat=a))
result = list(filter(lambda x: ''.join(x) == ''.join(choices), prod))
print(result)
# prints [('AB', 'C', 'AB'), ('A', 'B', 'C', 'AB'), ('AB', 'C', 'A', 'B'), ('A', 'B', 'C', 'A', 'B')]

Use itertools permutations to give all unique combinations.
d ={'A': 3, 'B': 4, 'C': 5, 'AB':6}
l = [k for k, v in d.items()]
print(list(itertools.permutations(l)))
[('A', 'B', 'C', 'AB'), ('A', 'B', 'AB', 'C'), ('A', 'C', 'B', 'AB'), ('A', 'C', 'AB', 'B'), ('A', 'AB', 'B', 'C'), ('A', 'AB', 'C', 'B'), ('B', 'A', 'C', 'AB'), ('B', 'A', 'AB', 'C'), ('B', 'C', 'A', 'AB'), ('B', 'C', 'AB', 'A'), ('B', 'AB', 'A', 'C'), ('B', 'AB', 'C', 'A'), ('C', 'A', 'B', 'AB'), ('C', 'A', 'AB', 'B'), ('C', 'B', 'A', 'AB'), ('C', 'B', 'AB', 'A'), ('C', 'AB', 'A', 'B'), ('C', 'AB', 'B', 'A'), ('AB', 'A', 'B', 'C'), ('AB', 'A', 'C', 'B'), ('AB', 'B', 'A', 'C'), ('AB', 'B', 'C', 'A'), ('AB', 'C', 'A', 'B'), ('AB', 'C', 'B', 'A')]

Related

How to use a dictionary to modify an array in python

I am trying to use a dictionary (created by reading the content of a first file) to modify the content of an array(created by reading the content of a second file) in order to return as many modified arrays as their are keys in the dictionary with the modification corresponding the position and change in the original array indicated in the values for each key.
From a minimal example, if my dictionary and my list are:
change={'change 1': [(1, 'B'), (3, 'B'), (5, 'B'), (7, 'B')], 'change 2': [(2, 'B'), (4, 'B'), (6, 'B')]}
s=['A', 'A', 'A', 'A', 'A', 'A', 'A']
Then, I would like my code to return two lists:
['B', 'A', 'B', 'A', 'B', 'A', 'B']
['A', 'B', 'A', 'B', 'A', 'B', 'A']
I tried to code this in python3:
change={'change 1': [(1, 'B'), (3, 'B'), (5, 'B'), (7, 'B')], 'change 2': [(2, 'B'), (4, 'B'), (6, 'B')]}
s=['A', 'A', 'A', 'A', 'A', 'A', 'A']
for key in change:
s1=s
for n in change[key]:
s1[n[0]-1] = n[1]
print(key, s1)
print(s)
However it seems that even if I am only modifying the list s1 which is a copy of s, I am nontheless modifying s as well, as it returns:
change 1 ['B', 'A', 'B', 'A', 'B', 'A', 'B']
['B', 'A', 'B', 'A', 'B', 'A', 'B']
change 2 ['B', 'B', 'B', 'B', 'B', 'B', 'B']
['B', 'B', 'B', 'B', 'B', 'B', 'B']
So although I get the first list right, the second isn't and I don't understand why.
Could you help me see what is wrong in my code, and how to make it work?
Many thanks!
In your code s1 isn't a copy of s, it's just another name for s.
I'd suggest doing this by just building a new list in a comprehension, e.g.:
>>> change={'change 1': [(1, 'B'), (3, 'B'), (5, 'B'), (7, 'B')], 'change 2': [(2, 'B'), (4, 'B'), (6, 'B')]}
>>> s=['A', 'A', 'A', 'A', 'A', 'A', 'A']
>>> for t in change.values():
... d = dict(t)
... print([d.get(i, x) for i, x in enumerate(s, 1)])
...
['B', 'A', 'B', 'A', 'B', 'A', 'B']
['A', 'B', 'A', 'B', 'A', 'B', 'A']
But if you change your original code to initialize s1 to s.copy() it seems to work:
>>> for key in change:
... s1 = s.copy()
... for n in change[key]:
... s1[n[0]-1] = n[1]
... print(s1)
...
['B', 'A', 'B', 'A', 'B', 'A', 'B']
['A', 'B', 'A', 'B', 'A', 'B', 'A']
If you want to make sure a copy of an instance of a class with all its dependencies is completely being copied, use deepcopy as follows:
import copy
change={'change 1': [(1, 'B'), (3, 'B'), (5, 'B'), (7, 'B')], 'change 2': [(2, 'B'), (4, 'B'), (6, 'B')]}
s=['A', 'A', 'A', 'A', 'A', 'A', 'A']
for key in change:
s1=copy.deepcopy(s)
for n in change[key]:
s1[n[0]-1] = n[1]
print(key, s1)
print(s)

Python replace tuples from list of tuples by highest priority list

I have those python lists :
x = [('D', 'F'), ('A', 'D'), ('B', 'G'), ('B', 'C'), ('A', 'B')]
priority_list = ['A', 'B', 'C', 'D', 'F', 'G'] # Ordered from highest to lowest priority
How can I, for each tuple in my list, keep the value with the highest priority according to priority_list? The result would be :
['D', 'A', 'B', 'B', 'A']
Another examples:
x = [('B', 'D'), ('E', 'A'), ('B', 'A'), ('D', 'F'), ('E', 'C')]
priority_list = ['A', 'B', 'C', 'D', 'E', 'F']
# Result:
['B', 'A', 'A', 'D', 'C']
x = [('B', 'C'), ('F', 'E'), ('B', 'A'), ('D', 'F'), ('E', 'C')]
priority_list = ['F', 'E', 'D', 'C', 'B', 'A'] # Notice the change in priorities
# Result:
['C', 'F', 'B', 'F', 'E']
Thanks in advance, I might be over complicating this.
You can try
[sorted(i, key=priority_list.index)[0] for i in x]
though it will throw an exception if you find a value not in the priority list.
You can try:
def get_priority_val(data, priority_list):
for single_val in priority_list:
if single_val in data:
return single_val
x = [('D', 'F'), ('A', 'D'), ('B', 'G'), ('B', 'C'), ('A', 'B')]
priority_list = ['A', 'B', 'C', 'D', 'F', 'G']
final_data = []
for data in x:
final_data.append(get_priority_val(data, priority_list))
print(final_data)
Output:
['D', 'A', 'B', 'B', 'A']
you can try using list comprehension:
ans = [d[0] if priority_list.index(d[0]) < priority_list.index(d[1] ) else d[1] for d in x ]
output:
['D', 'A', 'B', 'B', 'A']
You can do it in one line using a list comprehension :
[y[0] if priority_list.index(y[0]) < priority_list.index(y[1]) else y[1] for y in x]
Output :
['D', 'A', 'B', 'B', 'A']
You can create a dict containing priority values and just use min with a custom key
>>> priority_dict = {k:i for i,k in enumerate(priority_list)}
>>> [min(t, key=priority_dict.get) for t in x]
['D', 'A', 'B', 'B', 'A']

How to pop item from list and push it back to list based on condition using python

I have list with urls for crawling.['http://domain1.com','http://domain1.com/page1','http://domain2.com']
Code:
prev_domain = ''
while urls:
url = urls.pop()
if base_url(url) == prev_domain: # base_url is custom function return domain of an url
urls.append(url) # is this is possible?
continue
else:
crawl(url)
Basically I dont want to crawl webpages of same domain continuously. Continuosly crawling a domain url, return http response status code with 429: Too Many Requests. The user has sent too many requests in a given amount of time ("rate limiting"). To by-pass this issue, I'm planning to go with below logic.
Loop through all items in the list and compare current element base url with previously processed element base url.
If base urls are different then process for next step, otherwise do not process current element, just append this element to the same list.
Note : If urls in list are of same domain, make delay in processing each element and then execute.
Please provide your thoughts.
Your algorithm is almost correct, but not the implementation:
>>> L = [1,2,3]
>>> L.pop()
3
>>> L.append(3)
>>> L
[1, 2, 3]
That's why your program loops forever: if the domain is the same as the previous domain, you just append then pop then append, then.... What you need is not a stack, it's a round robin:
>>> L.pop()
3
>>> L.insert(0, 3)
>>> L
[3, 1, 2]
Let's take a shuffled list of permutations of "abcd":
>>> L = [('b', 'c', 'd', 'a'), ('d', 'c', 'b', 'a'), ('a', 'c', 'd', 'b'), ('c', 'd', 'a', 'b'), ('b', 'd', 'a', 'c'), ('b', 'a', 'd', 'c'), ('b', 'c', 'a', 'd'), ('a', 'b', 'd', 'c'), ('d', 'a', 'b', 'c'), ('a', 'b', 'c', 'd'), ('d', 'c', 'a', 'b'), ('a', 'd', 'c', 'b'), ('d', 'a', 'c', 'b'), ('c', 'd', 'b', 'a'), ('d', 'b', 'c', 'a'), ('d', 'b', 'a', 'c'), ('a', 'd', 'b', 'c'), ('b', 'd', 'c', 'a'), ('c', 'b', 'd', 'a'), ('c', 'a', 'b', 'd'), ('b', 'a', 'c', 'd')]
The first letter is the domain. Here's a slightly modified version of your code:
>>> prev = None
>>> while L:
... e = L.pop()
... if L and e[0] == prev:
... L.insert(0, e)
... else:
... print(e)
... prev = e[0]
('b', 'a', 'c', 'd')
('c', 'a', 'b', 'd')
('b', 'd', 'c', 'a')
('a', 'd', 'b', 'c')
('d', 'b', 'a', 'c')
('c', 'd', 'b', 'a')
('d', 'a', 'c', 'b')
('a', 'd', 'c', 'b')
('d', 'c', 'a', 'b')
('a', 'b', 'c', 'd')
('d', 'a', 'b', 'c')
('a', 'b', 'd', 'c')
('b', 'c', 'a', 'd')
('c', 'd', 'a', 'b')
('a', 'c', 'd', 'b')
('d', 'c', 'b', 'a')
('b', 'c', 'd', 'a')
('c', 'b', 'd', 'a')
('d', 'b', 'c', 'a')
('b', 'a', 'd', 'c')
('b', 'd', 'a', 'c')
The modification is: if L and, because if the last element of the list domain is prev, then you'll loop forever with your one element list: pop, same as prev, insert, pop, ...(as with pop/append)
Here's another option: create a dict domain -> list of urls:
>>> d = {}
>>> for e in L:
... d.setdefault(e[0], []).append(e)
>>> d
{'b': [('b', 'c', 'd', 'a'), ('b', 'd', 'a', 'c'), ('b', 'a', 'd', 'c'), ('b', 'c', 'a', 'd'), ('b', 'd', 'c', 'a'), ('b', 'a', 'c', 'd')], 'd': [('d', 'c', 'b', 'a'), ('d', 'a', 'b', 'c'), ('d', 'c', 'a', 'b'), ('d', 'a', 'c', 'b'), ('d', 'b', 'c', 'a'), ('d', 'b', 'a', 'c')], 'a': [('a', 'c', 'd', 'b'), ('a', 'b', 'd', 'c'), ('a', 'b', 'c', 'd'), ('a', 'd', 'c', 'b'), ('a', 'd', 'b', 'c')], 'c': [('c', 'd', 'a', 'b'), ('c', 'd', 'b', 'a'), ('c', 'b', 'd', 'a'), ('c', 'a', 'b', 'd')]}
Now, take an element of every domain and clear the dict, then loop until the dict is empty:
>>> while d:
... for k, vs in d.items():
... e = vs.pop()
... print (e)
... d = {k: vs for k, vs in d.items() if vs} # clear the dict
...
('b', 'a', 'c', 'd')
('d', 'b', 'a', 'c')
('a', 'd', 'b', 'c')
('c', 'a', 'b', 'd')
('b', 'd', 'c', 'a')
('d', 'b', 'c', 'a')
('a', 'd', 'c', 'b')
('c', 'b', 'd', 'a')
('b', 'c', 'a', 'd')
('d', 'a', 'c', 'b')
('a', 'b', 'c', 'd')
('c', 'd', 'b', 'a')
('b', 'a', 'd', 'c')
('d', 'c', 'a', 'b')
('a', 'b', 'd', 'c')
('c', 'd', 'a', 'b')
('b', 'd', 'a', 'c')
('d', 'a', 'b', 'c')
('a', 'c', 'd', 'b')
('b', 'c', 'd', 'a')
('d', 'c', 'b', 'a')
The output is more uniform.
Check the following code snippet,
urls = ['http://domain1.com','http://domain1.com/page1','http://domain2.com']
crawl_for_urls = {}
for url in urls:
domain = base_url(url)
if domain not in crowl_for_urls:
crawl_for_urls.update({domain:url})
crawl(url)
crawl() will be called only for unique domain.
Or you can use:
urls = ['http://domain1.com','http://domain1.com/page1','http://domain2.com']
crawl_for_urls = {}
for url in urls:
domain = base_url(url)
if domain not in crowl_for_urls:
crawl_for_urls.update({domain:[url]})
crawl(url)
else:
crawl_for_urls.get(domain, []).append(url)
This way you can categories the URL's based on domain and also can use crawl() for unique domain.

List of possible combinations of characters with OR statement

I managed to generate a list of all possible combinations of characters 'a', 'b' and 'c' (code below). Now I want to add a fourth character, which can be either 'd' or 'f' but NOT both in the same combination. How could I achieve this ?
items = ['a', 'b', 'c']
from itertools import permutations
for p in permutations(items):
print(p)
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
Created a new list items2 for d and f. Assuming that OP needs all combinations of [a,b,c,d] and [a,b,c,f]
items1 = ['a', 'b', 'c']
items2 = ['d','f']
from itertools import permutations
for x in items2:
for p in permutations(items1+[x]):
print(p)
A variation on #Van Peer's solution. You can modify the extended list in-place:
from itertools import permutations
items = list('abc_')
for items[3] in 'dg':
for p in permutations(items):
print(p)
itertools.product is suitable for representing these distinct groups in a way which generalizes well. Just let the exclusive elements belong to the same iterable passed to the Cartesian product.
For instance, to get a list with the items you're looking for,
from itertools import chain, permutations, product
list(chain.from_iterable(map(permutations, product(*items, 'df'))))
# [('a', 'b', 'c', 'd'),
# ('a', 'b', 'd', 'c'),
# ('a', 'c', 'b', 'd'),
# ('a', 'c', 'd', 'b'),
# ('a', 'd', 'b', 'c'),
# ('a', 'd', 'c', 'b'),
# ('b', 'a', 'c', 'd'),
# ('b', 'a', 'd', 'c'),
# ('b', 'c', 'a', 'd'),
# ('b', 'c', 'd', 'a'),
# ('b', 'd', 'a', 'c'),
# ('b', 'd', 'c', 'a'),
# ('c', 'a', 'b', 'd'),
# ('c', 'a', 'd', 'b'),
# ...
like this for example
items = ['a', 'b', 'c','d']
from itertools import permutations
for p in permutations(items):
print(p)
items = ['a', 'b', 'c','f']
from itertools import permutations
for p in permutations(items):
print(p)

How to create network data from lists

My data look like this:
data = [['A', 'B', 'C', 'D'],
['E', 'F', 'G'],
['I', 'J']]
I would like to transform the data to the following:
data = [['A', 'B'],
['A', 'C'],
['A', 'D'],
['B', 'C'],
['B', 'D'],
['C', 'D'],
['E', 'F'],
['E', 'G'],
['F', 'G'],
['I', 'J']]
My codes are not working:
for item in data:
count = len(item)
for i in range (0, count):
print item[i], item[i+1]
These codes need improvement. Any suggestion?
The main thing here is to use itertools.combinations() with each item of the list. See this example below
>>> from itertools import combinations
>>> list(combinations(['A', 'B', 'C', 'D'] , 2))
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
It's fairly easy to then combine the results into a single list using a list comprehension or chain.from_iterable()
>>> data = [['A', 'B', 'C', 'D'],
... ['E', 'F', 'G'],
... ['I', 'J']]
>>> list(chain.from_iterable(combinations(x, 2) for x in data))
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D'), ('E', 'F'), ('E', 'G'), ('F', 'G'), ('I', 'J')]
As pointed out you can use itertool.combinations in combination with a list comprehension to flatten the list:
>>> from itertools import combinations
>>> [x for d in data for x in combinations(d, 2)]
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'),
('C', 'D'), ('E', 'F'), ('E', 'G'), ('F', 'G'), ('I', 'J')]
You can use itertools.combinations:
from itertools import combinations
data = [['A', 'B', 'C', 'D'],
['E', 'F', 'G'],
['I', 'J']]
result = []
for sublist in data:
result.extend(map(list, combinations(sublist, 2)))
print result
OUTPUT
[['A', 'B'], ['A', 'C'], ['A', 'D'], ['B', 'C'], ['B', 'D'], ['C', 'D'], ['E', 'F'], ['E', 'G'], ['F', 'G'], ['I', 'J']]
You simply need 3 nested for loops
data2 = []
for item in data:
for i in range(0, len(item)-1):
for j in range(i+1, len(item)):
data2.append([item[i],item[j]])
print data2
Output:
[['A', 'B'], ['A', 'C'], ['A', 'D'], ['B', 'C'], ['B', 'D'], ['C', 'D'],
['E', 'F'], ['E', 'G'], ['F', 'G'], ['I', 'J']]
Here is a python one liner
pairs = [ [item[i], item[j]] for item in data for i in range(len(item)) for j in range(i + 1, len(item))]

Categories

Resources