Iterate over every possible n-sampling of a list - python

If I understand itertools "combinatoric iterators" doc correctly, the idea is to provide a set of standard functions for every common combinatory iteration.
But I miss one today. I need to iterate over every ordered combinations of items with repetitions.
combination_with_replacement('abcd', 4) yields
('a', 'a', 'a', 'a')
('a', 'a', 'a', 'b')
('a', 'a', 'a', 'c')
('a', 'a', 'a', 'd')
('a', 'a', 'b', 'b')
('a', 'a', 'b', 'c')
('a', 'a', 'b', 'd')
('a', 'a', 'c', 'c')
('a', 'a', 'c', 'd')
... etc
but (even though the results are sorted tuples), these combinations are not ordered.
I expect more results from an ideal ordered_combination_with_replacement('abcd', 4) for I need to distinguish between
('a', 'a', 'a', 'a')
('a', 'a', 'a', 'b')
('a', 'a', 'b', 'a')
('a', 'b', 'a', 'a')
('b', 'a', 'a', 'a')
('a', 'a', 'a', 'c')
('a', 'a', 'c', 'a')
... etc
In other words: order matters today.
Does itertool provide such an iteration? Why not, or why have I missed it?
What's a standard way to iterate over those?
Do I need to write this generic iterator myself?

To wrap up some of the comments, there are (at least) two ways to do that:
itertools.combinations_with_replacement("abcd", 4)
and
itertools.product("abcd", repeat=4)
Both produce the required results of:
[('a', 'a', 'a', 'a'),
('a', 'a', 'a', 'b'),
('a', 'a', 'a', 'c'),
('a', 'a', 'a', 'd'),
('a', 'a', 'b', 'a'),
...

Related

How to pop item from list and push it back to list based on condition using python

I have list with urls for crawling.['http://domain1.com','http://domain1.com/page1','http://domain2.com']
Code:
prev_domain = ''
while urls:
url = urls.pop()
if base_url(url) == prev_domain: # base_url is custom function return domain of an url
urls.append(url) # is this is possible?
continue
else:
crawl(url)
Basically I dont want to crawl webpages of same domain continuously. Continuosly crawling a domain url, return http response status code with 429: Too Many Requests. The user has sent too many requests in a given amount of time ("rate limiting"). To by-pass this issue, I'm planning to go with below logic.
Loop through all items in the list and compare current element base url with previously processed element base url.
If base urls are different then process for next step, otherwise do not process current element, just append this element to the same list.
Note : If urls in list are of same domain, make delay in processing each element and then execute.
Please provide your thoughts.
Your algorithm is almost correct, but not the implementation:
>>> L = [1,2,3]
>>> L.pop()
3
>>> L.append(3)
>>> L
[1, 2, 3]
That's why your program loops forever: if the domain is the same as the previous domain, you just append then pop then append, then.... What you need is not a stack, it's a round robin:
>>> L.pop()
3
>>> L.insert(0, 3)
>>> L
[3, 1, 2]
Let's take a shuffled list of permutations of "abcd":
>>> L = [('b', 'c', 'd', 'a'), ('d', 'c', 'b', 'a'), ('a', 'c', 'd', 'b'), ('c', 'd', 'a', 'b'), ('b', 'd', 'a', 'c'), ('b', 'a', 'd', 'c'), ('b', 'c', 'a', 'd'), ('a', 'b', 'd', 'c'), ('d', 'a', 'b', 'c'), ('a', 'b', 'c', 'd'), ('d', 'c', 'a', 'b'), ('a', 'd', 'c', 'b'), ('d', 'a', 'c', 'b'), ('c', 'd', 'b', 'a'), ('d', 'b', 'c', 'a'), ('d', 'b', 'a', 'c'), ('a', 'd', 'b', 'c'), ('b', 'd', 'c', 'a'), ('c', 'b', 'd', 'a'), ('c', 'a', 'b', 'd'), ('b', 'a', 'c', 'd')]
The first letter is the domain. Here's a slightly modified version of your code:
>>> prev = None
>>> while L:
... e = L.pop()
... if L and e[0] == prev:
... L.insert(0, e)
... else:
... print(e)
... prev = e[0]
('b', 'a', 'c', 'd')
('c', 'a', 'b', 'd')
('b', 'd', 'c', 'a')
('a', 'd', 'b', 'c')
('d', 'b', 'a', 'c')
('c', 'd', 'b', 'a')
('d', 'a', 'c', 'b')
('a', 'd', 'c', 'b')
('d', 'c', 'a', 'b')
('a', 'b', 'c', 'd')
('d', 'a', 'b', 'c')
('a', 'b', 'd', 'c')
('b', 'c', 'a', 'd')
('c', 'd', 'a', 'b')
('a', 'c', 'd', 'b')
('d', 'c', 'b', 'a')
('b', 'c', 'd', 'a')
('c', 'b', 'd', 'a')
('d', 'b', 'c', 'a')
('b', 'a', 'd', 'c')
('b', 'd', 'a', 'c')
The modification is: if L and, because if the last element of the list domain is prev, then you'll loop forever with your one element list: pop, same as prev, insert, pop, ...(as with pop/append)
Here's another option: create a dict domain -> list of urls:
>>> d = {}
>>> for e in L:
... d.setdefault(e[0], []).append(e)
>>> d
{'b': [('b', 'c', 'd', 'a'), ('b', 'd', 'a', 'c'), ('b', 'a', 'd', 'c'), ('b', 'c', 'a', 'd'), ('b', 'd', 'c', 'a'), ('b', 'a', 'c', 'd')], 'd': [('d', 'c', 'b', 'a'), ('d', 'a', 'b', 'c'), ('d', 'c', 'a', 'b'), ('d', 'a', 'c', 'b'), ('d', 'b', 'c', 'a'), ('d', 'b', 'a', 'c')], 'a': [('a', 'c', 'd', 'b'), ('a', 'b', 'd', 'c'), ('a', 'b', 'c', 'd'), ('a', 'd', 'c', 'b'), ('a', 'd', 'b', 'c')], 'c': [('c', 'd', 'a', 'b'), ('c', 'd', 'b', 'a'), ('c', 'b', 'd', 'a'), ('c', 'a', 'b', 'd')]}
Now, take an element of every domain and clear the dict, then loop until the dict is empty:
>>> while d:
... for k, vs in d.items():
... e = vs.pop()
... print (e)
... d = {k: vs for k, vs in d.items() if vs} # clear the dict
...
('b', 'a', 'c', 'd')
('d', 'b', 'a', 'c')
('a', 'd', 'b', 'c')
('c', 'a', 'b', 'd')
('b', 'd', 'c', 'a')
('d', 'b', 'c', 'a')
('a', 'd', 'c', 'b')
('c', 'b', 'd', 'a')
('b', 'c', 'a', 'd')
('d', 'a', 'c', 'b')
('a', 'b', 'c', 'd')
('c', 'd', 'b', 'a')
('b', 'a', 'd', 'c')
('d', 'c', 'a', 'b')
('a', 'b', 'd', 'c')
('c', 'd', 'a', 'b')
('b', 'd', 'a', 'c')
('d', 'a', 'b', 'c')
('a', 'c', 'd', 'b')
('b', 'c', 'd', 'a')
('d', 'c', 'b', 'a')
The output is more uniform.
Check the following code snippet,
urls = ['http://domain1.com','http://domain1.com/page1','http://domain2.com']
crawl_for_urls = {}
for url in urls:
domain = base_url(url)
if domain not in crowl_for_urls:
crawl_for_urls.update({domain:url})
crawl(url)
crawl() will be called only for unique domain.
Or you can use:
urls = ['http://domain1.com','http://domain1.com/page1','http://domain2.com']
crawl_for_urls = {}
for url in urls:
domain = base_url(url)
if domain not in crowl_for_urls:
crawl_for_urls.update({domain:[url]})
crawl(url)
else:
crawl_for_urls.get(domain, []).append(url)
This way you can categories the URL's based on domain and also can use crawl() for unique domain.

Reading a list of tuples from a text file with strings and numbers

I have a text file where each line represents the results from a sequence mining operation. So the first element in each tuple is a tuple of strings (letters), and the second element is the frequency (int).
How can I read these back from the text file into the original format? Format as follows, copied directly from the text file.... Can't seem to find any similar examples out there but there's got to be a way to do this easily.
(('a',), 30838057)
(('a', 'b'), 23151399)
(('a', 'b', 'c'), 13865674)
(('a', 'b', 'c', 'e'), 8979035)
(('a', 'b', 'c', 'e', 'f'), 6771982)
(('a', 'b', 'c', 'e', 'f', 'g'), 4514076)
(('a', 'b', 'c', 'e', 'f', 'g', 'h'), 2403374)
As other have commented you can use the ast.literal_eval() function since your data appears to be formatted the same a Python literals:
import ast
from pprint import pprint
filename = 'tuples_list.txt'
tuple_list = []
with open(filename) as inp:
for line in inp:
values = ast.literal_eval(line)
tuple_list.append(values)
pprint(tuple_list)
Output:
[(('a',), 30838057),
(('a', 'b'), 23151399),
(('a', 'b', 'c'), 13865674),
(('a', 'b', 'c', 'e'), 8979035),
(('a', 'b', 'c', 'e', 'f'), 6771982),
(('a', 'b', 'c', 'e', 'f', 'g'), 4514076),
(('a', 'b', 'c', 'e', 'f', 'g', 'h'), 2403374)]

Python - How to get all Combinations of names from a list: TypeError: 'list' object is not callable

I get an error when trying to print the permutation/combination of a user generated list of names.
I tried a couple of things from itertools, but can't get either permutations or combinations to work. Ran into some other errors along the way regarding concatenating strings, but currently getting a: TypeError: 'list' object not callable.
I know I'm making a simple mistake, but can't sort it out. Please help!
from itertools import combinations
name_list = []
for i in range(0,20):
name = input('Add up to 20 names.\nWhen finished, enter "Done" to see all first and middle name combinations.\nName: ')
name_list.append(name)
if name != 'Done':
print(name_list)
else:
name_list.remove('Done')
print(name_list(combinations))
I expect:
1) the user adds a name to list
2) the list prints showing user contents of list
3) when finished the user inputs "Done":
a) 'Done' is removed from the list
b) all the combinations of the remaining items on the list printed
Permutations and combinations are two different beasts. Look:
>>> from itertools import permutations,combinations
>>> from pprint import pprint
>>> l = ['a', 'b', 'c', 'd']
>>> pprint(list(combinations(l, 2)))
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
>>> pprint(list(permutations(l)))
[('a', 'b', 'c', 'd'),
('a', 'b', 'd', 'c'),
('a', 'c', 'b', 'd'),
('a', 'c', 'd', 'b'),
('a', 'd', 'b', 'c'),
('a', 'd', 'c', 'b'),
('b', 'a', 'c', 'd'),
('b', 'a', 'd', 'c'),
('b', 'c', 'a', 'd'),
('b', 'c', 'd', 'a'),
('b', 'd', 'a', 'c'),
('b', 'd', 'c', 'a'),
('c', 'a', 'b', 'd'),
('c', 'a', 'd', 'b'),
('c', 'b', 'a', 'd'),
('c', 'b', 'd', 'a'),
('c', 'd', 'a', 'b'),
('c', 'd', 'b', 'a'),
('d', 'a', 'b', 'c'),
('d', 'a', 'c', 'b'),
('d', 'b', 'a', 'c'),
('d', 'b', 'c', 'a'),
('d', 'c', 'a', 'b'),
('d', 'c', 'b', 'a')]
>>>
for use combinations , you need to give the r as argument.this code give all combinations for all numbers(0 to length of list),
from itertools import combinations
name_list = []
for i in range(0,20):
name = raw_input('Add up to 20 names.\nWhen finished, enter "Done" to see all first and middle name combinations.\nName: ')
name_list.append(name)
if name != 'Done':
print(name_list)
else:
name_list.remove('Done')
break
for i in range(len(name_list) + 1):
print(list(combinations(name_list, i)))
print("\n")

List of possible combinations of characters with OR statement

I managed to generate a list of all possible combinations of characters 'a', 'b' and 'c' (code below). Now I want to add a fourth character, which can be either 'd' or 'f' but NOT both in the same combination. How could I achieve this ?
items = ['a', 'b', 'c']
from itertools import permutations
for p in permutations(items):
print(p)
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
Created a new list items2 for d and f. Assuming that OP needs all combinations of [a,b,c,d] and [a,b,c,f]
items1 = ['a', 'b', 'c']
items2 = ['d','f']
from itertools import permutations
for x in items2:
for p in permutations(items1+[x]):
print(p)
A variation on #Van Peer's solution. You can modify the extended list in-place:
from itertools import permutations
items = list('abc_')
for items[3] in 'dg':
for p in permutations(items):
print(p)
itertools.product is suitable for representing these distinct groups in a way which generalizes well. Just let the exclusive elements belong to the same iterable passed to the Cartesian product.
For instance, to get a list with the items you're looking for,
from itertools import chain, permutations, product
list(chain.from_iterable(map(permutations, product(*items, 'df'))))
# [('a', 'b', 'c', 'd'),
# ('a', 'b', 'd', 'c'),
# ('a', 'c', 'b', 'd'),
# ('a', 'c', 'd', 'b'),
# ('a', 'd', 'b', 'c'),
# ('a', 'd', 'c', 'b'),
# ('b', 'a', 'c', 'd'),
# ('b', 'a', 'd', 'c'),
# ('b', 'c', 'a', 'd'),
# ('b', 'c', 'd', 'a'),
# ('b', 'd', 'a', 'c'),
# ('b', 'd', 'c', 'a'),
# ('c', 'a', 'b', 'd'),
# ('c', 'a', 'd', 'b'),
# ...
like this for example
items = ['a', 'b', 'c','d']
from itertools import permutations
for p in permutations(items):
print(p)
items = ['a', 'b', 'c','f']
from itertools import permutations
for p in permutations(items):
print(p)

How to use itertools to compute all combinations with repeating elements? [duplicate]

This question already has an answer here:
Which itertools generator doesn't skip any combinations?
(1 answer)
Closed 8 years ago.
I have tried to use itertools to compute all combinations of a list ['a', 'b', 'c'] using combinations_with_replacement with repeating elements. The problem is in the fact that the indices seem to be used to distinguish the elements:
Return r length subsequences of elements from the input iterable allowing individual elements to be repeated more than once.
Combinations are emitted in lexicographic sort order. So, if the input
iterable is sorted, the combination tuples will be produced in sorted
order.
Elements are treated as unique based on their position, not on their
value. So if the input elements are unique, the generated combinations
will also be unique.
Sot this code snippet:
import itertools
for item in itertools.combinations_with_replacement(['a','b','c'], 3):
print (item)
results in this output:
('a', 'a', 'a')
('a', 'a', 'b')
('a', 'a', 'c')
('a', 'b', 'b')
('a', 'b', 'c')
('a', 'c', 'c')
('b', 'b', 'b')
('b', 'b', 'c')
('b', 'c', 'c')
('c', 'c', 'c')
And what I need is the combination set to contain elements like: ('a', 'b', 'a') which seem to be missing. How to compute the complete combination set?
It sounds like you want itertools.product:
>>> from itertools import product
>>> for item in product(['a', 'b', 'c'], repeat=3):
... print item
...
('a', 'a', 'a')
('a', 'a', 'b')
('a', 'a', 'c')
('a', 'b', 'a')
('a', 'b', 'b')
('a', 'b', 'c')
('a', 'c', 'a')
('a', 'c', 'b')
('a', 'c', 'c')
('b', 'a', 'a')
('b', 'a', 'b')
('b', 'a', 'c')
('b', 'b', 'a')
('b', 'b', 'b')
('b', 'b', 'c')
('b', 'c', 'a')
('b', 'c', 'b')
('b', 'c', 'c')
('c', 'a', 'a')
('c', 'a', 'b')
('c', 'a', 'c')
('c', 'b', 'a')
('c', 'b', 'b')
('c', 'b', 'c')
('c', 'c', 'a')
('c', 'c', 'b')
('c', 'c', 'c')
>>>
For such small sequences you could use no itertools at all:
abc = ("a", "b", "c")
print [(x, y, z) for x in abc for y in abc for z in abc]
# output:
[('a', 'a', 'a'),
('a', 'a', 'b'),
('a', 'a', 'c'),
('a', 'b', 'a'),
('a', 'b', 'b'),
('a', 'b', 'c'),
('a', 'c', 'a'),
('a', 'c', 'b'),
('a', 'c', 'c'),
('b', 'a', 'a'),
('b', 'a', 'b'),
('b', 'a', 'c'),
('b', 'b', 'a'),
('b', 'b', 'b'),
('b', 'b', 'c'),
('b', 'c', 'a'),
('b', 'c', 'b'),
('b', 'c', 'c'),
('c', 'a', 'a'),
('c', 'a', 'b'),
('c', 'a', 'c'),
('c', 'b', 'a'),
('c', 'b', 'b'),
('c', 'b', 'c'),
('c', 'c', 'a'),
('c', 'c', 'b'),
('c', 'c', 'c')]

Categories

Resources