concurrently iterating through even and odd items of list - python

I have a list of items (which are HTML table rows, extracted with Beautiful Soup) and I need to iterate over the list and get even and odd elements (I mean index) for each loop run.
My code looks like this:
for top, bottom in izip(table[::2], table[1::2]):
#do something with top
#do something else with bottom
How to make this code less ugly? Or maybe is it good way to do this?
EDIT:
table[1::2], table[::2] => table[::2], table[1::2]

izip is a pretty good option, but here's a few alternatives since you're unhappy with it:
>>> def chunker(seq, size):
... return (tuple(seq[pos:pos+size]) for pos in xrange(0, len(seq), size))
...
>>> x = range(11)
>>> x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> chunker(x, 2)
<generator object <genexpr> at 0x00B44328>
>>> list(chunker(x, 2))
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9), (10,)]
>>> list(izip(x[1::2], x[::2]))
[(1, 0), (3, 2), (5, 4), (7, 6), (9, 8)]
As you can see, this has the advantage of properly handling an uneven amount of elements, which may or not be important to you. There's also this recipe from the itertools documentation itself:
>>> def grouper(n, iterable, fillvalue=None):
... "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
... args = [iter(iterable)] * n
... return izip_longest(fillvalue=fillvalue, *args)
...
>>>
>>> from itertools import izip_longest
>>> list(grouper(2, x))
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9), (10, None)]

Try:
def alternate(i):
i = iter(i)
while True:
yield(i.next(), i.next())
>>> list(alternate(range(10)))
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
This solution works on any sequence, not just lists, and doesn't copy the sequence (it will be far more efficient if you only want the first few elements of a long sequence).

Looks good. My only suggestion would be to wrap this in a function or method. That way, you can give it a name (evenOddIter()) which makes it much more readable.

Related

Given a List get all the combinations of tuples without duplicated results

I have a list=[1,2,3,4]
And I only want to receive tuple results for like all the positions in a matrix, so it would be
(1,1),(1,2),(1,3),(1,4),(2,1),(2,2),(2,3),(2,4),(3,1),(3,2),(3,3),(3,4),(4,1),(4,2),(4,3),(4,4)
I've seen several codes that return all the combinations but i don't know how to restrict it only to the tuples or how to add the (1,1),(2,2),(3,3),(4,4)
Thank you in advance.
You just need a double loop. A generator makes it easy to use
lst = [1,2,3,4]
def matrix(lst):
for i in range(len(lst)):
for j in range(len(lst)):
yield lst[i], lst[j]
output = [t for t in matrix(lst)]
print(output)
Output:
[(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3), (2, 4), (3, 1), (3, 2), (3, 3), (3, 4), (4, 1), (4, 2), (4, 3), (4, 4)]
If you just want to do this for making pairs of all symbols in the list
tuple_pairs = [(r,c) for r in lst for c in lst]
If you have instead some maximum row/colum numbers max_row and max_col you could avoid making the lst=[1,2,3,4] and instead;
tuple_pairs = [(r,c) for r in range(1,max_row+1) for c in range(1,max_col+1)]
But that's assuming that the lst's goal was to be = range(1, some_num).
Use itertools.product to get all possible combinations of an iterable object. product is roughly equivalent to nested for-loops with depth specified by the keyword parameter repeat. It returns an iterator.
from itertools import product
lst = [1, 2, 3, 4]
combos = product(lst, repeat=2)
combos = list(combos) # cast to list
print(*combos, sep=' ')
Diagonal terms can be found in a single loop (without any extra imports)
repeat = 2
diagonal = [(i,)*repeat for i in lst]
print(*diagonal sep=' ')
You can do that using list comprehension.
lst=[1,2,3,4]
out=[(i,i) for i in lst]
print(out)
Output:
[(1, 1), (2, 2), (3, 3), (4, 4)]

for loop using enumerate terminates unexpectedly

Here is a simple for loop through an enumerate object. This terminates due to (this line I have mentioned as a comment). Why is that?
enum_arr = enumerate(arr)
for ele in enum_arr:
print(ele)
print(list(enum_arr)[ele[0]:]) # terminates due to this line
Output:
(0, 0)
[(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)]
If I comment out the second print statement, then:
Output:
(0, 0)
(1, 1)
(2, 2)
(3, 3)
(4, 4)
(5, 5)
As expected.
Why is this happening?
enumerate() gives you an iterator object. Iterators are like a bookmark in a book that can only be moved forward; once you reach the end of the book you can't go back anymore, and have to make a new bookmark.
You then use that iterator in two places; the for loop and list(). The list() function moved the bookmark all the way to the end, so the for loop can't move it any further.
You'd have to create a new enumerate() object in the loop if you want to use a separate, independent iterator:
enum_arr = enumerate(arr)
for ele in enum_arr:
print(ele)
print(list(enumerate(arr[ele[0]:], ele[0])))
This does require that arr is itself not an iterator, it has to be a sequence so you can index into it. I'm assuming here that you have a list, tuple, range or similar value.
Note that I passed in ele[0] twice, the second argument to enumerate() lets you set the start value of the counter.
It is easier to use a tuple assignment here to separate out the count and value:
for count, value in enum_arr:
print((count, value))
print(list(enumerate(arr[count:], count)))
Demo:
>>> arr = range(6)
>>> enum_arr = enumerate(arr)
>>> for count, value in enum_arr:
... print((count, value))
... print(list(enumerate(arr[count:], count)))
...
(0, 0)
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5)]
(1, 1)
[(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)]
(2, 2)
[(2, 2), (3, 3), (4, 4), (5, 5)]
(3, 3)
[(3, 3), (4, 4), (5, 5)]
(4, 4)
[(4, 4), (5, 5)]
(5, 5)
[(5, 5)]
Coming back to the book analogy, and the requirement that arr is a sequence: as long as arr is a book with page numbers, you can add more bookmarks at any point. If it is some other iterable type, then you can't index into it and so would have to find some other means to 'skip ahead' and back again. Stretching the analogy further: say the book is being streamed to you, one page at a time, then you can't go back once you received all the pages. The solution coud be to create a local cache of pages first; if you can spare the memory that could be done with cached_copy = list(arr). Just take into account that you have to be sure that the book you are receiving is not so long as to require more space than you actually have. And some iterables are endless, so would require infinite memory!

More memory efficient way of making a dictionary?

VERY sorry for the vagueness, but I don't actually know what part of what I'm doing is inefficient.
I've made a program that takes a list of positive integers (example*):
[1, 1, 3, 5, 16, 2, 4, 6, 6, 8, 9, 24, 200,]
*the real lists can be up to 2000 in length and the elements between 0 and 100,000 exclusive
And creates a dictionary where each number tupled with its index (like so: (number, index)) is a key and the value for each key is a list of every number (and that number's index) in the input that it goes evenly into.
So the entry for the 3 would be: (3, 2): [(16, 4), (6, 7), (6, 8), (9, 10), (24, 11)]
My code is this:
num_dict = {}
sorted_list = sorted(beginning_list)
for a2, a in enumerate(sorted_list):
num_dict[(a, a2)] = []
for x2, x in enumerate(sorted_list):
for y2, y in enumerate(sorted_list[x2 + 1:]):
if y % x == 0:
pair = (y, y2 + x2 + 1)
num_dict[(x, x2)].append(pair)
But, when I run this script, I hit a MemoryError.
I understand that this means that I am running out of memory but in the situation I'm in, adding more ram or updating to a 64-bit version of python is not an option.
I am certain that the problem is not coming from the list sorting or the first for loop. It has to be the second for loop. I just included the other lines for context.
The full output for the list above would be (sorry for the unsortedness, that's just how dictionaries do):
(200, 12): []
(6, 7): [(24, 11)]
(16, 10): []
(6, 6): [(6, 7), (24, 11)]
(5, 5): [(200, 12)]
(4, 4): [(8, 8), (16, 10), (24, 11), (200, 12)]
(9, 9): []
(8, 8): [(16, 10), (24, 11), (200, 12)]
(2, 2): [(4, 4), (6, 6), (6, 7), (8, 8), (16, 10), (24, 11), (200, 12)]
(24, 11): []
(1, 0): [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (6, 7), (8, 8), (9, 9), (16, 10), (24, 11), (200, 12)]
(1, 1): [(2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (6, 7), (8, 8), (9, 9), (16, 10), (24, 11), (200, 12)]
(3, 3): [(6, 6), (6, 7), (9, 9), (24, 11)]
Is there a better way of going about this?
EDIT:
This dictionary will then be fed into this:
ans_set = set()
for x in num_dict:
for y in num_dict[x]:
for z in num_dict[y]:
ans_set.add((x[0], y[0], z[0]))
return len(ans_set)
to find all unique possible triplets in which the 3rd value can be evenly divided by the 2nd value which can be evenly divided by the 1st.
If you think you know of a better way of doing the entire thing, I'm open to redoing the whole of it.
Final Edit
I've found the best way to find the number of triples by reevaluating what I needed it to do. This method doesn't actually find the triples, it just counts them.
def foo(l):
llen = len(l)
total = 0
cache = {}
for i in range(llen):
cache[i] = 0
for x in range(llen):
for y in range(x + 1, llen):
if l[y] % l[x] == 0:
cache[y] += 1
total += cache[x]
return total
And here's a version of the function that explains the thought process as it goes (not good for huge lists though because of spam prints):
def bar(l):
list_length = len(l)
total_triples = 0
cache = {}
for i in range(list_length):
cache[i] = 0
for x in range(list_length):
print("\n\nfor index[{}]: {}".format(x, l[x]))
for y in range(x + 1, list_length):
print("\n\ttry index[{}]: {}".format(y, l[y]))
if l[y] % l[x] == 0:
print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
cache[y] += 1
total_triples += cache[x]
print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
print("\t\tcount is now {}".format(total_triples))
print("\t\t(+{} from cache[{}])".format(cache[x], x))
else:
print("\n\t\tfalse")
print("\ntotal number of triples:", total_triples)
Well, you could start by not unnecessarily duplicating information.
Storing full tuples (number and index) for each multiple is inefficient when you already have that information available.
For example, rather than:
(3, 2): [(16, 4), (6, 7), (6, 8), (9, 10), (24, 11)]
(the 16 appears to be wrong there as it's not a multiple of 3 so I'm guessing you meant 15) you could instead opt for:
(3, 2): [15, 6, 9, 24]
(6, 7): ...
That pretty much halves your storage needs since you can go from the 6 in the list and find all its indexes by searching the tuples. That will, of course, be extra processing effort to traverse the list but it's probably better to have a slower working solution than a faster non-working one :-)
You could reduce the storage even more by not storing the multiples at all, instead running through the tuple list using % to see if you have a multiple.
But, of course, this all depends on your actual requirements which would be better off stating the intent of what your trying to achieve rather than pre-supposing a solution.
You rebuild tuples in places like pair = (y, y2 + x2 + 1) and num_dict[(x, x2)].append(pair) when you could build a canonical set of tuples early on and then just put references in the containers. I cobbled up a 2000 item test my machine that works. I have python 3.4 64 bit with a relatively modest 3.5 GIG of RAM...
import random
# a test list that should generate longish lists
l = list(random.randint(0, 2000) for _ in range(2000))
# setup canonical index and sort ascending
sorted_index = sorted((v,i) for i,v in enumerate(l))
num_dict = {}
for idx, vi in enumerate(sorted_index):
v = vi[0]
num_dict[vi] = [vi2 for vi2 in sorted_index[idx+1:] if not vi2[0] % v]
for item in num_dict.items():
print(item)

returning a list of tuples like zip, generate it incrementally a tuple at a time, using comprehensions

I need a function which returns a tuple one at a time from a list of sequences without using zip . i tried to do it in this fashion:
gen1=[(x,y)for x in range(3) for y in range(4)]
which gives the following list:
[(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3)]
next I tried to return one tuple at a time by:
next(gen1)
But an error occured that list is not 'iterable' . how can i do it using generators.
If you want the behavior to work with an arbitrary number of sequences, while there are still some ambiguities in the question, if what you're trying to do is just make a generator version of zip, the below should work well:
def generator_zip(*args):
iterators = map(iter, args)
while iterators:
yield tuple(map(next, iterators))
First it turns each arg into an iterator, then continues to yield tuple that include the next relevant entry from each iterator until the shortest list is exhausted.
As of Python 2.4, you can do:
gen1 = ((x, y) for x in range(3) for y in range(4))
Note that you can always make a generator (well, iterator) from a list with iter:
gen1 = iter([(x, y) for x in range(3) for y in range(4)])
The difference in usage will be none. The second way will require the whole list to be in memory, though, while the first will not.
Note that you can also use the builtin functionality of zip, which is a generator (in Python 3). In Python 2, use itertools.izip.
Python 3:
>>> zip(range(0, 5), range(3, 8))
<zip object at 0x7f07519b3b90>
>>> list(zip(range(0, 5), range(3, 8)))
[(0, 3), (1, 4), (2, 5), (3, 6), (4, 7)]
Python < 3:
# Python < 3
>>> from itertools import izip
>>> izip(range(0, 5), range(3, 8))
<itertools.izip object at 0x7f5247807440>
>>> list(izip(range(0, 5), range(3, 8)))
[(0, 3), (1, 4), (2, 5), (3, 6), (4, 7)]
>>> zip(range(0, 5), range(3, 8))
[(0, 3), (1, 4), (2, 5), (3, 6), (4, 7)]

Outerzip / zip longest function (with multiple fill values)

Is there a Python function an "outer-zip", which is a extension of zip with different default values for each iterable?
a = [1, 2, 3] # associate a default value 0
b = [4, 5, 6, 7] # associate b default value 1
zip(a,b) # [(1, 4), (2, 5), (3, 6)]
outerzip((a, 0), (b, 1)) = [(1, 4), (2, 5), (3, 6), (0, 7)]
outerzip((b, 0), (a, 1)) = [(4, 1), (5, 2), (6, 3), (7, 1)]
I can almost replicate this outerzip function using map, but with None as the only default:
map(None, a, b) # [(1, 4), (2, 5), (3, 6), (None, 7)]
Note1: The built-in zip function takes an arbitrary number of iterables, and so should an outerzip function. (e.g. one should be able to calculate outerzip((a,0),(a,0),(b,1)) similarly to zip(a,a,b) and map(None, a, a, b).)
Note2: I say "outer-zip", in the style of this haskell question, but perhaps this is not correct terminology.
It's called izip_longest (zip_longest in python-3.x):
>>> from itertools import zip_longest
>>> a = [1,2,3]
>>> b = [4,5,6,7]
>>> list(zip_longest(a, b, fillvalue=0))
[(1, 4), (2, 5), (3, 6), (0, 7)]
You could modify zip_longest to support your use case for general iterables.
from itertools import chain, repeat
class OuterZipStopIteration(Exception):
pass
def outer_zip(*args):
count = len(args) - 1
def sentinel(default):
nonlocal count
if not count:
raise OuterZipStopIteration
count -= 1
yield default
iters = [chain(p, sentinel(default), repeat(default)) for p, default in args]
try:
while iters:
yield tuple(map(next, iters))
except OuterZipStopIteration:
pass
print(list(outer_zip( ("abcd", '!'),
("ef", '#'),
(map(int, '345'), '$') )))
This function can be defined by extending each inputted list and zipping:
def outerzip(*args):
# args = (a, default_a), (b, default_b), ...
max_length = max( map( lambda s: len(s[0]), args))
extended_args = [ s[0] + [s[1]]*(max_length-len(s[0])) for s in args ]
return zip(*extended_args)
outerzip((a, 0), (b, 1)) # [(1, 4), (2, 5), (3, 6), (0, 7)]

Categories

Resources