for loop save to array but skip saving elements - python

Basically, I want a fancy oneliner that doesn't read all of the files I'm looking at into memory, but still processes them all, and saves a nice sample of them.
The oneliner I would like to do is:
def foo(findex):
return [bar(line) for line in findex] # but skip every nth term
But I would like to be able to not save every nth line in that. i.e., I still want it to run (for byte position purposes), but I don't want to save the image, because I don't have enough memory for that.
So, if the output of bar(line) is 1,2,3,4,5,6,... I would like it to still run on 1,2,3,4,5,6,... but I would like the return value to be [1,3,5,7,9,...] or something of the sort.

use enumerate to get the index, and a filter using modulo to take every other line:
return [bar(line) for i,line in enumerate(findex) if i%2]
Generalize that with i%n so everytime that the index is divisible by n then i%n==0 and bar(line) isn't issued into the listcomp.
enumerate works for every iterable (file handle, generator ...), so it's way better than using range(len(findex))
Now the above is incorrect if you want to call bar on all the values (because you need the side effect generated by bar), because the filter prevents execution. So you have to do that in 2 passes, for instance using map to apply your function to all items of findex and pick only the results you're interested in (but it guarantees that all of the lines are processed) using the same modulo filter but after the execution:
l = [x for i,x in enumerate(map(bar,findex)) if i%n]

If findex is subscriptable (accepts [] operator with indices), you can try this way :
def foo(findex):
return [bar(findex[i]) for i in range (0, len(findex), 2) ]

Related

change the variable in a forloop just like in list comprehension, python

I want to be able to change what something does in a for loop
here a very simple example
for i in range(10):
print(i)
this will print 0 up to 9 and not include 10
and ofc i want 1 to 10 not 0 to 9
to fix this i would like to say:
i+1 for i in range(10):
print(i)
but i cant
if i did list comprehension i can do:
list0 = [i+1 for i in range(10)]
this is very handy
now i have to either do
for i in range(1, 10+1):
which is very annoying
or do
print(i+1)
but if i used i 10 times i'd have to change them all
or i could say:
for i in range(10):
i += 1
these methods are all not very nice, im just wondering if this neat way im looking for exists at all
thanks.
You ask if there exists any way to change the value received from an iterable in a for loop. The answer is yes; this can be accomplished in one of two ways. I'll continue to use your example with range to demonstrate this, but do note that I am in no way suggesting that these are ideal ways of solving that particular problem.
The first method is using the builtin map:
for i in map(lambda x: x + 1, range(10)):
map accepts a callable and an iterable, and will lazily apply the given callable to each element produced by the iterable. Do note that since this involves an additional function call during each iteration, this technique can incur a noticeable runtime penalty compared to performing the same action within the loop body.
The second method is using a generator expression (or, alternatively, any other flavor of list/set/dict compression):
for i in (x + 1 for x in range(10)):
As with map, using a generator will lazily produce transformed elements from the given iterable. Do note that if you opt to use a comprehension instead, the entire collection will be constructed upfront, which may be undesirable.
Again, for incrementing the values produced by range, neither of these are ideal. Simply using range(1, 11) is the natural solution for that.

Index error while iterating through list and pop()-ing elements [duplicate]

This question already has answers here:
How to test multiple variables for equality against a single value?
(31 answers)
Closed 6 years ago.
import os
os.chdir('G:\\f5_automation')
r = open('G:\\f5_automation\\uat.list.cmd.txt')
#print(r.read().replace('\n', ''))
t = r.read().split('\n')
for i in range(len(t)):
if ('inherited' or 'device-group' or 'partition' or 'template' or 'traffic-group') in t[i]:
t.pop(i)
print(i,t[i])
In the above code, I get an index error at line 9: 'if ('inherited' or 'device-group'...etc.
I really don't understand why. How can my index be out of range if it's the perfect length by using len(t) as my range?
The goal is to pop any indexes from my list that contain any of those substrings. Thank you for any assistance!
This happens because you are editing the list while looping through it,
you first get the length which is 10 for example, then you loop through the thing 10 times. but as soon as you've deleted one thing the list will only be 9 long.
A way around this is to create a new list of things you want to keep and use that one instead.
I've slightly edited your code and done something similar.
t = ['inherited', 'cookies', 'device-group']
interesing_things = []
for i in t:
if i not in ['inherited', 'device-group', 'partition', 'template', 'traffic-group']:
interesing_things.append(i)
print(i)
Let's say len(t) == 5.
We'll process i taking values [0,1,2,3,4]
After we process i = 0, we pop one value from t. len(t) == 4 now. This would mean error if we get to i = 4. However, we're still going to try to go up to 4 because our range is already inited to be up to 4.
Next (i = 1) step ensures an error on i = 3.
Next (i = 2) step ensures an error on i = 2, but that is already processed.
Next (i = 3) step yields an error.
Instead, you should do something like this:
while t:
element = t.pop()
print(element)
On a side note, you should replace that in check with sets:
qualities_we_need = {'inherited', 'device-group', 'partition'} # put all your qualities here
And then in loop:
if qualities_we_need & set(element):
print(element)
If you need indexes you could either use one more variable to keep track of index of value we're currently processing, or use enumerate()
As many people said in the comments, there are several problems with your code.
The or operator sees the values on its left and right as booleans and returns the first one that is True (from left to right). So your parenthesis evaluates to 'inherited' since any non-empty string is True. As a result, even if your for loop was working, you would be popping elements that are equal to 'inherited' only.
The for loop is not working though. That happens because the size of the list you are iterating over is changing as you loop through and you will get an index-out-of-range error if an element of the list is actually equal to 'inherited' and gets popped.
So, take a look at this:
import os
os.chdir('G:\\f5_automation')
r = open('G:\\f5_automation\\uat.list.cmd.txt')
print(r.read().replace('\n', ''))
t = r.read().split('\n')
t_dupl = t[:]
for i, items in enumerate(t_dupl):
if items in ['inherited', 'device-group', 'partition', 'template', 'traffic-group']:
print(i, items)
t.remove(items)
By duplicating the original list, we can use its items as a "pool" of items to pick from and modify the list we are actually interested in.
Finally, know that the pop() method returns the item it removes from the list and this is something you do not need in your example. remove() works just fine for you.
As a side note, you can probably replace your first 5 lines of code with this:
with open('G:\\f5_automation\\uat.list.cmd.txt', 'r') as r:
t = r.readlines()
the advantage of using the with statement is that it automatically handles the closing of the file by itself when the reading is done. Finally, instead of reading the whole file and splitting it on linebreaks, you can just use the built-in readlines() method which does exactly that.

Python, out of memory when iterating over very large numbers

I'm writing a python script that does various permutations of characters. Eventually, the script will crash with out of memory error depending on how much depth I want to go for the permutation.
I had initially thought the solution would have been emptying out the list and restarting over but doing it this way I get index out of bounds error.
This is my current set up:
for j in range(0, csetlen):
getJ = None
for i in range(0, char_set_len):
getJ = word_list[j] + char_set[i]
word_list.append(getJ)
csetlen = csetlen - j
del word_list[j-1:]
word_list.append(getJ)
j=0
Basically, csetlen can be a very large number (excess of 100,000,000). Of course I do not have enough RAM for this; so I'm trying to find out how to shrink the list in the outer for loop. How does one do this gracefully?
The memory error has to do with word_list. Currently, I am storing millions of different permutations; I need to be able to "recycle" some of the old list values. How does one do this to a python list?
What you want is an iterator that generates the values on demand (and doesn't store them in memory):
from itertools import product
getJ_iterator = product(wordlist[:csetlen], char_set[:char_set_len])
This is equivalent to the following generator function:
def getJ_gen(first_list, second_list):
for i in first_list:
for j in second_list:
yield (i, j)
getJ_iterator = getJ_gen(wordlist[:csetlen], char_set[:char_set_len])
You would iterate over the object like so:
for item in getJ_iterator:
#do stuff
Note that item in this case would be a tuple of the form (word, char).

Why does len() not support iterators?

Many of Python's built-in functions (any(), all(), sum() to name some) take iterables but why does len() not?
One could always use sum(1 for i in iterable) as an equivalent, but why is it len() does not take iterables in the first place?
Many iterables are defined by generator expressions which don't have a well defined len. Take the following which iterates forever:
def sequence(i=0):
while True:
i+=1
yield i
Basically, to have a well defined length, you need to know the entire object up front. Contrast that to a function like sum. You don't need to know the entire object at once to sum it -- Just take one element at a time and add it to what you've already summed.
Be careful with idioms like sum(1 for i in iterable), often it will just exhaust iterable so you can't use it anymore. Or, it could be slow to get the i'th element if there is a lot of computation involved. It might be worth asking yourself why you need to know the length a-priori. This might give you some insight into what type of data-structure to use (frequently list and tuple work just fine) -- or you may be able to perform your operation without needing calling len.
This is an iterable:
def forever():
while True:
yield 1
Yet, it has no length. If you want to find the length of a finite iterable, the only way to do so, by definition of what an iterable is (something you can repeatedly call to get the next element until you reach the end) is to expand the iterable out fully, e.g.:
len(list(the_iterable))
As mgilson pointed out, you might want to ask yourself - why do you want to know the length of a particular iterable? Feel free to comment and I'll add a specific example.
If you want to keep track of how many elements you have processed, instead of doing:
num_elements = len(the_iterable)
for element in the_iterable:
...
do:
num_elements = 0
for element in the_iterable:
num_elements += 1
...
If you want a memory-efficient way of seeing how many elements end up being in a comprehension, for example:
num_relevant = len(x for x in xrange(100000) if x%14==0)
It wouldn't be efficient to do this (you don't need the whole list):
num_relevant = len([x for x in xrange(100000) if x%14==0])
sum would probably be the most handy way, but it looks quite weird and it isn't immediately clear what you're doing:
num_relevant = sum(1 for _ in (x for x in xrange(100000) if x%14==0))
So, you should probably write your own function:
def exhaustive_len(iterable):
length = 0
for _ in iterable: length += 1
return length
exhaustive_len(x for x in xrange(100000) if x%14==0)
The long name is to help remind you that it does consume the iterable, for example, this won't work as you might think:
def yield_numbers():
yield 1; yield 2; yield 3; yield 5; yield 7
the_nums = yield_numbers()
total_nums = exhaustive_len(the_nums)
for num in the_nums:
print num
because exhaustive_len has already consumed all the elements.
EDIT: Ah in that case you would use exhaustive_len(open("file.txt")), as you have to process all lines in the file one-by-one to see how many there are, and it would be wasteful to store the entire file in memory by calling list.

Where to use yield in Python best?

I know how yield works. I know permutation, think it just as a math simplicity.
But what's yield's true force? When should I use it? A simple and good example is better.
yield is best used when you have a function that returns a sequence and you want to iterate over that sequence, but you do not need to have every value in memory at once.
For example, I have a python script that parses a large list of CSV files, and I want to return each line to be processed in another function. I don't want to store the megabytes of data in memory all at once, so I yield each line in a python data structure. So the function to get lines from the file might look something like:
def get_lines(files):
for f in files:
for line in f:
#preprocess line
yield line
I can then use the same syntax as with lists to access the output of this function:
for line in get_lines(files):
#process line
but I save a lot of memory usage.
Simply put, yield gives you a generator. You'd use it where you would normally use a return in a function. As a really contrived example cut and pasted from a prompt...
>>> def get_odd_numbers(i):
... return range(1, i, 2)
...
>>> def yield_odd_numbers(i):
... for x in range(1, i, 2):
... yield x
...
>>> foo = get_odd_numbers(10)
>>> bar = yield_odd_numbers(10)
>>> foo
[1, 3, 5, 7, 9]
>>> bar
<generator object yield_odd_numbers at 0x1029c6f50>
>>> next(bar)
1
>>> next(bar)
3
>>> next(bar)
5
As you can see, in the first case foo holds the entire list in memory at once. It's not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called. In the second case, bar just gives you a generator. A generator is an iterable--which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object "remembers" where it was in the looping the last time you called it--this way, if you're using an iterable to (say) count to 50 billion, you don't have to count to 50 billion all at once and store the 50 billion numbers to count through. Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)
This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.
Further reading:
python wiki http://wiki.python.org/moin/Generators
PEP on generators http://www.python.org/dev/peps/pep-0255/
Another use is in a network client. Use 'yield' in a generator function to round-robin through multiple sockets without the complexity of threads.
For example, I had a hardware test client that needed to send a R,G,B planes of an image to firmware. The data needed to be sent in lockstep: red, green, blue, red, green, blue. Rather than spawn three threads, I had a generator that read from the file, encoded the buffer. Each buffer was a 'yield buf'. End of file, function returned and I had end-of-iteration.
My client code looped through the three generator functions, getting buffers until end-of-iteration.
I'm reading Data Structures and Algorithms in Python
There is a fibonacci function using yield. I think it's the best moment to use yield.
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a+b
you can use this like:
gen = fibonacci()
for i, f in enumerate(gen):
print(i, f)
if i >= 100: break
So, I think, maybe, when the next element is depending on previous elements, e.g., digital filters, it's time to use yield.

Categories

Resources