Pythonic method of iterating over subsets of a range of values? - python

I have a function which resembles:
def long_running_with_more_values(start, stop):
headers = get_headers.delay(start, stop)
insert_to_db.delay(headers)
This function is batch processing data that is requested from the net in parallel.
get_headers + insert_to_db is firing off to the message stack and is processed in the end by celery workers, so is not blocking execution.
It has to process every number between start and stop, but can split this up into sections (ranges).
I've found that the operation get_headers is optimal when the range is ~20000 where range = (stop - start)
I want to know how I can split an arbitrary range into groups of 20000 and run each group through the function so I end up with the function being called multiple times with different start and stop values, but still covering the previous range in total.
so for starting values for start and stop of 1 and 100000 respectively i'd expect get_headers to be called 5 times with the following:
[1,20000][20001,40000][40001,60000][60001,80000][80001,100000]

def long_running_with_more_values(start, stop):
while start < stop:
if stop - start < 20000:
headers = get_headers.delay(start, stop)
break
else:
headers = get_headers.delay(start, start + 20000)
start += 20000
insert_to_db.delay(headers)
Notice that headers will only store the return value of the last call to get_headers.delay(). You might need to change the code to headers += get_headers.delay(start, stop). I can't really tell without knowing what the return value of the get_headers.delay() method is.

Related

Is there a way to have a function invoke 'continue' affecting a loop in its caller?

I am trying to refactor a code that consists of nested loops. For a similar repetitive code snippet, I am trying to write a function which consists of a 'continue' within. The loop is very long and I just want to include the if statement in the function.
def calc_indexes_for_rand_switch_on(self, switch_on, rand_time, max_free_spot, rand_window):
if np.any(self.daily_use[switch_on:rand_window[1]] != 0.001): # control to check if there are any other switch on times after the current one
next_switch = [switch_on + k[0] for k in np.where(self.daily_use[switch_on:] != 0.001)] # identifies the position of next switch on time and sets it as a limit for the duration of the current switch on
if (next_switch[0] - switch_on) >= self.func_cycle and max_free_spot >= self.func_cycle:
upper_limit = min((next_switch[0] - switch_on), min(rand_time, rand_window[1] - switch_on))
elif (next_switch[0] - switch_on) < self.func_cycle and max_free_spot >= self.func_cycle: # if next switch_on event does not allow for a minimum functioning cycle without overlapping, but there are other larger free spots, the cycle tries again from the beginning
continue
else:
upper_limit = next_switch[0] - switch_on # if there are no other options to reach the total time of use, empty spaces are filled without minimum cycle restrictions until reaching the limit
else:
upper_limit = min(rand_time, rand_window[1] - switch_on) # if there are no other switch-on events after the current one, the upper duration limit is set this way
return np.arange(switch_on, switch_on + (int(random.uniform(self.func_cycle, upper_limit)))) if upper_limit >= self.func_cycle \
else np.arange(switch_on, switch_on + upper_limit)
This is the function I am trying to write. The if statement is a part of bigger while loop in the main code. But, here this gives an error since there is no loop. How can I solve this?
You can return a sentinel value here, e.g. None and handle this in the caller.
So in your function:
elif (next_switch[0] - switch_on) < self.func_cycle and max_free_spot >= self.func_cycle: # if next switch_on event does not allow for a minimum functioning cycle without overlapping, but there are other larger free spots, the cycle tries again from the beginning
return None
And use it like:
while some_condition():
# do some stuff
...
result = calc_indexes_for_rand_switch_on(switch_on, rand_time, max_free_spot, rand_window)
if result is None:
continue
...
# do stuff with result

best way to call a Python method under multiple conditions

I have a workflow in which I need to call a python method when either of:
1. specified timeout occurs, or
2. size of input data (list) reaches a threshold, like 10 data points
What is the best way to support the workflow?
[Edit] - The methods would be called in a serverless API so it needs to be stateless. Does it make sense to use some sort of queues to store and retrieve the data and how?
You could do it like this:
while True: #Keep checking condition
if timeoutCondition or len([list]) > 10: #check conditions
class.method() #execute method
You can pull the status at a certain interval, if timeout or data reaches threshold, do the thing you want.
import time
max_count = 10
timeout = 60 #1min
count_increase = 0
next_timeout = time.time() + timeout
while True:
current_time = time.time
count_increase = get_count() - count
if count >= max_count or current_time >= next_timeout:
do_what_you_want()
count_increase = 0
next_timeout = time.time() + timeout
time.sleep(1)

How to count "times per second" in a correct way?

Goal: I would like to see how many times python is able to print something per 1 second.
For educational purposes I'm trying to make a script that shows how many times per every second a random module will appear in a loop. How to do it in a fastest pythonic way?
At first, to count seconds I wrote this code:
import time
sec = 0
while True:
print(sec)
time.sleep(1)
sec += 1
But this one seems slower than a real seconds.
So I decided to use local seconds. Also, before continue my script I wanted to count how many times python will print 'you fool' manually, so I wrote following code:
import time
def LocalSeconds():
local_seconds = time.gmtime()[5:-3]
local_seconds = int(local_seconds[0])
return local_seconds
while True:
print(LocalSeconds(), 'you fool')
Output:
first second - 14 times per second;
next second - 13 times;
next second - 12 times, etc. Why it goes slower?
Where I end / stuck right now:
import time, random
def RandomNumbers():
return random.randint(3,100)
def LocalSeconds():
local_seconds = time.gmtime()[5:-3]
local_seconds = int(local_seconds[0])
return local_seconds
def LocalSecondsWithPing():
local_seconds_ping = time.gmtime()[5:-3]
local_seconds_ping = int(local_seconds[0:1])
return local_seconds_ping
record_seconds = []
record_seconds_with_ping = []
while True:
record_seconds.append(LocalSeconds())
record_seconds_with_ping.append(LocalSecondsWithPing())
if record_seconds == record_seconds_with_ping:
RandomNumbers()
del record_seconds_with_ping[0]
del record_seconds[-1]
Also, I guess I need to use "for" loop, not "while"? How to do this script?
Counting a single second won't give you a good result. The number of prints in a single second may vary depending on things like other threads currently running on your system (for the OS or other programs) and may be influenced by other unknown factor.
Consider the followind code:
import calendar
import time
NumOfSeconds=100 #Number of seconds we average over
msg='Display this message' #Message to be displayed
count=0 #Number of time message will end up being displayed
#Time since the epoch in seconds + time of seconds in which we count
EndTime=calendar.timegm(time.gmtime()) + NumOfSeconds
while calendar.timegm(time.gmtime())<EndTime: #While we are not at the end point yet
print(msg) #Print message
count=count+1 #Count message printed
print(float(count)/NumOfSeconds) #Average number of prints per second
Here calendar.timegm(time.gmtime()) gives us the time in seconds since the epoch (if you don't know what that is, read this. But basically it's just a fixed point in time most computer system now days use as a reference point.
So we set the EndTime to that point + the number of seconds we want to average over. Then, in a loop, we print the message we want to test and count the number of times we do that, between every iteration checking that we are not past the end time.
Finally we print the average number of times per seconds that we printed the message. This helps with the fact that we may end up start counting near the end of a whole second since the epoch, in which case we won't actually have a whole second to print messages, but just a fraction of that. If we make sure NumOfSeconds is large enough, that error addition will be small (for example, for NumOfSeconds=100 that error is ~1%).
We should note that the actual number would also depend on the fact that we are constantly testing the current time and counting the number of prints, however, while I haven't tested that here, it is usually the case that printing to the screen takes significantly longer time than those operations.

How to run a search that returns after a certain amount of time?

I have a function that runs an iterative deepening search and would like to return the value from the deepest search after a certain amount of time has passed. The code skeleton would look something like
import time
answers = []
START = time.clock()
current_depth = 1
while time.clock() - START < DESIRED_RUN_TIME:
answers.append(IDS(depth=current_depth))
current_depth += 1
return answers[-1]
The problem with this code is it will not return until after the time limit has passed. What is the best way to solve this? If I should just add time checks in the IDS function, how can I make sure to return the last value found? Any help would be greatly appreciated.
Your code should work unless IDS is blocking and takes a very long time. Then you have to wait until IDS is finished and the time limit may not be all that precise.
I'm not sure exactly what you mean by
would like to return the value from the deepest search after a certain amount of time has passed.
and
The problem with this code is it will not return until after the time limit has passed.
If you have time limits and you have update times then you can use this code as a generator.
import time
answers = []
START = time.clock()
current_depth = 1
def get_ids(update_time, limit_time):
last_update = time.clock()
while time.clock() - START < DESIRED_RUN_TIME:
answers.append(IDS(depth=current_depth))
current_depth += 1
if time.clock() - last_update < update_time:
last_update = time.clock()
yield answers[-1]
yield answers[-1]
for i in get_ids(1, 10): # get an ids every second and stop after 10 seconds
print(i)

Using return instead of yield

Is return better than yield? From what ive read it can be. In this case I am having trouble getting iteration from the if statement. Basically what the program does is take two points, a begin and end. If the two points are at least ten miles apart, it takes a random sample. The final if statement shown works for the first 20 miles from the begin point, begMi. nCounter.length = 10 and is a class member. So the question is, how can I adapt the code to where a return statement would work instead of a yield? Or is a yield statement fine in this instance?
def yielderOut(self):
import math
import random as r
for col in self.fileData:
corridor = str(col['CORRIDOR_CODE'])
begMi = float(col['BEGIN_MI'])
endMi = float(col['END_MI'])
roughDiff = abs(begMi - endMi)
# if the plain distance between two points is greater than length = 10
if roughDiff > nCounter.length:
diff = ((int(math.ceil(roughDiff/10.0))*10)-10)
if diff > 0 and (diff % 2 == 0 or diff % 3 == 0 or diff % 5 == 0)\
and ((diff % roughDiff) >= diff):
if (nCounter.length+begMi) < endMi:
vars1 = round(r.uniform(begMi,\
(begMi+nCounter.length)),nCounter.rounder)
yield corridor,begMi,endMi,'Output 1',vars1
if ((2*nCounter.length)+begMi) < endMi:
vars2 = round(r.uniform((begMi+nCounter.length),\
(begMi+ (nCounter.length*2))),nCounter.rounder)
yield corridor,begMi,endMi,'Output 2',vars1,vars2
So roughdiff equals the difference between two points and is rounded down to the nearest ten. Ten is then subtracted so the sample is taken from a full ten mile section; and that becomes diff. So lets say a roughDiff of 24 is rounded to 20, 20 - 10, diff + begin point = sample is taken from between mi 60 and 70 instead of between 70 and 80.
The program works, but I think it would be better if I used return instead of yield. Not a programmer.
return is not better, it's different. return says "I am done. Here is the result". yield says "here is the next value in a series of values"
Use the one that best expresses your intent.
Using yield makes your function a generator function, which means it will produce a series of values each time its (automatically created) next() method is called.
This is useful when you want to process things iteratively because it means you don't have to save all the results in a container and then process them. In addition, any preliminary work that is required before values can generated only has to be done once, because the generator created will resume execution of your code following the that last yield encountered — i.e. it effectively turns it into what is called a coroutine.
Generator functions quit when they return a value rather than yield one. This usually happens when execution "falls off the end" when it will return None by default.
From the looks of your code, I'd say using yield would be advantageous, especially if you can process the results incrementally. The alternative would be to have it store all the values in a container like a list and return that when it was finished.
I use yield in situations where I want to continue iteration on some object. However, if I wanted to make that function recursive, I'd use return.

Categories

Resources