Using RxPY for illustration purposes.
I want to create an observable from a function, but that function must take parameters. This particular example must return, at random intervals, one of many pre-defined tickers which I want to send to it. My solution thus far is to use a closure:
from __future__ import print_function
from rx import Observable
import random
import string
import time
def make_tickers(n = 300, s = 123):
""" generates up to n unique 3-letter strings geach makde up of uppsercase letters"""
random.seed(s)
tickers = [''.join(random.choice(string.ascii_uppercase) for _ in range(3)) for y in range(n)]
tickers = list(set(tickers)) # unique
print(len(tickers))
return(tickers)
def spawn_prices_fn(tickers):
""" returns a function that will return a random element
out of tickers every 20-100 ms, and takes an observable parameter """
def spawner(observer):
while True:
next_tick = random.choice(tickers)
observer.on_next(next_tick)
time.sleep(random.randint(20, 100)/1000.0)
return(spawner)
if __name__ == "__main__":
spawned = spawn_prices_fn(make_tickers())
xx = Observable.create(spawned)
xx.subscribe(lambda s: print(s))
Is there a simpler way? Can further parameters be sent to Observable.create's first paramater function, that don't require a closure? What is the canonical advice?
It can be done in numerous ways, here's one of the solutions that doesn't change your code too much.
Note that tickers generation could also be broken up into a function generating a single string combined with some rx magic to be more rx-like
I also slightly adjusted the code to make flake8 happy
from __future__ import print_function
import random
import string
import time
from rx import Observable
def make_tickers(n=300, s=123):
"""
Generates up to n unique 3-letter strings each made up of uppercase letters
"""
random.seed(s)
tickers = [''.join(random.choice(string.ascii_uppercase) for _ in range(3))
for y in range(n)]
tickers = list(set(tickers)) # unique
print(len(tickers))
return(tickers)
def random_picker(tickers):
ticker = random.choice(tickers)
time.sleep(random.randint(20, 100) / 1000.0)
return ticker
if __name__ == "__main__":
xx = Observable\
.repeat(make_tickers())\
.map(random_picker)\
.subscribe(lambda s: print(s))
or a solution without make_tickers:
from __future__ import print_function
import random
import string
import time
from rx import Observable
def random_picker(tickers):
ticker = random.choice(tickers)
time.sleep(random.randint(20, 100) / 1000.0)
return ticker
if __name__ == "__main__":
random.seed(123)
Observable.range(1, 300)\
.map(lambda _: ''.join(random.choice(string.ascii_uppercase)
for _ in range(3)))\
.reduce(lambda x, y: x + [y], [])\
.do_while(lambda _: True)\
.map(random_picker)\
.subscribe(lambda s: print(s))
time.sleep could be moved away from random_picker but the code would become a bit trickier
You can also use "partials", to wrap your Subscription method. It allows you to define other arguments, but call rx.create on a method that waits only for Observer and Scheduler:
def my_subscription_with_arguments(observer, scheduler, arg1):
observer.on_next(arg1)
my_subscription_wrapper = functools.partial(my_subscription_with_arguments, arg1='hello')
source = rx.create(my_subscription_wrapper)
Related
For the following toy example, I am attempting to parallelize some nested for loops using dask delayed/compute. Is there any way I can visualize the task graph for the following?
import time
from dask import compute, delayed
#delayed
def child(val):
time.sleep(1)
return val
#delayed
def p1(val):
futs = []
for i in range(5):
futs += [child(val * i)]
return compute(*futs)
#delayed
def p2(val):
futs = []
for i in range(10):
futs += [p1(val * i)]
return compute(*futs)
#delayed
def p3(val):
futs = []
for i in range(30):
futs += [p2(val * i)]
return futs
if __name__ == "__main__":
f = p3(10)
f.visualize()
For example, when I call the .visualize method on any of the delayed functions it returns just one level(node?) but none of the previous branches and functions. For instance p3(10).visualize() returns
p3 task graph
Perhaps I am using dask.delayed improperly here?
Building off Sultan's example above visualize(p3(10)) returns the following task graph
Instead if you modify the return to be a sum instead of a list:
import time
from dask import compute, delayed, visualize
#delayed
def child(val):
time.sleep(1)
return val
def p1(val):
return sum([child(val * i) for i in range(2)])
def p2(val):
return sum([p1(val * i) for i in range(3)])
def p3(val):
return sum([p2(val * i) for i in range(4)])
It returns the following task graph
Perhaps my question should have been, what the blank boxes in the task graph represent?
dask.visualize will show the task DAG, however without evaluating the contents of a delayed task, dask will not know what to plot (since the results are delayed). Running compute within the delayed function doesn't resolve this, since this will be done only once the task itself is evaluated.
Referring to the best practices, you will want to avoid calling delayed within delayed.
The snippet below shows one way to modify the script:
import time
from dask import compute, delayed, visualize
#delayed
def child(val):
time.sleep(1)
return val
def p1(val):
return [child(val * i) for i in range(2)]
def p2(val):
return [p1(val * i) for i in range(3)]
def p3(val):
return [p2(val * i) for i in range(4)]
if __name__ == "__main__":
f = p3(10)
visualize(f)
# by default the dag will be saved into mydask.png
I have a dataframe, where each row contains a list of integers. I also have a reference-list that I use to check what integers in the dataframe appear in this list.
I have made two implementations of this, one single-threaded and one multi-threaded. The single-threaded implementation is quite fast (takes roughly 0.1s on my machine), whereas the multithreaded takes roughly 5s.
My question is: Is this due to my implementation being poor, or is this merely a case where the overhead due to multithreading is so large that it doesn't make sense to use multiple threads?
The example is below:
import time
from random import randint
import pandas as pd
import multiprocessing
from functools import partial
class A:
def __init__(self, N):
self.ls = [[randint(0, 99) for i in range(20)] for j in range(N)]
self.ls = pd.DataFrame({'col': self.ls})
self.lst_nums = [randint(0, 99) for i in range(999)]
#classmethod
def helper(cls, lst_nums, col):
return any([s in lst_nums for s in col])
def get_idx_method1(self):
method1 = self.ls['col'].apply(lambda nums: any(x in self.lst_nums for x in nums))
return method1
def get_idx_method2(self):
pool = multiprocessing.Pool(processes=1)
method2 = pool.map(partial(A.helper, self.lst_nums), self.ls['col'])
pool.close()
return method2
if __name__ == "__main__":
a = A(50000)
start = time.time()
m1 = a.get_idx_method1()
end = time.time()
print(end-start)
start = time.time()
m2 = a.get_idx_method2()
end = time.time()
print(end - start)
First of all, multiprocessing is useful when the cost of data communication between the main process and the others is less comparable to the time cost of the function.
Another thing is that you made an error in your code:
def helper(cls, lst_nums, col):
return any([s in lst_nums for s in col])
VS
any(x in self.lst_nums for x in nums)
You have that list [] in the helper method, which will make the any() method to wait for the entire array to be computed, while the second any() will just stop at the first True value.
In conclusion if you remove list brackets from the helper method and maybe increase the randint range for lst_nums initializer, you will notice an increase in speed when using multiple processes.
self.lst_nums = [randint(0, 10000) for i in range(999)]
and
def helper(cls, lst_nums, col):
return any(s in lst_nums for s in col)
I am having a problem with measuring the time of a function.
My function is a "linear search":
def linear_search(obj, item,):
for i in range(0, len(obj)):
if obj[i] == item:
return i
return -1
And I made another function that measures the time 100 times and adds all the results to a list:
def measureTime(a):
nl=[]
import random
import time
for x in range(0,100): #calculating time
start = time.time()
a
end =time.time()
times=end-start
nl.append(times)
return nl
When I'm using measureTime(linear_search(list,random.choice(range(0,50)))), the function always returns [0.0].
What can cause this problem? Thanks.
you are actually passing the result of linear_search into function measureTime, you need to pass in the function and arguments instead for them to be execute inside measureTime function like #martijnn2008 answer
Or better wise you can consider using timeit module to to the job for you
from functools import partial
import timeit
def measureTime(n, f, *args):
# return average runtime for n number of times
# use a for loop with number=1 to get all individual n runtime
return timeit.timeit(partial(f, *args), number=n)
# running within the module
measureTime(100, linear_search, list, random.choice(range(0,50)))
# if running interactively outside the module, use below, lets say your module name mymodule
mymodule.measureTime(100, mymodule.linear_search, mymodule.list, mymodule.random.choice(range(0,50)))
Take a look at the following example, don't know exactly what you are trying to achieve so I guessed it ;)
import random
import time
def measureTime(method, n, *args):
start = time.time()
for _ in xrange(n):
method(*args)
end = time.time()
return (end - start) / n
def linear_search(lst, item):
for i, o in enumerate(lst):
if o == item:
return i
return -1
lst = [random.randint(0, 10**6) for _ in xrange(10**6)]
repetitions = 100
for _ in xrange(10):
item = random.randint(0, 10**6)
print 'average runtime =',
print measureTime(linear_search, repetitions, lst, item) * 1000, 'ms'
I need to generate random 32-digit number along with 15-character string to get something like 09826843-5112-8345-7619-372151470268 and qcRtAhieRabnpUaQ. I use following code to generate number:
import random
"-".join(['%08d' % random.randrange(0, 10e7),
'%04d' % random.randrange(0, 10e3),
'%04d' % random.randrange(0, 10e3),
'%04d' % random.randrange(0, 10e3),
'%012d' % random.randrange(0, 10e11)])
Is there a similar way to create case insensitive 15-char string with just random module?
import random
import string
''.join(random.sample(string.ascii_letters, 15))
import uuid
str(uuid.uuid4())
import numpy as np
import random
import string
def random_string(num_chars, symbols):
return "".join(random.choice(symbols)
for _ in range(num_chars))
def random_string2(num_chars, symbols, replace=True):
"""Random string with replacement option"""
symbols = np.asarray(list(symbols))
return "".join(np.random.choice(symbols, num_chars, replace))
def main():
print(random_string(15, string.ascii_letters))
print(random_string2(15, string.ascii_letters, False))
print(random_string2(15, string.ascii_letters, True))
if __name__ == "__main__":
main()
Note that elements in string need not be unique (which I presume to be the case since "qcRtAhieRabnpUaQ" has 2 'a').
If you want the elements to be unique, then #Sergey Gornostaev's solution is probably the most elegant, but that will impose the number of unique elements in ascii_letters as the longest string you can generate.
import random
rand_cap = [chr(random.randint(65, 90)) for i in range(7)]
rand_small = [chr(random.randint(97, 122)) for i in range(7)]
rand_chars_list = rand_cap + rand_small
random.shuffle(rand_chars_list)
rand_chars = ''.join(rand_chars_list)
I have a function f(x) that takes as input a list x of 100 random floats between 0 and 1. Different lists will result in different running times of f.
I want to find out how long f takes to run on average, over a large number of different random lists. What's the best way to do this? Should I use timeit and if so is there a way I can do this without including the time it takes to generate each random list in each trial?
This is how I would do it without timeit (pseudocode):
for i = 1 to 10000:
x = random list
start = current time
f(x)
end = current time
results.append(end - start)
return mean(results)
You can make a timer decorator:
Here is some example code:
from time import time
class Timer(object):
def __init__(self, func):
"""
Decorator that times a function
#param func: Function being decorated
#type func: callable
"""
self.func = func
def __call__(self, *args, **kwargs):
start = time()
self.func(*args, **kwargs)
end = time()
return end - start
#Timer
def cheese():
for var in xrange(9999999):
continue
for var in xrange(100):
print cheese()
Working example, with fewer loops.
import timeit, random
def summer(myList):
result = 0
for num in myList:
result += num
return result
for i in range(10):
x = [random.randint(0, 100) for i in range(100000)]
print timeit.timeit("summer(x)", setup="from __main__ import x, summer", number = 100)
You can import the variable using from __main__ import x
I think this does the trick. It will execute the setup once per repeat and then execute stmt number=1 time. However I don't think this is that much better than the simple loop you posted.
import timeit
stmt = '[x*x*x for x in xrange(n)]' # just an example
setup = 'import random; n = random.randint(10, 100)'
r = 10000
times = timeit.repeat(stmt, setup, repeat=r, number=1)
print min(times), max(times), sum(times)/r
There is also a "cell mode" that you can use with timeit in the IPython shell, but it only returns the fasted time and there is no easy way to change that (?).
import random
%%timeit -r 10000 -n 1 n = random.randint(10,100)
var = [x*x*x for x in xrange(n)]