Related
Intro:
Hello. I am exploring the python rxpy library for my use case - where I am building an execution pipeline using the reactive programming concepts. This way I expect I would not have to manipulate too many states. Though my solution seems to be functional, but I am having trouble trying to compose a new Observable from other Observables.
The problem is that the way I am composing my observables is causing some expensive calculations to be repeated twice. For performance, I really want to prevent triggering expensive calculations.
I am very new the reactive programming. Trying to scratch my head and have looked through internet resources and reference documentation - seems a little too terse for me to grasp. Please advice.
Following is a toy example which illustrates what I am doing:
import rx
from rx import operators as op
from rx.subject import Subject
root = Subject()
foo = root.pipe(
op.map( lambda x : x + 1 ),
op.do_action(lambda r: print("foo(x) = %s (expensive)" % str(r)))
)
bar_foo = foo.pipe(
op.map( lambda x : x * 2 ),
op.do_action(lambda r: print("bar(foo(x)) = %s" % str(r)))
)
bar_foo.pipe(
op.zip(foo),
op.map(lambda i: i[0]+i[1]),
op.do_action(lambda r: print("foo(x) + bar(foo(x)) = %s" % str(r)))
).subscribe()
print("-------------")
root.on_next(10)
print("-------------")
Output:
-------------
foo(x) = 11 (expensive)
bar(foo(x)) = 22
foo(x) = 11 (expensive)
foo(x) + bar(foo(x)) = 33
-------------
You could think of foo() and bar() to be expensive and complex operations. I first build an observable foo. Then compose a new observable bar_foo that incorporates foo. Later both are zipped together to calculate the final result foo(x)+bar(foo(x)).
Question:
What can I do to prevent foo() from getting triggered more than once for a single input?
I have really strong reasons to keep foo() and bar() separate. Also I also do not want to explicitly memoize foo().
Anyone with experience using rxpy in production could share their experiences. Will using rxpy lead to better performance or slowdowns as compared to an equivalent hand crafted (but unmaintainable) code?
Adding op.share() right after the expensive calculation in the foo pipeline could be useful here. So changing the foo pipeline to:
foo = root.pipe(
op.map( lambda x : x + 1 ),
op.do_action(lambda r: print("foo(x) = %s (expensive)" % str(r))),
op.share() # added to pipeline
)
will result in:
-------------
foo(x) = 11 (expensive)
bar(foo(x)) = 22
foo(x) + bar(foo(x)) = 33
-------------
I believe that .share() makes the emitted events of the expensive operation being shared among downstream subscribers, so that the result of a single expensive calculation can be used multiple times.
Regarding your second question; I am new to RxPy as well, so interested in the answer of more experienced users. Until now, I've noticed that as a beginner you can easily create (bad) pipelines where messages and calculations are repeated in the background. .share() seems to reduce this to some extend, but not sure about what is happening in the background.
I was profiling Erlang's lists:reverse Built in Function (BIF) to see how well it scales with the size of the input. More specifically, I tried:
1> X = lists:seq(1, 1000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
2> timer:tc(lists, reverse, [X]).
{57737,
[1000000,999999,999998,999997,999996,999995,999994,999993,
999992,999991,999990,999989,999988,999987,999986,999985,
999984,999983,999982,999981,999980,999979,999978,999977,
999976,999975,999974|...]}
3> timer:tc(lists, reverse, [X]).
{46896,
[1000000,999999,999998,999997,999996,999995,999994,999993,
999992,999991,999990,999989,999988,999987,999986,999985,
999984,999983,999982,999981,999980,999979,999978,999977,
999976,999975,999974|...]}
4> Y = lists:seq(1, 10000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
5> timer:tc(lists, reverse, [Y]).
{434079,
[10000000,9999999,9999998,9999997,9999996,9999995,9999994,
9999993,9999992,9999991,9999990,9999989,9999988,9999987,
9999986,9999985,9999984,9999983,9999982,9999981,9999980,
9999979,9999978,9999977,9999976,9999975,9999974|...]}
6> timer:tc(lists, reverse, [Y]).
{214173,
[10000000,9999999,9999998,9999997,9999996,9999995,9999994,
9999993,9999992,9999991,9999990,9999989,9999988,9999987,
9999986,9999985,9999984,9999983,9999982,9999981,9999980,
9999979,9999978,9999977,9999976,9999975,9999974|...]}
Ok, so far it seems like the reverse BIF scales in approximately linear time with respect to the input (e.g. multiply the size of the input by 10 and the size of time taken also increases by a factor of 10). In pure Erlang that would make sense since we would use something like tail recursion to reverse the list. I guess that even as a BIF implemented in C, the algorithm for reversing seems a list to be the same (maybe because of the way lists are just represented in Erlang?).
Now I wanted to compare this with something another language - perhaps another dynamically typed language that I already use. So I tried a similar thing in Python - taking care to, very explicitly, use actual lists instead of generators which I anticipate would affect the performance of Python positively in this test, giving it an unfair advantage.
import time
ms_conv_factor = 10**6
def profile(func, *args):
start = time.time()
func(args)
end = time.time()
elapsed_seconds = end - start
print(elapsed_seconds * ms_conv_factor, flush=True)
x = list([i for i in range(0, 1000000)])
y = list([i for i in range(0, 10000000)])
z = list([i for i in range(0, 100000000)])
def f(m):
return m[::-1]
def g(m):
return reversed(m)
if __name__ == "__main__":
print("All done loading the lists, starting now.", flush=True)
print("f:")
profile(f, x)
profile(f, y)
print("")
profile(f, x)
profile(f, y)
print("")
profile(f, z)
print("")
print("g:")
profile(g, x)
profile(g, y)
print("")
profile(g, x)
profile(g, y)
print("")
profile(g, z)
This seems to suggest that after the function has been loaded and run once, the length of the input makes no difference and the reversal times are incredibly fast - in the range of ~0.7µs.
Exact result:
All done loading the lists, starting now.
f:
1.430511474609375
0.7152557373046875
0.7152557373046875
0.2384185791015625
0.476837158203125
g:
1.9073486328125
0.7152557373046875
0.2384185791015625
0.2384185791015625
0.476837158203125
My first, naive, guess was that python might be able to recognize the reverse construct and create something like a reverse iterator and return that (Python can work with references right? Maybe it was using some kind of optimization here). But I don't think that theory makes sense since the original list and the returned list are not the same (changing one shouldn't change the other).
So my question(s) here is(are):
Is my profiling technique here flawed? Have I written the tests in a way that favor one language over the other?
What is the difference in implementation of lists and their reversal in Erlang vs Python that make this situation (of Python being WAY faster) possible?
Thanks for your time (in advance).
This seems to suggest that after the function has been loaded and run
once, the length of the input makes no difference and the reversal
times are incredibly fast - in the range of ~0.7µs.
Because your profiling function is incorrect. It accepts variable positional arguments, but when it passes them to the function, it doesn't unpack them so you are only ever working with a tuple of length one. You need to do the following:
def profile(func, *args):
start = time.time()
func(*args) # Make sure to unpack the args!
end = time.time()
elapsed_seconds = end - start
print(elapsed_seconds * ms_conv_factor, flush=True)
So notice the difference:
>>> def foo(*args):
... print(args)
... print(*args)
...
>>> foo(1,2,3)
(1, 2, 3)
1 2 3
Also note, reversed(m) creates a reversed iterator, so it doesn't actually do anything until you iterate over it. So g will still be constant time.
But rest assured, reversing a list in Python takes linear time.
My first post:
Before beginning, I should note I am relatively new to OOP, though I have done DB/stat work in SAS, R, etc., so my question may not be well posed: please let me know if I need to clarify anything.
My question:
I am attempting to import and parse large CSV files (~6MM rows and larger likely to come). The two limitations that I've run into repeatedly have been runtime and memory (32-bit implementation of Python). Below is a simplified version of my neophyte (nth) attempt at importing and parsing in reasonable time. How can I speed up this process? I am splitting the file as I import and performing interim summaries due to memory limitations and using pandas for the summarization:
Parsing and Summarization:
def ParseInts(inString):
try:
return int(inString)
except:
return None
def TextToYearMo(inString):
try:
return 100*inString[0:4]+int(inString[5:7])
except:
return 100*inString[0:4]+int(inString[5:6])
def ParseAllElements(elmValue,elmPos):
if elmPos in [0,2,5]:
return elmValue
elif elmPos == 3:
return TextToYearMo(elmValue)
else:
if elmPos == 18:
return ParseInts(elmValue.strip('\n'))
else:
return ParseInts(elmValue)
def MakeAndSumList(inList):
df = pd.DataFrame(inList, columns = ['x1','x2','x3','x4','x5',
'x6','x7','x8','x9','x10',
'x11','x12','x13','x14'])
return df[['x1','x2','x3','x4','x5',
'x6','x7','x8','x9','x10',
'x11','x12','x13','x14']].groupby(
['x1','x2','x3','x4','x5']).sum().reset_index()
Function Calls:
def ParsedSummary(longString,delimtr,rowNum):
keepColumns = [0,3,2,5,10,9,11,12,13,14,15,16,17,18]
#Do some other stuff that takes very little time
return [pse.ParseAllElements(longString.split(delimtr)[i],i) for i in keepColumns]
def CSVToList(fileName, delimtr=','):
with open(fileName) as f:
enumFile = enumerate(f)
listEnumFile = set(enumFile)
for lineCount, l in enumFile:
pass
maxSplit = math.floor(lineCount / 10) + 1
counter = 0
Summary = pd.DataFrame({}, columns = ['x1','x2','x3','x4','x5',
'x6','x7','x8','x9','x10',
'x11','x12','x13','x14'])
for counter in range(0,10):
startRow = int(counter * maxSplit)
endRow = int((counter + 1) * maxSplit)
includedRows = set(range(startRow,endRow))
listOfRows = [ParsedSummary(row,delimtr,rownum)
for rownum, row in listEnumFile if rownum in includedRows]
Summary = pd.concat([Summary,pse.MakeAndSumList(listOfRows)])
listOfRows = []
counter += 1
return Summary
(Again, this is my first question - so I apologize if I simplified too much or, more likely, too little, but I am at a loss as to how to expedite this.)
For runtime comparison:
Using Access I can import, parse, summarize, and merge several files in this size-range in <5 mins (though I am right at its 2GB lim). I'd hope I can get comparable results in Python - presently I'm estimating ~30 min run time for one file. Note: I threw something together in Access' miserable environment only because I didn't have admin rights readily available to install anything else.
Edit: Updated parsing code. Was able to shave off five minutes (est. runtime at 25m) by changing some conditional logic to try/except. Also - runtime estimate doesn't include pandas portion - I'd forgotten I'd commented that out while testing, but its impact seems negligible.
If you want to optimize performance, don't roll your own CSV reader in Python. There is already a standard csv module. Perhaps pandas or numpy have faster csv readers; I'm not sure.
From https://softwarerecs.stackexchange.com/questions/7463/fastest-python-library-to-read-a-csv-file:
In short, pandas.io.parsers.read_csv beats everybody else, NumPy's loadtxt is impressively slow and NumPy's from_file and load impressively fast.
If I want to use only the index within a loop, should I better use the range/xrange function in combination with len()
a = [1,2,3]
for i in xrange(len(a)):
print i
or enumerate? Even if I won't use p at all?
for i,p in enumerate(a):
print i
I would use enumerate as it's more generic - eg it will work on iterables and sequences, and the overhead for just returning a reference to an object isn't that big a deal - while xrange(len(something)) although (to me) more easily readable as your intent - will break on objects with no support for len...
Using xrange with len is quite a common use case, so yes, you can use it if you only need to access values by index.
But if you prefer to use enumerate for some reason, you can use underscore (_), it's just a frequently seen notation that show you won't use the variable in some meaningful way:
for i, _ in enumerate(a):
print i
There's also a pitfall that may happen using underscore (_). It's also common to name 'translating' functions as _ in i18n libraries and systems, so beware to use it with gettext or some other library of such kind (thnks to #lazyr).
That's a rare requirement – the only information used from the container is its length! In this case, I'd indeed make this fact explicit and use the first version.
xrange should be a little faster, but enumerate will mean you don't need to change it when you realise that you need p afterall
I ran a time test and found out range is about 2x faster than enumerate. (on python 3.6 for Win32)
best of 3, for len(a) = 1M
enumerate(a): 0.125s
range(len(a)): 0.058s
Hope it helps.
FYI: I initialy started this test to compare python vs vba's speed...and found out vba is actually 7x faster than range method...is it because of my poor python skills?
surely python can do better than vba somehow
script for enumerate
import time
a = [0]
a = a * 1000000
time.perf_counter()
for i,j in enumerate(a):
pass
print(time.perf_counter())
script for range
import time
a = [0]
a = a * 1000000
time.perf_counter()
for i in range(len(a)):
pass
print(time.perf_counter())
script for vba (0.008s)
Sub timetest_for()
Dim a(1000000) As Byte
Dim i As Long
tproc = Timer
For i = 1 To UBound(a)
Next i
Debug.Print Timer - tproc
End Sub
I wrote this because I wanted to test it.
So it depends if you need the values to work with.
Code:
testlist = []
for i in range(10000):
testlist.append(i)
def rangelist():
a = 0
for i in range(len(testlist)):
a += i
a = testlist[i] + 1 # Comment this line for example for testing
def enumlist():
b = 0
for i, x in enumerate(testlist):
b += i
b = x + 1 # Comment this line for example for testing
import timeit
t = timeit.Timer(lambda: rangelist())
print("range(len()):")
print(t.timeit(number=10000))
t = timeit.Timer(lambda: enumlist())
print("enum():")
print(t.timeit(number=10000))
Now you can run it and will get most likely the result, that enum() is faster.
When you comment the source at a = testlist[i] + 1 and b = x + 1 you will see range(len()) is faster.
For the code above I get:
range(len()):
18.766527627612255
enum():
15.353173553868345
Now when commenting as stated above I get:
range(len()):
8.231641875551514
enum():
9.974262515773656
Based on your sample code,
res = [[profiel.attr[i].x for i,p in enumerate(profiel.attr)] for profiel in prof_obj]
I would replace it with
res = [[p.x for p in profiel.attr] for profiel in prof_obj]
Just use range(). If you're going to use all the indexes anyway, xrange() provides no real benefit (unless len(a) is really large). And enumerate() creates a richer datastructure that you're going to throw away immediately.
I profiled my python program and found that the following function was taking too long to run. Perhaps, I can use a different algorithm and make it run faster. However, I have read that I can also possibly increase the speed by reducing function calls, especially when it gets called repeatedly within a loop. I am a python newbie and would like to learn how to do this and see how much faster it can get. Currently, the function is:
def potentialActualBuyers(setOfPeople,theCar,price):
count=0
for person in setOfPeople:
if person.getUtility(theCar) >= price and person.periodCarPurchased==None:
count += 1
return count
where setOfPeople is a list of person objects. I tried the following:
def potentialActualBuyers(setOfPeople,theCar,price):
count=0
Utility=person.getUtility
for person in setOfPeople:
if Utility(theCar) >= price and person.periodCarPurchased==None:
count += 1
return count
This, however, gives me an error saying local variable 'person' referenced before assignment
Any suggestions, how I can reduce function calls or any other changes that can make the code faster.
Again, I am a python newbie and even though I may possibly be able to use a better algorithm, it is still worthwhile learning the answer to the above question.
Thanks very much.
***** EDIT *****
Adding the getUtility method:
def getUtility(self,theCar):
if theCar in self.utility.keys():
return self.utility[theCar]
else:
self.utility[theCar]=self.A*(math.pow(theCar.mpg,self.alpha))*(math.pow(theCar.hp,self.beta))*(math.pow(theCar.pc,self.gamma))
return self.utility[theCar]
***** EDIT: asking for new ideas *****
Any ideas how to speed this up further. I used the method suggested by Alex to cut the time in half. Can I speed this further?
Thanks.
I doubt you can get much speedup in this case by hoisting the lookup of person.getUtility (by class, not by instances, as other instances have pointed out). Maybe...:
return sum(1 for p in setOfPeople
if p.periodCarPurchased is None
and p.getUtility(theCar) >= price)
but I suspect most of the time is actually spent in the execution of getUtility (and possibly in the lookup of p.periodCarPurchased if that's some fancy property as opposed to a plain old attribute -- I moved the latter before the and just in case it is a plain attribute and can save a number of the getUtility calls). What does your profiling say wrt the fraction of time spent in this function (net of its calls to others) vs the method (and possibly property) in question?
Try instead (that's assuming all persons are of the same type Person):
Utility = Person.getUtility
for person in setOfPeople:
if Utility (person, theCar) >= ...
Also, instead of == None using is None should be marginally faster. Try if swapping and terms helps.
Methods are just functions bound to an object:
Utility = Person.getUtility
for person in setOfPeople:
if Utility(person, theCar) ...
This doesn't eliminate a function call though, it eliminates an attribute lookup.
This one line made my eyes bleed:
self.utility[theCar]=self.A*(math.pow(theCar.mpg,self.alpha))*(math.pow(theCar.hp,self.beta))*(math.pow(theCar.pc,self.gamma))
Let's make it legible and PEP8able and then see if it can be faster. First some spaces:
self.utility[theCar] = self.A * (math.pow(theCar.mpg, self.alpha)) * (math.pow(theCar.hp, self.beta)) * (math.pow(theCar.pc, self.gamma))
Now we can see there are very redundant parentheses; remove them:
self.utility[theCar] = self.A * math.pow(theCar.mpg, self.alpha) * math.pow(theCar.hp, self.beta) * math.pow(theCar.pc, self.gamma)
Hmmm: 3 lookups of math.pow and 3 function calls. You have three choices for powers: x ** y, the built-in pow(x, y[, z]), and math.pow(x, y). Unless you have good reason for using one of the others, it's best (IMHO) to choose x ** y; you save both the attribute lookup and the function call.
self.utility[theCar] = self.A * theCar.mpg ** self.alpha * theCar.hp ** self.beta * theCar.pc ** self.gamma
annnnnnd while we're here, let's get rid of the horizontal scroll-bar:
self.utility[theCar] = (self.A
* theCar.mpg ** self.alpha
* theCar.hp ** self.beta
* theCar.pc ** self.gamma)
A possibility that would require quite a rewrite of your existing code and may not help anyway (in Python) would be to avoid most of the power calculations by taking logs everywhere and working with log_utility = log_A + log_mpg * alpha ...