I have a single-threaded Python 3 program that's CPU bound, the only IO is printing a couple of lines to output (no reading/writing of files).
On my desktop machine (AMD Ryzen 1700x 3.8 GHz, 16GB 3000 MHz DDR4) it performs (consistently) at 3400 episodes/second where a run takes around 60 seconds.
On my laptop (Intel i7-6600U 2.8 GHz, 16GB 2000 MHz DDR3) the performance is doubled at 7000 episodes/second, and a run coming in at just under 30 seconds.
Both machines run the same operating systems (Fedora 26) and the same python version (not built from source).
What's more, when profiling, there's a line showing
10.999 tottime, 28.814 cumtime for arrayprint.py:557(fillFormat)
but only when the code is run on the desktop. On the laptop, the particular function does not appear at all (and none of the arrayprint functions use more than 1 second tottime).
Not only is it strange that the performance differs between the machines, but no arrays or lists are ever printed to the screen, converted to strings, or saved to files during the execution of the program.
Here's the full profile for the desktop:
54499635 function calls (53787999 primitive calls) in 58.746 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
533727 0.359 0.000 0.514 0.000 <frozen importlib._bootstrap>:402(parent)
533727 0.469 0.000 0.697 0.000 <frozen importlib._bootstrap>:989(_handle_fromlist)
1 0.000 0.000 58.746 58.746 <string>:1(<module>)
4 0.000 0.000 0.000 0.000 __init__.py:120(getLevelName)
567524 0.237 0.000 0.727 0.000 __init__.py:1284(debug)
2 0.000 0.000 0.000 0.000 __init__.py:1296(info)
2 0.000 0.000 0.000 0.000 __init__.py:1308(warning)
2 0.000 0.000 0.000 0.000 __init__.py:1320(warn)
4 0.000 0.000 0.000 0.000 __init__.py:1374(findCaller)
4 0.000 0.000 0.000 0.000 __init__.py:1404(makeRecord)
4 0.000 0.000 0.000 0.000 __init__.py:1419(_log)
4 0.000 0.000 0.000 0.000 __init__.py:1444(handle)
4 0.000 0.000 0.000 0.000 __init__.py:1498(callHandlers)
567528 0.175 0.000 0.175 0.000 __init__.py:1528(getEffectiveLevel)
567528 0.315 0.000 0.490 0.000 __init__.py:1542(isEnabledFor)
4 0.000 0.000 0.000 0.000 __init__.py:157(<lambda>)
4 0.000 0.000 0.000 0.000 __init__.py:251(__init__)
4 0.000 0.000 0.000 0.000 __init__.py:329(getMessage)
4 0.000 0.000 0.000 0.000 __init__.py:387(usesTime)
4 0.000 0.000 0.000 0.000 __init__.py:390(format)
4 0.000 0.000 0.000 0.000 __init__.py:540(usesTime)
4 0.000 0.000 0.000 0.000 __init__.py:546(formatMessage)
4 0.000 0.000 0.000 0.000 __init__.py:562(format)
8 0.000 0.000 0.000 0.000 __init__.py:703(filter)
8 0.000 0.000 0.000 0.000 __init__.py:807(acquire)
8 0.000 0.000 0.000 0.000 __init__.py:814(release)
4 0.000 0.000 0.000 0.000 __init__.py:827(format)
4 0.000 0.000 0.000 0.000 __init__.py:850(handle)
4 0.000 0.000 0.000 0.000 __init__.py:969(flush)
4 0.000 0.000 0.000 0.000 __init__.py:980(emit)
289159 0.101 0.000 1.491 0.000 _methods.py:31(_sum)
533727 0.175 0.000 1.613 0.000 _methods.py:37(_any)
177909 0.862 0.000 33.737 0.000 arrayprint.py:237(_get_formatdict)
177909 0.370 0.000 34.214 0.000 arrayprint.py:273(_get_format_function)
177909 0.686 0.000 39.971 0.000 arrayprint.py:315(_array2string)
533727/177909 0.674 0.000 40.351 0.000 arrayprint.py:340(array2string)
1224652 0.960 0.000 1.554 0.000 arrayprint.py:467(_extendLine)
177909 1.671 0.000 4.320 0.000 arrayprint.py:475(_formatArray)
533727 0.682 0.000 29.496 0.000 arrayprint.py:543(__init__)
533727 10.999 0.000 28.814 0.000 arrayprint.py:557(fillFormat)
355336 1.600 0.000 5.432 0.000 arrayprint.py:589(<listcomp>)
2416068 2.677 0.000 3.832 0.000 arrayprint.py:642(_digits)
177909 0.720 0.000 2.378 0.000 arrayprint.py:652(__init__)
1224652 1.057 0.000 1.057 0.000 arrayprint.py:665(__call__)
533727 0.147 0.000 0.147 0.000 arrayprint.py:674(__init__)
177909 0.227 0.000 0.319 0.000 arrayprint.py:702(__init__)
177909 0.415 0.000 17.986 0.000 arrayprint.py:713(__init__)
177909 0.166 0.000 0.166 0.000 arrayprint.py:730(__init__)
177909 0.046 0.000 0.046 0.000 arrayprint.py:751(__init__)
1 0.000 0.000 0.000 0.000 enum.py:265(__call__)
1 0.000 0.000 0.000 0.000 enum.py:515(__new__)
1 0.000 0.000 0.000 0.000 enum.py:544(_missing_)
177909 0.206 0.000 0.206 0.000 enum.py:552(__str__)
177909 0.269 0.000 0.475 0.000 enum.py:564(__format__)
755408 0.248 0.000 0.366 0.000 enum.py:579(__hash__)
200000 0.037 0.000 0.037 0.000 enum.py:592(name)
27976 0.005 0.000 0.005 0.000 enum.py:597(value)
200524 0.443 0.000 0.641 0.000 eventgen.py:115(_push)
200001 0.492 0.000 0.885 0.000 eventgen.py:122(pop)
200000 0.892 0.000 1.036 0.000 eventgen.py:137(ce_str)
13988 0.017 0.000 0.034 0.000 eventgen.py:15(__lt__)
99676 0.168 0.000 0.911 0.000 eventgen.py:44(event_new)
79335 0.096 0.000 0.520 0.000 eventgen.py:52(event_end)
11689 0.078 0.000 0.261 0.000 eventgen.py:61(event_new_handoff)
9824 0.014 0.000 0.098 0.000 eventgen.py:90(event_end_handoff)
77441 0.295 0.000 0.380 0.000 eventgen.py:94(reassign)
177909 0.177 0.000 0.555 0.000 fromnumeric.py:1364(ravel)
200001 0.093 0.000 0.464 0.000 fromnumeric.py:1471(nonzero)
289159 0.542 0.000 2.148 0.000 fromnumeric.py:1710(sum)
533727 0.637 0.000 2.956 0.000 fromnumeric.py:1866(any)
200001 0.120 0.000 0.372 0.000 fromnumeric.py:55(_wrapfunc)
4 0.000 0.000 0.000 0.000 genericpath.py:117(_splitext)
49 0.000 0.000 0.001 0.000 grid.py:172(neighbors1)
49 0.001 0.000 0.001 0.000 grid.py:195(neighbors2)
533727/177909 0.311 0.000 40.472 0.000 numeric.py:1927(array_str)
1067454 1.724 0.000 4.091 0.000 numeric.py:2692(seterr)
1067454 1.466 0.000 1.603 0.000 numeric.py:2792(geterr)
533727 0.299 0.000 0.422 0.000 numeric.py:3085(__init__)
533727 0.411 0.000 2.588 0.000 numeric.py:3089(__enter__)
533727 0.461 0.000 2.374 0.000 numeric.py:3094(__exit__)
177909 0.064 0.000 0.151 0.000 numeric.py:463(asarray)
711636 0.223 0.000 0.503 0.000 numeric.py:534(asanyarray)
4 0.000 0.000 0.000 0.000 posixpath.py:119(splitext)
4 0.000 0.000 0.000 0.000 posixpath.py:142(basename)
4 0.000 0.000 0.000 0.000 posixpath.py:39(_get_sep)
6 0.000 0.000 0.000 0.000 posixpath.py:50(normcase)
4 0.000 0.000 0.000 0.000 process.py:137(name)
4 0.000 0.000 0.000 0.000 process.py:35(current_process)
1 0.000 0.000 0.000 0.000 signal.py:25(_int_to_enum)
2 0.000 0.000 0.000 0.000 signal.py:35(_enum_to_int)
1 0.000 0.000 0.000 0.000 signal.py:45(signal)
99627 0.062 0.000 0.062 0.000 stats.py:38(new)
20292 0.028 0.000 0.039 0.000 stats.py:42(new_rej)
88750 0.047 0.000 0.047 0.000 stats.py:48(end)
11623 0.005 0.000 0.005 0.000 stats.py:51(hoff_new)
1799 0.001 0.000 0.002 0.000 stats.py:54(hoff_rej)
22091 0.012 0.000 0.012 0.000 stats.py:58(rej)
200000 0.234 0.000 1.513 0.000 stats.py:64(iter)
1 0.000 0.000 0.000 0.000 stats.py:69(n_iter)
1 0.000 0.000 0.000 0.000 stats.py:86(endsim)
1 0.000 0.000 0.001 0.001 strats.py:189(get_init_action)
200000 1.070 0.000 49.964 0.000 strats.py:193(get_action)
177909 1.348 0.000 1.937 0.000 strats.py:220(execute_action)
200001 4.572 0.000 47.626 0.000 strats.py:243(optimal_ch)
89158 0.071 0.000 0.958 0.000 strats.py:299(reward)
89158 0.018 0.000 0.018 0.000 strats.py:308(discount)
1242355 0.944 0.000 0.944 0.000 strats.py:333(get_qval)
89158 0.160 0.000 0.160 0.000 strats.py:336(update_qval)
1 0.000 0.000 58.746 58.746 strats.py:40(init_sim)
1 1.271 1.271 58.745 58.745 strats.py:49(_simulate)
4 0.000 0.000 0.000 0.000 threading.py:1076(name)
4 0.000 0.000 0.000 0.000 threading.py:1230(current_thread)
227976 0.120 0.000 0.162 0.000 types.py:135(__get__)
177909 0.079 0.000 0.079 0.000 {built-in method _functools.reduce}
200001 0.192 0.000 0.222 0.000 {built-in method _heapq.heappop}
200524 0.084 0.000 0.088 0.000 {built-in method _heapq.heappush}
310143 0.064 0.000 0.064 0.000 {built-in method _operator.gt}
843054 0.152 0.000 0.152 0.000 {built-in method _operator.lt}
1 0.000 0.000 0.000 0.000 {built-in method _signal.signal}
8 0.000 0.000 0.000 0.000 {built-in method _thread.get_ident}
2 0.000 0.000 0.000 0.000 {built-in method _warnings.warn}
1 0.000 0.000 58.746 58.746 {built-in method builtins.exec}
200001 0.056 0.000 0.056 0.000 {built-in method builtins.getattr}
1067468 0.228 0.000 0.228 0.000 {built-in method builtins.hasattr}
755408 0.118 0.000 0.118 0.000 {built-in method builtins.hash}
467082 0.164 0.000 0.164 0.000 {built-in method builtins.isinstance}
533727 0.107 0.000 0.107 0.000 {built-in method builtins.issubclass}
10361766 1.076 0.000 1.076 0.000 {built-in method builtins.len}
533441 0.304 0.000 0.304 0.000 {built-in method builtins.max}
533923 0.198 0.000 0.198 0.000 {built-in method builtins.min}
889545 0.368 0.000 0.368 0.000 {built-in method numpy.core.multiarray.array}
111251 0.101 0.000 0.101 0.000 {built-in method numpy.core.multiarray.where}
2134908 0.377 0.000 0.377 0.000 {built-in method numpy.core.umath.geterrobj}
1067454 0.524 0.000 0.524 0.000 {built-in method numpy.core.umath.seterrobj}
14 0.000 0.000 0.000 0.000 {built-in method posix.fspath}
4 0.000 0.000 0.000 0.000 {built-in method posix.getpid}
4 0.000 0.000 0.000 0.000 {built-in method sys._getframe}
5 0.000 0.000 0.000 0.000 {built-in method time.time}
8 0.000 0.000 0.000 0.000 {method 'acquire' of '_thread.RLock' objects}
533727 0.299 0.000 1.912 0.000 {method 'any' of 'numpy.ndarray' objects}
875 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
16544 0.119 0.000 0.119 0.000 {method 'choice' of 'mtrand.RandomState' objects}
533727 0.872 0.000 0.872 0.000 {method 'compress' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
188835 0.641 0.000 0.641 0.000 {method 'exponential' of 'mtrand.RandomState' objects}
4 0.000 0.000 0.000 0.000 {method 'find' of 'str' objects}
4 0.000 0.000 0.000 0.000 {method 'flush' of '_io.TextIOWrapper' objects}
8 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
355818 0.069 0.000 0.069 0.000 {method 'item' of 'numpy.ndarray' objects}
200001 0.196 0.000 0.196 0.000 {method 'nonzero' of 'numpy.ndarray' objects}
533727 0.123 0.000 0.123 0.000 {method 'pop' of 'dict' objects}
11689 0.053 0.000 0.053 0.000 {method 'randint' of 'mtrand.RandomState' objects}
168494 0.128 0.000 0.128 0.000 {method 'random_sample' of 'mtrand.RandomState' objects}
177909 0.232 0.000 0.232 0.000 {method 'ravel' of 'numpy.ndarray' objects}
1889376 5.023 0.000 5.023 0.000 {method 'reduce' of 'numpy.ufunc' objects}
8 0.000 0.000 0.000 0.000 {method 'release' of '_thread.RLock' objects}
12 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects}
533727 0.155 0.000 0.155 0.000 {method 'rpartition' of 'str' objects}
4865786 1.100 0.000 1.100 0.000 {method 'rstrip' of 'str' objects}
8 0.000 0.000 0.000 0.000 {method 'write' of '_io.TextIOWrapper' objects}
And here's for the laptop:
27738517 function calls (26673571 primitive calls) in 28.612 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 28.612 28.612 <string>:1(<module>)
4 0.000 0.000 0.000 0.000 __init__.py:120(getLevelName)
566894 0.244 0.000 0.720 0.000 __init__.py:1284(debug)
2 0.000 0.000 0.000 0.000 __init__.py:1296(info)
2 0.000 0.000 0.000 0.000 __init__.py:1308(warning)
2 0.000 0.000 0.000 0.000 __init__.py:1320(warn)
4 0.000 0.000 0.000 0.000 __init__.py:1374(findCaller)
4 0.000 0.000 0.000 0.000 __init__.py:1404(makeRecord)
4 0.000 0.000 0.000 0.000 __init__.py:1419(_log)
4 0.000 0.000 0.000 0.000 __init__.py:1444(handle)
4 0.000 0.000 0.000 0.000 __init__.py:1498(callHandlers)
566898 0.166 0.000 0.166 0.000 __init__.py:1528(getEffectiveLevel)
566898 0.309 0.000 0.476 0.000 __init__.py:1542(isEnabledFor)
4 0.000 0.000 0.000 0.000 __init__.py:157(<lambda>)
4 0.000 0.000 0.000 0.000 __init__.py:251(__init__)
4 0.000 0.000 0.000 0.000 __init__.py:329(getMessage)
4 0.000 0.000 0.000 0.000 __init__.py:387(usesTime)
4 0.000 0.000 0.000 0.000 __init__.py:390(format)
4 0.000 0.000 0.000 0.000 __init__.py:540(usesTime)
4 0.000 0.000 0.000 0.000 __init__.py:546(formatMessage)
4 0.000 0.000 0.000 0.000 __init__.py:562(format)
8 0.000 0.000 0.000 0.000 __init__.py:703(filter)
8 0.000 0.000 0.000 0.000 __init__.py:807(acquire)
8 0.000 0.000 0.000 0.000 __init__.py:814(release)
4 0.000 0.000 0.000 0.000 __init__.py:827(format)
4 0.000 0.000 0.000 0.000 __init__.py:850(handle)
4 0.000 0.000 0.000 0.000 __init__.py:969(flush)
4 0.000 0.000 0.000 0.000 __init__.py:980(emit)
288946 0.112 0.000 1.643 0.000 _methods.py:31(_sum)
177491 0.330 0.000 0.330 0.000 arrayprint.py:256(_get_formatdict)
177491 0.169 0.000 3.542 0.000 arrayprint.py:259(<lambda>)
177491 0.465 0.000 4.419 0.000 arrayprint.py:299(_get_format_function)
177491 0.623 0.000 9.729 0.000 arrayprint.py:343(_array2string)
532473/177491 0.987 0.000 10.679 0.000 arrayprint.py:381(wrapper)
532473/177491 0.721 0.000 10.150 0.000 arrayprint.py:399(array2string)
1225350 0.971 0.000 1.470 0.000 arrayprint.py:527(_extendLine)
177491 1.458 0.000 3.920 0.000 arrayprint.py:535(_formatArray)
177491 0.768 0.000 3.373 0.000 arrayprint.py:712(__init__)
1225350 0.960 0.000 0.960 0.000 arrayprint.py:725(__call__)
1 0.000 0.000 0.000 0.000 enum.py:265(__call__)
1 0.000 0.000 0.000 0.000 enum.py:515(__new__)
1 0.000 0.000 0.000 0.000 enum.py:544(_missing_)
177491 0.209 0.000 0.209 0.000 enum.py:552(__str__)
177491 0.316 0.000 0.525 0.000 enum.py:564(__format__)
755255 0.238 0.000 0.352 0.000 enum.py:579(__hash__)
200000 0.039 0.000 0.039 0.000 enum.py:592(name)
28626 0.005 0.000 0.005 0.000 enum.py:597(value)
200505 0.443 0.000 0.643 0.000 eventgen.py:115(_push)
200001 0.474 0.000 0.863 0.000 eventgen.py:122(pop)
200000 0.834 0.000 0.983 0.000 eventgen.py:137(ce_str)
14313 0.017 0.000 0.035 0.000 eventgen.py:15(__lt__)
99673 0.186 0.000 0.939 0.000 eventgen.py:44(event_new)
78949 0.094 0.000 0.500 0.000 eventgen.py:52(event_end)
11887 0.078 0.000 0.261 0.000 eventgen.py:61(event_new_handoff)
9996 0.017 0.000 0.103 0.000 eventgen.py:90(event_end_handoff)
77374 0.284 0.000 0.364 0.000 eventgen.py:94(reassign)
177491 0.195 0.000 0.595 0.000 fromnumeric.py:1380(ravel)
200001 0.098 0.000 0.490 0.000 fromnumeric.py:1487(nonzero)
288946 0.590 0.000 2.352 0.000 fromnumeric.py:1730(sum)
200001 0.130 0.000 0.392 0.000 fromnumeric.py:55(_wrapfunc)
4 0.000 0.000 0.000 0.000 genericpath.py:117(_splitext)
49 0.000 0.000 0.001 0.000 grid.py:172(neighbors1)
49 0.001 0.000 0.001 0.000 grid.py:195(neighbors2)
532473/177491 0.365 0.000 10.826 0.000 numeric.py:1905(array_str)
177491 0.062 0.000 0.151 0.000 numeric.py:463(asarray)
177491 0.051 0.000 0.104 0.000 numeric.py:534(asanyarray)
4 0.000 0.000 0.000 0.000 posixpath.py:119(splitext)
4 0.000 0.000 0.000 0.000 posixpath.py:142(basename)
4 0.000 0.000 0.000 0.000 posixpath.py:39(_get_sep)
6 0.000 0.000 0.000 0.000 posixpath.py:50(normcase)
4 0.000 0.000 0.000 0.000 process.py:137(name)
4 0.000 0.000 0.000 0.000 process.py:35(current_process)
1 0.000 0.000 0.000 0.000 signal.py:25(_int_to_enum)
2 0.000 0.000 0.000 0.000 signal.py:35(_enum_to_int)
1 0.000 0.000 0.000 0.000 signal.py:45(signal)
99624 0.066 0.000 0.066 0.000 stats.py:38(new)
20675 0.028 0.000 0.040 0.000 stats.py:42(new_rej)
88545 0.045 0.000 0.045 0.000 stats.py:48(end)
11831 0.006 0.000 0.006 0.000 stats.py:51(hoff_new)
1835 0.001 0.000 0.002 0.000 stats.py:54(hoff_rej)
22510 0.013 0.000 0.013 0.000 stats.py:58(rej)
200000 0.261 0.000 1.490 0.000 stats.py:64(iter)
1 0.000 0.000 0.000 0.000 stats.py:69(n_iter)
1 0.000 0.000 0.000 0.000 stats.py:86(endsim)
1 0.000 0.000 0.000 0.000 strats.py:189(get_init_action)
200000 1.234 0.000 19.760 0.000 strats.py:193(get_action)
177490 1.294 0.000 1.860 0.000 strats.py:220(execute_action)
200001 3.897 0.000 17.128 0.000 strats.py:243(optimal_ch)
88945 0.074 0.000 1.112 0.000 strats.py:299(reward)
88945 0.017 0.000 0.017 0.000 strats.py:308(discount)
1241938 0.681 0.000 0.681 0.000 strats.py:333(get_qval)
88945 0.167 0.000 0.167 0.000 strats.py:336(update_qval)
1 0.000 0.000 28.612 28.612 strats.py:40(init_sim)
1 1.383 1.383 28.611 28.611 strats.py:49(_simulate)
4 0.000 0.000 0.000 0.000 threading.py:1076(name)
4 0.000 0.000 0.000 0.000 threading.py:1230(current_thread)
228626 0.122 0.000 0.166 0.000 types.py:135(__get__)
177491 0.075 0.000 0.075 0.000 {built-in method _functools.reduce}
200001 0.203 0.000 0.234 0.000 {built-in method _heapq.heappop}
200505 0.079 0.000 0.083 0.000 {built-in method _heapq.heappush}
320262 0.068 0.000 0.068 0.000 {built-in method _operator.gt}
832731 0.136 0.000 0.136 0.000 {built-in method _operator.lt}
1 0.000 0.000 0.000 0.000 {built-in method _signal.signal}
532481 0.090 0.000 0.090 0.000 {built-in method _thread.get_ident}
2 0.000 0.000 0.000 0.000 {built-in method _warnings.warn}
1 0.000 0.000 28.612 28.612 {built-in method builtins.exec}
200001 0.066 0.000 0.066 0.000 {built-in method builtins.getattr}
14 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
755255 0.113 0.000 0.113 0.000 {built-in method builtins.hash}
532473 0.092 0.000 0.092 0.000 {built-in method builtins.id}
466451 0.166 0.000 0.166 0.000 {built-in method builtins.isinstance}
532473 0.083 0.000 0.083 0.000 {built-in method builtins.issubclass}
3750044 0.325 0.000 0.325 0.000 {built-in method builtins.len}
177687 0.091 0.000 0.091 0.000 {built-in method builtins.max}
196 0.000 0.000 0.000 0.000 {built-in method builtins.min}
354982 0.142 0.000 0.142 0.000 {built-in method numpy.core.multiarray.array}
111456 0.095 0.000 0.095 0.000 {built-in method numpy.core.multiarray.where}
14 0.000 0.000 0.000 0.000 {built-in method posix.fspath}
4 0.000 0.000 0.000 0.000 {built-in method posix.getpid}
4 0.000 0.000 0.000 0.000 {built-in method sys._getframe}
5 0.000 0.000 0.000 0.000 {built-in method time.time}
8 0.000 0.000 0.000 0.000 {method 'acquire' of '_thread.RLock' objects}
532473 0.089 0.000 0.089 0.000 {method 'add' of 'set' objects}
875 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
16345 0.110 0.000 0.110 0.000 {method 'choice' of 'mtrand.RandomState' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
532473 0.097 0.000 0.097 0.000 {method 'discard' of 'set' objects}
188618 0.633 0.000 0.633 0.000 {method 'exponential' of 'mtrand.RandomState' objects}
4 0.000 0.000 0.000 0.000 {method 'find' of 'str' objects}
4 0.000 0.000 0.000 0.000 {method 'flush' of '_io.TextIOWrapper' objects}
8 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
354982 0.066 0.000 0.066 0.000 {method 'item' of 'numpy.ndarray' objects}
200001 0.196 0.000 0.196 0.000 {method 'nonzero' of 'numpy.ndarray' objects}
11887 0.052 0.000 0.052 0.000 {method 'randint' of 'mtrand.RandomState' objects}
167895 0.157 0.000 0.157 0.000 {method 'random_sample' of 'mtrand.RandomState' objects}
177491 0.251 0.000 0.251 0.000 {method 'ravel' of 'numpy.ndarray' objects}
643928 2.511 0.000 2.511 0.000 {method 'reduce' of 'numpy.ufunc' objects}
8 0.000 0.000 0.000 0.000 {method 'release' of '_thread.RLock' objects}
12 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects}
2451118 0.328 0.000 0.328 0.000 {method 'rstrip' of 'str' objects}
8 0.000 0.000 0.000 0.000 {method 'write' of '_io.TextIOWrapper' objects}
numpy was installed through pip on the laptop, and through the Fedora repositories on the desktop. Removing the package and installing it through pip removed arrayprint (fillFormat) from the profiling results and the runtime is now very much the same (which is still a bit weird). It's also strange that the other arrayprint functions are still being called, with 10 seconds of cumulative time.
The task is to average the image pixels on the values of the neighbours in a 3X3 window.
The image is a standard image of 2.5 MB.
In order to avoid the edge cases, i give them a value of -1 and filter them out.
When running the program it terminates after 624 seconds. Why is it so slow?
The program looks very minimalistic but i am sure i miss something.
import scipy
import numpy as np
import scipy.misc
import scipy.ndimage
import timeit
def average_neighbours(mat):
interesting = mat[mat>=0]
return np.average(interesting)
def run_program():
vienna = scipy.misc.imread('kaertnerstrasse.jpg')
vienna1 = scipy.ndimage.filters.generic_filter(vienna, function=average_neighbours, size=(3,3,1),mode="constant",cval=-1.0)
scipy.misc.imsave('kaertnerstrasse3-3.jpg', vienna1)
if __name__ == "__main__":
start = timeit.default_timer()
run_program()
stop = timeit.default_timer()
print stop - start
Here is the profiler data. It looks like the standard operations i want to do take much time. Is it possible to do it faster?
479086307 function calls (479086303 primitive calls) in 739.517 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 739.517 739.517 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 BmpImagePlugin.py:173(DibImageFile)
1 0.001 0.001 0.001 0.001 BmpImagePlugin.py:27(<module>)
1 0.000 0.000 0.000 0.000 BmpImagePlugin.py:55(_accept)
1 0.000 0.000 0.000 0.000 BmpImagePlugin.py:61(BmpImageFile)
1 0.006 0.006 0.006 0.006 GifImagePlugin.py:28(<module>)
1 0.000 0.000 0.000 0.000 GifImagePlugin.py:47(_accept)
1 0.000 0.000 0.000 0.000 GifImagePlugin.py:54(GifImageFile)
1 0.000 0.000 0.268 0.268 Image.py:1394(save)
1 0.000 0.000 0.014 0.014 Image.py:1750(new)
1 0.000 0.000 0.029 0.029 Image.py:1786(fromstring)
1 0.000 0.000 0.011 0.011 Image.py:1943(open)
5 0.000 0.000 0.000 0.000 Image.py:2082(register_open)
3 0.000 0.000 0.000 0.000 Image.py:2094(register_mime)
5 0.000 0.000 0.000 0.000 Image.py:2104(register_save)
10 0.000 0.000 0.000 0.000 Image.py:2114(register_extension)
1 0.000 0.000 0.000 0.000 Image.py:219(_conv_type_shape)
2 0.003 0.001 0.011 0.005 Image.py:290(preinit)
2 0.000 0.000 0.000 0.000 Image.py:371(_getdecoder)
2 0.000 0.000 0.000 0.000 Image.py:387(_getencoder)
3 0.000 0.000 0.000 0.000 Image.py:449(__init__)
1 0.000 0.000 0.000 0.000 Image.py:460(_new)
6/2 0.003 0.000 0.224 0.112 Image.py:503(__getattr__)
1 0.000 0.000 0.221 0.221 Image.py:522(tostring)
1 0.000 0.000 0.015 0.015 Image.py:577(fromstring)
4 0.000 0.000 0.000 0.000 Image.py:606(load)
5 0.000 0.000 0.000 0.000 Image.py:83(isStringType)
5 0.000 0.000 0.000 0.000 Image.py:92(isTupleType)
1 0.000 0.000 0.000 0.000 Image.py:98(isImageType)
1 0.000 0.000 0.183 0.183 ImageFile.py:124(load)
1 0.000 0.000 0.004 0.004 ImageFile.py:227(load_prepare)
1 0.000 0.000 0.000 0.000 ImageFile.py:236(load_end)
1 0.000 0.000 0.000 0.000 ImageFile.py:254(StubImageFile)
1 0.000 0.000 0.000 0.000 ImageFile.py:283(_ParserFile)
1 0.000 0.000 0.000 0.000 ImageFile.py:30(<module>)
1 0.000 0.000 0.000 0.000 ImageFile.py:330(Parser)
1 0.000 0.000 0.266 0.266 ImageFile.py:466(_save)
9 0.000 0.000 0.000 0.000 ImageFile.py:516(_safe_read)
1 0.000 0.000 0.000 0.000 ImageFile.py:69(ImageFile)
1 0.000 0.000 0.000 0.000 ImageFile.py:72(__init__)
1 0.000 0.000 0.000 0.000 JpegImagePlugin.py:121(SOF)
2 0.000 0.000 0.000 0.000 JpegImagePlugin.py:168(DQT)
1 0.000 0.000 0.000 0.000 JpegImagePlugin.py:261(_accept)
1 0.000 0.000 0.000 0.000 JpegImagePlugin.py:267(JpegImageFile)
1 0.000 0.000 0.000 0.000 JpegImagePlugin.py:272(_open)
1 0.000 0.000 0.000 0.000 JpegImagePlugin.py:35(<module>)
24 0.000 0.000 0.000 0.000 JpegImagePlugin.py:41(i16)
1 0.000 0.000 0.266 0.266 JpegImagePlugin.py:420(_save)
5 0.000 0.000 0.000 0.000 JpegImagePlugin.py:50(Skip)
1 0.000 0.000 0.000 0.000 JpegImagePlugin.py:54(APP)
1 0.000 0.000 0.000 0.000 PngImagePlugin.py:151(PngInfo)
1 0.000 0.000 0.000 0.000 PngImagePlugin.py:169(PngStream)
1 0.000 0.000 0.000 0.000 PngImagePlugin.py:308(PngImageFile)
1 0.001 0.001 0.001 0.001 PngImagePlugin.py:34(<module>)
1 0.000 0.000 0.000 0.000 PngImagePlugin.py:453(_idat)
1 0.000 0.000 0.000 0.000 PngImagePlugin.py:75(ChunkStream)
1 0.000 0.000 0.000 0.000 PpmImagePlugin.py:18(<module>)
1 0.000 0.000 0.000 0.000 PpmImagePlugin.py:46(PpmImageFile)
29942784 64.011 0.000 81.737 0.000 _methods.py:43(_count_reduce_items)
29942784 126.522 0.000 398.092 0.000 _methods.py:53(_mean)
1 0.000 0.000 0.000 0.000 _ni_support.py:38(_extend_mode_to_code)
2 0.000 0.000 0.000 0.000 _ni_support.py:55(_normalize_sequence)
1 0.000 0.000 0.000 0.000 _ni_support.py:70(_get_output)
29942784 151.188 0.000 701.104 0.000 ex1.py:12(average_neighbours)
1 0.007 0.007 739.513 739.513 ex1.py:18(run_program)
1 0.000 0.000 738.947 738.947 filters.py:1115(generic_filter)
29942784 67.220 0.000 549.916 0.000 function_base.py:436(average)
1 0.000 0.000 0.000 0.000 genericpath.py:85(_splitext)
1 0.000 0.000 0.000 0.000 ntpath.py:161(splitext)
1 0.000 0.000 0.000 0.000 numeric.py:141(ones)
29942789 19.069 0.000 50.684 0.000 numeric.py:394(asarray)
29942784 14.466 0.000 37.300 0.000 numeric.py:464(asanyarray)
1 0.000 0.000 0.000 0.000 numeric.py:773(flatnonzero)
1 0.000 0.000 0.247 0.247 pilutil.py:103(imread)
1 0.002 0.002 0.313 0.313 pilutil.py:130(imsave)
1 0.000 0.000 0.236 0.236 pilutil.py:174(fromimage)
1 0.000 0.000 0.042 0.042 pilutil.py:206(toimage)
1 0.000 0.000 0.000 0.000 pilutil.py:34(bytescale)
1 0.000 0.000 0.000 0.000 re.py:188(compile)
1 0.000 0.000 0.000 0.000 re.py:226(_compile)
5 0.000 0.000 0.000 0.000 sre_compile.py:178(_compile_charset)
5 0.000 0.000 0.000 0.000 sre_compile.py:207(_optimize_charset)
1 0.000 0.000 0.000 0.000 sre_compile.py:32(_compile)
1 0.000 0.000 0.000 0.000 sre_compile.py:359(_compile_info)
2 0.000 0.000 0.000 0.000 sre_compile.py:472(isstring)
1 0.000 0.000 0.000 0.000 sre_compile.py:478(_code)
1 0.000 0.000 0.000 0.000 sre_compile.py:493(compile)
4 0.000 0.000 0.000 0.000 sre_parse.py:138(append)
1 0.000 0.000 0.000 0.000 sre_parse.py:140(getwidth)
1 0.000 0.000 0.000 0.000 sre_parse.py:178(__init__)
7 0.000 0.000 0.000 0.000 sre_parse.py:182(__next)
1 0.000 0.000 0.000 0.000 sre_parse.py:195(match)
6 0.000 0.000 0.000 0.000 sre_parse.py:201(get)
4 0.000 0.000 0.000 0.000 sre_parse.py:257(_escape)
1 0.000 0.000 0.000 0.000 sre_parse.py:301(_parse_sub)
1 0.000 0.000 0.000 0.000 sre_parse.py:379(_parse)
1 0.000 0.000 0.000 0.000 sre_parse.py:67(__init__)
1 0.000 0.000 0.000 0.000 sre_parse.py:675(parse)
1 0.000 0.000 0.000 0.000 sre_parse.py:90(__init__)
11 0.000 0.000 0.000 0.000 string.py:220(lower)
24 0.000 0.000 0.000 0.000 string.py:229(upper)
1 0.000 0.000 0.012 0.012 string.py:308(join)
2 0.000 0.000 0.000 0.000 type_check.py:237(iscomplexobj)
1 0.014 0.014 0.014 0.014 {PIL._imaging.fill}
1 0.004 0.004 0.004 0.004 {PIL._imaging.new}
1 0.000 0.000 0.000 0.000 {_sre.compile}
4 0.000 0.000 0.000 0.000 {apply}
41 0.193 0.005 0.193 0.005 {built-in method decode}
1 0.266 0.266 0.266 0.266 {built-in method encode_to_file}
548 0.025 0.000 0.025 0.000 {built-in method encode}
3 0.000 0.000 0.000 0.000 {built-in method pixel_access}
2 0.000 0.000 0.000 0.000 {built-in method setimage}
26 0.000 0.000 0.000 0.000 {chr}
1 0.000 0.000 0.000 0.000 {divmod}
4 0.000 0.000 0.000 0.000 {getattr}
29942787 9.386 0.000 9.386 0.000 {hasattr}
89828369 34.448 0.000 34.448 0.000 {isinstance}
29942786 12.578 0.000 12.578 0.000 {issubclass}
40 0.000 0.000 0.000 0.000 {len}
3 0.000 0.000 0.000 0.000 {max}
591 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.001 0.001 0.001 0.001 {method 'close' of 'file' objects}
1 0.000 0.000 0.000 0.000 {method 'copy' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'fileno' of 'file' objects}
2 0.000 0.000 0.000 0.000 {method 'flush' of 'file' objects}
11 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
13 0.000 0.000 0.000 0.000 {method 'has_key' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
1 0.012 0.012 0.012 0.012 {method 'join' of 'str' objects}
11 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects}
29942784 19.872 0.000 417.964 0.000 {method 'mean' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'nonzero' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
79 0.001 0.000 0.001 0.000 {method 'read' of 'file' objects}
29942784 114.221 0.000 114.221 0.000 {method 'reduce' of 'numpy.ufunc' objects}
3 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects}
2 0.000 0.000 0.000 0.000 {method 'seek' of 'file' objects}
2 0.000 0.000 0.000 0.000 {method 'sort' of 'list' objects}
1 0.013 0.013 0.013 0.013 {method 'tostring' of 'numpy.ndarray' objects}
24 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects}
2 0.000 0.000 0.000 0.000 {min}
59885574 54.461 0.000 54.685 0.000 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.copyto}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.empty}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros}
2 0.001 0.000 0.001 0.000 {open}
63 0.000 0.000 0.000 0.000 {ord}
29942785 13.675 0.000 13.675 0.000 {range}
1 37.843 37.843 738.947 738.947 {scipy.ndimage._nd_image.generic_filter}
1 0.000 0.000 0.000 0.000 {zip}
739.529455192
The line function_base.py:436(average) is intersting. It looks like it takes most of the time.
I would try to replace the generic_filter method with a more general and maybe better suited solution. What you are basically trying to do is a convolution with a kernel of size 3x3x1 and values 1/9:
import numpy as np
import scipy
import scipy.misc
import scipy.ndimage
import scipy.signal
import timeit
def run_program():
my_image = scipy.misc.imread('my_image.png')
kernel = np.ones((3, 3, 1))
kernel /= kernel.size
my_image_smoothed = scipy.signal.fftconvolve(my_image, kernel, mode='valid')
scipy.misc.imsave('my_image_3x3.png', my_image_smoothed)
if(__name__ == '__main__'):
start = timeit.default_timer()
run_program()
print timeit.default_timer() - start
[UPDATE]
To respect the original image size you could use the mode 'same' instead of 'valid' for fftconvolve. This will automatically zero pad your image. However to get better results at the boundaries of the image pad the image with one of the modes promoted by numpy's pad function
numpy.pad(array, pad_width, mode=<'constant'|'edge'|'reflect'|'symmetric'|...>)
and use the fftconvolve mode 'valid' on the alternately padded image.
padding = [(shape // 2, shape // 2) for shape in kernel.shape]
my_padded_image = np.pad(my_image, shape, mode='edge')
my_image_smoothed = scipy.signal.fftconvolve(my_padded_image, kernel, mode='valid')
[/UPDATE]
I am not sure if it will run a lot faster but it should at least compensate the method call overhead.
On a test image with 4876x2278 pixels the code needs ~40 seconds (Hard drive is an SSD)
Best regards
I have this code:
def getNeighbors(cfg, cand, adj):
c_nocfg = np.setdiff1d(cand, cfg)
deg = np.sum(adj[np.ix_(cfg, c_nocfg)], axis=0)
degidx = np.where(deg) > 0
nbs = c_nocfg[degidx]
deg = deg[degidx]
return nbs, deg
which retrieves neighbors (and their degree in a subgraph spanned by nodes in cfg) from an adjacency matrix.
Inlining the code gives reasonable performance (~10kx10k adjacency matrix as boolean array, 10k candidates in cand, subgraph cfg spanning 500 nodes): 0.02s
However, calling the function getNeighbors results in an overhead of roughly 0.5s.
Mocking
deg = np.sum(adj[np.ix_(cfg, c_nocfg)], axis=0)
with
deg = np.random.randint(500, size=c_nocfg.shape[0])
drives down the runtime of the function call to 0.002s.
Could someone explain me what causes the enormous overhead in my case - after all, the sum-operation itself is not all too costly.
Profiling output
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.466 0.466 0.488 0.488 /home/x/x/x/utils/benchmarks.py:15(getNeighbors)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:1621(sum)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:23(_sum)
1 0.019 0.019 0.019 0.019 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 0.002 0.002 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:410(setdiff1d)
1 0.000 0.000 0.001 0.001 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:284(in1d)
2 0.001 0.000 0.001 0.000 {method 'argsort' of 'numpy.ndarray' objects}
2 0.000 0.000 0.001 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:93(unique)
2 0.000 0.000 0.000 0.000 {method 'sort' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.where}
4 0.000 0.000 0.000 0.000 {numpy.core.multiarray.concatenate}
2 0.000 0.000 0.000 0.000 {method 'flatten' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/index_tricks.py:26(ix_)
5 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py:392(asarray)
5 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {range}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
2 0.000 0.000 0.000 0.000 {issubclass}
2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
6 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
inline version:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:1621(sum)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:23(_sum)
1 0.019 0.019 0.019 0.019 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 0.002 0.002 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:410(setdiff1d)
1 0.000 0.000 0.001 0.001 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:284(in1d)
2 0.001 0.000 0.001 0.000 {method 'argsort' of 'numpy.ndarray' objects}
2 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:93(unique)
2 0.000 0.000 0.000 0.000 {method 'sort' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.where}
4 0.000 0.000 0.000 0.000 {numpy.core.multiarray.concatenate}
2 0.000 0.000 0.000 0.000 {method 'flatten' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/index_tricks.py:26(ix_)
5 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py:392(asarray)
5 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {range}
2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
2 0.000 0.000 0.000 0.000 {issubclass}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
6 0.000 0.000 0.000 0.000 {len}
sample data for testing:
np.random.seed(0)
adj = np.zeros(10000*10000, dtype=np.bool)
adj[np.random.randint(low=0, high=10000*10000+1, size=100000)] = True
adj = adj.reshape((10000, 10000))
cand = np.arange(adj.shape[0])
cfgs = np.random.choice(cand, size=500)
I am profiling a python code ; why does it spend more time in the user space ?
user#terminal$ time python main.py
1964 function calls in 0.003 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.003 0.003 :1()
1 0.000 0.000 0.000 0.000 ConfigParser.py:218(init)
1 0.000 0.000 0.001 0.001 ConfigParser.py:266(read)
30 0.000 0.000 0.000 0.000 ConfigParser.py:354(optionxform)
1 0.000 0.000 0.000 0.000 ConfigParser.py:434(_read)
15 0.000 0.000 0.000 0.000 ConfigParser.py:515(get)
15 0.000 0.000 0.000 0.000 ConfigParser.py:611(_interpolate)
15 0.000 0.000 0.000 0.000 ConfigParser.py:619(_interpolate_some)
1 0.000 0.000 0.000 0.000 config.py:32(read_config_data)
1 0.000 0.000 0.001 0.001 config.py:9(init)
6 0.000 0.000 0.000 0.000 entity.py:108(add_to_filter)
1 0.000 0.000 0.002 0.002 entity.py:24(init)
1 0.001 0.001 0.002 0.002 entity.py:39(create_inverted_index)
493 0.000 0.000 0.001 0.000 entity.py:80(beautify)
1 0.000 0.000 0.000 0.000 entity.py:84(create_bucket_lookup)
1 0.000 0.000 0.000 0.000 main.py:15()
2 0.000 0.000 0.000 0.000 main.py:18()
1 0.000 0.000 0.003 0.003 main.py:23(main)
1 0.000 0.000 0.000 0.000 main.py:9(get_bag_of_words)
19 0.000 0.000 0.000 0.000 {built-in method group}
34 0.000 0.000 0.000 0.000 {built-in method match}
1 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {len}
28 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of 'file' objects}
15 0.000 0.000 0.000 0.000 {method 'copy' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
15 0.000 0.000 0.000 0.000 {method 'find' of 'str' objects}
19 0.000 0.000 0.000 0.000 {method 'isspace' of 'str' objects}
24 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
49 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects}
20 0.000 0.000 0.000 0.000 {method 'readline' of 'file' objects}
6 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects}
24 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects}
47 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}
9 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects}
1030 0.000 0.000 0.000 0.000 {method 'strip' of 'str' objects}
15 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'write' of 'file' objects}
10 0.000 0.000 0.000 0.000 {open}
2 0.000 0.000 0.000 0.000 {range}
3 0.000 0.000 0.000 0.000 {reduce}
Done
real 0m0.063s
user 0m0.050s
sys 0m0.010s
While the cProfile says it took only 0.003 seconds, why is unix (sys) time saying it runs in 0.01 seconds?
time(1) is measuring the execution time of the whole process, whereas the profiler excludes Python interpreter startup time, bytecode compilation time, etc.
I currently process sections of a string like this:
for (i, j) in huge_list_of_indices:
process(huge_text_block[i:j])
I want to avoid the overhead of generating these temporary substrings. Any ideas? Perhaps a wrapper that somehow uses index offsets? This is currently my bottleneck.
Note that process() is another python module that expects a string as input.
Edit:
A few people doubt there is a problem. Here are some sample results:
import time
import string
text = string.letters * 1000
def timeit(fn):
t1 = time.time()
for i in range(len(text)):
fn(i)
t2 = time.time()
print '%s took %0.3f ms' % (fn.func_name, (t2-t1) * 1000)
def test_1(i):
return text[i:]
def test_2(i):
return text[:]
def test_3(i):
return text
timeit(test_1)
timeit(test_2)
timeit(test_3)
Output:
test_1 took 972.046 ms
test_2 took 47.620 ms
test_3 took 43.457 ms
I think what you are looking for are buffers.
The characteristic of buffers is that they "slice" an object supporting the buffer interface without copying its content, but essentially opening a "window" on the sliced object content. Some more technical explanation is available here. An excerpt:
Python objects implemented in C can export a group of functions called the “buffer interface.” These functions can be used by an object to expose its data in a raw, byte-oriented format. Clients of the object can use the buffer interface to access the object data directly, without needing to copy it first.
In your case the code should look more or less like this:
>>> s = 'Hugely_long_string_not_to_be_copied'
>>> ij = [(0, 3), (6, 9), (12, 18)]
>>> for i, j in ij:
... print buffer(s, i, j-i) # Should become process(...)
Hug
_lo
string
HTH!
A wrapper that uses index offsets to a mmap object could work, yes.
But before you do that, are you sure that generating these substrings are a problem? Don't optimize before you have found out where the time and memory actually goes. I wouldn't expect this to be a significant problem.
If you are using Python3 you can use protocol buffer and memory views. Assuming that the text is stored somewhere in the filesystem:
f = open(FILENAME, 'rb')
data = bytearray(os.path.getsize(FILENAME))
f.readinto(data)
mv = memoryview(data)
for (i, j) in huge_list_of_indices:
process(mv[i:j])
Check also this article. It might be useful.
Maybe a wrapper that uses index offsets is indeed what you are looking for. Here is an example that does the job. You may have to add more checks on slices (for overflow and negative indexes) depending on your needs.
#!/usr/bin/env python
from collections import Sequence
from timeit import Timer
def process(s):
return s[0], len(s)
class FakeString(Sequence):
def __init__(self, string):
self._string = string
self.fake_start = 0
self.fake_stop = len(string)
def setFakeIndices(self, i, j):
self.fake_start = i
self.fake_stop = j
def __len__(self):
return self.fake_stop - self.fake_start
def __getitem__(self, ii):
if isinstance(ii, slice):
if ii.start is None:
start = self.fake_start
else:
start = ii.start + self.fake_start
if ii.stop is None:
stop = self.fake_stop
else:
stop = ii.stop + self.fake_start
ii = slice(start,
stop,
ii.step)
else:
ii = ii + self.fake_start
return self._string[ii]
def initial_method():
r = []
for n in xrange(1000):
r.append(process(huge_string[1:9999999]))
return r
def alternative_method():
r = []
for n in xrange(1000):
fake_string.setFakeIndices(1, 9999999)
r.append(process(fake_string))
return r
if __name__ == '__main__':
huge_string = 'ABCDEFGHIJ' * 100000
fake_string = FakeString(huge_string)
fake_string.setFakeIndices(5,15)
assert fake_string[:] == huge_string[5:15]
t = Timer(initial_method)
print "initial_method(): %fs" % t.timeit(number=1)
which gives:
initial_method(): 1.248001s
alternative_method(): 0.003416s
The example the OP gives, will give nearly biggest performance difference between slicing and not slicing possible.
If processing actually does something that takes significant time, the problem may hardly exist.
Fact is OP needs to let us know what process does. The most likely scenario is it does something significant, and therefore he should profile his code.
Adapted from op's example:
#slice_time.py
import time
import string
text = string.letters * 1000
import random
indices = range(len(text))
random.shuffle(indices)
import re
def greater_processing(a_string):
results = re.findall('m', a_string)
def medium_processing(a_string):
return re.search('m.*?m', a_string)
def lesser_processing(a_string):
return re.match('m', a_string)
def least_processing(a_string):
return a_string
def timeit(fn, processor):
t1 = time.time()
for i in indices:
fn(i, i + 1000, processor)
t2 = time.time()
print '%s took %0.3f ms %s' % (fn.func_name, (t2-t1) * 1000, processor.__name__)
def test_part_slice(i, j, processor):
return processor(text[i:j])
def test_copy(i, j, processor):
return processor(text[:])
def test_text(i, j, processor):
return processor(text)
def test_buffer(i, j, processor):
return processor(buffer(text, i, j - i))
if __name__ == '__main__':
processors = [least_processing, lesser_processing, medium_processing, greater_processing]
tests = [test_part_slice, test_copy, test_text, test_buffer]
for processor in processors:
for test in tests:
timeit(test, processor)
And then the run...
In [494]: run slice_time.py
test_part_slice took 68.264 ms least_processing
test_copy took 42.988 ms least_processing
test_text took 33.075 ms least_processing
test_buffer took 76.770 ms least_processing
test_part_slice took 270.038 ms lesser_processing
test_copy took 197.681 ms lesser_processing
test_text took 196.716 ms lesser_processing
test_buffer took 262.288 ms lesser_processing
test_part_slice took 416.072 ms medium_processing
test_copy took 352.254 ms medium_processing
test_text took 337.971 ms medium_processing
test_buffer took 438.683 ms medium_processing
test_part_slice took 502.069 ms greater_processing
test_copy took 8149.231 ms greater_processing
test_text took 8292.333 ms greater_processing
test_buffer took 563.009 ms greater_processing
Notes:
Yes I tried OP's original test_1 with [i:] slice and it's much slower, making his test even more bunk.
Interesting that buffer almost always performs slightly slower then slicing. This time there is one where it does better though! The real test though is below and buffer seems to do better for larger substrings while slicing does better for smaller substrings.
And, yes, I do have some randomness in this test so test away and see the different results :). It also may be interesting to changes the size of the 1000's.
So, maybe some others believe you, but I don't, so I'd like to know something about what processing does and how you came to the conclusion: "slicing is the problem."
I profiled medium processing in my example and upped the string.letters multiplier to 100000 and raised the length of the slices to 10000. Also below is one with slices of length 100. I used cProfile for these (much less overhead then profile!).
test_part_slice took 77338.285 ms medium_processing
31200019 function calls in 77.338 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 77.338 77.338 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 iostream.py:63(write)
5200000 8.208 0.000 43.823 0.000 re.py:139(search)
5200000 9.205 0.000 12.897 0.000 re.py:228(_compile)
5200000 5.651 0.000 49.475 0.000 slice_time.py:15(medium_processing)
1 7.901 7.901 77.338 77.338 slice_time.py:24(timeit)
5200000 19.963 0.000 69.438 0.000 slice_time.py:31(test_part_slice)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
2 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode}
2 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5200000 3.692 0.000 3.692 0.000 {method 'get' of 'dict' objects}
5200000 22.718 0.000 22.718 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
2 0.000 0.000 0.000 0.000 {method 'write' of '_io.StringIO' objects}
4 0.000 0.000 0.000 0.000 {time.time}
test_buffer took 58067.440 ms medium_processing
31200103 function calls in 58.068 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 58.068 58.068 <string>:1(<module>)
3 0.000 0.000 0.000 0.000 __init__.py:185(dumps)
3 0.000 0.000 0.000 0.000 encoder.py:102(__init__)
3 0.000 0.000 0.000 0.000 encoder.py:180(encode)
3 0.000 0.000 0.000 0.000 encoder.py:206(iterencode)
1 0.000 0.000 0.001 0.001 iostream.py:37(flush)
2 0.000 0.000 0.001 0.000 iostream.py:63(write)
1 0.000 0.000 0.000 0.000 iostream.py:86(_new_buffer)
3 0.000 0.000 0.000 0.000 jsonapi.py:57(_squash_unicode)
3 0.000 0.000 0.000 0.000 jsonapi.py:69(dumps)
2 0.000 0.000 0.000 0.000 jsonutil.py:78(date_default)
1 0.000 0.000 0.000 0.000 os.py:743(urandom)
5200000 6.814 0.000 39.110 0.000 re.py:139(search)
5200000 7.853 0.000 10.878 0.000 re.py:228(_compile)
1 0.000 0.000 0.000 0.000 session.py:149(msg_header)
1 0.000 0.000 0.000 0.000 session.py:153(extract_header)
1 0.000 0.000 0.000 0.000 session.py:315(msg_id)
1 0.000 0.000 0.000 0.000 session.py:350(msg_header)
1 0.000 0.000 0.000 0.000 session.py:353(msg)
1 0.000 0.000 0.000 0.000 session.py:370(sign)
1 0.000 0.000 0.000 0.000 session.py:385(serialize)
1 0.000 0.000 0.001 0.001 session.py:437(send)
3 0.000 0.000 0.000 0.000 session.py:75(<lambda>)
5200000 4.732 0.000 43.842 0.000 slice_time.py:15(medium_processing)
1 5.423 5.423 58.068 58.068 slice_time.py:24(timeit)
5200000 8.802 0.000 52.645 0.000 slice_time.py:40(test_buffer)
7 0.000 0.000 0.000 0.000 traitlets.py:268(__get__)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
1 0.000 0.000 0.000 0.000 uuid.py:101(__init__)
1 0.000 0.000 0.000 0.000 uuid.py:197(__str__)
1 0.000 0.000 0.000 0.000 uuid.py:531(uuid4)
2 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode}
1 0.000 0.000 0.000 0.000 {built-in method now}
18 0.000 0.000 0.000 0.000 {isinstance}
4 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {locals}
1 0.000 0.000 0.000 0.000 {map}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of '_io.StringIO' objects}
1 0.000 0.000 0.000 0.000 {method 'count' of 'list' objects}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects}
5200001 3.025 0.000 3.025 0.000 {method 'get' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'getvalue' of '_io.StringIO' objects}
3 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
5200000 21.418 0.000 21.418 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
1 0.000 0.000 0.000 0.000 {method 'send_multipart' of 'zmq.core.socket.Socket' objects}
2 0.000 0.000 0.000 0.000 {method 'strftime' of 'datetime.date' objects}
1 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'write' of '_io.StringIO' objects}
1 0.000 0.000 0.000 0.000 {posix.close}
1 0.000 0.000 0.000 0.000 {posix.open}
1 0.000 0.000 0.000 0.000 {posix.read}
4 0.000 0.000 0.000 0.000 {time.time}
Smaller slices (100 length).
test_part_slice took 54916.153 ms medium_processing
31200019 function calls in 54.916 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 54.916 54.916 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 iostream.py:63(write)
5200000 6.788 0.000 38.312 0.000 re.py:139(search)
5200000 8.014 0.000 11.257 0.000 re.py:228(_compile)
5200000 4.722 0.000 43.034 0.000 slice_time.py:15(medium_processing)
1 5.594 5.594 54.916 54.916 slice_time.py:24(timeit)
5200000 6.288 0.000 49.322 0.000 slice_time.py:31(test_part_slice)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
2 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode}
2 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5200000 3.242 0.000 3.242 0.000 {method 'get' of 'dict' objects}
5200000 20.268 0.000 20.268 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
2 0.000 0.000 0.000 0.000 {method 'write' of '_io.StringIO' objects}
4 0.000 0.000 0.000 0.000 {time.time}
test_buffer took 62019.684 ms medium_processing
31200103 function calls in 62.020 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 62.020 62.020 <string>:1(<module>)
3 0.000 0.000 0.000 0.000 __init__.py:185(dumps)
3 0.000 0.000 0.000 0.000 encoder.py:102(__init__)
3 0.000 0.000 0.000 0.000 encoder.py:180(encode)
3 0.000 0.000 0.000 0.000 encoder.py:206(iterencode)
1 0.000 0.000 0.001 0.001 iostream.py:37(flush)
2 0.000 0.000 0.001 0.000 iostream.py:63(write)
1 0.000 0.000 0.000 0.000 iostream.py:86(_new_buffer)
3 0.000 0.000 0.000 0.000 jsonapi.py:57(_squash_unicode)
3 0.000 0.000 0.000 0.000 jsonapi.py:69(dumps)
2 0.000 0.000 0.000 0.000 jsonutil.py:78(date_default)
1 0.000 0.000 0.000 0.000 os.py:743(urandom)
5200000 7.426 0.000 41.152 0.000 re.py:139(search)
5200000 8.470 0.000 11.628 0.000 re.py:228(_compile)
1 0.000 0.000 0.000 0.000 session.py:149(msg_header)
1 0.000 0.000 0.000 0.000 session.py:153(extract_header)
1 0.000 0.000 0.000 0.000 session.py:315(msg_id)
1 0.000 0.000 0.000 0.000 session.py:350(msg_header)
1 0.000 0.000 0.000 0.000 session.py:353(msg)
1 0.000 0.000 0.000 0.000 session.py:370(sign)
1 0.000 0.000 0.000 0.000 session.py:385(serialize)
1 0.000 0.000 0.001 0.001 session.py:437(send)
3 0.000 0.000 0.000 0.000 session.py:75(<lambda>)
5200000 5.399 0.000 46.551 0.000 slice_time.py:15(medium_processing)
1 5.958 5.958 62.020 62.020 slice_time.py:24(timeit)
5200000 9.510 0.000 56.061 0.000 slice_time.py:40(test_buffer)
7 0.000 0.000 0.000 0.000 traitlets.py:268(__get__)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
1 0.000 0.000 0.000 0.000 uuid.py:101(__init__)
1 0.000 0.000 0.000 0.000 uuid.py:197(__str__)
1 0.000 0.000 0.000 0.000 uuid.py:531(uuid4)
2 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode}
1 0.000 0.000 0.000 0.000 {built-in method now}
18 0.000 0.000 0.000 0.000 {isinstance}
4 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {locals}
1 0.000 0.000 0.000 0.000 {map}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of '_io.StringIO' objects}
1 0.000 0.000 0.000 0.000 {method 'count' of 'list' objects}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects}
5200001 3.158 0.000 3.158 0.000 {method 'get' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'getvalue' of '_io.StringIO' objects}
3 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
5200000 22.097 0.000 22.097 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
1 0.000 0.000 0.000 0.000 {method 'send_multipart' of 'zmq.core.socket.Socket' objects}
2 0.000 0.000 0.000 0.000 {method 'strftime' of 'datetime.date' objects}
1 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'write' of '_io.StringIO' objects}
1 0.000 0.000 0.000 0.000 {posix.close}
1 0.000 0.000 0.000 0.000 {posix.open}
1 0.000 0.000 0.000 0.000 {posix.read}
4 0.000 0.000 0.000 0.000 {time.time}
process(huge_text_block[i:j])
I want to avoid the overhead of generating these temporary substrings.
(...)
Note that process() is another python module
that expects a string as input.
Completely contradictory.
How can you imagine to find a way for not creating what the function requires ?!