I have a binary file with a particular format, described here for those that are interested. The format isn't the import thing. I can read and convert this data into the form that I want but the problem is these binary files tend to have a lot of information in them. If I am just returning the bytes as read, this is very quick (less than 1 second), but I can't do anything useful with those bytes, they need to be converted into genotypes first and that is the code that appears to be slowing things down.
The conversion for a series of bytes into genotypes is as follows
h = ['%02x' % ord(b) for b in currBytes]
b = ''.join([bin(int(i, 16))[2:].zfill(8)[::-1] for i in h])[:nBits]
genotypes = [b[i:i+2] for i in range(0, len(b), 2)]
map = {'00': 0, '01': 1, '11': 2, '10': None}
return [map[i] for i in genotypes]
What I am hoping is that there is a faster way to do this? Any ideas? Below are the results of running python -m cProfile test.py where test.py is calling a reader object I have written to read these files.
vlan1711:src davykavanagh$ python -m cProfile test.py
183, 593483, 108607389, 366, 368, 46
that took 93.6410450935
86649088 function calls in 96.396 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.248 1.248 2.753 2.753 plinkReader.py:13(__init__)
1 0.000 0.000 0.000 0.000 plinkReader.py:47(plinkReader)
1 0.000 0.000 0.000 0.000 plinkReader.py:48(__init__)
1 0.000 0.000 0.000 0.000 plinkReader.py:5(<module>)
1 0.000 0.000 0.000 0.000 plinkReader.py:55(__iter__)
593484 77.634 0.000 91.477 0.000 plinkReader.py:58(next)
1 0.000 0.000 0.000 0.000 plinkReader.py:71(SNP)
593483 1.123 0.000 1.504 0.000 plinkReader.py:75(__init__)
1 0.000 0.000 0.000 0.000 plinkReader.py:8(plinkFiles)
1 0.000 0.000 0.000 0.000 plinkReader.py:85(Person)
183 0.000 0.000 0.001 0.000 plinkReader.py:89(__init__)
1 2.166 2.166 96.396 96.396 test.py:5(<module>)
27300218 5.909 0.000 5.909 0.000 {bin}
593483 0.080 0.000 0.080 0.000 {len}
1 0.000 0.000 0.000 0.000 {math.ceil}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}
593483 0.531 0.000 0.531 0.000 {method 'join' of 'str' objects}
593485 0.588 0.000 0.588 0.000 {method 'read' of 'file' objects}
593666 0.257 0.000 0.257 0.000 {method 'rsplit' of 'str' objects}
593666 0.125 0.000 0.125 0.000 {method 'rstrip' of 'str' objects}
27300218 4.098 0.000 4.098 0.000 {method 'zfill' of 'str' objects}
3 0.000 0.000 0.000 0.000 {open}
27300218 1.820 0.000 1.820 0.000 {ord}
593483 0.817 0.000 0.817 0.000 {range}
2 0.000 0.000 0.000 0.000 {time.time}
You are slowing things down by creating lists and large strings you don't need. You are just examining bits of the bytes and convert two-bit groups into numbers. That can be achieved much simpler, e. g. by this code:
def convert(currBytes, nBits):
for byte in currBytes:
for p in range(4):
bits = (ord(byte) >> (p*2)) & 3
yield None if bits == 1 else 1 if bits == 2 else 2 if bits == 3 else 0
nBits -= 2
if nBits <= 0:
raise StopIteration()
In case you really need a list in the end, just use
list(convert(currBytes, nBits))
But I guess there can be cases in which you just want to iterate over the results:
for blurp in convert(currBytes, nBits):
# handle your blurp (0, 1, 2, or None)
Related
I was solving leetcode 1155 which is about number of dice rolls with target sum. I was using dictionary-based memorization. Here's the exact code:
class Solution:
def numRollsToTarget(self, dices: int, faces: int, target: int) -> int:
dp = {}
def ways(t, rd):
if t == 0 and rd == 0: return 1
if t <= 0 or rd <= 0: return 0
if dp.get((t,rd)): return dp[(t,rd)]
dp[(t,rd)] = sum(ways(t-i, rd-1) for i in range(1,faces+1))
return dp[(t,rd)]
return ways(target, dices)
But this solution is invariably timing out for a combination of face and dices around 15*15
Then I found this solution which uses functools.lru_cache and the rest of it is exactly the same. This solution works very fast.
class Solution:
def numRollsToTarget(self, dices: int, faces: int, target: int) -> int:
from functools import lru_cache
#lru_cache(None)
def ways(t, rd):
if t == 0 and rd == 0: return 1
if t <= 0 or rd <= 0: return 0
return sum(ways(t-i, rd-1) for i in range(1,faces+1))
return ways(target, dices)
Earlier, I have compared and found that in most cases, lru_cache does not outperform dictionary-based cache by such a margin.
Can someone explain the reason why there is such a drastic performance difference between the two approaches?
First, running your OP code with cProfile and this is the report:
with print(numRollsToTarget2(4, 6, 20)) (OP version)
You can spot right away there're some heavy calls in ways genexpr and sum. That's prob. need close examinations and try to improve/reduce. Next posting is for similar memo version, but the calls is much less. And that version has passed w/o timeout.
35
2864 function calls (366 primitive calls) in 0.018 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.018 0.018 <string>:1(<module>)
1 0.000 0.000 0.001 0.001 dice_rolls.py:23(numRollsToTarget2)
1075/1 0.001 0.000 0.001 0.001 dice_rolls.py:25(ways)
1253/7 0.001 0.000 0.001 0.000 dice_rolls.py:30(<genexpr>)
1 0.000 0.000 0.018 0.018 dice_rolls.py:36(main)
21 0.000 0.000 0.000 0.000 rpc.py:153(debug)
3 0.000 0.000 0.017 0.006 rpc.py:216(remotecall)
3 0.000 0.000 0.000 0.000 rpc.py:226(asynccall)
3 0.000 0.000 0.016 0.005 rpc.py:246(asyncreturn)
3 0.000 0.000 0.000 0.000 rpc.py:252(decoderesponse)
3 0.000 0.000 0.016 0.005 rpc.py:290(getresponse)
3 0.000 0.000 0.000 0.000 rpc.py:298(_proxify)
3 0.000 0.000 0.016 0.005 rpc.py:306(_getresponse)
3 0.000 0.000 0.000 0.000 rpc.py:328(newseq)
3 0.000 0.000 0.000 0.000 rpc.py:332(putmessage)
2 0.000 0.000 0.001 0.000 rpc.py:559(__getattr__)
3 0.000 0.000 0.000 0.000 rpc.py:57(dumps)
1 0.000 0.000 0.001 0.001 rpc.py:577(__getmethods)
2 0.000 0.000 0.000 0.000 rpc.py:601(__init__)
2 0.000 0.000 0.016 0.008 rpc.py:606(__call__)
4 0.000 0.000 0.000 0.000 run.py:412(encoding)
4 0.000 0.000 0.000 0.000 run.py:416(errors)
2 0.000 0.000 0.017 0.008 run.py:433(write)
6 0.000 0.000 0.000 0.000 threading.py:1306(current_thread)
3 0.000 0.000 0.000 0.000 threading.py:222(__init__)
3 0.000 0.000 0.016 0.005 threading.py:270(wait)
3 0.000 0.000 0.000 0.000 threading.py:81(RLock)
3 0.000 0.000 0.000 0.000 {built-in method _struct.pack}
3 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
6 0.000 0.000 0.000 0.000 {built-in method _thread.get_ident}
1 0.000 0.000 0.018 0.018 {built-in method builtins.exec}
6 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
9 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.017 0.017 {built-in method builtins.print}
179/1 0.000 0.000 0.001 0.001 {built-in method builtins.sum}
3 0.000 0.000 0.000 0.000 {built-in method select.select}
3 0.000 0.000 0.000 0.000 {method '_acquire_restore' of '_thread.RLock' objects}
3 0.000 0.000 0.000 0.000 {method '_is_owned' of '_thread.RLock' objects}
3 0.000 0.000 0.000 0.000 {method '_release_save' of '_thread.RLock' objects}
3 0.000 0.000 0.000 0.000 {method 'acquire' of '_thread.RLock' objects}
6 0.016 0.003 0.016 0.003 {method 'acquire' of '_thread.lock' objects}
3 0.000 0.000 0.000 0.000 {method 'append' of 'collections.deque' objects}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'bytes' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
3 0.000 0.000 0.000 0.000 {method 'dump' of '_pickle.Pickler' objects}
2 0.000 0.000 0.000 0.000 {method 'encode' of 'str' objects}
201 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
3 0.000 0.000 0.000 0.000 {method 'getvalue' of '_io.BytesIO' objects}
3 0.000 0.000 0.000 0.000 {method 'release' of '_thread.RLock' objects}
3 0.000 0.000 0.000 0.000 {method 'send' of '_socket.socket' objects}
Then I tried to run modified/simplified version, and compare the results.
35
387 function calls (193 primitive calls) in 0.006 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.006 0.006 <string>:1(<module>)
1 0.000 0.000 0.006 0.006 dice_rolls.py:36(main)
1 0.000 0.000 0.000 0.000 dice_rolls.py:5(numRollsToTarget)
195/1 0.000 0.000 0.000 0.000 dice_rolls.py:8(dp)
21 0.000 0.000 0.000 0.000 rpc.py:153(debug)
3 0.000 0.000 0.006 0.002 rpc.py:216(remotecall)
3 0.000 0.000 0.000 0.000 rpc.py:226(asynccall)
3 0.000 0.000 0.006 0.002 rpc.py:246(asyncreturn)
3 0.000 0.000 0.000 0.000 rpc.py:252(decoderesponse)
3 0.000 0.000 0.006 0.002 rpc.py:290(getresponse)
3 0.000 0.000 0.000 0.000 rpc.py:298(_proxify)
3 0.000 0.000 0.006 0.002 rpc.py:306(_getresponse)
3 0.000 0.000 0.000 0.000 rpc.py:328(newseq)
3 0.000 0.000 0.000 0.000 rpc.py:332(putmessage)
2 0.000 0.000 0.001 0.000 rpc.py:559(__getattr__)
3 0.000 0.000 0.000 0.000 rpc.py:57(dumps)
1 0.000 0.000 0.001 0.001 rpc.py:577(__getmethods)
2 0.000 0.000 0.000 0.000 rpc.py:601(__init__)
2 0.000 0.000 0.005 0.003 rpc.py:606(__call__)
4 0.000 0.000 0.000 0.000 run.py:412(encoding)
4 0.000 0.000 0.000 0.000 run.py:416(errors)
2 0.000 0.000 0.006 0.003 run.py:433(write)
6 0.000 0.000 0.000 0.000 threading.py:1306(current_thread)
3 0.000 0.000 0.000 0.000 threading.py:222(__init__)
3 0.000 0.000 0.006 0.002 threading.py:270(wait)
3 0.000 0.000 0.000 0.000 threading.py:81(RLock)
3 0.000 0.000 0.000 0.000 {built-in method _struct.pack}
3 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
6 0.000 0.000 0.000 0.000 {built-in method _thread.get_ident}
1 0.000 0.000 0.006 0.006 {built-in method builtins.exec}
6 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
9 0.000 0.000 0.000 0.000 {built-in method builtins.len}
34 0.000 0.000 0.000 0.000 {built-in method builtins.max}
1 0.000 0.000 0.006 0.006 {built-in method builtins.print}
3 0.000 0.000 0.000 0.000 {built-in method select.select}
3 0.000 0.000 0.000 0.000 {method '_acquire_restore' of '_thread.RLock' objects}
3 0.000 0.000 0.000 0.000 {method '_is_owned' of '_thread.RLock' objects}
3 0.000 0.000 0.000 0.000 {method '_release_save' of '_thread.RLock' objects}
3 0.000 0.000 0.000 0.000 {method 'acquire' of '_thread.RLock' objects}
6 0.006 0.001 0.006 0.001 {method 'acquire' of '_thread.lock' objects}
3 0.000 0.000 0.000 0.000 {method 'append' of 'collections.deque' objects}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'bytes' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
3 0.000 0.000 0.000 0.000 {method 'dump' of '_pickle.Pickler' objects}
2 0.000 0.000 0.000 0.000 {method 'encode' of 'str' objects}
2 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
3 0.000 0.000 0.000 0.000 {method 'getvalue' of '_io.BytesIO' objects}
3 0.000 0.000 0.000 0.000 {method 'release' of '_thread.RLock' objects}
3 0.000 0.000 0.000 0.000 {method 'send' of '_socket.socket' objects}
The profiling codes are here:
import cProfile
from typing import List
def numRollsToTarget(d, f, target):
memo = {}
def dp(d, target):
if d == 0:
return 0 if target > 0 else 1
if (d, target) in memo:
return memo[(d, target)]
result = 0
for k in range(max(0, target-f), target):
result += dp(d-1, k)
memo[(d, target)] = result
return result
return dp(d, target) % (10**9 + 7)
def numRollsToTarget2(dices: int, faces: int, target: int) -> int:
dp = {}
def ways(t, rd):
if t == 0 and rd == 0: return 1
if t <= 0 or rd <= 0: return 0
if dp.get((t,rd)): return dp[(t,rd)]
dp[(t,rd)] = sum(ways(t-i, rd-1) for i in range(1,faces+1))
return dp[(t,rd)]
return ways(target, dices)
def numRollsToTarget3(dices: int, faces: int, target: int) -> int:
from functools import lru_cache
#lru_cache(None)
def ways(t, rd):
if t == 0 and rd == 0: return 1
if t <= 0 or rd <= 0: return 0
return sum(ways(t-i, rd-1) for i in range(1,faces+1))
return ways(target, dices)
def main():
print(numRollsToTarget(4, 6, 20))
#print(numRollsToTarget2(4, 6, 20))
#print(numRollsToTarget3(4, 6, 20)) # not faster than first
if __name__ == '__main__':
cProfile.run('main()')
I am storing an index in a compressed zip on disk and wanted to extract a single file from this zip. Doing this in python seems to be very slow, is it possible to solve this.
with zipfile.ZipFile("testoutput/index_doc.zip", mode='r') as myzip:
with myzip.open("c0ibtxf_i.txt") as mytxt:
txt = mytxt.read()
txt = codecs.decode(txt, "utf-8")
print(txt)
Is the python code I use. Running this script in python takes a noticably long time
python3 testunzip.py 1.22s user 0.06s system 98% cpu 1.303 total
Which is annoying, especially since I know it can go much faster:
unzip -p testoutput/index_doc.zip c0ibtxf_i.txt 0.01s user 0.00s system 69% cpu 0.023 total
as per request: profiling
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.051 0.051 1.492 1.492 <string>:1(<module>)
127740 0.043 0.000 0.092 0.000 cp437.py:14(decode)
1 0.000 0.000 1.441 1.441 testunzip.py:69(toprofile)
1 0.000 0.000 0.000 0.000 threading.py:72(RLock)
1 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
1 0.000 0.000 0.000 0.000 zipfile.py:1065(__enter__)
1 0.000 0.000 0.000 0.000 zipfile.py:1068(__exit__)
1 0.692 0.692 1.441 1.441 zipfile.py:1085(_RealGetContents)
1 0.000 0.000 0.000 0.000 zipfile.py:1194(getinfo)
1 0.000 0.000 0.000 0.000 zipfile.py:1235(open)
1 0.000 0.000 0.000 0.000 zipfile.py:1591(__del__)
2 0.000 0.000 0.000 0.000 zipfile.py:1595(close)
2 0.000 0.000 0.000 0.000 zipfile.py:1713(_fpclose)
1 0.000 0.000 0.000 0.000 zipfile.py:191(_EndRecData64)
1 0.000 0.000 0.000 0.000 zipfile.py:234(_EndRecData)
127739 0.180 0.000 0.220 0.000 zipfile.py:320(__init__)
127739 0.046 0.000 0.056 0.000 zipfile.py:436(_decodeExtra)
1 0.000 0.000 0.000 0.000 zipfile.py:605(_check_compression)
1 0.000 0.000 0.000 0.000 zipfile.py:636(_get_decompressor)
1 0.000 0.000 0.000 0.000 zipfile.py:654(__init__)
3 0.000 0.000 0.000 0.000 zipfile.py:660(read)
1 0.000 0.000 0.000 0.000 zipfile.py:667(close)
1 0.000 0.000 0.000 0.000 zipfile.py:708(__init__)
1 0.000 0.000 0.000 0.000 zipfile.py:821(read)
1 0.000 0.000 0.000 0.000 zipfile.py:854(_update_crc)
1 0.000 0.000 0.000 0.000 zipfile.py:901(_read1)
1 0.000 0.000 0.000 0.000 zipfile.py:937(_read2)
1 0.000 0.000 0.000 0.000 zipfile.py:953(close)
1 0.000 0.000 1.441 1.441 zipfile.py:981(__init__)
127740 0.049 0.000 0.049 0.000 {built-in method _codecs.charmap_decode}
1 0.000 0.000 0.000 0.000 {built-in method _codecs.decode}
1 0.000 0.000 0.000 0.000 {built-in method _codecs.utf_8_decode}
127743 0.058 0.000 0.058 0.000 {built-in method _struct.unpack}
127739 0.016 0.000 0.016 0.000 {built-in method builtins.chr}
1 0.000 0.000 1.492 1.492 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
2 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
255484 0.020 0.000 0.020 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {built-in method builtins.max}
1 0.000 0.000 0.000 0.000 {built-in method builtins.min}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.000 0.000 {built-in method io.open}
2 0.000 0.000 0.000 0.000 {built-in method zlib.crc32}
1 0.000 0.000 0.000 0.000 {function ZipExtFile.close at 0x101975620}
127741 0.011 0.000 0.011 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of '_io.BufferedReader' objects}
127740 0.224 0.000 0.317 0.000 {method 'decode' of 'bytes' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
127739 0.024 0.000 0.024 0.000 {method 'find' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
7 0.006 0.001 0.006 0.001 {method 'read' of '_io.BufferedReader' objects}
510956 0.071 0.000 0.071 0.000 {method 'read' of '_io.BytesIO' objects}
8 0.000 0.000 0.000 0.000 {method 'seek' of '_io.BufferedReader' objects}
4 0.000 0.000 0.000 0.000 {method 'tell' of '_io.BufferedReader' objects}
it seems to be something that happens in the constructor? Can I avoid this overhead somehow?
I figured out what the problem was:
pythons zipfile library builds a list of information object for each file in the zip
this causes zipfile to be quite fast once it's loaded.
but when there are a lot of files in the zip and you only need a small portion of this files each time you load the zip, the overhead of creating the info-list costs a lot of time.
To solve this, I adapted the source of python's zipfile. It has all the default functionalities you need, but when you give the constructor a list of the filenames to extract, it will not build the entire information list.
In the particular use case that you only need a few files from a zip, this will make a big difference in performance and memory usage.
for the particular case in the example above (namely extracting only one file from a zip containing 128K files, the speed of the new implementation now approaches the speed of the unzip method)
A test case:
def original_zipfile():
import zipfile
with zipfile.ZipFile("testoutput/index_doc.zip", mode='r') as myzip:
with myzip.open("c6kn5pu_i.txt") as mytxt:
txt = mytxt.read()
def my_zipfile():
import zipfile2
with zipfile2.ZipFile("testoutput/index_doc.zip", to_extract=["c6kn5pu_i.txt"], mode='r') as myzip:
with myzip.open("c6kn5pu_i.txt") as mytxt:
txt = mytxt.read()
if __name__ == "__main__":
import time
time1 = time.time()
original_zipfile()
print("running time of original_zipfile = "+str(time.time()-time1))
time1 = time.time()
my_zipfile()
print("running time of my_new_zipfile = "+str(time.time()-time1))
print(myStopwatch.getPretty())
resulted in the following time readings
running time of original_zipfile = 1.0871901512145996
running time of my_new_zipfile = 0.07036209106445312
I will include the source code, but notice that there are 2 small flaws to my implementation (once you give an extract list, when you don't the behaviour will be the same as mentioned before):
it assumes all filenames to be encoded in the same encoding (which is an optimisation I included for my own purposes)
other functionality might be altered (for example extract_all might fail or only extract the files you gave to the the constructor)
github link
I'm doing a kernel density estimation of a dataset (a collection of points).
The estimation process is ok, the problem is that, when I'm trying to get the density value for each point, the speed is very slow:
from sklearn.neighbors import KernelDensity
# this speed is ok
kde = KernelDensity(bandwidth=2.0,atol=0.0005,rtol=0.01).fit(sample)
# this is very slow
kde_result = kde.score_samples(sample)
The sample is consist of 300,000 (x,y) points.
I'm wondering if it's possible to make it run parallely, so the speed would be quicker?
For example, maybe I can divide the sample in to smaller sets and run the score_samples for each set at the same time? Specifically:
I'm not familliar with parallel computing at all. So I'm wondering if it's applicable in my case?
If this can really speed up the process, what should I do? I'm just running the script in ipython notebook, and have no prior expereince in this, is there any good and simple example for my case?
I'm reading http://ipython.org/ipython-doc/dev/parallel/parallel_intro.html now.
UPDATE:
import cProfile
cProfile.run('kde.score_samples(sample)')
64 function calls in 8.653 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 8.653 8.653 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 _methods.py:31(_sum)
2 0.000 0.000 0.000 0.000 base.py:870(isspmatrix)
1 0.000 0.000 8.653 8.653 kde.py:133(score_samples)
4 0.000 0.000 0.000 0.000 numeric.py:464(asanyarray)
2 0.000 0.000 0.000 0.000 shape_base.py:60(atleast_2d)
2 0.000 0.000 0.000 0.000 validation.py:105(_num_samples)
2 0.000 0.000 0.000 0.000 validation.py:126(_shape_repr)
6 0.000 0.000 0.000 0.000 validation.py:153(<genexpr>)
2 0.000 0.000 0.000 0.000 validation.py:268(check_array)
2 0.000 0.000 0.000 0.000 validation.py:43(_assert_all_finite)
6 0.000 0.000 0.000 0.000 {hasattr}
4 0.000 0.000 0.000 0.000 {isinstance}
12 0.000 0.000 0.000 0.000 {len}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
1 8.652 8.652 8.652 8.652 {method 'kernel_density' of 'sklearn.neighbors.kd_tree.BinaryTree' objects}
2 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects}
2 0.000 0.000 0.000 0.000 {method 'sum' of 'numpy.ndarray' objects}
6 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
Here is a simple example of parallelization using multiprocessing built-in module :
import numpy as np
import multiprocessing
from sklearn.neighbors import KernelDensity
def parrallel_score_samples(kde, samples, thread_count=int(0.875 * multiprocessing.cpu_count())):
with multiprocessing.Pool(thread_count) as p:
return np.concatenate(p.map(kde.score_samples, np.array_split(samples, thread_count)))
kde = KernelDensity(bandwidth=2.0,atol=0.0005,rtol=0.01).fit(sample)
kde_result = parrallel_score_samples(kde, sample)
As you can see from code above, multiprocessing.Pool allows you to map a pool of worker processes executing kde.score_samples on a subset of your samples.
The speedup will be significant if your processor have enough cores.
I have this code:
def getNeighbors(cfg, cand, adj):
c_nocfg = np.setdiff1d(cand, cfg)
deg = np.sum(adj[np.ix_(cfg, c_nocfg)], axis=0)
degidx = np.where(deg) > 0
nbs = c_nocfg[degidx]
deg = deg[degidx]
return nbs, deg
which retrieves neighbors (and their degree in a subgraph spanned by nodes in cfg) from an adjacency matrix.
Inlining the code gives reasonable performance (~10kx10k adjacency matrix as boolean array, 10k candidates in cand, subgraph cfg spanning 500 nodes): 0.02s
However, calling the function getNeighbors results in an overhead of roughly 0.5s.
Mocking
deg = np.sum(adj[np.ix_(cfg, c_nocfg)], axis=0)
with
deg = np.random.randint(500, size=c_nocfg.shape[0])
drives down the runtime of the function call to 0.002s.
Could someone explain me what causes the enormous overhead in my case - after all, the sum-operation itself is not all too costly.
Profiling output
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.466 0.466 0.488 0.488 /home/x/x/x/utils/benchmarks.py:15(getNeighbors)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:1621(sum)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:23(_sum)
1 0.019 0.019 0.019 0.019 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 0.002 0.002 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:410(setdiff1d)
1 0.000 0.000 0.001 0.001 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:284(in1d)
2 0.001 0.000 0.001 0.000 {method 'argsort' of 'numpy.ndarray' objects}
2 0.000 0.000 0.001 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:93(unique)
2 0.000 0.000 0.000 0.000 {method 'sort' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.where}
4 0.000 0.000 0.000 0.000 {numpy.core.multiarray.concatenate}
2 0.000 0.000 0.000 0.000 {method 'flatten' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/index_tricks.py:26(ix_)
5 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py:392(asarray)
5 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {range}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
2 0.000 0.000 0.000 0.000 {issubclass}
2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
6 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
inline version:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:1621(sum)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:23(_sum)
1 0.019 0.019 0.019 0.019 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 0.002 0.002 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:410(setdiff1d)
1 0.000 0.000 0.001 0.001 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:284(in1d)
2 0.001 0.000 0.001 0.000 {method 'argsort' of 'numpy.ndarray' objects}
2 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:93(unique)
2 0.000 0.000 0.000 0.000 {method 'sort' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.where}
4 0.000 0.000 0.000 0.000 {numpy.core.multiarray.concatenate}
2 0.000 0.000 0.000 0.000 {method 'flatten' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/index_tricks.py:26(ix_)
5 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py:392(asarray)
5 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {range}
2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
2 0.000 0.000 0.000 0.000 {issubclass}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
6 0.000 0.000 0.000 0.000 {len}
sample data for testing:
np.random.seed(0)
adj = np.zeros(10000*10000, dtype=np.bool)
adj[np.random.randint(low=0, high=10000*10000+1, size=100000)] = True
adj = adj.reshape((10000, 10000))
cand = np.arange(adj.shape[0])
cfgs = np.random.choice(cand, size=500)
I currently process sections of a string like this:
for (i, j) in huge_list_of_indices:
process(huge_text_block[i:j])
I want to avoid the overhead of generating these temporary substrings. Any ideas? Perhaps a wrapper that somehow uses index offsets? This is currently my bottleneck.
Note that process() is another python module that expects a string as input.
Edit:
A few people doubt there is a problem. Here are some sample results:
import time
import string
text = string.letters * 1000
def timeit(fn):
t1 = time.time()
for i in range(len(text)):
fn(i)
t2 = time.time()
print '%s took %0.3f ms' % (fn.func_name, (t2-t1) * 1000)
def test_1(i):
return text[i:]
def test_2(i):
return text[:]
def test_3(i):
return text
timeit(test_1)
timeit(test_2)
timeit(test_3)
Output:
test_1 took 972.046 ms
test_2 took 47.620 ms
test_3 took 43.457 ms
I think what you are looking for are buffers.
The characteristic of buffers is that they "slice" an object supporting the buffer interface without copying its content, but essentially opening a "window" on the sliced object content. Some more technical explanation is available here. An excerpt:
Python objects implemented in C can export a group of functions called the “buffer interface.” These functions can be used by an object to expose its data in a raw, byte-oriented format. Clients of the object can use the buffer interface to access the object data directly, without needing to copy it first.
In your case the code should look more or less like this:
>>> s = 'Hugely_long_string_not_to_be_copied'
>>> ij = [(0, 3), (6, 9), (12, 18)]
>>> for i, j in ij:
... print buffer(s, i, j-i) # Should become process(...)
Hug
_lo
string
HTH!
A wrapper that uses index offsets to a mmap object could work, yes.
But before you do that, are you sure that generating these substrings are a problem? Don't optimize before you have found out where the time and memory actually goes. I wouldn't expect this to be a significant problem.
If you are using Python3 you can use protocol buffer and memory views. Assuming that the text is stored somewhere in the filesystem:
f = open(FILENAME, 'rb')
data = bytearray(os.path.getsize(FILENAME))
f.readinto(data)
mv = memoryview(data)
for (i, j) in huge_list_of_indices:
process(mv[i:j])
Check also this article. It might be useful.
Maybe a wrapper that uses index offsets is indeed what you are looking for. Here is an example that does the job. You may have to add more checks on slices (for overflow and negative indexes) depending on your needs.
#!/usr/bin/env python
from collections import Sequence
from timeit import Timer
def process(s):
return s[0], len(s)
class FakeString(Sequence):
def __init__(self, string):
self._string = string
self.fake_start = 0
self.fake_stop = len(string)
def setFakeIndices(self, i, j):
self.fake_start = i
self.fake_stop = j
def __len__(self):
return self.fake_stop - self.fake_start
def __getitem__(self, ii):
if isinstance(ii, slice):
if ii.start is None:
start = self.fake_start
else:
start = ii.start + self.fake_start
if ii.stop is None:
stop = self.fake_stop
else:
stop = ii.stop + self.fake_start
ii = slice(start,
stop,
ii.step)
else:
ii = ii + self.fake_start
return self._string[ii]
def initial_method():
r = []
for n in xrange(1000):
r.append(process(huge_string[1:9999999]))
return r
def alternative_method():
r = []
for n in xrange(1000):
fake_string.setFakeIndices(1, 9999999)
r.append(process(fake_string))
return r
if __name__ == '__main__':
huge_string = 'ABCDEFGHIJ' * 100000
fake_string = FakeString(huge_string)
fake_string.setFakeIndices(5,15)
assert fake_string[:] == huge_string[5:15]
t = Timer(initial_method)
print "initial_method(): %fs" % t.timeit(number=1)
which gives:
initial_method(): 1.248001s
alternative_method(): 0.003416s
The example the OP gives, will give nearly biggest performance difference between slicing and not slicing possible.
If processing actually does something that takes significant time, the problem may hardly exist.
Fact is OP needs to let us know what process does. The most likely scenario is it does something significant, and therefore he should profile his code.
Adapted from op's example:
#slice_time.py
import time
import string
text = string.letters * 1000
import random
indices = range(len(text))
random.shuffle(indices)
import re
def greater_processing(a_string):
results = re.findall('m', a_string)
def medium_processing(a_string):
return re.search('m.*?m', a_string)
def lesser_processing(a_string):
return re.match('m', a_string)
def least_processing(a_string):
return a_string
def timeit(fn, processor):
t1 = time.time()
for i in indices:
fn(i, i + 1000, processor)
t2 = time.time()
print '%s took %0.3f ms %s' % (fn.func_name, (t2-t1) * 1000, processor.__name__)
def test_part_slice(i, j, processor):
return processor(text[i:j])
def test_copy(i, j, processor):
return processor(text[:])
def test_text(i, j, processor):
return processor(text)
def test_buffer(i, j, processor):
return processor(buffer(text, i, j - i))
if __name__ == '__main__':
processors = [least_processing, lesser_processing, medium_processing, greater_processing]
tests = [test_part_slice, test_copy, test_text, test_buffer]
for processor in processors:
for test in tests:
timeit(test, processor)
And then the run...
In [494]: run slice_time.py
test_part_slice took 68.264 ms least_processing
test_copy took 42.988 ms least_processing
test_text took 33.075 ms least_processing
test_buffer took 76.770 ms least_processing
test_part_slice took 270.038 ms lesser_processing
test_copy took 197.681 ms lesser_processing
test_text took 196.716 ms lesser_processing
test_buffer took 262.288 ms lesser_processing
test_part_slice took 416.072 ms medium_processing
test_copy took 352.254 ms medium_processing
test_text took 337.971 ms medium_processing
test_buffer took 438.683 ms medium_processing
test_part_slice took 502.069 ms greater_processing
test_copy took 8149.231 ms greater_processing
test_text took 8292.333 ms greater_processing
test_buffer took 563.009 ms greater_processing
Notes:
Yes I tried OP's original test_1 with [i:] slice and it's much slower, making his test even more bunk.
Interesting that buffer almost always performs slightly slower then slicing. This time there is one where it does better though! The real test though is below and buffer seems to do better for larger substrings while slicing does better for smaller substrings.
And, yes, I do have some randomness in this test so test away and see the different results :). It also may be interesting to changes the size of the 1000's.
So, maybe some others believe you, but I don't, so I'd like to know something about what processing does and how you came to the conclusion: "slicing is the problem."
I profiled medium processing in my example and upped the string.letters multiplier to 100000 and raised the length of the slices to 10000. Also below is one with slices of length 100. I used cProfile for these (much less overhead then profile!).
test_part_slice took 77338.285 ms medium_processing
31200019 function calls in 77.338 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 77.338 77.338 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 iostream.py:63(write)
5200000 8.208 0.000 43.823 0.000 re.py:139(search)
5200000 9.205 0.000 12.897 0.000 re.py:228(_compile)
5200000 5.651 0.000 49.475 0.000 slice_time.py:15(medium_processing)
1 7.901 7.901 77.338 77.338 slice_time.py:24(timeit)
5200000 19.963 0.000 69.438 0.000 slice_time.py:31(test_part_slice)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
2 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode}
2 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5200000 3.692 0.000 3.692 0.000 {method 'get' of 'dict' objects}
5200000 22.718 0.000 22.718 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
2 0.000 0.000 0.000 0.000 {method 'write' of '_io.StringIO' objects}
4 0.000 0.000 0.000 0.000 {time.time}
test_buffer took 58067.440 ms medium_processing
31200103 function calls in 58.068 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 58.068 58.068 <string>:1(<module>)
3 0.000 0.000 0.000 0.000 __init__.py:185(dumps)
3 0.000 0.000 0.000 0.000 encoder.py:102(__init__)
3 0.000 0.000 0.000 0.000 encoder.py:180(encode)
3 0.000 0.000 0.000 0.000 encoder.py:206(iterencode)
1 0.000 0.000 0.001 0.001 iostream.py:37(flush)
2 0.000 0.000 0.001 0.000 iostream.py:63(write)
1 0.000 0.000 0.000 0.000 iostream.py:86(_new_buffer)
3 0.000 0.000 0.000 0.000 jsonapi.py:57(_squash_unicode)
3 0.000 0.000 0.000 0.000 jsonapi.py:69(dumps)
2 0.000 0.000 0.000 0.000 jsonutil.py:78(date_default)
1 0.000 0.000 0.000 0.000 os.py:743(urandom)
5200000 6.814 0.000 39.110 0.000 re.py:139(search)
5200000 7.853 0.000 10.878 0.000 re.py:228(_compile)
1 0.000 0.000 0.000 0.000 session.py:149(msg_header)
1 0.000 0.000 0.000 0.000 session.py:153(extract_header)
1 0.000 0.000 0.000 0.000 session.py:315(msg_id)
1 0.000 0.000 0.000 0.000 session.py:350(msg_header)
1 0.000 0.000 0.000 0.000 session.py:353(msg)
1 0.000 0.000 0.000 0.000 session.py:370(sign)
1 0.000 0.000 0.000 0.000 session.py:385(serialize)
1 0.000 0.000 0.001 0.001 session.py:437(send)
3 0.000 0.000 0.000 0.000 session.py:75(<lambda>)
5200000 4.732 0.000 43.842 0.000 slice_time.py:15(medium_processing)
1 5.423 5.423 58.068 58.068 slice_time.py:24(timeit)
5200000 8.802 0.000 52.645 0.000 slice_time.py:40(test_buffer)
7 0.000 0.000 0.000 0.000 traitlets.py:268(__get__)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
1 0.000 0.000 0.000 0.000 uuid.py:101(__init__)
1 0.000 0.000 0.000 0.000 uuid.py:197(__str__)
1 0.000 0.000 0.000 0.000 uuid.py:531(uuid4)
2 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode}
1 0.000 0.000 0.000 0.000 {built-in method now}
18 0.000 0.000 0.000 0.000 {isinstance}
4 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {locals}
1 0.000 0.000 0.000 0.000 {map}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of '_io.StringIO' objects}
1 0.000 0.000 0.000 0.000 {method 'count' of 'list' objects}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects}
5200001 3.025 0.000 3.025 0.000 {method 'get' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'getvalue' of '_io.StringIO' objects}
3 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
5200000 21.418 0.000 21.418 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
1 0.000 0.000 0.000 0.000 {method 'send_multipart' of 'zmq.core.socket.Socket' objects}
2 0.000 0.000 0.000 0.000 {method 'strftime' of 'datetime.date' objects}
1 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'write' of '_io.StringIO' objects}
1 0.000 0.000 0.000 0.000 {posix.close}
1 0.000 0.000 0.000 0.000 {posix.open}
1 0.000 0.000 0.000 0.000 {posix.read}
4 0.000 0.000 0.000 0.000 {time.time}
Smaller slices (100 length).
test_part_slice took 54916.153 ms medium_processing
31200019 function calls in 54.916 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 54.916 54.916 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 iostream.py:63(write)
5200000 6.788 0.000 38.312 0.000 re.py:139(search)
5200000 8.014 0.000 11.257 0.000 re.py:228(_compile)
5200000 4.722 0.000 43.034 0.000 slice_time.py:15(medium_processing)
1 5.594 5.594 54.916 54.916 slice_time.py:24(timeit)
5200000 6.288 0.000 49.322 0.000 slice_time.py:31(test_part_slice)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
2 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode}
2 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5200000 3.242 0.000 3.242 0.000 {method 'get' of 'dict' objects}
5200000 20.268 0.000 20.268 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
2 0.000 0.000 0.000 0.000 {method 'write' of '_io.StringIO' objects}
4 0.000 0.000 0.000 0.000 {time.time}
test_buffer took 62019.684 ms medium_processing
31200103 function calls in 62.020 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 62.020 62.020 <string>:1(<module>)
3 0.000 0.000 0.000 0.000 __init__.py:185(dumps)
3 0.000 0.000 0.000 0.000 encoder.py:102(__init__)
3 0.000 0.000 0.000 0.000 encoder.py:180(encode)
3 0.000 0.000 0.000 0.000 encoder.py:206(iterencode)
1 0.000 0.000 0.001 0.001 iostream.py:37(flush)
2 0.000 0.000 0.001 0.000 iostream.py:63(write)
1 0.000 0.000 0.000 0.000 iostream.py:86(_new_buffer)
3 0.000 0.000 0.000 0.000 jsonapi.py:57(_squash_unicode)
3 0.000 0.000 0.000 0.000 jsonapi.py:69(dumps)
2 0.000 0.000 0.000 0.000 jsonutil.py:78(date_default)
1 0.000 0.000 0.000 0.000 os.py:743(urandom)
5200000 7.426 0.000 41.152 0.000 re.py:139(search)
5200000 8.470 0.000 11.628 0.000 re.py:228(_compile)
1 0.000 0.000 0.000 0.000 session.py:149(msg_header)
1 0.000 0.000 0.000 0.000 session.py:153(extract_header)
1 0.000 0.000 0.000 0.000 session.py:315(msg_id)
1 0.000 0.000 0.000 0.000 session.py:350(msg_header)
1 0.000 0.000 0.000 0.000 session.py:353(msg)
1 0.000 0.000 0.000 0.000 session.py:370(sign)
1 0.000 0.000 0.000 0.000 session.py:385(serialize)
1 0.000 0.000 0.001 0.001 session.py:437(send)
3 0.000 0.000 0.000 0.000 session.py:75(<lambda>)
5200000 5.399 0.000 46.551 0.000 slice_time.py:15(medium_processing)
1 5.958 5.958 62.020 62.020 slice_time.py:24(timeit)
5200000 9.510 0.000 56.061 0.000 slice_time.py:40(test_buffer)
7 0.000 0.000 0.000 0.000 traitlets.py:268(__get__)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
1 0.000 0.000 0.000 0.000 uuid.py:101(__init__)
1 0.000 0.000 0.000 0.000 uuid.py:197(__str__)
1 0.000 0.000 0.000 0.000 uuid.py:531(uuid4)
2 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode}
1 0.000 0.000 0.000 0.000 {built-in method now}
18 0.000 0.000 0.000 0.000 {isinstance}
4 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {locals}
1 0.000 0.000 0.000 0.000 {map}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of '_io.StringIO' objects}
1 0.000 0.000 0.000 0.000 {method 'count' of 'list' objects}
2 0.000 0.000 0.000 0.000 {method 'decode' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects}
5200001 3.158 0.000 3.158 0.000 {method 'get' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'getvalue' of '_io.StringIO' objects}
3 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
5200000 22.097 0.000 22.097 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
1 0.000 0.000 0.000 0.000 {method 'send_multipart' of 'zmq.core.socket.Socket' objects}
2 0.000 0.000 0.000 0.000 {method 'strftime' of 'datetime.date' objects}
1 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'write' of '_io.StringIO' objects}
1 0.000 0.000 0.000 0.000 {posix.close}
1 0.000 0.000 0.000 0.000 {posix.open}
1 0.000 0.000 0.000 0.000 {posix.read}
4 0.000 0.000 0.000 0.000 {time.time}
process(huge_text_block[i:j])
I want to avoid the overhead of generating these temporary substrings.
(...)
Note that process() is another python module
that expects a string as input.
Completely contradictory.
How can you imagine to find a way for not creating what the function requires ?!