Profile() context vs runctx

Profile() context vs runctx - python

When I run
my_str = "res = f(content1, content2, reso)"
cProfile.runctx(my_str, globals(), locals())
I get:
3 function calls in 0.038 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.037 0.037 0.037 0.037 <string>:1(<module>)
1 0.000 0.000 0.038 0.038 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Which is Nice, however, when I run:
with cProfile.Profile() as pr:
f(content1, content2, reso)
pr.print_stats()
I get something different (and all times are = to 0):
9 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 cProfile.py:40(print_stats)
1 0.000 0.000 0.000 0.000 cProfile.py:50(create_stats)
1 0.000 0.000 0.000 0.000 pstats.py:107(__init__)
1 0.000 0.000 0.000 0.000 pstats.py:117(init)
1 0.000 0.000 0.000 0.000 pstats.py:136(load_stats)
1 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
1 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
1 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
What's the difference between these 2? I would expect them printing the same result. Am I missing something?

Related

Identify a bottleneck that isn't explicit using cProfile

Can you help identify the bottleneck of this code? I am solving problem #7 of project Euler, and am failing to understand why this solution takes so long (30s). I know there are better solutions, I just want to understand more about why this, specifically is so bad.
def primes(n):
primes = set([2])
count = 1
i = 1
while count < n:
i += 2
if not any([i % num == 0 for num in primes]):
primes.add(i)
count += 1
print i
cProfile.run("primes(10001)") #slow! 30s
The profile is here below:
62374 function calls in 34.605 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 34.605 34.605 <string>:1(<module>)
1 34.273 34.273 34.605 34.605 problem_7.py:8(primes)
52371 0.328 0.000 0.328 0.000 {any}
10000 0.004 0.000 0.004 0.000 {method 'add' of 'set' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

I implemented a couple tweaks to help out. See my comments and results below. In all cases I changed the accumulator from being named primes (shadowing its function name) to being named ps. I didn't test to see if this improved performance, I just hate shadowing names :o)
# Your original code
def prime_orig(n):
ps = set([2])
count = 1
i = 1
while count < n:
i += 2
if not any([i % num == 0 for num in ps]):
ps.add(i)
count += 1
# replace the set accum with a list, per #Sheshnath's comment
def prime_list(n):
ps = [2]
count = 1
i = 1
while count < n:
i += 2
if not any([i % num == 0 for num in ps]):
ps.append(i)
count += 1
# replace the listcomp with a genexp
def prime_genexp(n):
ps = set([2])
count = 1
i = 1
while count < n:
i += 2
if not any(i % num == 0 for num in ps):
ps.add(i)
count += 1
# both optimizations at once
def prime_genexp_list(n):
ps = [2]
count = 1
i = 1
while count < n:
i += 2
if not any(i % num == 0 for num in ps):
ps.append(i)
count += 1
RESULTS:
cProfile.run('prime_orig(10001)')
114746 function calls in 27.283 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.701 0.701 27.283 27.283 <ipython-input-2-3059f1e23ab4>:1(prime_orig)
52371 26.330 0.001 26.330 0.001 <ipython-input-2-3059f1e23ab4>:7(<listcomp>)
1 0.000 0.000 27.283 27.283 <string>:1(<module>)
52371 0.250 0.000 0.250 0.000 {built-in method builtins.any}
1 0.000 0.000 27.283 27.283 {built-in method builtins.exec}
10000 0.003 0.000 0.003 0.000 {method 'add' of 'set' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
cProfile.run('prime_list(10001)')
114746 function calls in 24.523 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.666 0.666 24.523 24.523 <ipython-input-2-3059f1e23ab4>:11(prime_list)
52371 23.625 0.000 23.625 0.000 <ipython-input-2-3059f1e23ab4>:17(<listcomp>)
1 0.000 0.000 24.523 24.523 <string>:1(<module>)
52371 0.231 0.000 0.231 0.000 {built-in method builtins.any}
1 0.000 0.000 24.523 24.523 {built-in method builtins.exec}
10000 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
cProfile.run('prime_genexp(10001)')
50627376 function calls in 10.577 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.040 0.040 10.577 10.577 <ipython-input-2-3059f1e23ab4>:21(prime_genexp)
50565001 7.060 0.000 7.060 0.000 <ipython-input-2-3059f1e23ab4>:27(<genexpr>)
1 0.000 0.000 10.577 10.577 <string>:1(<module>)
52371 3.475 0.000 10.530 0.000 {built-in method builtins.any}
1 0.000 0.000 10.577 10.577 {built-in method builtins.exec}
10000 0.002 0.000 0.002 0.000 {method 'add' of 'set' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
cProfile.run('prime_genexp_list(10001)')
50400891 function calls in 9.781 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.040 0.040 9.781 9.781 <ipython-input-2-3059f1e23ab4>:31(prime_genexp_list)
50338516 6.272 0.000 6.272 0.000 <ipython-input-2-3059f1e23ab4>:37(<genexpr>)
1 0.000 0.000 9.781 9.781 <string>:1(<module>)
52371 3.468 0.000 9.735 0.000 {built-in method builtins.any}
1 0.000 0.000 9.781 9.781 {built-in method builtins.exec}
10000 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Python Profiling - discrepancy between inlining and calling function

I have this code:
def getNeighbors(cfg, cand, adj):
c_nocfg = np.setdiff1d(cand, cfg)
deg = np.sum(adj[np.ix_(cfg, c_nocfg)], axis=0)
degidx = np.where(deg) > 0
nbs = c_nocfg[degidx]
deg = deg[degidx]
return nbs, deg
which retrieves neighbors (and their degree in a subgraph spanned by nodes in cfg) from an adjacency matrix.
Inlining the code gives reasonable performance (~10kx10k adjacency matrix as boolean array, 10k candidates in cand, subgraph cfg spanning 500 nodes): 0.02s
However, calling the function getNeighbors results in an overhead of roughly 0.5s.
Mocking
deg = np.sum(adj[np.ix_(cfg, c_nocfg)], axis=0)
with
deg = np.random.randint(500, size=c_nocfg.shape[0])
drives down the runtime of the function call to 0.002s.
Could someone explain me what causes the enormous overhead in my case - after all, the sum-operation itself is not all too costly.
Profiling output
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.466 0.466 0.488 0.488 /home/x/x/x/utils/benchmarks.py:15(getNeighbors)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:1621(sum)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:23(_sum)
1 0.019 0.019 0.019 0.019 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 0.002 0.002 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:410(setdiff1d)
1 0.000 0.000 0.001 0.001 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:284(in1d)
2 0.001 0.000 0.001 0.000 {method 'argsort' of 'numpy.ndarray' objects}
2 0.000 0.000 0.001 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:93(unique)
2 0.000 0.000 0.000 0.000 {method 'sort' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.where}
4 0.000 0.000 0.000 0.000 {numpy.core.multiarray.concatenate}
2 0.000 0.000 0.000 0.000 {method 'flatten' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/index_tricks.py:26(ix_)
5 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py:392(asarray)
5 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {range}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
2 0.000 0.000 0.000 0.000 {issubclass}
2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
6 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
inline version:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:1621(sum)
1 0.000 0.000 0.019 0.019 /usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:23(_sum)
1 0.019 0.019 0.019 0.019 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 0.002 0.002 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:410(setdiff1d)
1 0.000 0.000 0.001 0.001 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:284(in1d)
2 0.001 0.000 0.001 0.000 {method 'argsort' of 'numpy.ndarray' objects}
2 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.py:93(unique)
2 0.000 0.000 0.000 0.000 {method 'sort' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.where}
4 0.000 0.000 0.000 0.000 {numpy.core.multiarray.concatenate}
2 0.000 0.000 0.000 0.000 {method 'flatten' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/lib/index_tricks.py:26(ix_)
5 0.000 0.000 0.000 0.000 /usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py:392(asarray)
5 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {isinstance}
2 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {range}
2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
2 0.000 0.000 0.000 0.000 {issubclass}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
6 0.000 0.000 0.000 0.000 {len}
sample data for testing:
np.random.seed(0)
adj = np.zeros(10000*10000, dtype=np.bool)
adj[np.random.randint(low=0, high=10000*10000+1, size=100000)] = True
adj = adj.reshape((10000, 10000))
cand = np.arange(adj.shape[0])
cfgs = np.random.choice(cand, size=500)

Best performance method for evaluating presence of substrings within strings in python?

I know I can do:
word = 'shrimp'
sentence = 'They eat shrimp in Mexico, and even more these days'
word in sentence
And so it'd would evaluate True.
But what if I have:
words = ['a','la','los','shrimp']
How can I evaluate whether sentence contains any words elements?
I care for performance as list might be large and I don't want to extend code lines using loops

Could be read as English. But might not be the most performant. First benchmark and then only if needed, use an optimized solution.
any(word in sentence for word in words)
any returns True if any element is True, otherwise False. (word in sentence for word in words) is an iterable, which produces whether each word is in the sentence or not.

>>> words = ['a','la','los','shrimp']
>>> sentence = 'They eat shrimp in Mexico, and more these days'
>>> [word for word in words if word in sentence]
['a', 'shrimp']
>>> any(word in sentence for word in words)
True
words = ['a','la','los','shrimp']
sentence = 'They eat shrimp in Mexico, and more these days'
a = map(lambda word: word in words if word in sentence else None, words)
print(a.__next__())
True
There are many ways to perform this.
Since you asked for Performance, here are the results.
In [1]: words = ['a','la','los','shrimp']
In [2]: sentence = 'They eat shrimp in Mexico
In [3]: a = map(lambda word: word in words if word in sentence else None, words)
In [4]: a.__next__()
Out[4]: True
In [5]: %timeit a
10000000 loops, best of 3: 65.3 ns per loop
In [6]: b = any(word in sentence for word in words)
In [7]: %timeit b
10000000 loops, best of 3: 50.1 ns per loop
words = ['a','la','fresh','shrimp','ban']
sentence = 'They fresh bananas in Mexico, and more these days'
a = [word for word in words if word in sentence.split(' ')]
print(a) #['fresh']
b = any(word in sentence.split(' ') for word in words)
print(b) #True

for word in words:
if word in sentence:
#do stuff

Here is what i Found.
Testing is done on 100k words:
NB: Generating sets is expensive inside the function if your list is long, BUT if you can pre-process into set before calling the function then, set will win.
Snippet:
#!/usr/bin/python
import cProfile
from timeit import Timer
from faker import Faker
def func1(sentence, words):
return any(word in sentence for word in words)
def func2(sentence, words):
for word in words:
if word in sentence:
return True
return False
def func3(sentence, words):
# using set
sets = set(sentence)
return bool(set(words).intersection(sets))
def func4(sentence, words):
# using set
sets = set(sentence)
for word in words:
if word in sets:
return True
return False
def func5(sentence, words):
# using set
sentence, words = set(sentence), set(words)
return not words.isdisjoint(sentence)
s = Faker()
sentence = s.sentence(nb_words=100000).split()
words = 'quidem necessitatibus minus id quos in neque omnis molestias'.split()
func = [ func1, func2, func3, func4, func5 ]
for fun in func:
t = Timer(lambda: fun(sentence, words))
print fun.__name__, cProfile.run('t.timeit(number=1000)')
Output:
Result: func2 wins
func1 5011 function calls in 0.032 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.032 0.032 <string>:1(<module>)
1000 0.000 0.000 0.032 0.000 exists.py:42(<lambda>)
1000 0.001 0.000 0.032 0.000 exists.py:8(func1)
2000 0.030 0.000 0.030 0.000 exists.py:9(<genexpr>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 0.032 0.032 timeit.py:178(timeit)
1 0.000 0.000 0.032 0.032 timeit.py:96(inner)
1000 0.000 0.000 0.030 0.000 {any}
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
func2 2011 function calls in 0.031 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.031 0.031 <string>:1(<module>)
1000 0.031 0.000 0.031 0.000 exists.py:11(func2)
1000 0.000 0.000 0.031 0.000 exists.py:42(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 0.031 0.031 timeit.py:178(timeit)
1 0.000 0.000 0.031 0.031 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
func3 3011 function calls in 7.079 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 7.079 7.079 <string>:1(<module>)
1000 7.069 0.007 7.073 0.007 exists.py:17(func3)
1000 0.004 0.000 7.077 0.007 exists.py:42(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 7.079 7.079 timeit.py:178(timeit)
1 0.002 0.002 7.079 7.079 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000 0.004 0.000 0.004 0.000 {method 'intersection' of 'set' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
func4 2011 function calls in 7.022 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 7.022 7.022 <string>:1(<module>)
1000 7.014 0.007 7.014 0.007 exists.py:22(func4)
1000 0.006 0.000 7.020 0.007 exists.py:42(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 7.022 7.022 timeit.py:178(timeit)
1 0.002 0.002 7.022 7.022 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
func5 3011 function calls in 7.142 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 7.142 7.142 <string>:1(<module>)
1000 7.133 0.007 7.134 0.007 exists.py:30(func5)
1000 0.006 0.000 7.140 0.007 exists.py:42(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 7.142 7.142 timeit.py:178(timeit)
1 0.002 0.002 7.142 7.142 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000 0.002 0.000 0.002 0.000 {method 'isdisjoint' of 'set' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None

Characters in Strings in Python

Implement a function with signature find_chars(string1, string2) that
takes two strings and returns a string that contains only the
characters found in string1 and string two in the order that they are
found in string1. Implement a version of order N*N and one of order N.
(Source: http://thereq.com/q/138/python-software-interview-question/characters-in-strings)
Here are my solutions:
Order N*N:
def find_chars_slow(string1, string2):
res = []
for char in string1:
if char in string2:
res.append(char)
return ''.join(res)
So the for loop goes through N elements, and each char in string2 check is another N operations so this gives N*N.
Order N:
from collections import defaultdict
def find_char_fast(string1, string2):
d = defaultdict(int)
for char in string2:
d[char] += 1
res = []
for char in string1:
if char in d:
res.append(char)
return ''.join(res)
First store the characters of string2 as a dictionary (O(N)). Then scan string1 (O(N)) and check if it is in the dict (O(1)). This gives a total runtime of O(2N) = O(N).
Is the above correct? Is there a faster method?

Your solution is algorithmically correct (the first is O(n**2), and the second is O(n)) but you're doing some things that are going to be possible red flags to an interviewer.
The first function is basically okay. You might get bonus points for writing it like this:
''.join([c for c in string1 if c in string2])
..which does essentially the same thing.
My problem (if I'm wearing my interviewer pants) with how you've written the second function is that you use a defaultdict where you don't care at all about the count - you only care about membership. This is the classic case for when to use a set.
seen = set(string2)
''.join([c for c in string1 if c in seen])
The way I've written these functions are going to be slightly faster than what you wrote, since list comprehensions loop in native code rather than in python bytecode. They are algorithmically the same complexity.

Your method is sound, and there is no method with time complexity less than O(N), since you obviously need to go through each character at least once.
That is not to say that there's no method that runs faster. There's no need to actually increment the numbers in the dictionary. You could, for example, use a set. You could also make further use of python's features, such as list comprehensions/generators:
def find_char_fast2(string1, string2):
s= set(string2)
return "".join( (x for x in string1 if x in s) )

The algorithms you have used are perfectly fine. There are few improvements you can do
Since you are converting the second string to a dictionary, I would recommend using set, like this
d = set(string2)
Apart from that you can use list comprehension, as a filter, like this
return "".join([char for char in string1 if char in d])
If the order of the characters in the output doesn't matter, you can simply convert both the strings to sets and simply find set difference, like this
return "".join(set(string1) - set(string2))

Hi, I am trying to profiling the various solution given here:
In my snippet, I am using a module called faker to generate fake words so i can test on very long string more than 20k characters:
Snippet:
from faker import Faker
from timeit import Timer
from collections import defaultdict
def first(string1, string2):
sets = set(string2)
return ''.join((c for c in string1 if c in sets))
def second(s1, s2):
res = []
for char in string1:
if char in string2:
res.append(char)
return ''.join(res)
def third(s1, s2):
d = defaultdict(int)
for char in string2:
d[char] += 1
res = []
for char in string1:
if char in d:
res.append(char)
return ''.join(res)
f = Faker()
string1 = ''.join(f.paragraph(nb_sentences=10000).split())
string2 = ''.join(f.paragraph(nb_sentences=10000).split())
funcs = [first, second, third]
import cProfile
print 'Length of String1: ', len(string1)
print 'Length of String2: ', len(string2)
print 'Time taken to execute:'
for f in funcs:
t = Timer(lambda: f(string1, string2))
print f.__name__, cProfile.run('t.timeit(number=100)')
Output:
Length of String1: 525133
Length of String2: 501050
Time taken to execute:
first 52513711 function calls in 18.169 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
100 0.001 0.000 18.164 0.182 s.py:39(<lambda>)
100 1.723 0.017 18.163 0.182 s.py:5(first)
52513400 9.442 0.000 9.442 0.000 s.py:7(<genexpr>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 18.169 18.169 timeit.py:178(timeit)
1 0.005 0.005 18.169 18.169 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 6.998 0.070 16.440 0.164 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
second 52513611 function calls in 22.280 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 22.280 22.280 <string>:1(<module>)
100 0.121 0.001 22.275 0.223 s.py:39(<lambda>)
100 16.957 0.170 22.153 0.222 s.py:9(second)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 22.280 22.280 timeit.py:178(timeit)
1 0.005 0.005 22.280 22.280 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
52513300 4.018 0.000 4.018 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 1.179 0.012 1.179 0.012 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
third 52513611 function calls in 28.184 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 28.184 28.184 <string>:1(<module>)
100 22.847 0.228 28.059 0.281 s.py:16(third)
100 0.120 0.001 28.179 0.282 s.py:39(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 28.184 28.184 timeit.py:178(timeit)
1 0.005 0.005 28.184 28.184 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
52513300 4.032 0.000 4.032 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 1.180 0.012 1.180 0.012 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
Conclusion:
So, the first function with comprehension is the fastest.
But when you run on strings size around 25K characters second functions wins.
Length of String1: 22959
Length of String2: 452919
Time taken to execute:
first 2296311 function calls in 2.216 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
100 0.000 0.000 2.216 0.022 s.py:39(<lambda>)
100 1.530 0.015 2.216 0.022 s.py:5(first)
2296000 0.402 0.000 0.402 0.000 s.py:7(<genexpr>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 2.216 2.216 timeit.py:178(timeit)
1 0.000 0.000 2.216 2.216 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 0.284 0.003 0.686 0.007 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
second 2296211 function calls in 0.939 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.939 0.939 <string>:1(<module>)
100 0.003 0.000 0.939 0.009 s.py:39(<lambda>)
100 0.729 0.007 0.936 0.009 s.py:9(second)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 0.939 0.939 timeit.py:178(timeit)
1 0.000 0.000 0.939 0.939 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
2295900 0.165 0.000 0.165 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 0.042 0.000 0.042 0.000 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
third 2296211 function calls in 8.361 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 8.361 8.361 <string>:1(<module>)
100 8.145 0.081 8.357 0.084 s.py:16(third)
100 0.004 0.000 8.361 0.084 s.py:39(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 8.361 8.361 timeit.py:178(timeit)
1 0.000 0.000 8.361 8.361 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
2295900 0.169 0.000 0.169 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 0.043 0.000 0.043 0.000 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None

How to quicky create array from big file?

I have example:
for line in IN.readlines():
line = line.rstrip('\n')
mas = line.split('\t')
row = ( int(mas[0]), int(mas[1]), mas[2], mas[3], mas[4] )
self.inetnums.append(row)
IN.close()
If ffilesize == 120mb, script time = 10 sec. Can I decrease this time ?

Remove the readlines()
Just do
for line in IN:
Using readlines you are creating a list of all lines from the file and then accessing each one, which you don't need to do. Without it the for loop simply uses the generator which returns a line each time from the file.

You may gain some speed if you use a List Comprehension
inetnums=[(int(x) for x in line.rstrip('\n').split('\t')) for line in fin]
Here is the profile information with two different versions
>>> def foo2():
fin.seek(0)
inetnums=[]
for line in fin:
line = line.rstrip('\n')
mas = line.split('\t')
row = ( int(mas[0]), int(mas[1]), mas[2], mas[3])
inetnums.append(row)
>>> def foo1():
fin.seek(0)
inetnums=[[int(x) for x in line.rstrip('\n').split('\t')] for line in fin]
>>> cProfile.run("foo1()")
444 function calls in 0.004 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.003 0.003 0.004 0.004 <pyshell#362>:1(foo1)
1 0.000 0.000 0.004 0.004 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
220 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'seek' of 'file' objects}
220 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}
>>> cProfile.run("foo2()")
664 function calls in 0.006 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.005 0.005 0.006 0.006 <pyshell#360>:1(foo2)
1 0.000 0.000 0.006 0.006 <string>:1(<module>)
220 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
220 0.001 0.000 0.001 0.000 {method 'rstrip' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'seek' of 'file' objects}
220 0.001 0.000 0.001 0.000 {method 'split' of 'str' objects}
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Profile() context vs runctx - python

Related

Identify a bottleneck that isn't explicit using cProfile

Python Profiling - discrepancy between inlining and calling function

Best performance method for evaluating presence of substrings within strings in python?

Characters in Strings in Python

How to quicky create array from big file?

Categories

Resources