line_profiler not working as expected - python

Trying to use line_profiler as an API. Following their docs and this tutorial (scroll down to Line Profiling), I get a minimalist test case for profiling some numpy ufuncs:
import numpy as np
import line_profiler
import time
shp = (1000,1000)
a = np.ones(shp)
o = np.zeros(shp)
def main():
t = time.time()
np.divide(a,1,o)
for i in xrange(200):
np.multiply(a,2,o)
np.add(a,1,o)
print 'duration', time.time()-t
profiler = line_profiler.LineProfiler()
profiler.add_function(main)
main()
profiler.print_stats()
I get this in stdout which indicates that main ran, but was not profiled:
duration 2.6779999733
Timer unit: 5.59936e-07 s
File: testprof.py
Function: main at line 9
Total time: 0 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
9 def main():
10 t = time.time()
11 np.divide(a,1,o)
12 for i in xrange(200):
13 np.multiply(a,2,o)
14 np.add(a,1,o)
15 print 'duration', time.time
()-t
I'm new to line_profiler. See my other q if curious why I don't use cProfile.

Try add
profiler.enable_by_count()
before
main()

Related

Trying to add throttle control to paralleled API calls in python

I am using Google places API which has a query per second limit of 10. This means I cannot make more than 10 requests within a second. If we were using Serial execution this wouldn't be an issue as the APIs avg response time is 250 ms, so i will be able to make just 4 calls in a second.
To utilize the entire 10 QPS limit i used multithreading and made parallel API calls. But now i need to control the number of calls that can happen in a second, it should not go beyond 10 (google API starts throwing errors if i cross the limit)
Below is the code that i have so far, I am not able to figure out why the program just gets stuck sometimes or takes alot longer than required.
import time
from datetime import datetime
import random
from threading import Lock
from concurrent.futures import ThreadPoolExecutor as pool
import concurrent.futures
import requests
import matplotlib.pyplot as plt
from statistics import mean
from ratelimiter import RateLimiter
def make_parallel(func, qps=10):
lock = Lock()
threads_execution_que = []
limit_hit = False
def qps_manager(arg):
current_second = time.time()
lock.acquire()
if len(threads_execution_que) >= qps or limit_hit:
limit_hit = True
if current_second - threads_execution_que[0] <= 1:
time.sleep(current_second - threads_execution_que[0])
current_time = time.time()
threads_execution_que.append(current_time)
lock.release()
res = func(arg)
lock.acquire()
threads_execution_que.remove(current_time)
lock.release()
return res
def wrapper(iterable, number_of_workers=12):
result = []
with pool(max_workers=number_of_workers) as executer:
bag = {executer.submit(func, i): i for i in iterable}
for future in concurrent.futures.as_completed(bag):
result.append(future.result())
return result
return wrapper
#make_parallel
def api_call(i):
min_func_time = random.uniform(.25, .3)
start_time = time.time()
try:
response = requests.get('https://jsonplaceholder.typicode.com/posts', timeout=1)
except Exception as e:
response = e
if (time.time() - start_time) - min_func_time < 0:
time.sleep(min_func_time - (time.time() - start_time))
return response
api_call([1]*50)
Ideally the code should take not more than 1.5 seconds, but currently it is taking about 12-14 seconds.
The script speeds up to its expected speed as soon as i remove the QPS manager logic.
Please do suggest what i am doing wrong and also, if there is any package available already which does this mechanism out of the box.
Looks like ratelimit does just that:
from ratelimit import limits, sleep_and_retry
#make_parallel
#sleep_and_retry
#limits(calls=10, period=1)
def api_call(i):
try:
response = requests.get("https://jsonplaceholder.typicode.com/posts", timeout=1)
except Exception as e:
response = e
return response
EDIT: I did some testing and it looks like #sleep_and_retry is a little too optimistic, so just increase the period a little, to 1.2 second:
s = datetime.now()
api_call([1] * 50)
elapsed_time = datetime.now() - s
print(elapsed_time > timedelta(seconds=50 / 10))

Python run function in parallel to main code

I have the following code (Python 3.7 on windows 64bit):
from time import sleep
import time
from multiprocessing import Process
### function ###
def func(l):
for i in l:
sleep(1)
print (i)
t1 = time.time()
total = t1-t0
print ('time : ',total)
### main code ###
t0 = time.time()
l = list(range(1, 4))
if __name__ == '__main__':
p = Process(target=func, args=(l,))
p.start()
p.join()
sleep(10)
print ('done')
t1 = time.time()
total = t1-t0
print ('time : ',total)
The goal is to have a function run in parallel with the main block of code. When I run this I get the following result:
done
time : 10.000610828399658
1
time : 11.000777244567871
2
time : 12.001059532165527
3
time : 13.00185513496399
done
time : 23.11873483657837
However I was expecting the following:
1
time: ~1
2
time: ~2
3
time: ~3
done
time: ~10
So essentially I want the function to run in parallel with the main code. I am confused because without multiprocessing this code should run at most for 13 seconds but it is running for 23. The goal is to have it run in 10 seconds.
How can I fix this to have it work as intended?
I can't reproduce the problem where the first time printed is ~10, when I try it, I get times starting from ~1, as expected.
My final time from the parent process is ~13. This is because of p.join(), which waits for the child process to finish. If I remove that, the time printed in the parent is ~10.
Script:
from time import sleep
import time
from multiprocessing import Process
### function ###
def func(l):
for i in l:
sleep(1)
print (i)
t1 = time.time()
total = t1-t0
print ('time : ',total)
### main code ###
t0 = time.time()
l = list(range(1, 4))
if __name__ == '__main__':
p = Process(target=func, args=(l,))
p.start()
# p.join()
sleep(10)
print ('done')
t1 = time.time()
total = t1-t0
print ('time : ',total)
Output:
$ python testmultiproc.py
1
time : 1.0065689086914062
2
time : 2.0073459148406982
3
time : 3.0085067749023438
done
time : 10.008337020874023

No module named mem_profile

I'm using this program to measure the time and memory used by two functions and compare which is better for processing a large amount of data. My understanding is that to measure the memory usage we need the mem_profile module, but during the pip install mem_profile it gave me the error No module named mem_profile.
import mem_profile
import random
import time
names = ['Kiran','King','John','Corey']
majors = ['Math','Comps','Science']
print 'Memory (Before): {}Mb'.format(mem_profile.memory_usage_resource())
def people_list(num_people):
results = []
for i in num_people:
person = {
'id':i,
'name': random.choice(names),
'major':random.choice(majors)
}
results.append(person)
return results
def people_generator(num_people):
for i in xrange(num_people):
person = {
'id':i,
'name': random.choice(names),
'major':random.choice(majors)
}
yield person
t1 = time.clock()
people = people_list(10000000)
t2 = time.clock()
# t1 = time.clock()
# people = people_generator(10000000)
# t2 = time.clock()
print 'Memory (After): {}Mb'.format(mem_profile.memory_usage_resource())
print 'Took {} Seconds'.format(t2-t1)
What has caused this error? And are there any alternative packages I could use instead?
1)First import module
pip install memory_profiler
2)include it in your code like this
import memory_profiler as mem_profile
3)change code
mem_profile.memory_usage_psutil() to memory_usage()
4)convert you print statements like this
print('Memory (Before): ' + str(mem_profile.memory_usage()) + 'MB' )
print('Memory (After) : ' + str(mem_profile.memory_usage()) + 'MB')
print ('Took ' + str(t2-t1) + ' Seconds')
5)you will have something like this code:
import memory_profiler as mem_profile
import random
import time
names = ['John', 'Corey', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
# print('Memory (Before): {}Mb '.format(mem_profile.memory_usage_psutil()))
print('Memory (Before): ' + str(mem_profile.memory_usage()) + 'MB' )
def people_list(num_people):
result = []
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
result.append(person)
return result
def people_generator(num_people):
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
yield person
# t1 = time.clock()
# people = people_list(1000000)
# t2 = time.clock()
t1 = time.clock()
people = people_generator(1000000)
t2 = time.clock()
# print 'Memory (After) : {}Mb'.format(mem_profile.memory_usage_psutil())
print('Memory (After) : ' + str(mem_profile.memory_usage()) + 'MB')
# print 'Took {} Seconds'.format(t2-t1)
print ('Took ' + str(t2-t1) + ' Seconds')
Now it work fine i m using python 3.6 and its working without any error.
Was going through the same tutorial and encountered the same problem. But upon further research, I discovered the author of the tutorial used a package called memory_profiler, whose main file he changed to mem_profile, which he imported in the code tutorial.
Just go ahead and do pip install memory_profiler. Copy and rename the file to mem_profile.py in your working directory and you should be fine. If you are on Windows, make sure you install the dependent psutil package as well.
Hope this helps somebody.
Use this for calculating time:
import time
time_start = time.time()
#run your code
time_elapsed = (time.time() - time_start)
As referenced by the Python documentation:
time.time() → float
Return the time in seconds since the epoch as a floating point number.
The specific date of the epoch and the handling of leap seconds is
platform dependent. On Windows and most Unix systems, the epoch is
January 1, 1970, 00:00:00 (UTC) and leap seconds are not counted
towards the time in seconds since the epoch. This is commonly referred
to as Unix time. To find out what the epoch is on a given platform,
look at gmtime(0).
Note that even though the time is always returned as a floating point
number, not all systems provide time with a better precision than 1
second. While this function normally returns non-decreasing values, it
can return a lower value than a previous call if the system clock has
been set back between the two calls.
The number returned by time() may be converted into a more common time
format (i.e. year, month, day, hour, etc…) in UTC by passing it to
gmtime() function or in local time by passing it to the localtime()
function. In both cases a struct_time object is returned, from which
the components of the
calendar date may be accessed as attributes.
Reference: https://docs.python.org/3/library/time.html#time.time
Use this for calculating memory:
import resource
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
Reference: http://docs.python.org/library/resource.html
Use this if you using python 3.x:
Reference: https://docs.python.org/3/library/timeit.html
Adding to Adebayo Ibro's answer above. Do the following :
In terminal, run $ pip install memory_profiler
In your script, replace import mem_profile with import memory_profiler as mem_profile
In your script, replace all mem_profile.memory_usage_resource() with mem_profile.memory_usage().
Hope this helps!
That module is hand written (not in python packages).
I got this from Corey Schafer's comment in his youtube video.
Just save this code as the module's name:
from pympler import summary, muppy
import psutil
import resource
import os
import sys
def memory_usage_psutil():
# return the memory usage in MB
process = psutil.Process(os.getpid())
mem = process.get_memory_info()[0] / float(2 ** 20)
return mem
def memory_usage_resource():
rusage_denom = 1024.
if sys.platform == 'darwin':
# ... it seems that in OSX the output is different units ...
rusage_denom = rusage_denom * rusage_denom
mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / rusage_denom
return mem
I just encountered the same problem. I solved it installing memory_profiler ($ pip install -U memory_profiler), and them modify the program as follows:
import memory_profiler
...
print('Memory (Before): {}Mb'.format(memory_profiler.memory_usage()))
A couple of python 3.8 updates since time.clock() was removed, and print() has evolved.
Thanks everyone for this discussion, and definitely thanks to Corey Schafer great video.
import memory_profiler as mem_profile
import random
import time
names = ['John', 'Corey', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
print(f'Memory (Before): {mem_profile.memory_usage()}Mb')
def people_list(num_people):
result = []
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
result.append(person)
return result
def people_generator(num_people):
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
yield person
# t1 = time.process_time()
# people = people_list(1000000)
# t2 = time.process_time()
t1 = time.process_time()
people = people_generator(1000000)
t2 = time.process_time()
print(f'Memory (After): {mem_profile.memory_usage()}Mb')
print(f'Took {t2-t1} Seconds')
I went through the same tutorial
the library name is memory_profiler
in order to install, you can use the following
pip install -U memory_profiler
to importing
import memory_profiler
note: the library does not have memory_usage_resource function
but you can use memory_usage with the same functionality.
instead of using clock() function use time() function
import memory_profiler
import random
import time
names=['John', 'Jane', 'Adam','Steve', 'Rick','George','Paul','Bill','Bob']
majors=['Math','Engineering','ComSic','Arts','Stuck Broker']
print ('Memory (Before): {} MB'.format(memory_profiler.memory_usage()))
#using list
def people_list(num_people):
result = []
for i in range(num_people):
person={
'id':i,
'name':random.choice(names),
'major':random.choice(majors)
}
result.append(person)
return result
#using generators
def people_generator(num_people):
for i in range(num_people):
person={
'id':i,
'name':random.choice(names),
'major':random.choice(majors)
}
yield person
# t1=time.time()
# people_list(1000000)
# t2=time.time()
t1=time.time()
people_generator(1000000)
t2=time.time()
print('Memory (After): {} MB'.format(memory_profiler.memory_usage()))
print ('Took {} seconds'.format(t2-t1))
Much simple with sys
import sys
...
print ('Memory (Before): {0}Mb'.format(sys.getsizeof([])))
during the pip install mem_profile it gave me error No module named mem_profile.
by default, pip will download packages from PyPI. No package exists on PyPI named "mem_profile", so of course you will get an error.
for timing blocks of code, the timeit module is what you want to use:
https://docs.python.org/library/timeit.html

monitor function name error using simpy

very new to python and trying to use a simpy script I found online to queue times. I am getting a name error when I use "Monitor" - which I thought was part of SimPy. Is there somewhere else I should be importing monitor from?
Thanks in advance for the help!
See below:
#!/usr/bin/env python
from __future__ import generators
import simpy
from multiprocessing import Queue, Process
from random import Random,expovariate,uniform
# MMC.py simulation of an M/M/c/FCFS/inft/infty queue
# 2004 Dec updated and simplified
# $Revision: 1.1.1.5 $ $Author: kgmuller $ $Date: 2006/02/02 13:35:45 $
"""Simulation of an M/M/c queue
Jobs arrive at random into a c-server queue with
exponential service-time distribution. Simulate to
determine the average number and the average time
in the system.
- c = Number of servers = 3
- rate = Arrival rate = 2.0
- stime = mean service time = 1.0
"""
__version__='\nModel: MMC queue'
class Generator(Process):
""" generates Jobs at random """
def execute(self,maxNumber,rate,stime):
##print "%7.4f %s starts"%(now(), self.name)
for i in range(maxNumber):
L = Job("Job "+`i`)
activate(L,L.execute(stime),delay=0)
yield hold,self,grv.expovariate(rate)
class Job(Process):
""" Jobs request a gatekeeper and hold it for an exponential time """
def execute(self,stime):
global NoInSystem
arrTime=now()
self.trace("Hello World")
NoInSystem +=1
m.accum(NoInSystem)
yield request,self,server
self.trace("At last ")
t = jrv.expovariate(1.0/stime)
msT.tally(t)
yield hold,self,t
yield release,self,server
NoInSystem -=1
m.accum(NoInSystem)
mT.tally(now()-arrTime)
self.trace("Geronimo ")
def trace(self,message):
if TRACING:
print "%7.4f %6s %10s (%2d)"%(now(),self.name,message,NoInSystem)
TRACING = 0
print __version__
c = 3
stime = 1.0
rate = 2.0
print "%2d servers, %6.4f arrival rate,%6.4f mean service time"%(c,rate,stime)
grv = Random(333555) # RV for Source
jrv = Random(777999) # RV for Job
NoInSystem = 0
m=Monitor()
mT=Monitor()
msT=Monitor()
server=Resource(c,name='Gatekeeper')
initialize()
g = Generator('gen')
activate(g,g.execute(maxNumber=10,rate=rate,stime=stime),delay=0)
simulate(until=3000.0)
print "Average number in the system is %6.4f"%(m.timeAverage(),)
print "Average time in the system is %6.4f"%(mT.mean(),)
print "Actual average service-time is %6.4f"%(msT.mean(),)
You are currently receiving a name error because Monitor hasn't currently been defined within your script. In order to use the Monitor from within simpy, you will need to either change import simpy to from simpy import Monitor or append simpy.Monitor for the locations that you are currently using the Monitor function.
Ex:
#!/usr/bin/env python
from __future__ import generators
from simpy import Monitor
Or (lines 71-73):
m=simpy.Monitor()
mT=simpy.Monitor()
msT=simpy.Monitor()

Recover output of 2 python scripts with subprocess

I'm trying to launch two python scripts simultaneously from another script. It works with subprocess.Popen. But I also would like to recover the output of these two scripts launched simultaneously.
When I use subprocess.check_output, I manage to recover the outputs but the scripts are not launched at the same time.
I have made a simple example to illustrate the problem. The programm 2scripts.py calls the scripts aa.py and bb.py.
aa.py :
import time
delay = 0
t0 = time.time()
print "temps " + str(t0)
print("aa")
while delay < 5 :
delay = time.time() - t0
bb.py :
import time
delay = 0
t0 = time.time()
print "temps " + str(t0)
print("bb")
while delay < 5 :
delay = time.time() - t0
This is 2scripts.py and the output with subprocess.Popen :
2scripts.py :
import subprocess
x = subprocess.Popen((["python", "aa.py"]))
y = subprocess.Popen((["python", "bb.py"]))
temps 1460040113.05
aa
temps 1460040113.05
bb
And 2scripts.py and the output with subprocess.check_output()
import subprocess
x = subprocess.check_output((["python", "aa.py"]))
y = subprocess.check_output((["python", "bb.py"]))
print(x)
print(y)
temps 1460040186.3
aa
temps 1460040191.31
bb
You can use multiprocessing.pool to run both at the same time and only print the outputs when both have returned results. This way, they both start at the same time and you manage to get their outputs.
Code:
import subprocess
import multiprocessing.pool
import time
pool = multiprocessing.pool.ThreadPool(2)
x,y = pool.map(lambda x: x(), [
lambda: subprocess.check_output((["python", "aa.py"])),
lambda: subprocess.check_output((["python", "bb.py"]))
])
print(x)
print(y)
Output:
temps 1460050982.44
aa
temps 1460050982.44
bb
An enhanced example of #Ru Hasha with number of thread as first command line parameter and only one script to illustrate the different command called.
xscripts.py
import sys, subprocess, multiprocessing.pool
nb_threads = int(sys.argv[1])
pool = multiprocessing.pool.ThreadPool(nb_threads)
processes = []
for i in range(nb_threads):
processes.append(lambda i=i: subprocess.check_output((["python", "mic.py", "mic" + str(i)])))
outputs = pool.map(lambda x: x(), processes)
for o in outputs:
print o
mic.py
import sys, time
sys.stdout.write(sys.argv[1] + " start time\t" + str(time.time()) + '\n')
time.sleep(2)
sys.stdout.write(sys.argv[1] + " end time\t" + str(time.time()))
Output
$ python xscripts.py 4
mic0 start time 1460071350.1
mic0 end time 1460071352.1
mic1 start time 1460071350.1
mic1 end time 1460071352.1
mic2 start time 1460071350.1
mic2 end time 1460071352.1
mic3 start time 1460071350.1
mic3 end time 1460071352.1

Categories

Resources