How to profile a script which uses python's multiprocessing

How to profile a script which uses python's multiprocessing - python

I have a script file which uses the function parallel_map.
( source code of qutip.parallel.parallel_map) from a package QuTiP . As one would see on clicking the source code for the function, it uses the multiprocess module of python.
I looked at answers of serial version of this question. I decided upon Snakeviz on reading zaxiliu's solution.
But naively trying it on my code fails. So what must I do to profile my code? My heart is not set on Snakeviz. I don't mind using any other graphical tool.

Doesn't satisfy the question requirements fully, but will work if nothing else is available
Try using serial_map instead of parallel_map from the same module.
Replace (or better yet comment out) the line
from qutip.parallel import parallel_map
with
from qutip.parallel import serial_map
Now you have a serial implementation of code. This can be profiled using those described in serial version of your question.
After this (assuming you go ahead with Snakeviz)
make the profile file
python -m cProfile -o program.prof my_program.py
Run Snakeviz on the profile file generated in the previous step
snakeviz program.prof

Related

calling precompiled module from another file

Primarirly I am c++ developer trying to use python for certain tasks for me.
I have made a python module in python 3.6 and got it pre-compiled in windows 7 using the following command
python -m py_compile myfile.py
using information from this link. I get "myfile.pyc" created
Now I want to use this in a python file which is using python 2.7
so, I use information from this and this & write
mod=imp.load_source('myfile.func', 'c:/folder1/folder2/myfile.pyc')
But the above statement gives me exception
[name 'o' is not defined]
Is this because I am using pre-compiled in 3.6 & using in 2.7?
What is that am I missing here

First python 3.6 is not backwards compatible with python 2.7. Secondly its usually better to import the module as normal and let the compiler handle caching library code as compiled byte code. Also the function load_source is meant for loading uncompiled source files, the function you want is load_compiled. Check here
https://docs.python.org/2/library/imp.html
Lastly, if you are looking for performance improvements this will only help reduce compile time, and only on the first compile or when the imported file changes.
What is __pycache__?

This is the complete solution of my problem. ( If you do not want to go through all the comment & discussion & figuring out the solution )
As Mr. Garrigan Stafford aptly pointed out that I am using the wrong API for loading the module.
The API for loading a compiled module is load_compiled & not load_source.
When I started using this API, ran in to the error of magic number: Bad magic number.
This happens because while creating the file, the compiler inserts certain values to basically identify what file is it. ( more info : can be found here.).
In my case, compiled my lib is 3.6 & used in 2.7 which was causing the problem.
To overcome, I basically went back to the original code & compiled my lib in 2.7 & then used it in the client code.
Volla !!!!
All works fine now.
Thanks to stackoverflow community as whole & Mr. Stafford in particular for helping out.

Why is pprofile giving no output?

I'm new to python and I'd like to use pprofile, but I don't get it running. For example
#pprofile --threads 0 test.py
gives me the error
bash: pprofile: Command not found.
I've tried to run pprofiler as a module, like described here: https://github.com/vpelletier/pprofile , using the following script:
#!/usr/bin/env sc_python3
# coding=utf-8
import time
import pprofile
def someHotSpotCallable():
profiler = pprofile.Profile()
with profiler:
time.sleep(2)
time.sleep(1)
profiler.print_stats()
Running this script gives no output. Changing the script in the following way
#!/usr/bin/env sc_python3
# coding=utf-8
import time
import pprofile
def someHotSpotCallable():
profiler = pprofile.Profile()
with profiler:
time.sleep(2)
time.sleep(1)
profiler.print_stats()
print(someHotSpotCallable())
gives the Output
Total duration: 3.00326s
None
How du I get the line-by-line table-output, shown on https://github.com/vpelletier/pprofile?
I'm using Python 3.4.3, Version 2.7.3 is giving the same output (only Total duration) on my System.
Do I have to install anything?
Thanks a lot!

pprofile author here.
To use pprofile as a command, you would have to install it. The only packaging I have worked on so far is via pypi. Unless you are using a dependency-gathering tool (like buildout), the easiest is probably to setup a virtualenv and install pprofile inside it:
$path_to_your_virtualenv/bin/pip install pprofile
Besides this, there is nothing else to install: pprofile only depends on python interpreter features (more on this just below).
Then you can run it like:
$path_to_your_virtualenv/bin/pprofile <args>
Another way to run pprofile would be to fetch the source and run it as a python script rather than as a standa alone command:
$your_python_interpreter $path_to_pprofile/pprofile.py <args>
Then, about the surprising output: I notice your shebang mentions "sc_python3" as interpreter. What implementation of python interpreter is this ? Would you have some non-standard modules loaded on interpreter start ?
pprofile, in deterministic mode, depends on the interpreter triggering special events each time a line changes, each time a function is called or each time it returns, and, just for completeness, it also monitors when threads are created as the tracing function is a thread local. It looks like that interpreter does not trigger these events. A possible explanation would be that something else is competing with pprofile for these events: only one function can be registered at a time. For example code coverage tools and debuggers may use this function (or another closely related one in standard sys module, setprofile). Just for completeness, setprofile was insufficient for pprofile as it only triggers events on function call/return.
You may want to try the statistic profiling mode of pprofile at the expense of accuracy (but for an extreme reduction in profiler overhead), although there pprofile has to rely on another interpreter feature: the ability to list the call stack of all running threads, sadly expected to be less portable than other features of the standard the sys module.
All these work fine in CPython 2.x, CPython 3.x, pypy and (it has been contributed but I haven't tested it myself) IronPython.

ImageScraper does not work

Ok great. So I've downloaded a module called imagescraper from
Pip install ImageScraper
When running in console
image-scraper 'insert url'
Works just fine. However following the documentation when I ran in jupyter notebook.
Import image_scraper
image_scraper.scrape_images('insert url')
I'm returned a (0, 0) tuple.
I've searched my working directory where my images suppose to be but it's not there.
My curiosity is no longer with scraping images. But I really just wanna work it out and figure why it's no working in my Python book.

In ImageScraper 2.0.7, the version currently available from PyPI, image_scraper.scrape_images() is bug ridden.
It fails to properly setup the format_list, which is a list of file name extensions for filtering image urls. Because it defaults to [] no urls will be selected for download.
In addition to that there are calls to non-existent functions, or more accurately, attempted calls to functions that are actually methods of class ImageScraper.
I'd avoid using it, or you could manually use the ImageScraper class. I see that you've already created an issue at the project author's github page, so you might want to await the outcome of that.

This works, although it is not elegant - calls the non-Python command line version of image_scraper tool from inside Python:
import subprocess
import shlex
for link in your_list_of_links:
subprocess.call(shlex.split('image-scraper ' + link))

Run pip in python idle

I am curious about running pip.
Everytime I ran pip in command shell in windows like that
c:\python27\script>pip install numpy
But, I wondered if I can run it in python idle.
import pip
pip.install("numpy")
Unfortunately, it is not working.

Still cannot comment so I added another answer. Pip has had several entrypoints in the past. And it's not recommended to call pip directly or in-process (if you still want to do it, "runpy" is kind of recommended):
import sys
import runpy
sys.argv=["pip", "install", "packagename"]
runpy.run_module("pip", run_name="__main__")
But this should also work:
try:
from pip._internal import main as _pip_main
except ImportError:
from pip import main as _pip_main
_pip_main(["install", "packagename"])

This question is, or should be, about how to run pip from a python program. IDLE is not directly relevant to this version of the quesiton.
To expand on J. J. Hakala's comment: a command-line such as pip install pillow is split on spaces to become sys.argv. When pip is run as a main module, it calls pip.main(sys.argv[1:]). If one imports pip, one may call pip.main(arg_line.split()), where arg_line is the part of the command line after pip.
Last September (2015) I experimented with using this unintended API from another python program and reported the initial results on tracker issue 23551. Discussion and further results followed.
The problem with executing multiple commands in one process is that some pip commands cache not only sys.path, which normally stays constant, but also the list of installed packages, which normally changes. Since pip is designed to run one command per process, and then exit, it never updates the cache. When pip.main is used to run multiple commands in one process, commands given after the caching may use a stale and no-longer-correct cache. For example, list after install shows how things were before the install.
A second problem for a program that wants to examine the output from pip is that it goes to stdout and stderr. I posted a program that captures these streams into program variables as part of running pip.
Using a subprocess call for each pip command, as suggested by L_Pav, though less efficient, solves both problems. The communicate method makes the output streams available. See the subprocess doc.

At moment there are no official way to do it, you could use pip.main but you current idle session will not 'see' this installed package.
There been a lot a discussion over how to add a "high level" programmatic API for pip, it's seems promising.

Actually, I think, you can use subprocess.Popen(apt-get numpy), not sure how to do it with PIP though.

If your on a Mac you should be able to do it like this:
Go to your IDLE.
Run help('modules').
Find the HTML module.
Run help('HTML')
There should pop up a file map, for example this/file/map/example/.
Go to the finder and do command+shift+g and paste the file map there. Please delete the last file, because then your gonna go to the modules files.
There are all the modules. If you want to add modules, download the files of the module and put them there.
I hope this helps you.

Create documentation using pydoc and unknown modules

I'm afraid this will a question for a very particular case. At university we have been told to generate documentation by using pydoc. The problem is that we need to create it for a Maya script, and pydoc yells when it finds import maya.cmds as cmds
So, I tried to comment this line but I keep getting errors:
python C:\Python27\Lib\pydoc.py Script.py
problem in Script - <type 'exceptions.NameError'>: global name 'cmds' is not defined
I also tried Script, without the extension .py but it's silly doing that, we still running around the same issue.
Does anybody know how to generate documentation for Maya scripts, where import maya only works in the maya interpreter?

maya.commands is an stub module until it's run inside a working maya environment; if you just import it and inspect it outside of Maya you'll see that it's basically a placeholder.
If you want to inspect the contents, you can import the maya.standalone module and initialize it before running other commands (in this case it means you won't be able to run pydoc standalone.
You can get the documentation using the write command:
import maya.standalone
maya.standalone.initialize()
import pydoc
import mymodule
pydoc.write(mymodule) # writes the mymodule.html to current directory
Be warned, however, that the documentation for all maya built in functions will be unhelful:
'built-in function ls'
however you can at least document your own stuff without the maya parts crashing.
Pydoc, ironically, does not have a lot of external documentation. However you can see the code here:http://hg.python.org/cpython/file/2.7/Lib/pydoc.py (i'm not sure about the delta between this and the 2.6 version for maya pre-2014 but it works as above in maya 2011)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.