Profiling Python script from shell environment - python

I'm profiling a Python 3.4 script within an interactive shell environment (IDE, if it matters). Normally I use cProfile to profile functions. However, this time I have some top-level code in the script. By "top-level" I mean that the code is not inside a function definition. cProfile.run won't accept the filename - normally I would pass it a function.
To get around this, I wrap the top-level code in a main() function, execute it to create main in the shell namespace, then run cProfiler.run('main()'). This is pretty annoying - I would like to fool around with several variables generated in the top-level code, and I'd rather not try to return them all from main().
I have carefully read the similar questions How can you profile a python script? and How to profile my code?.
They give great solutions for profiling top-level code from the command line and for profiling functions from a shell, but I don't think they address this specific question.

I have a kluge for getting this done, but I think there's probably a better way out there.
cProfile.run(compile(open(filename, "rb").read(), filename, 'exec'))
where filename is the filename of the script.

Related

python script calling another script

I wrote a python script that works. The first line of my script is reading an hdf5 file
readFile = h5py.File('FileName_00','r')
After reading the file, my script does several mathematical operations, successfully working. In the output I got function F.
Now, I want to repeat the same script for different files. Basically, I only need to modify FileName_00 by FimeName_01 or ....FileName_10. I was thinking to create a script that call this script!
I never wrote a script that call another script, so any advice would be appreciable.
One option: turn your existing code into a function which takes a filename as an argument:
def myfunc(filename):
h5py.file(filename, 'r')
...
Now, after your existing code, call your function with the filenames you want to input:
myfunc('Filename_00')
myfunc('Filename_01')
myfunc('Filename_02')
...
Even more usefully, I definitely recommend looking into
if(__name__ == '__main__')
and argparse (https://docs.python.org/3/library/argparse.html) as jkr noted.
Also, if you put your algorithm in a function like this, you can import it and use it in another Python script. Very useful!
Although there are certainly many ways to achieve what you want without multiple python scripts, as other answerers have shown, here's how you could do it.
In python we have this function os.system (learn more about it here: https://docs.python.org/3/library/os.html#os.system). Simply put, you can use it like this:
os.system("INSERT COMMAND HERE")
Replacing INSERT COMMAND HERE with the command you use to run your python script. For example, with a script named script.py you could conceivably (depending on your environment) include the following line of code in a secondary python script:
os.system("python script.py")
Running the secondary python script would run script.py as well. FWIW, I don't necessarily think this is the best way to accomplish your goal -- I tend to agree with DraftyHat's solution in most circumstances. But in case you were curious, this is certainly an option in python. I've used this functionality in the past, albeit not to run other python scripts, but to execute commands in the shell. Hope this helps!

Python Setuptools: quick way to add scripts without "main" function as "console_scripts" entry points

My request seems unorthodox, but I would like to quickly package an old repository, consisting mostly of python executable scripts.
The problem is that those scripts were not designed as modules, so some of them execute code directly at the module top-level, and some other have the if __name__=='__main__' part.
How would you distribute those scripts using setuptools, without too much rewrite?
I know I could just put them under the scripts option of setup(), but it's not advised, and also it doesn't allow me to rename them.
I would like to skip defining a main() function in all those scripts, also because some scripts call weird recursive functions with side effects on global variables, so I'm a bit afraid of breaking stuff.
When I try providing only the module name as console_scripts (e.g "myscript=mypkg.myscript" instead of "myscript=mypkg.myscript:main"), it logically complains after installation that a module is not callable.
Is there a way to create scripts from modules? At least when they have a if __name__=='__main__'?
I just realised part of the answer:
in the case where the module executes everything at the top-level, i.e. on import, it's therefore ok to define a dummy "no-op" main function, like so:
# Content of mypkg/myscript.py
print("myscript being executed!")
def main():
pass # Do nothing!
This solution will still force me to add this line to the existing scripts, but I think it's a quick but cautious solution.
No solution if the code is under a if __name__=='__main__' though...
You can use the following codes.
def main():
pass # or do something
if __name__ == "__main__":
main()

Compile time checking and where is the main script byte-code is stored in Python?

I'm new to the Python and following multiple online tutorials to learn. One of it is Google for Education.
In the Google's tutorial there is a section:
Code Checked at Runtime
Python does very little checking at compile time, deferring almost all
type, name, etc. checks on each line until that line runs. Suppose the
above main() calls repeat() like this:
def main():
if name == 'Guido':
print repeeeet(name) + '!!!'
else:
print repeat(name)
The if-statement contains an obvious error, where the repeat() function is accidentally typed in as repeeeet().
The funny thing in Python ... this code compiles and runs fine so long
as the name at runtime is not 'Guido'. Only when a run actually tries
to execute the repeeeet() will it notice that there is no such
function and raise an error. This just means that when you first run a
Python program, some of the first errors you see will be simple typos
like this. This is one area where languages with a more verbose type
system, like Java, have an advantage ... they can catch such errors at
compile time (but of course you have to maintain all that type
information ... it's a tradeoff).
There is a good example of runtime check in that section but no compile time check example.
And I'm interested in knowing about that little checking at compile time.
I can't find anything in the internet regarding that line. Every possible search returns about compiling python scripts and modules like this, this and this.
Edit:
python myscript.py is compiled(otherwise we wont be getting errors) and then interpreted to execute. Then compilation process should definitely produce a code(it might be byte-code). Is that code stored in memory instead of storing it as .pyc in the filesystem?
Edit 2:
For more on why the main script byte code is stored in memory and why modules are compiled can be found here.
Not sure about exact compiler procedure in python. but what "little check" here means on running python file it will check if all the modules used/imported are existing and have references but it won't check the variables or it's types. Because in python we don't use types to declare variables. So all such type errors are ignored in compile time and encountered only during execution
A pyc file is created for imported modules, and they are placed in the same directory containing the py file. However... no pyc file is created for the main script for your program. In other words... if you call "python myscript.py" on the command line, there will be no .pyc file for myscript.py. Since it is main script the compiled pyc wont be reusable.. but if it is a module (without main) then the same pyc could be reused whenever it is imported.
Hope it is useful!

refactoring code to keep large objects/models in memory in iPython to be reused in python scripts

My script depends on loading lots of variables in a minute and uses them globally in many functions. Every time I call that script in iPython, it loads them again, taking time.
I tried to take these calls to load and populate functions out of that script, but then these global variables are not available to the functions in the script.
It gives NameError: name 'clf' is not defined error message.
Is there a best way to refactor this code to keep these globals in memory and make the script use them? The script loads many variables like these, and uses them in other functions as globals.
vectorizer_title, vectorizer_desc, clf,
df_instance, vocab, all_tokens, df_dist_all,
df_soc2class_proba, dict_p2s,
dict_f2m, token_pattern, cleanup_pattern,
excluded_words = load_data_and_model(lang)
dict_token2idx_all, dict_token2idx_instance,
dist_array, token_dist_to_instance_min,
dict_bigram_by_instance, denominate,
similar_threshold = populate_data(1)
I had asked this question after trying
from depended_library import *
it had not worked in iPython.
But used with python and used in a Flask Web API it works.
Importing library using the "from" statement executes also the codes out of functions in the depended_library in addition to defining functions.
(If someone explains the problem with iPython and suggest a solution, I shall select it as answer.)

python modules missing in sage

I have Sage 4.7.1 installed and have run into an odd problem. Many of my older scripts that use functions like deepcopy() and uniq() no longer recognize them as global names. I have been able to fix this by importing the python modules one by one, but this is quite tedious. But when I start the command-line Sage interface, I can type "list2=deepcopy(list1)" without importing the copy module, and this works fine. How is it possible that the command line Sage can recognize global name 'deepcopy' but if I load my script that uses the same name it doesn't recognize it?
oops, sorry, not familiar with stackoverflow yet. I type: 'sage_4.7.1/sage" to start the command line interface; then, I type "load jbom.py" to load up all the functions I defined in a python script. When I use one of the functions from the script, it runs for a few seconds (complex function) then hits a spot where I use some function that Sage normally has as a global name (deepcopy, uniq, etc) but for some reason the script I loaded does not know what the function is. And to reiterate, my script jbom.py used to work the last time I was working on this particular research, just as I described.
It also makes no difference if I use 'load jbom.py' or 'import jbom'. Both methods get the functions I defined in my script (but I have to use jbom. in the second case) and both get the same error about 'deepcopy' not being a global name.
REPLY TO DSM: I have been sloppy about describing the problem, for which I am sorry. I have created a new script 'experiment.py' that has "import jbom" as its first line. Executing the function in experiment.py recognizes the functions in jbom.py but deepcopy is not recognized. I tried loading jbom.py as "load jbom.py" and I can use the functions just like I did months ago. So, is this all just a problem of layering scripts without proper usage of import/load etc?
SOLVED: I added "from sage.all import *" to the beginning of jbom.py and now I can load experiment.py and execute the functions calling jbom.py functions without any problems. From the Sage doc on import/load I can't really tell what I was doing wrong exactly.
Okay, here's what's going on:
You can only import files ending with .py (ignoring .py[co]) These are standard Python files and aren't preparsed, so 1/3 == int(0), not QQ(1)/QQ(3), and you don't have the equivalent of a from sage.all import * to play with.
You can load and attach both .py and .sage files (as well as .pyx and .spyx and .m). Both have access to Sage definitions but the .py files aren't preparsed (so y=17 makes y a Python int) while the .sage files are (so y=17 makes y a Sage Integer).
So import jbom here works just like it would in Python, and you don't get the access to what Sage has put in scope. load etc. are handy but they don't scale up to larger programs so well. I've proposed improving this in the past and making .sage scripts less second-class citizens, but there hasn't yet been the mix of agreement on what to do and energy to do it. In the meantime your best bet is to import from sage.all.

Categories

Resources