I am trying to find ways to make better use of my time while programming.
I have a python script that does some heavy work (it can take hours) to finish. Now, most of the work it does is network related, so i have plenty of cpu resources to spare.
If the script was a C binary executable, it would be fine to git checkout onto a different branch and do extra work, I could even modify the binary in disk as it has been copied to ram, so until it finishes running I won't affect program output.
But python scripts are translated, not compiled. What happens if I start tampering with the source file, can i corrupt the programs output, or is the text file and associated imports copied to RAM, allowing me to tamper with the source with no risk of changing the behaviour of the running program?
In general, if you have a single Python file which you run as a script, you're fine. When you run the file, it is compiled into bytecode which is then executed. You can change the original script at this point and nothing breaks.
However, we can deliberately break it by writing some horrible but legal code like this:
horrible.py:
from time import sleep
sleep(10)
import silly
silly.thing()
silly.py:
def thing():
print("Wow!")
You can run horrible.py and while it is running you can edit silly.py on disk to make it do something else. When silly.py is finally imported, the updated version will be loaded.
A workaround is to put all your imports at the top of the file, which you probably do anyway.
When a python program is run it is compiled (kinda, more like translated) into a .pyc file that is then run by the python interpreter. When you change a file it should NOT affect the code if it is already running.
Here is a related stackoverflow answer. What will happen if I modify a Python script while it's running?
Why not have another working directory where you make your modifications? Is there a lot of ancillary data or something that makes it hard to set up a working directory? I.e. if your working directory is A, git clone A B, and then work in B. When you're done, you can pull the changes back from B to A:
git remote add B ../B
git pull B master
Related
I always structure my repos in the following way.
repo/main.py
repo/scripts/script.py
…
In main.py, I import script.py in the following manner:
from scripts import script as sc
This always works unless I decide to make changes to script.py. After making changes, if I run main.py from shell, it still imports code from the older script.py without the current changes. Until now, what I would then is just create another branch. However, this leads to a lot of branches in the development process. Is there any way to avoid this? How else should I be importing from the scripts directory to avoid this?
Help would be highly appreciated.
UPDATE
From the answers, I can see that I have caused some confusion. What I mean when I say that I run main.py from shell, I mean executing it with python main.py from the terminal. One can think of main.py as a script that does some math and outputs the answer. In doing those math, it imports script.py from scripts that has additional functions that it (main.py) uses. After running main.py N times, if I choose to update script.py, and then I execute main.py again (in the terminal), it imports the old script.py again and does the math with the older code. The answer does not reflect the changes I just made to script.py. Until now, what I have done, when I have had to go through something like this is just create another new branch and literally copy-paste the old files and the newer script.py in the new branch and execute main.py in the shell. It does import the newer script.py then. One more thing I have noticed is that if I just create a new file as say script2.py and then import it in main.py as
from scripts import script2 as sc
it imports script2.py just as it should - it reflects all the changes made to script.py.
There’s no second import statement in main.py.
On the surface this question sounds like we're repeatedly running
$ python main.py,
which quickly executes and exits.
But from the symptom, it must be the case that
we have a long-lived REPL prompt repeatedly executing the main.py code.
The module you're looking for is
importlib.
Do this:
from importlib import reload
from scripts import script as sc
# [do stuff, then edit script.py]
reload(sc)
# [do more stuff, and see the effect of the edit]
What is going on here?
Well, if you repeatedly execute
import scripts.script
import scripts.script
it turns out that the 2nd and subsequent import does nothing.
It consults sys.modules, finds the module has already
been loaded, and reports "cache hit!" instead of doing
the hard work of pulling in that text file.
The purpose of the reload() function is to
do a cache invalidate and repeat the import, so it actually
pulls in python text from the (presumably edited) source file.
Suppose you have a short-lived $ python main.py process
that runs repeatedly, sometimes after a script.py edit.
So the in-memory sys.module cache is not relevant,
having been discarded each time the process exits.
There is another level of caching at work here.
Typically the cPython interpreter will read script.py,
parse, produce bytecode from the parse tree, and write
the bytecode to script.pyc. More than one output cache
directory is possible, depending on system details.
Upon being asked to read script.py, the interpreter
can look to see if the corresponding .pyc bytecode file
is available and fresh, and then declare "cache hit!",
in which case the .py source file is not even read.
Normally this works great, because source file updates
are infrequent (human editing speed), and the freshness
comparison of file timestamps is effective. If there's
something wrong with those timestamps the whole mechanism
won't work properly, and we might fail to notice the
source was recently edited. Perhaps you suffer from this.
First, identify the relevant .pyc bytecode file.
Then, run main, make an edit, delete the .pyc file,
re-run main, and notice that the edit took effect.
Let us know
how it goes.
In my project, I have a Python script that scans the source directory and updates a source file with what it finds. I’d like this script to only run when it needs to. At the moment, I have this script in a Run Script build phase with Input Files set to $(PROJECT).xcodeproj/project.pbxproj and Output Files set to the updated source file. This means that the scripts runs when I add new files but it also runs whenever I change project settings. When the script runs unnecessarily, part of the project is recompiled even though none of the source files have changed. This is kind of annoying when all I want to do is tweak some settings.
Is there some way that I can avoid the unnecessary recompilation and just run the script when new source files are added or removed from the project?
I guess I could manually run the script whenever I add or remove a source file.
I think that Xcode is recompiling because the modification date on the file is changed. Python updates the modification date when you flush to a file. So I guess I could just write to the file only when the output is different to the output file. I’m pretty sure reading a file won’t change the modification date. That seems like a lot of fluffing around though. If anyone’s got a better solution, please let me know!
I have a python script which takes a while to finish its executing depending on the passed argument. So if I run them from two terminals with different arguments, do they get their own version of the code? I can't see two .pyc files being generated.
Terminal 1 runs: python prog.py 1000 > out_1000.out
Before the script running on terminal 1 terminate, i start running an another; thus terminal 2 runs: python prog.py 100 > out_100.out
Or basically my question is could they interfere with each other?
If you are writing the output to the same file in disk, then yes, it will be overwritten. However, it seems that you're actually printing to the stdout and then redirect it to a file. So that is not the case here.
Now answer to your question is simple: there is no interaction between two different executions of the same code. When you execute a program or a script OS will load the code to the memory and execute it and subsequent changes to code has nothing to do with the code that is already running. Technically a program that is running is called a process. Also when you run a code on two different terminals there will be two different processes on the OS one for each of them and there is no way for two process to interfere unless you explicitly do that (IPC or inter-process communication) which you are doing here.
So in summary you can run your code simultaneously on different terminals they will be completely independent.
Each Python interpreter process is independent. How the script reacts to itself being run multiple times depends on the exact code in use, but in general they should not interfere.
.pyc file reference http://effbot.org/pyfaq/how-do-i-create-a-pyc-file.htm
Python automatically compiles your script to compiled code, so called
byte code, before running it. When a module is imported for the first
time, or when the source is more recent than the current compiled
file, a .pyc file containing the compiled code will usually be created
in the same directory as the .py file.
If you afraid your code get overwritten due to whatever mistake, you should learn to put your code under VERSION CONTROL. Register github and use git to do that.
bigger sign ">" will send the output to the right handler. It you specify file name, it will push the output to that file name. Even in different terminal, if you run the code inside the same folder, use the ">" point to SAME file name, the file on the right of the ">" definitely get overwrite.
Program SOURCE CODE ARE NOT mutable during execution. Unless you acquire high level program hacking skill.
Each program will run inside its "execution workspace". Unless you make a code that tap into same resources(like change same file,shared reources ), otherwise there is no interference. (except if one exhaust all CPU, Memory resources, the second one will be interfere, but that is other story)
Forgive me if it's a silly question. I'm new to Python and scripting languages. Now I'm using Komodo Edit to code and run Python programs. Each time I run it, I have to wait until the program finished execution to see my "print" results in the middle. I'm wondering if it's possible to see realtime outputs as in a console. Maybe it is caused by some preference in Komodo?
Another question is that I know in the interpreter, when I store some variables it will remember what I stored, like in a Matlab workspace. But in Komodo Edit, every time the program runs from the beginning and store no temporary variables for debugging. For example if I need to read in some large file and do some operations, every time I have to read it in again which takes a lot of time.
Is there a way to achieve instant output or temporary variable storage without typing every line into the interpreter directly, when using other environments like Komodo?
The Python output is realtime.
If your output is not realtime, this is likely an artefact of Komodo Edit. Run your script outside of Komodo.
And Python, as any programming language, starts from scratch when you start it. How would it otherwise work?
If you want a interpreter-like situation you can use import pdb;pdb.set_trace() in your script. That will give you an interpreter prompt for debugging.
I'm having the same problem on Windows and Linux. I launch any of various python 2.6 shells and run nose.py() to run my test suite. It works fine. However, the second time I run it, and every time thereafter I get exactly the same output, no matter how I change code or test files. My guess is that it's holding onto file references somehow, but even deleting the *.pyc files, I can never get the output of nose.run() to change until I restart the shell, or open another one, whereupon the problem starts again on the second run. I've tried both del nose and reload(nose) to no avail.
Solved* it with some outside help. I wouldn't consider this the proper solution, but by searching through sys.modules for all of my test_modules (which point to *.pyc files) and deling them, nose finally recognizes changes again. I'll have to delete them before each nose.run() call. These must be in-memory versions of the pyc files, as simply deleting them out in the shell wasn't doing it. Good enough for now.
Edit:
*Apparently I didn't entirely solve it. It does seem to work for a bit, and then all of a sudden it won't anymore, and I have to restart my shell. Now I'm even more confused.