Python: Using multiprocessing when the function called is dependent on a file

Python: Using multiprocessing when the function called is dependent on a file - python

I have a very old fortran file, which is too complex for me to convert into python. So I have to compile the file and run it via python.
For the fortran file to work it requires 3 input values on 3 lines from the mobcal.run file. They are as follows:
line 1 - Name of file to run
line 2 - Name of output file
line 3 - random seed number
I change the input values per worker in the run() function.
When I run my script (see below), only 2 output file were created, but all 32 processers were running which i found out via the top command
I'm guessing the issue is here, is that there was not enough time to change the mobcal.run file for each worker.
The only solution I have come up with so far is to put a time.sleep(random.randint(1,100)) at the beginning of the run() function. But I dont find this solution very elegant and may not always work as two workers may have the same random.randint, is there a more pythonic way to solve this?
def run(mfj_file):
import shutil
import random
import subprocess
#shutil.copy('./mfj_files/%s' % mfj_file, './')
print 'Calculating cross sections for: %s' % mfj_file[:-4]
with open('mobcal.run', 'w') as outf:
outf.write(mfj_file+'\n'+mfj_file[:-4]+'.out\n'+str(random.randint(5000000,6000000)))
ccs = subprocess.Popen(['./a.out'])
ccs.wait()
shutil.move('./'+mfj_file[:-4]+'.out', './results/%s.out' % mfj_file[:-4])
def mobcal_multi_cpu():
from multiprocessing import Pool
import os
import shutil
mfj_list = os.listdir('./mfj_files/')
for f in mfj_list:
shutil.copy('./mfj_files/'+f, './')
if __name__ == '__main__':
pool = Pool(processes=32)
pool.map(run,mfj_list)
mobcal_multi_cpu()

I assume your a.out looks in the current working directory for its mobcal.run. If you run each instance in it's own directory then each process can have it's own mobcal.run without clobbering the others. This isn't necessarily the most pythonic way but it's the most unixy.
import tempfile
import os
def run(mjf_file):
dir = tempfile.mkdtemp(dir=".")
os.chdir(dir)
# rest of function here
# create mobcal.run in current directory
# while changing references to other files from "./" to "../"

Create several directories, with one mobcal.run each, and run your fortran program into them instead.
If you need a sleep() in multiprocessing you are doing it wrong.

Related

How to make a module reload in python after the script is compiled?

The basic idea involved:
I am trying to make an application where students can write code
related to a specific problem(say to check if the number is even) The
code given by the student is then checked by the application by
comparing the output given by the user's code with the correct output
given by the correct code which is already present in the application.
The basic version of the project I am working on:
An application in which you can write a python script (in tkinter text
box). The contents of the text box are first stored in a test_it.py
file. This file is then imported (on the click of a button) by the
application. The function present in test_it.py is then called to
get the output of the code(by the user).
The problem:
Since I am "importing" the contents of test_it.py , therefore,
during the runtime of the application the user can test his script
only once. The reason is that python will import the test_it.py
file only once. So even after saving the new script of the user in
test_it.py , it wont be available to the application.
The solution:
Reload test_it.py every time when the button to test the script is clicked.
The actual problem:
While this works perfectly when I run the application from the script,
this method fails to work for the compiled/executable version(.exe) of the file (which is expected since during compilation all the imported modules would be
compiled too and so modifying them later will not work)
The question:
I want my test_it.py file to be reloaded even after compiling the application.
If you would like to see the working version of the application to test it yourself. You will find it here.

Problem summary:
A test_it.py program is running and has a predicate available, e.g. is_odd().
Every few minutes, a newly written file containing a revised is_odd() predicate becomes available,
and test_it wishes to feed a test vector to revised predicate.
There are several straightforward solutions.
Don't load the predicate in the current process at all. Serialize the test vector, send it to a newly forked child which computes and serializes results, and examine those results.
Typically eval is evil, but here you might want that, or exec.
Replace current process with a newly initialized interpreter: https://docs.python.org/3/library/os.html#os.execl
Go the memory leak route. Use a counter to assign each new file a unique module name, manipulate source file to match, and load that. As a bonus, this makes it easy to diff current results against previous results.
Reload: from importlib import reload

Even for the bundled application imports work the standard way. That means whenever an import is encountered, the interpreter will try to find the corresponding module. You can make your test_it.py module discoverable by appending the containing directory to sys.path. The import test_it should be dynamic, e.g. inside a function, so that it won't be discovered by PyInstaller (so that PyInstaller won't make an attempt to bundle it with the application).
Consider the following example script, where the app data is stored inside a temporary directory which hosts the test_it.py module:
import importlib
import os
import sys
import tempfile
def main():
with tempfile.TemporaryDirectory() as td:
f_name = os.path.join(td, 'test_it.py')
with open(f_name, 'w') as fh: # write the code
fh.write('foo = 1')
sys.path.append(td) # make available for import
import test_it
print(f'{test_it.foo=}')
with open(f_name, 'w') as fh: # update the code
fh.write('foo = 2')
importlib.reload(test_it)
print(f'{test_it.foo=}')
main()

The key is to check if program are running as exe and add exe path to the sys.path.
File program.py:
import time
import sys
import os
import msvcrt
import importlib
if getattr(sys, 'frozen', False):
# This is .exe so we change current working dir
# to the exe file directory:
app_path = os.path.dirname(sys.executable)
print(' Add .exe path to sys.path: ' + app_path)
sys.path.append(app_path)
os.chdir(app_path)
test_it = importlib.import_module('test_it')
def main():
global test_it
try:
print(' Start')
while True:
if not msvcrt.kbhit(): continue
key = msvcrt.getch()
if key in b'rR':
print(' Reload module')
del sys.modules['test_it']
del test_it
test_it = importlib.import_module('test_it')
elif key in b'tT':
print(' Run test')
test_it.test_func()
time.sleep(0.001)
except KeyboardInterrupt:
print(' Exit')
if __name__ == '__main__': main()
File test_it.py:
def test_func():
print('Hi')
Create an .exe file:
pyinstaller --onefile --clean program.py
Copy _text_it.py to the _dist_ folder and it's done.
Press t in program window to run test_func. Edit test_it.py then press r to reload module and press t again to see changes.

Maybe the solution is to use the code module:
import code
# get source from file as a string
src_code = ''.join(open('test_it.py').readlines())
# compile the source
compiled_code = code.compile_command(source=src_code, symbol='exec')
# run the code
eval(compiled_code) # or exec(compiled_code)

running a .py file from another Python script using os

Can someone give me a tip on how I can use os to run a different .py file from my python script? This code below works, but only because I specify the complete file path.
How can I modify the code to incorporate running plots.py from the same directory as my main script app.py? Im using Windows at the moment but hoping it can work on any operating system. Thanks
import os
os.system('py C:/Users/benb/Desktop/flaskEconServer/plots.py')

You can execute an arbitrary Python script as a separate process using the subprocess.run() function something like this:
import os
import subprocess
import sys
#py_filepath = 'C:/Users/benb/Desktop/flaskEconServer/plots.py'
py_filepath = 'plots_test.py'
args = '"%s" "%s" "%s"' % (sys.executable, # command
py_filepath, # argv[0]
os.path.basename(py_filepath)) # argv[1]
proc = subprocess.run(args)
print('returncode:', proc.returncode)
If you would like to communicate with the process while it's running, that can also be done, plus there are other subprocess functions, including the lower-level but very general subprocess.Popen class that support doing those kind of things.

Python has built-in support for executing other scripts, without the need for the os module.
Try:
from . import plots
If you want to execute it in an independent python process, look into the multiprocessing or subprocess modules.

You can get the directory of the app.py file by using the following call in app.py
dir_path = os.path.dirname(os.path.realpath(__file__))
then join the file name you want
file_path = os.path.join(dir_path,'plot.py')
Finally your system call
os.system(f'py {file_path}') # if you're on 3.6 and above.
os.system('py %s' % file_path) # 3.5 and below
As others have said sub-processes and multi-threading may be better, but for your specific question this is what you want.

Run python script inside another python script

I have a script 'preprocessing.py' containing the function for text preprocessing:
def preprocess():
#...some code here
with open('stopwords.txt') as sw:
for line in sw.readlines():
stop_words.add(something)
#...some more code than doesn't matter
return stop_words
Now I want to use this function in another Python script.
So, I do the following:
import sys
sys.path.insert(0, '/path/to/first/script')
from preprocessing import preprocess
x = preprocess(my_text)
Finally, I end up with the issue:
IOError: [Errno 2] No such file or directory: 'stopwords.txt'
The problem is surely that the 'stopwords.txt' file is located next to the first script, not the second.
Is there any way to specify the path to this file, not making any changes to the script 'preprocessing.py'?
Thank you.

Since you're running on a *nix like system, it seems, why not use that marvellous environment to glue your stuff together?
cat stopwords.txt | python preprocess.py | python process.py
Of course, your scripts should just use the standard input, and produce just standard output. See! Remove code and get functionality for free!

The simplest, and possibly most sensible way is to pass in the fully pathed filename:
def preprocess(filename):
#...some code here
with open(filename) as sw:
for line in sw.readlines():
stop_words.add(something)
#...some more code than doesn't matter
return stop_words
Then you can call it appropriately.

Looks like you can put
import os
os.chdir('path/to/first/script')
in your second script. Please try.

import os
def preprocess():
#...some code here
# get path in same dir
path = os.path.splitext(__file__)
# join them with file name
file_id = os.path.join(path, "stopwords.txt")
with open(file_id) as sw:
for line in sw.readlines():
stop_words.add(something)
#...some more code than doesn't matter
return stop_words

Python -- "Batch Processing" of multiple existing scripts

I have written three simple scripts (which I will not post here, as they are part of my dissertation research) that are all in working order.
What I would like to do now is write a "batch-processing" script for them. I have many (read as potentially tens of thousands) of data files on which I want these scripts to act.
My questions about this process are as follows:
What is the most efficient way to go about this sort of thing?
I am relatively new to programming. Is there a simple way to do this, or is this a very complex endeavor?
Before anyone downvotes this question as "unresearched" or whatever negative connotation comes to mind, PLEASE just offer help. I have spent days reading documentation and following leads from Google searches, and it would be most appreciated if a human being could offer some input.

If you just need to have the scripts run, probably a shell script would be the easiest thing.
If you want to stay in Python, the best way would be to have a main() (or somesuch) function in each script (and have each script importable), have the batch script import the subscript and then run its main.
If staying in Python:
- your three scripts must have the .py ending to be importable
- they should either be in Python's search path, or the batch control script can set the path
- they should each have a main function (or whatever name you choose) that will activate that script
For example:
batch_script
import sys
sys.path.insert(0, '/location/of/subscripts')
import first_script
import second_script
import third_script
first_script.main('/location/of/files')
second_script.main('/location/of/files')
third_script.main('/location/of/files')
example sub_script
import os
import sys
import some_other_stuff
SOMETHING_IMPORTANT = 'a value'
def do_frobber(a_file):
...
def main(path_to_files):
all_files = os.listdir(path_to_files)
for file in all_files:
do_frobber(os.path.join(path_to_files, file)
if __name__ == '__main__':
main(sys.argv[1])
This way, your subscript can be run on its own, or called from the main script.

You can write a batch script in python using os.walk() to generate a list of the files and then process them one by one with your existing python programs.
import os, re
for root, dir, file in os.walk(/path/to/files):
for f in file:
if re.match('.*\.dat$', f):
run_existing_script1 root + "/" file
run_existing_script2 root + "/" file
If there are other files in the directory you might want to add a regex to ensure you only process the files you're interested in.
EDIT - added regular expression to ensure only files ending ".dat" are processed.

How to print the filename of the current execution file in python? [duplicate]

Is there a universal approach in Python, to find out the path to the file that is currently executing?
Failing approaches
path = os.path.abspath(os.path.dirname(sys.argv[0]))
This does not work if you are running from another Python script in another directory, for example by using execfile in 2.x.
path = os.path.abspath(os.path.dirname(__file__))
I found that this doesn't work in the following cases:
py2exe doesn't have a __file__ attribute, although there is a workaround
When the code is run from IDLE using execute(), in which case there is no __file__ attribute
On Mac OS X v10.6 (Snow Leopard), I get NameError: global name '__file__' is not defined
Test case
Directory tree
C:.
| a.py
\---subdir
b.py
Content of a.py
#! /usr/bin/env python
import os, sys
print "a.py: sys.argv[0]=", sys.argv[0]
print "a.py: __file__=", __file__
print "a.py: os.getcwd()=", os.getcwd()
print
execfile("subdir/b.py")
Content of subdir/b.py
#! /usr/bin/env python
import os, sys
print "b.py: sys.argv[0]=", sys.argv[0]
print "b.py: __file__=", __file__
print "b.py: os.getcwd()=", os.getcwd()
print
Output of python a.py (on Windows)
a.py: __file__= a.py
a.py: os.getcwd()= C:\zzz
b.py: sys.argv[0]= a.py
b.py: __file__= a.py
b.py: os.getcwd()= C:\zzz
Related (but these answers are incomplete)
Find path to currently running file
Path to current file depends on how I execute the program
How can I know the path of the running script in Python?
Change directory to the directory of a Python script

First, you need to import from inspect and os
from inspect import getsourcefile
from os.path import abspath
Next, wherever you want to find the source file from you just use
abspath(getsourcefile(lambda:0))

You can't directly determine the location of the main script being executed. After all, sometimes the script didn't come from a file at all. For example, it could come from the interactive interpreter or dynamically generated code stored only in memory.
However, you can reliably determine the location of a module, since modules are always loaded from a file. If you create a module with the following code and put it in the same directory as your main script, then the main script can import the module and use that to locate itself.
some_path/module_locator.py:
def we_are_frozen():
# All of the modules are built-in to the interpreter, e.g., by py2exe
return hasattr(sys, "frozen")
def module_path():
encoding = sys.getfilesystemencoding()
if we_are_frozen():
return os.path.dirname(unicode(sys.executable, encoding))
return os.path.dirname(unicode(__file__, encoding))
some_path/main.py:
import module_locator
my_path = module_locator.module_path()
If you have several main scripts in different directories, you may need more than one copy of module_locator.
Of course, if your main script is loaded by some other tool that doesn't let you import modules that are co-located with your script, then you're out of luck. In cases like that, the information you're after simply doesn't exist anywhere in your program. Your best bet would be to file a bug with the authors of the tool.

This solution is robust even in executables:
import inspect, os.path
filename = inspect.getframeinfo(inspect.currentframe()).filename
path = os.path.dirname(os.path.abspath(filename))

I was running into a similar problem, and I think this might solve the problem:
def module_path(local_function):
''' returns the module path without the use of __file__. Requires a function defined
locally in the module.
from http://stackoverflow.com/questions/729583/getting-file-path-of-imported-module'''
return os.path.abspath(inspect.getsourcefile(local_function))
It works for regular scripts and in IDLE. All I can say is try it out for others!
My typical usage:
from toolbox import module_path
def main():
pass # Do stuff
global __modpath__
__modpath__ = module_path(main)
Now I use _modpath_ instead of _file_.

You have simply called:
path = os.path.abspath(os.path.dirname(sys.argv[0]))
instead of:
path = os.path.dirname(os.path.abspath(sys.argv[0]))
abspath() gives you the absolute path of sys.argv[0] (the filename your code is in) and dirname() returns the directory path without the filename.

The short answer is that there is no guaranteed way to get the information you want, however there are heuristics that work almost always in practice. You might look at How do I find the location of the executable in C?. It discusses the problem from a C point of view, but the proposed solutions are easily transcribed into Python.

See my answer to the question Importing modules from parent folder for related information, including why my answer doesn't use the unreliable __file__ variable. This simple solution should be cross-compatible with different operating systems as the modules os and inspect come as part of Python.
First, you need to import parts of the inspect and os modules.
from inspect import getsourcefile
from os.path import abspath
Next, use the following line anywhere else it's needed in your Python code:
abspath(getsourcefile(lambda:0))
How it works:
From the built-in module os (description below), the abspath tool is imported.
OS routines for Mac, NT, or Posix depending on what system we're on.
Then getsourcefile (description below) is imported from the built-in module inspect.
Get useful information from live Python objects.
abspath(path) returns the absolute/full version of a file path
getsourcefile(lambda:0) somehow gets the internal source file of the lambda function object, so returns '<pyshell#nn>' in the Python shell or returns the file path of the Python code currently being executed.
Using abspath on the result of getsourcefile(lambda:0) should make sure that the file path generated is the full file path of the Python file.
This explained solution was originally based on code from the answer at How do I get the path of the current executed file in Python?.

This should do the trick in a cross-platform way (so long as you're not using the interpreter or something):
import os, sys
non_symbolic=os.path.realpath(sys.argv[0])
program_filepath=os.path.join(sys.path[0], os.path.basename(non_symbolic))
sys.path[0] is the directory that your calling script is in (the first place it looks for modules to be used by that script). We can take the name of the file itself off the end of sys.argv[0] (which is what I did with os.path.basename). os.path.join just sticks them together in a cross-platform way. os.path.realpath just makes sure if we get any symbolic links with different names than the script itself that we still get the real name of the script.
I don't have a Mac; so, I haven't tested this on one. Please let me know if it works, as it seems it should. I tested this in Linux (Xubuntu) with Python 3.4. Note that many solutions for this problem don't work on Macs (since I've heard that __file__ is not present on Macs).
Note that if your script is a symbolic link, it will give you the path of the file it links to (and not the path of the symbolic link).

You can use Path from the pathlib module:
from pathlib import Path
# ...
Path(__file__)
You can use call to parent to go further in the path:
Path(__file__).parent

Simply add the following:
from sys import *
path_to_current_file = sys.argv[0]
print(path_to_current_file)
Or:
from sys import *
print(sys.argv[0])

If the code is coming from a file, you can get its full name
sys._getframe().f_code.co_filename
You can also retrieve the function name as f_code.co_name

The main idea is, somebody will run your python code, but you need to get the folder nearest the python file.
My solution is:
import os
print(os.path.dirname(os.path.abspath(__file__)))
With
os.path.dirname(os.path.abspath(__file__))
You can use it with to save photos, output files, ...etc

import os
current_file_path=os.path.dirname(os.path.realpath('__file__'))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Using multiprocessing when the function called is dependent on a file - python

Create several directories, with one mobcal.run each, and run your fortran program into them instead. If you need a sleep() in multiprocessing you are doing it wrong.

Related

How to make a module reload in python after the script is compiled?

running a .py file from another Python script using os

Run python script inside another python script

Python -- "Batch Processing" of multiple existing scripts

How to print the filename of the current execution file in python? [duplicate]

Categories

Resources