I'm working on a proof of concept using rpy2 to tie an existing R package to a web service. I do have the source to the package, if that is needed to fix this issue. I'm also currently developing on Windows, but if this problem is solved by using Linux instead, that's fine, as that's my planned environment.
For my first point in this POC, I'm trying to capture a chart made by this package, and serve it up to a web request using Flask. The complete code:
from flask import Flask, Response
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
from tempfile import TemporaryDirectory
from os import path
app = Flask(__name__)
null = ro.r("NULL")
numeric = ro.r("numeric")
grdevices = importr("grDevices")
efm = importr('euroformix')
#app.route('/')
def index():
table = efm.tableReader('stain.txt')
list = efm.sample_tableToList(table)
with TemporaryDirectory() as dir_name:
print("Working in {0}".format(dir_name))
png_path = path.join(dir_name, "epg_mix.png")
print("png path {0}".format(png_path))
grdevices.png(file=png_path, width=512, height=512)
# Do Data Science Stuff Here
grdevices.dev_off()
with open(png_path, 'rb') as f:
png = f.read()
return Response(png, "image/png")
if __name__ == '__main__':
app.run(debug=True)
When hitting the service, I get back PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\matt\\AppData\\Local\\Temp\\tmpgg65cagq\\epg_mix.png'
Looking at the call stack, it happens when TempDirectory() goes to clean up. Using the Flask debugger, the png variable is empty, also.
So, how do I make grDevices close the file? Or do I need to go about my POC a different way?
rpy2 is not fully supported on Windows and what is working on Linux (or OS X) might not. Since you are developing a PoC with Flask, I'd encourage you to try using Docker (with docker-machine on Windows). You could use rpy2's docker image as a base image.
However, here this is just using the R functions png() and dev.off() so it "should" work.
I have 3 suggestions:
1-
Does your "Do Data Science stuff" block make any R plot ? If not this would explain why your Python object png is empty.
2-
If using R's grid system (e.g., through lattice or ggplot2) and you are evaluating strings as R code it is preferable to explicitly ask R to plot the figure. For example:
p <- ggplot(mydata) + geom_point(aes(x=x, y=y))
print(p)
rather than
ggplot(mydata) + geom_point(aes(x=x, y=y))
3-
Try moving return Response(png, "image/png") to outside the context manager block for TemporaryDirectory
Related
How can I convert an asciidoc to html using the asciidoc3 python package from within my python script? I'm not able to find a working example. The official docs are oriented mainly towards those who will use asciidoc3 as a command line tool, not for those who wish to do conversions in their python apps.
I'm finding that sometimes packages are refactored with significant improvements and old examples on the interwebs are not updated. Python examples frequently omit import statements for brevity, but for newer developers like me, the correct entry point is not obvious.
In my venv, I ran
pip install asciidoc3
Then I tried...
import io
from asciidoc3.asciidoc3api import AsciiDoc3API
infile = io.StringIO('Hello world')
outfile = io.StringIO()
asciidoc3_ = AsciiDoc3API()
asciidoc3_.options('--no-header-footer')
asciidoc3_.execute(infile, outfile, backend='html4')
print(outfile.getvalue())
and
import io
from asciidoc3 import asciidoc3api
asciidoc3_ = asciidoc3api.AsciiDoc3API()
infile = io.StringIO('Hello world')
asciidoc3_.execute(infile)
Pycharm doesn't have a problem with either import attempt when it does it's syntax check and everything looks right based on what I'm seeing in my venv's site-packages... "./venv/lib/python3.10/site-packages/asciidoc3/asciidoc3api.py" is there as expected. But both of my attempts raise "AttributeError: module 'asciidoc3' has no attribute 'execute'"
That's true. asciidoc3 doesn't have any such attribute. It's a method of class AsciiDoc3API defined in asciidoc3api.py. I assume the problem is my import statement?
I figured it out. It wasn't the import statement. The error message was sending me down the wrong rabbit hole but I found this in the module's doc folder...
[NOTE]
.PyPI, venv (Windows or GNU/Linux and other POSIX OS)
Unfortunately, sometimes (not always - depends on your directory-layout, operating system etc.) AsciiDoc3 cannot find the 'asciidoc3' module when you installed via venv and/or PyPI. +
The solution:
from asciidoc3api import AsciiDoc3API
asciidoc3 = AsciiDoc3API('/full/path/to/asciidoc3.py')
I am attempting to use a module named "global_data" to save global state information without success. The code is getting large already, so I`ll try to post only the bare essentials.
from view import cube_control
from ioserver.ioserver import IOServer
from manager import config_manager, global_data
if __name__ == "__main__":
#sets up initial data
config_manager.init_manager()
#modifies data
io = IOServer()
#verify global data modified from IOServer.__init__
global_data.test() #success
#start pyqt GUI
cube_control.start_view()
So far so good. However in the last line cube_control.start_view() it enters this code:
#inside cube_control.py
def start_view():
#verify global data modified from IOServer.__init__
global_data.test() #fail ?!?!
app = QApplication(sys.argv)
w = MainWindow()
sys.exit(app.exec_())
Running the global_data.test() in this case fails. printing the entire global state reveals it now somehow reverted back to the data setup by config_manager.init_manager()
How is this possible?
While Qt is running I have a scheduler called every 10 seconds, also reporting a failed test.
However once the Qt GUI is stopped (clicked "x"), and I run the test from console, it succeeds again.
Inside the global_data module I`ve attempted to store the data in a dict inside both a simple python object as well as a ZODB in memory database:
#inside global_data
state = {
"units" : {}
}
db = ZODB.DB(None) #creates an in memory db
def test(identity="no-id"):
con = db.open()
r = con.root()
print("test online: ", r["units"]["local-test"]["online"], identity)
con.close()
Both have the exact same problem. Above the test is only done using the db.
The reason I attempted to use a db is that I understand threads can create a completely new global dictionary. However the 2 first tests are in the same thread. The cyclic one is in its own thread and could potentially create such a problem...?
File organization
If it helps my program is organized with the following structure:
There is also a "view" folder with some qt5 GUI files.
The IOServer attempts to connect to a bunch of OPC-UA servers using the opcua module. No threads are manually started there, although I suppose the opcua module does to stay connected.
global_data id()
I attempted to also print(id(global_data)) together with the tests and found that the ID is the same in IOServer AND top level code, but changes inside the cube_control.py#start_view. Should not these always refer to the same module?
I`m still not sure what exactly happened. But apparently this was solved by removing the init.py file inside of the folder named manager. Now all imports of the module named "global_data" points to the same ID.
How using a init.py file caused a second instance of the same module remains a mystery
I'm getting pretty confused as to how and where to initialise application configuration in Python 3.
I have configuration that consists of application specific config (db connection strings, url endpoints etc.) and logging configuration.
Before my application performs its intended function I want to initialise the application and logging config.
After a few different attempts, I eventually ended up with something like the code below in my main entry module. It has the nice effect of all imports being grouped at the top of the file (https://www.python.org/dev/peps/pep-0008/#imports), but it doesn't feel right since the config modules are being imported for side effects alone which is pretty non-intuitive.
import config.app_config # sets up the app config
import config.logging_config # sets up the logging config
...
if __name__ == "__main__":
...
config.app_config looks something like follows:
_config = {
'DB_URL': None
}
_config['DB_URL'] = _get_db_url()
def db_url():
return _config['DB_URL']
def _get_db_url():
#somehow get the db url
and config.logging_config looks like:
if not os.path.isdir('.\logs'):
os.makedirs('.\logs')
if os.path.exists('logging_config.json'):
with open(path, 'rt') as f:
config = json.load(f)
logging.config.dictConfig(config)
else:
logging.basicConfig(level=log_level)
What is the common way to set up application configuration in Python? Bearing in mind that I will have multiple applications each using the config.app_config and config.logging_config module, but with different connection string possibly read from a file
I ended up with a cut down version of the Django approach: https://github.com/django/django/blob/master/django/conf/init.py
It seems pretty elegant and has the nice benefit of working regardless of which module imports settings first.
I have a question regarding the pymatbridge. I have been trying to use it as an alternative to the Matlab Engine, which for some reason broke on me recently and I haven't been able to get it to work again. I followed the instructions from Github and when testing my script in the terminal, the zmq connection works great, and the connection gets established every single time. But when I copy paste what's working in the terminal into a python script, the connection fails every single time. I'm not familiar with zmq, but the problem seems to be systematic, so I was wondering if there was something obvious I'm missing. Here is my code.
import os
import glob
import csv
import numpy as np
import matplotlib.pylab as plt
#Alternative to matlab Engine: pymatbridge
import pymatbridge as pymat
matlab = pymat.Matlab(executable='/Applications/MATLAB_R2015a.app/bin/matlab')
#Directory of Matlab functions
Matlab_dir = '/Users/cynthiagerlein/Dropbox (Personal)/Scatterometer/Matlab/'
#Directory with SIR data
SIR_dir = '/Volumes/blahblahblah/OriginalData/'
#Directory with matrix data
Data_dir = '/Volumes/blahblahblah/Data/'
#Create list of names of SIR files to open and save as matrices
os.chdir(SIR_dir)
#Save list of SIR file names
SIR_File_List = glob.glob("*.sir")
#Launch Pymatbridge
matlab.start()
for the_file in SIR_File_List:
print 'We are on file ', the_file
Running_name = SIR_dir + the_file
image = matlab.run_func('/Users/cynthiagerlein/Dropbox\ \(Personal\)/Scatterometer/Matlab/loadsir.m', Running_name)
np.savetxt(Data_dir+the_file[:22] + '.txt.gz',np.array(image['result']) )
I ended up using matlab_wrapper instead, and it's working great and was A LOT easier to install and set up, but I am just curious to understand why the pymatbridge is failing in my script but working in terminal. By the way, I learned about both pymatbridge and matlab_wrapper in the amazing answer to this post (scroll down, 3rd answer).
I am trying to distribute some programs to a local cluster I built using Spark. The aim of this project is to pass some data to each worker and pass the data to an external matlab function to process and collect the data back to master node. I met problem of how to call matlab function. Is that possible for Spark to call external function? In other word, could we control each function parallelized in Spark to search local path of each node to execute external function.
Here is a small test code:
run.py
import sys
from operator import add
from pyspark import SparkContext
import callmatlab
def run(a):
# print '__a'
callmatlab.sparktest()
if __name__ == "__main__":
sc = SparkContext(appName="PythonWordCount")
output = sc.parallelize(range(1,2)).map(run)
print output
sc.stop()
sparktest.py
import matlab.engine as eng
import numpy as np
eng = eng.start_matlab()
def sparktest():
print "-----------------------------------------------"
data = eng.sparktest()
print "----the return data:\n", type(data), data
if __name__ == "__main__":
sparktest()
submit spark
#!/bin/bash
path=/home/zzz/ProgramFiles/spark
$path/bin/spark-submit \
--verbose \
--py-files $path/hpc/callmatlab.py $path/hpc/sparktest.m \
--master local[4] \
$path/hpc/run.py \
README.md
It seems Spark asks all attached .py files shown as parameters of --py-files, however, Spark does not recognize sparktest.m.
I do not know how to continue. Could anyone give me some advice? Does Spark allow this way? Or any recommendation of other distributed python framework?
Thanks
Thanks for trying to answer my question. I use a different way to solve this problem. I uploaded the matlab files and data need to call and load to a path in the node file system. And the python just add the path and call it using matlab.engine module.
So my callmatlab.py becomes
import matlab.engine as eng
import numpy as np
import os
eng = eng.start_matlab()
def sparktest():
print "-----------------------------------------------"
eng.addpath(os.path.join(os.getenv("HOME"), 'zzz/hpc/'),nargout=0)
data = eng.sparktest([12, 1, 2])
print data
Firstly, I do not see any reason to pass on sparktest.m.
Secondly, recommended way is putting them in a .zip file. From documentation:
For Python, you can use the --py-files argument of spark-submit to add
.py, .zip or .egg files to be distributed with your application. If
you depend on multiple Python files we recommend packaging them into a
.zip or .egg.
At the end, remember your function will be executed in an executor jvm in a remote m/c, so Spark framework ships function, closure and additional files as part of the job. Hope that helps.
Add the
--files
option before the sparktest.m .
That tells Spark to ship the sparktest.m file to all workers.