I have a question regarding the pymatbridge. I have been trying to use it as an alternative to the Matlab Engine, which for some reason broke on me recently and I haven't been able to get it to work again. I followed the instructions from Github and when testing my script in the terminal, the zmq connection works great, and the connection gets established every single time. But when I copy paste what's working in the terminal into a python script, the connection fails every single time. I'm not familiar with zmq, but the problem seems to be systematic, so I was wondering if there was something obvious I'm missing. Here is my code.
import os
import glob
import csv
import numpy as np
import matplotlib.pylab as plt
#Alternative to matlab Engine: pymatbridge
import pymatbridge as pymat
matlab = pymat.Matlab(executable='/Applications/MATLAB_R2015a.app/bin/matlab')
#Directory of Matlab functions
Matlab_dir = '/Users/cynthiagerlein/Dropbox (Personal)/Scatterometer/Matlab/'
#Directory with SIR data
SIR_dir = '/Volumes/blahblahblah/OriginalData/'
#Directory with matrix data
Data_dir = '/Volumes/blahblahblah/Data/'
#Create list of names of SIR files to open and save as matrices
os.chdir(SIR_dir)
#Save list of SIR file names
SIR_File_List = glob.glob("*.sir")
#Launch Pymatbridge
matlab.start()
for the_file in SIR_File_List:
print 'We are on file ', the_file
Running_name = SIR_dir + the_file
image = matlab.run_func('/Users/cynthiagerlein/Dropbox\ \(Personal\)/Scatterometer/Matlab/loadsir.m', Running_name)
np.savetxt(Data_dir+the_file[:22] + '.txt.gz',np.array(image['result']) )
I ended up using matlab_wrapper instead, and it's working great and was A LOT easier to install and set up, but I am just curious to understand why the pymatbridge is failing in my script but working in terminal. By the way, I learned about both pymatbridge and matlab_wrapper in the amazing answer to this post (scroll down, 3rd answer).
Related
Currently, I'm working on a small college project.
Today I created small layered structure, and everything working fine, i created some small python modules in separate files, and next to it, I use these modules and their methods in other python files. Everything looks and works fine, and i save my progress and close VS Code.
After few hours I'm back to coding with this horrible env, and I find that's my imports stop working, without any logical reason, I don't perform any changes to my code before and after I close vs last time. I tried some online solutions, but none works for me, as well as I'm a .net developer and a situation like this is a little weird for me.
Anyone had same or similar issue with python ?
Fragment of structure:
Code:
ThrowHelper.py:
class appError(Exception):
def __init__(self, message) -> None:
self.message = message
CommonFileModule.py:
import pandas as pd
import helpers.ThrowHelper as throw
import core.Shared as core
def readCsvFile(path: str, **separator: str):
try:
file = open(path, "rt")
file.close()
data = pd.read_csv(file, separator)
return data
except FileNotFoundError:
throw.appError(core.AppErrorCodes.FileDoesntExist)
return None
ErrorCode:
It seems to me that you should import full import paths e.g.
from common.helpers.ThrowHelper import appError as throw
I tested it and it worked for me.
I am trying to work with NAO (V50) with trac_ik inverse kinematic solver. I need to use modified limits, because NAO has skin added onto him, which changed the limits. Tried to use pre-genrated nao.urdf without modification, but that throws the same error. When I looked up this error, I found it could be related to tf library. Their included trac_ik example code for pr2 works just fine. When I thought it was bug from trac_ik, they responded that it is ROS usage error.
from naoqi import ALProxy
from trac_ik_python.trac_ik import IK
import rospy
with open('data.urdf', 'r') as file:
urdf=file.read()
Solver = IK("Torso", "LShoulderPitch", urdf_string=urdf)
Ends with:
terminate called after throwing an instance of 'ros::TimeNotInitializedException'
what(): Cannot use ros::Time::now() before the first NodeHandle has been created or ros::start() has been called. If this is a standalone app or test that just uses ros::Time and does not communicate over ROS, you may also call ros::Time::init()
Neúspěšně ukončen (SIGABRT) (core dumped [obraz paměti uložen])
Also tried to have rospy.init_node("text") in the beginning, but that also did not work. Using ROS Melodic. How do I find what is causing this/what is the correct ROS usage?
Edit: Why the downvote?
Make sure you initialize ROS time before doing anything else, since some stuff you importing might need it.
import rospy
rospy.init_node("testnode")
from naoqi import ALProxy
from trac_ik_python.trac_ik import IK
with open('data.urdf', 'r') as file:
urdf=file.read()
Solver = IK("Torso", "LShoulderPitch", urdf_string=urdf)
Update: It seems this is a tf related issue as you said. Can you try these steps:
1- Find track_ik_wrap.i file in track_ik_python package.
2- add line "ros::Time::init();" to TRAC_IK constructor. (I added it before urdf::Model robot_model; line)
3- Recompile package with catkin_make --pkg track_ik_python
4- Run your example script again.
I'm working on a proof of concept using rpy2 to tie an existing R package to a web service. I do have the source to the package, if that is needed to fix this issue. I'm also currently developing on Windows, but if this problem is solved by using Linux instead, that's fine, as that's my planned environment.
For my first point in this POC, I'm trying to capture a chart made by this package, and serve it up to a web request using Flask. The complete code:
from flask import Flask, Response
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
from tempfile import TemporaryDirectory
from os import path
app = Flask(__name__)
null = ro.r("NULL")
numeric = ro.r("numeric")
grdevices = importr("grDevices")
efm = importr('euroformix')
#app.route('/')
def index():
table = efm.tableReader('stain.txt')
list = efm.sample_tableToList(table)
with TemporaryDirectory() as dir_name:
print("Working in {0}".format(dir_name))
png_path = path.join(dir_name, "epg_mix.png")
print("png path {0}".format(png_path))
grdevices.png(file=png_path, width=512, height=512)
# Do Data Science Stuff Here
grdevices.dev_off()
with open(png_path, 'rb') as f:
png = f.read()
return Response(png, "image/png")
if __name__ == '__main__':
app.run(debug=True)
When hitting the service, I get back PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\matt\\AppData\\Local\\Temp\\tmpgg65cagq\\epg_mix.png'
Looking at the call stack, it happens when TempDirectory() goes to clean up. Using the Flask debugger, the png variable is empty, also.
So, how do I make grDevices close the file? Or do I need to go about my POC a different way?
rpy2 is not fully supported on Windows and what is working on Linux (or OS X) might not. Since you are developing a PoC with Flask, I'd encourage you to try using Docker (with docker-machine on Windows). You could use rpy2's docker image as a base image.
However, here this is just using the R functions png() and dev.off() so it "should" work.
I have 3 suggestions:
1-
Does your "Do Data Science stuff" block make any R plot ? If not this would explain why your Python object png is empty.
2-
If using R's grid system (e.g., through lattice or ggplot2) and you are evaluating strings as R code it is preferable to explicitly ask R to plot the figure. For example:
p <- ggplot(mydata) + geom_point(aes(x=x, y=y))
print(p)
rather than
ggplot(mydata) + geom_point(aes(x=x, y=y))
3-
Try moving return Response(png, "image/png") to outside the context manager block for TemporaryDirectory
I am trying to distribute some programs to a local cluster I built using Spark. The aim of this project is to pass some data to each worker and pass the data to an external matlab function to process and collect the data back to master node. I met problem of how to call matlab function. Is that possible for Spark to call external function? In other word, could we control each function parallelized in Spark to search local path of each node to execute external function.
Here is a small test code:
run.py
import sys
from operator import add
from pyspark import SparkContext
import callmatlab
def run(a):
# print '__a'
callmatlab.sparktest()
if __name__ == "__main__":
sc = SparkContext(appName="PythonWordCount")
output = sc.parallelize(range(1,2)).map(run)
print output
sc.stop()
sparktest.py
import matlab.engine as eng
import numpy as np
eng = eng.start_matlab()
def sparktest():
print "-----------------------------------------------"
data = eng.sparktest()
print "----the return data:\n", type(data), data
if __name__ == "__main__":
sparktest()
submit spark
#!/bin/bash
path=/home/zzz/ProgramFiles/spark
$path/bin/spark-submit \
--verbose \
--py-files $path/hpc/callmatlab.py $path/hpc/sparktest.m \
--master local[4] \
$path/hpc/run.py \
README.md
It seems Spark asks all attached .py files shown as parameters of --py-files, however, Spark does not recognize sparktest.m.
I do not know how to continue. Could anyone give me some advice? Does Spark allow this way? Or any recommendation of other distributed python framework?
Thanks
Thanks for trying to answer my question. I use a different way to solve this problem. I uploaded the matlab files and data need to call and load to a path in the node file system. And the python just add the path and call it using matlab.engine module.
So my callmatlab.py becomes
import matlab.engine as eng
import numpy as np
import os
eng = eng.start_matlab()
def sparktest():
print "-----------------------------------------------"
eng.addpath(os.path.join(os.getenv("HOME"), 'zzz/hpc/'),nargout=0)
data = eng.sparktest([12, 1, 2])
print data
Firstly, I do not see any reason to pass on sparktest.m.
Secondly, recommended way is putting them in a .zip file. From documentation:
For Python, you can use the --py-files argument of spark-submit to add
.py, .zip or .egg files to be distributed with your application. If
you depend on multiple Python files we recommend packaging them into a
.zip or .egg.
At the end, remember your function will be executed in an executor jvm in a remote m/c, so Spark framework ships function, closure and additional files as part of the job. Hope that helps.
Add the
--files
option before the sparktest.m .
That tells Spark to ship the sparktest.m file to all workers.
The question of how to speed up importing of Python modules has been asked previously (Speeding up the python "import" loader and Python -- Speed Up Imports?) but without specific examples and has not yielded accepted solutions. I will therefore take up the issue again here, but this time with a specific example.
I have a Python script that loads a 3-D image stack from disk, smooths it, and displays it as a movie. I call this script from the system command prompt when I want to quickly view my data. I'm OK with the 700 ms it takes to smooth the data as this is comparable to MATLAB. However, it takes an additional 650 ms to import the modules. So from the user's perspective the Python code runs at half the speed.
This is the series of modules I'm importing:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
Of course, not all modules are equally slow to import. The chief culprits are:
matplotlib.pyplot [300ms]
numpy [110ms]
scipy.signal [200ms]
I have experimented with using from, but this isn't any faster. Since Matplotlib is the main culprit and it's got a reputation for slow screen updates, I looked for alternatives. One is PyQtGraph, but that takes 550 ms to import.
I am aware of one obvious solution, which is to call my function from an interactive Python session rather than the system command prompt. This is fine but it's too MATLAB-like, I'd prefer the elegance of having my function available from the system prompt.
I'm new to Python and I'm not sure how to proceed at this point. Since I'm new, I'd appreciate links on how to implement proposed solutions. Ideally, I'm looking for a simple solution (aren't we all!) because the code needs to be portable between multiple Mac and Linux machines.
Not an actual answer to the question, but a hint on how to profile the import speed with Python 3.7 and tuna (a small project of mine):
python3 -X importtime -c "import scipy" 2> scipy.log
tuna scipy.log
you could build a simple server/client, the server running continuously making and updating the plot, and the client just communicating the next file to process.
I wrote a simple server/client example based on the basic example from the socket module docs: http://docs.python.org/2/library/socket.html#example
here is server.py:
# expensive imports
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
# Echo server program
import socket
HOST = '' # Symbolic name meaning all available interfaces
PORT = 50007 # Arbitrary non-privileged port
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((HOST, PORT))
s.listen(1)
while 1:
conn, addr = s.accept()
print 'Connected by', addr
data = conn.recv(1024)
if not data: break
conn.sendall("PLOTTING:" + data)
# update plot
conn.close()
and client.py:
# Echo client program
import socket
import sys
HOST = '' # The remote host
PORT = 50007 # The same port as used by the server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.sendall(sys.argv[1])
data = s.recv(1024)
s.close()
print 'Received', repr(data)
you just run the server:
python server.py
which does the imports, then the client just sends via the socket the filename of the new file to plot:
python client.py mytextfile.txt
then the server updates the plot.
On my machine running your imports take 0.6 seconds, while running client.py 0.03 seconds.
You can import your modules manually instead, using imp. See documentation here.
For example, import numpy as np could probably be written as
import imp
np = imp.load_module("numpy",None,"/usr/lib/python2.7/dist-packages/numpy",('','',5))
This will spare python from browsing your entire sys.path to find the desired packages.
See also:
Manually importing gtk fails: module not found
1.35 seconds isn't long, but I suppose if you're used to half that for a "quick check" then perhaps it seems so.
Andrea suggests a simple client/server setup, but it seems to me that you could just as easily call a very slight modification of your script and keep it's console window open while you work:
Call the script, which does the imports then waits for input
Minimize the console window, switch to your work, whatever: *Do work*
Select the console again
Provide the script with some sort of input
Receive the results with no import overhead
Switch away from the script again while it happily awaits input
I assume your script is identical every time, ie you don't need to give it image stack location or any particular commands each time (but these are easy to do as well!).
Example RAAC's_Script.py:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
print('********* RAAC\'s Script Now Running *********')
while True: # Loops forever
# Display a message and wait for user to enter text followed by enter key.
# In this case, we're not expecting any text at all and if there is any it's ignored
input('Press Enter to test image stack...')
'''
*
*
**RAAC's Code Goes Here** (Make sure it's indented/inside the while loop!)
*
*
'''
To end the script, close the console window or press ctrl+c.
I've made this as simple as possible, but it would require very little extra to handle things like quitting nicely, doing slightly different things based on input, etc.
You can use lazy imports, but it depends on your use case.
If it's an application, you can run necessary modules for GUI, then after window is loaded, you can import all your modules.
If it's a module and user do not use all the dependencies, you can import inside function.
[warning]
It's against pep8 i think and it's not recomennded at some places, but all the reason behind this is mostly readability (i may be wrong though...) and some builders (e.g. pyinstaller) bundling (which can be solved with adding missing dependencies param to spec)
If you use lazy imports, use comments so user knows that there are extra dependencies.
Example:
import numpy as np
# Lazy imports
# import matplotlib.pyplot as plt
def plot():
import matplotlib.pyplot as plt
# Your function here
# This will be imported during runtime
For some specific libraries i think it's necessity.
You can also create some let's call it api in __init__.py
For example on scikit learn. If you import sklearn and then call some model, it's not found and raise error. You need to be more specific then and import directly submodule. Though it can be unconvenient for users, it's imho good practice and can reduce import times significantly.
Usually 10% of imported libraries cost 90% of import time. Very simple tool for analysis is line_profiler
import line_profiler
import atexit
profile = line_profiler.LineProfiler()
atexit.register(profile.print_stats)
#profile
def profiled_function():
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
profiled_function()
This give results
Line # Hits Time Per Hit % Time Line Contents
==============================================================
20 #profile
21 def profiled_function():
22
23 1 2351852.0 2351852.0 6.5 import numpy as np
24 1 6545679.0 6545679.0 18.0 import pandas as pd
25 1 27485437.0 27485437.0 75.5 import matplotlib.pyplot as plt
75% of three libraries imports time is matplotlib (this does not mean that it's bad written, it just needs a lot of stuff for grafic output)
Note:
If you import library in one module, other imports cost nothing, it's globally shared...
Another note:
If using imports directly from python (e.g pathlib, subprocess etc.) do not use lazy load, python modules import times are close to zero and don't need to be optimized from my experience...
I have done just a basic test below, but it shows that runpy can be used to solve this issue when you need to have a whole Python script to be faster (you don't want to put any logic in test_server.py).
test_server.py
import socket
import time
import runpy
import matplotlib.pyplot
HOST = 'localhost'
PORT = 50007
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
serversocket.bind((HOST, PORT))
except:
print("Server is already running")
exit(1)
# Start server with maximum 100 connections
serversocket.listen(100)
while True:
connection, address = serversocket.accept()
buf = connection.recv(64)
if len(buf) > 0:
buf_str = str(buf.decode("utf-8"))
now = time.time()
runpy.run_path(path_name=buf_str)
after = time.time()
duration = after - now
print("I received " + buf_str + " script and it took " + str(duration) + " seconds to execute it")
test_client.py
import socket
import sys
HOST = 'localhost'
PORT = 50007
clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientsocket.connect((HOST, PORT))
message = sys.argv[1].encode()
clientsocket.send(message)
test_lag.py
import matplotlib.pyplot
Testing:
$ python3 test_client.py test_lag.py
I received test_lag.py script and it took 0.0002799034118652344 seconds to execute it
$ time python3 test_lag.py
real 0m0.624s
user 0m1.307s
sys 0m0.180s
Based on this, module is pre-loaded for fast usage.