Pass R object to Python after running R Script - python

I have a python script Test.py that runs an R script Test.R below:
import subprocess
import pandas
import pyper
#Run Simple R Code and Print Output
proc = subprocess.Popen(['Path/To/Rscript.exe',
'Path/To/Test.R'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = proc.communicate()
print stdout
print stderr
The R script is below:
library("methods")
x <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(2,4,6,8,10,12,14,16,18,20)
data <- data.frame(x,y)
How can I pass the R data frame (or any R object for that matter) to Python? I've had great difficulty getting Rpy2 to work on windows, and I've seen this link to use PypeR but it's using a lot of in-line R code in the Python code and I'd really like to keep the code on separate files (or is this practice considered acceptable?) Thanks.

I've experienced issues getting Rpy2 to work on a mac too and use the same workaround calling R directly from python via subprocess; also agree that keeping files separate helps manage complexity.
First export your data as a .csv from R (again script called through subprocess):
write.table(data, file = 'test.csv')
After that, you can import as a python pandas data frame (Pandas):
import pandas as pd
dat = pd.read_csv('test.csv')
dat
row x y
1 1 2
2 2 4
3 3 6
4 4 8
5 5 10
6 6 12
7 7 14
8 8 16
9 9 18
10 10 20

Related

Python: tracing lines of source code that python function activates

I am aware of how to find the location of a python function in its source code with the inspect module via
import inspect
inspect.getsourcefile(random_function)
However, while a python function is running, or after it has run, how would one find all of the pieces of the source code it utilized/referenced during its individual run?
For ex., if I ran random_function(arg1=1, arg2=2) vs. random_function(arg1=1, arg5=3.5), I would like to know which different parts of the module got used each time.
Is there anything like the example here?
You can do this by getting the information of bytecode using dis module. For instance:
import dis
import numpy as np
def main():
array = np.zeros(shape = 5)
if __name__ == '__main__':
print(dis.dis(main))
result:
5 0 LOAD_GLOBAL 0 (np)
2 LOAD_ATTR 1 (zeros)
4 LOAD_CONST 1 (5)
6 LOAD_CONST 2 (('shape',))
8 CALL_FUNCTION_KW 1
10 STORE_FAST 0 (array)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE

Find start and end of a context in Python 3

I am trying to find the line number of the start and the end of a context. In Python 2.7 I am able to successfully do so as follows:
1 from contextlib import contextmanager
2 import sys
3
4 #contextmanager
5 def print_start_end_ctx():
6 frame = sys._getframe(2)
7 start_line = frame.f_lineno
8 yield
9 end_line = frame.f_lineno
10 print("start_line={}\nend_line={}".format(start_line, end_line))
11
12 with print_start_end_ctx():
13 100
14 (200,
15 300)
Output in Python 2.7:
start_line=12
end_line=15
However, extracting line numbers from frame object fails in Python 3.7:
start_line=12
end_line=14

Print function and numpy.savetxt in python 3

Some code I am using (not in python) takes input files written in specific way. I usually prepare such input files with python scripts. One of them takes the following format:
100
0 1 2
3 4 5
6 7 8
where 100 is just an overall parameter and the rest is a matrix. In python 2, I used to do it in the following way:
# python 2.7
import numpy as np
Matrix = np.arange(9)
Matrix.shape = 3,3
f = open('input.inp', 'w')
print >> f, 100
np.savetxt(f, Matrix)
I just converted to python 3 recently. Running above script with 2to3 gets me something like:
# python 3.6
import numpy as np
Matrix = np.arange(9)
Matrix.shape = 3,3
f = open('input.inp', 'w')
print(100, file=f)
np.savetxt(f, Matrix)
The first error I got was TypeError: write() argument must be str, not bytes, because there are something like fh.write(asbytes(format % tuple(row) + newline)) during the execution of numpy.savetxt. I was able to fix this problem through opening the file as a binary: f = open('input.inp', 'wb'). But this will cause the print() to fail. Is there a way to harmonize these two?
I ran into this same issue converting to python3. All strings in python3 are interpreted as unicode by default now, so you have to convert. I found the solution of writing to a string first and then writing the string to the file to be the most appealing. This is a working version of your snippet in python3 using this method:
# python 3.6
from io import BytesIO
import numpy as np
Matrix = np.arange(9)
Matrix.shape = 3,3
f = open('input.inp', 'w')
print(100, file=f)
fh = BytesIO()
np.savetxt(fh, Matrix, fmt='%d')
cstr = fh.getvalue()
fh.close()
print(cstr.decode('UTF-8'), file=f)

Python multiprocessing pool swallows exception from first chunk's input

I'm writing a script that reads a bunch of files, and then processes the rows from all of those files in parallel.
My problem is that the script behaves strangely if it can't open some of the files. If it's one of the later files in the list, then it processes the earlier files, and reports the exception when it gets to the bad file. However, if it can't open one of the first files in the list, then it processes nothing, and doesn't report an exception.
How can I make the script report all exceptions, no matter where they are in the list?
The key problem seems to be the chunk size of pool.imap(). If the exception occurs before the first chunk is submitted, it fails silently.
Here's a little script to reproduce the problem:
from multiprocessing.pool import Pool
def prepare():
for i in range(5):
yield i+1
raise RuntimeError('foo')
def process(x):
return x
def test(chunk_size):
pool = Pool(10)
n = raised = None
try:
for n in pool.imap(process, prepare(), chunksize=chunk_size):
pass
except RuntimeError as ex:
raised = ex
print(chunk_size, n, raised)
def main():
print('chunksize n raised')
for chunk_size in range(1, 10):
test(chunk_size)
if __name__ == '__main__':
main()
The prepare() function generates five integers, then raises an exception. That generator gets passed to pool.imap() with chunk size from 1 to 10. Then it prints out the chunk size, number of results received, and any exception raised.
chunksize n raised
1 5 foo
2 4 foo
3 3 foo
4 4 foo
5 5 foo
6 None None
7 None None
8 None None
9 None None
You can see that the exception is properly reported until the chunk size increases enough that the exception happens before the first chunk is submitted. Then it silently fails, and no results are returned.
If I run this (I modified it slightly for py2k and py3k cross compatibility) with Python 2.7.13 and 3.5.4 on my own handy system, I get:
$ python2 --version
Python 2.7.13
$ python2 mptest.py
chunksize n raised
1 5 foo
2 4 foo
3 3 foo
4 4 foo
5 5 foo
6 None None
7 None None
8 None None
9 None None
$ python3 --version
Python 3.5.4
$ python3 mptest.py
chunksize n raised
1 5 foo
2 4 foo
3 3 foo
4 4 foo
5 5 foo
6 None foo
7 None foo
8 None foo
9 None foo
I presume the fact that it fails (and hence prints None) for chunk sizes > 5 is not surprising, since no pool process can get six arguments since the generator produced by calling mptest can only be called 5 times.
What does seem surprising is that Python2.7.9 says None for the exceptions for chunk sizes above 5, while Python 3.5 says foo for the exceptions.
This is Issue #28699, fixed in commit 794623bdb2. The fix has apparently been backported to Python 3.5.4, but not to Python 2.7.9, nor apparently to your own Python 3 version.

Listing all usb mass storage disks using python

I have few USB disks inserted in my system. I would like to list all of them like:-
/dev/sdb
/dev/sdc
.... ans so on..
Please remember that I don't want to list partitions in it like /dev/sdb1. I am looking for solution under Linux. Tried cat /proc/partitions.
major minor #blocks name
8 0 488386584 sda
8 1 52428800 sda1
8 2 52428711 sda2
8 3 1 sda3
8 5 52428800 sda5
8 6 15728516 sda6
8 7 157683712 sda7
8 8 157682688 sda8
11 0 1074400 sr0
11 1 47602 sr1
8 32 3778852 sdc
8 33 1 sdc1
8 37 3773440 sdc5
But it list all the disks and unable to figure which one is USB storage disks. I am looking for a solution which does not require an additional package installation.
You can convert Klaus D.'s suggestion into Python code like this:
#!/usr/bin/env python
import os
basedir = '/dev/disk/by-path/'
print 'All USB disks'
for d in os.listdir(basedir):
#Only show usb disks and not partitions
if 'usb' in d and 'part' not in d:
path = os.path.join(basedir, d)
link = os.readlink(path)
print '/dev/' + os.path.basename(link)
path contains info in this format:
/dev/disk/by-path/pci-0000:00:1d.7-usb-0:5:1.0-scsi-0:0:0:0
which is a symbolic link, so we can get the pseudo-scsi device name using os.readlink().
But that returns info in this format:
../../sdc
so we use os.path.basename() to clean it up.
Instead of using
'/dev/' + os.path.basename(link)
you can produce a string in the '/dev/sdc' format by using
os.path.normpath(os.path.join(os.path.dirname(path), link))
but I think you'll agree that the former technique is simpler. :)
List the right path in /dev:
ls -l /dev/disk/by-path/*-usb-* | fgrep -v part

Categories

Resources