Run Stata do file from Python - python

I have a Python script that cleans up and performs basic statistical calculations on a large panel dataset (2,000,000+ observations).
I find that some of these tasks are better suited to Stata, and wrote a do file with the necessary commands. Thus, I want to run a .do file within my Python code. How would I go about calling a .do file from Python?

I think #user229552 points in the correct direction. Python's subprocess module can be used. Below an example that works for me with Linux OS.
Suppose you have a Python file called pydo.py with the following:
import subprocess
## Do some processing in Python
## Set do-file information
dofile = "/home/roberto/Desktop/pyexample3.do"
cmd = ["stata", "do", dofile, "mpg", "weight", "foreign"]
## Run do-file
subprocess.call(cmd)
and a Stata do-file named pyexample3.do, with the following:
clear all
set more off
local y `1'
local x1 `2'
local x2 `3'
display `"first parameter: `y'"'
display `"second parameter: `x1'"'
display `"third parameter: `x2'"'
sysuse auto
regress `y' `x1' `x2'
exit, STATA clear
Then executing pydo.py in a Terminal window works as expected.
You could also define a Python function and use that:
## Define a Python function to launch a do-file
def dostata(dofile, *params):
## Launch a do-file, given the fullpath to the do-file
## and a list of parameters.
import subprocess
cmd = ["stata", "do", dofile]
for param in params:
cmd.append(param)
return subprocess.call(cmd)
## Do some processing in Python
## Run a do-file
dostata("/home/roberto/Desktop/pyexample3.do", "mpg", "weight", "foreign")
The complete call from a Terminal, with results:
roberto#roberto-mint ~/Desktop
$ python pydo.py
___ ____ ____ ____ ____ (R)
/__ / ____/ / ____/
___/ / /___/ / /___/ 12.1 Copyright 1985-2011 StataCorp LP
Statistics/Data Analysis StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC http://www.stata.com
979-696-4600 stata#stata.com
979-696-4601 (fax)
Notes:
1. Command line editing enabled
. do /home/roberto/Desktop/pyexample3.do mpg weight foreign
. clear all
. set more off
.
. local y `1'
. local x1 `2'
. local x2 `3'
.
. display `"first parameter: `y'"'
first parameter: mpg
. display `"second parameter: `x1'"'
second parameter: weight
. display `"third parameter: `x2'"'
third parameter: foreign
.
. sysuse auto
(1978 Automobile Data)
. regress `y' `x1' `x2'
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 2, 71) = 69.75
Model | 1619.2877 2 809.643849 Prob > F = 0.0000
Residual | 824.171761 71 11.608053 R-squared = 0.6627
-------------+------------------------------ Adj R-squared = 0.6532
Total | 2443.45946 73 33.4720474 Root MSE = 3.4071
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0065879 .0006371 -10.34 0.000 -.0078583 -.0053175
foreign | -1.650029 1.075994 -1.53 0.130 -3.7955 .4954422
_cons | 41.6797 2.165547 19.25 0.000 37.36172 45.99768
------------------------------------------------------------------------------
.
. exit, STATA clear
Sources:
http://www.reddmetrics.com/2011/07/15/calling-stata-from-python.html
http://docs.python.org/2/library/subprocess.html
http://www.stata.com/support/faqs/unix/batch-mode/
A different route for using Python and Stata together can be found at
http://ideas.repec.org/c/boc/bocode/s457688.html
http://www.stata.com/statalist/archive/2013-08/msg01304.html

This answer extends #Roberto Ferrer's answer, solving a few issues I ran into.
Stata in system path
For stata to run code, it must be correctly set up in the system path (on Windows at least). At least for me, this was not automatically set up on installing Stata, and i found the simplest correction was to put in the full path (which for me was "C:\Program Files (x86)\Stata12\Stata-64) i.e.:
cmd = ["C:\Program Files (x86)\Stata12\Stata-64","do", dofile]`
How to quietly run the code in the background
It is possible to get the code to run quietly in the background (i.e. not opening up Stata each time), by adding the command /e i.e.
cmd = ["C:\Program Files (x86)\Stata12\Stata-64,"/e","do", dofile]
Log file storage location
Finally, if you are running quietly in the background, Stata will will want to save log files. It will do this in cmd's working directory. This must vary depending on where the code is being run from, but for me, since i was executing Python from Notepad++, it wanted to save the log files in C:\Program Files (x86)\Notepad++ , which Stata did not have write-access to. This can be changed by specifying the working directory when the sub-process is called.
These modifications to Roberto Ferrer's code lead to:
def dostata(dofile, *params):
cmd = ["C:\Program Files (x86)\Stata12\Stata-64","/e","do", dofile]
for param in params:
cmd.append(param)
return (subprocess.call(cmd, cwd=r'C:\location_to_save_log_files'))

If you're running this in a command-line setting, you should be able to call Stata from the command line from python (I don't know how to invoke a shell command from within Python, but it shouldn't be too hard, see here: Calling an external command in Python). To run Stata from the command line (aka batch mode), see here: http://www.stata.com/support/faqs/unix/batch-mode/

Related

torchserve model not running and giving a load of errors

I ran the following commands
torch-model-archiver --model-name "bert" --version 1.0 --serialized-file ./bert_model/pytorch_model.bin --extra-files "./bert_model/config.json,./bert_model/vocab.txt" --handler "./handler.py"
I created all the files and then I created a new directory and copied the model into it.
Then I executed the following command:
torchserve --start --model-store model_store --models bert=bert.mar
It then displayed a slew of errors.
Here is my error text. It is too long and repetitive; hence, I posted it on paste bin.
error
I would suggest lowering down the number of workers per model (Default workers per model: 12) now you get the maximum number that your can handle
How?
Go to config.properties file and add (the first line indicates the workers to 2):
default_workers_per_model=2
Then when you will do the torchserve add this (ts-config option to point on the location of you config.properties file):
torchserve --start \
--model-store ./deployment/model-store \
--ts-config ./deployment/config.properties \
--models bert=bert.mar
Let me know if this solves the error.
Note : you can add other parameters as well in the config.properties file such as :
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
default_workers_per_model=2
number_of_netty_threads=1
netty_client_threads=1
prefer_direct_buffer=true

Does Vowpal Wabbit's Python interface support a single call for model training/testing?

I've been trying to develop a CB model using Vowpal Wabbit. Following all the online tutorials I could find, I started by training and testing in Python by looping over records:
vw = pyvw.vw("--cb_explore_adf")
for i in range(df.shape[0]):
# format example into vw-friendly format
new_line = to_vw_format_train(df.iloc[i])
#vw learn from each line
vw_line = vw.parse(new_line, pyvw.vw.lContextualBandit)
vw.learn(vw_line)
# save model for future use
vw.save('vw.model')
However, I noticed that calling VW from the command line gives you a more succinct way of training/testing. For example, if I had all records in a VW-friendly format (vw_training_set.txt) I could run:
vw -d vw_training_set.txt --cb_explore_adf -p train_predictions.txt -f vw.model
My questions are:
Assuming all parameters are equal, is there any difference in these two approaches?
If not, is there a way to perform training in Python without explicitly looping over each example?
Edit: I have tried to use pyvw to run the following:
from vowpalwabbit import pyvw
import os
pywd = "directory"
os.chdir(pywd)
# testing built in functionality
test_records = """shared |User var:5
Action 1:-1:0.5 | treatment=1
| treatment=2
shared |User var:3
| treatment=1
Action 2:-2:0.5 | treatment=2
"""
vw_train_records = open(r"test_file.txt","w+")
vw_train_records.write(test_records)
vw_train_records.close()
vw = pyvw.vw("-d test_file.txt --cb_explore_adf -p train_predictions.txt")
vw.save('vw.model')
vw.finish()
The train_predictions file is created, but it is not populated. The vw.model file is also created, but it doesn't seem like its learned anything.
Edit 2: Updated example w/ vw.finish()
Edit 3: Updated example to include correct spacing
Assuming all parameters are equal, is there any difference in these two approaches?
Yes I believe they should be equivalent, except for the -p train_predictions.txt. That isn't present in the Python snippet.
not, is there a way to perform training in Python without explicitly looping over each example?
Not exactly, but it's a great suggestion and something we should look at adding.
You can kind of do this by leveraging the fact that currently (this may change in further versions) pyvw initializes the vw instance with the options passed to the constructor and it has logic to tell if a data file is passed to process it.
So this should work:
vw = pyvw.vw("-d vw_training_set.txt --cb_explore_adf -p train_predictions.txt")
vw.save('vw.model')

Stop new gdal command window opening each time gdal command is executed python

I have written a PyQGIS script that uses gdals warp. The piece of code doing this for me is as follows:
warp = 'gdalwarp -ot Byte -q -of GTiff -tr 2.81932541777e-05 -2.81932541777e-05 -tap -cutline %s -crop_to_cutline -co COMPRESS=DEFLATE -co PREDICTOR=1 -co ZLEVEL=6 -wo OPTIMIZE_SIZE=TRUE %s %s' % (instrv, ('"' + pathsplitedit + '"'), outputpath2)
call (warp)
So I have this in a loop and all is good. However each time it executes a new command window is opened, which isn't ideal as it loops through up 100 features in a shapefile. Is there a way I can not have the command window open at all? Any help is really appreciated!!
Since GDAL 2.1 (which recent versions of QGIS use) you can access the command line utilities from the bindings itself, which has a lot of benefits.
Your call becomes something like this, notice that i didn't copy all your creation options, its just to give you an idea on how to use it.
warpopts = gdal.WarpOptions(outputType=gdal.GDT_Byte,
format='GTiff',
xRes=2.81932541777e-05,
yRes=-2.81932541777e-05,
cutlineDSName='cutline_vec.shp',
cropToCutline=True,
targetAlignedPixels=True,
options=['COMPRESS=DEFLATE'])
ds = gdal.Warp('raster_out.tif', 'raster_in.tif', options=warpopts)
ds = None
One of the benefits is that the input files don't have to be on-disk but can also be opened gdal/ogr Datasets. The gdal.Warp cmd also returns the output file as an opened Dataset, which you can then pass on to other commands.

Python run .exe with input

I would like to use a script to call an executable program and input some instructions in the exe program (a DOS window) to run this exe program (output not required)
For example if I directly run the program I'll double click it and type as follows:
load XXX.txt
oper
quit
Here's my code, fwiw I do not have a deep understanding about subprocess.
import subprocess
import os
os.chdir('D:/Design/avl3.35/Avl/Runs')
Process=subprocess.Popen(['avl.exe'], stdin=subprocess.PIPE)
Process.communicate(b'load allegro.avl\n')
When I run this code, I get the following:
===================================================
Athena Vortex Lattice Program Version 3.35
Copyright (C) 2002 Mark Drela, Harold Youngren
This software comes with ABSOLUTELY NO WARRANTY,
subject to the GNU General Public License.
Caveat computor
===================================================
==========================================================
Quit Exit program
.OPER Compute operating-point run cases
.MODE Eigenvalue analysis of run cases
.TIME Time-domain calculations
LOAD f Read configuration input file
MASS f Read mass distribution file
CASE f Read run case file
CINI Clear and initialize run cases
MSET i Apply mass file data to stored run case(s)
.PLOP Plotting options
NAME s Specify new configuration name
AVL c>
Reading file: allegro.avl ...
Configuration: Allegro-lite 2M
Building surface: WING
Reading airfoil from file: ag35.dat
Reading airfoil from file: ag36.dat
Reading airfoil from file: ag37.dat
Reading airfoil from file: ag38.dat
At line 145 of file ../src/userio.f (unit = 5, file = 'stdin') #!!!!Error here!!
Fortran runtime error: End of file #!!!!Error here!!
Building duplicate image-surface: WING (YDUP)
Building surface: Horizontal tail
Building duplicate image-surface: Horizontal tail (YDUP)
Building surface: Vertical tail
Mach = 0.0000 (default)
Nbody = 0 Nsurf = 5 Nstrp = 64 Nvor = 410
Initializing run cases...
I have no idea what's wrong with this, nor do I know why the error prompt shows inside the codes. After searching I see that communicate method is to wait for the process to finish and return all the output, though I do not need output, but I still don't know what to do.
Could you explain what is happening here and how could I finish what I want to do?

Python Elaphe - Barcode Generation Issues

I would like to use Elaphe to generate barcodes.
I am working on a 64-bit windows machine. This is on Windows 7, Python 2.7, I have Elaphe 0.6.0 and Ghostscript 9.10 installed.
When I run the simple example usage, nothing seems to be happening. The barcode does not show up. When I execute _.show(), it hangs but nothing shows up. I have to do a KeyboardInterrupt to get back to the prompt. What viewer is supposed to launch when I do _.show()? I however see a gswin32.exe process in the Windows Task Manager.
Please refer to my Python traceback at http://dpaste.com/hold/1653582/
Is there a way to see the PS code generated? How can I troubleshoot?
Please help.
The object returned by elaphe.barcode is an EpsImageFile (where EPS means Encapsulated PostScript), but after calling barcode it hasn't yet run Ghostscript to convert the code into a bitmap image.
You can dump out the code that it has generated by looking at the fp attribute - there's a lot of it, because it embeds the full PS library code for all the different barcode types it supports. So it's probably best to write it out to a file:
b = el.barcode('qr', 'slamacow')
with open('code.eps') as outfile:
outfile.write(b.fp.getvalue()) # fp is a StringIO instance
In the file you'll see something like this:
%!PS-Adobe-2.0
%%Pages: (attend)
%%Creator: Elaphe powered by barcode.ps
%%BoundingBox: 0 0 42 42
%%LanguageLevel: 2
%%EndComments
% --BEGIN RESOURCE preamble--
... A whole lot of included library ...
% --END ENCODER hibccodablockf--
gsave
0 0 moveto
1.000000 1.000000 scale
<74686973206973206d792064617461>
<>
/qrcode /uk.co.terryburton.bwipp findresource exec
grestore
showpage
If you want to see how PIL or pillow runs Ghostscript so you can try it yourself at the commandline, the key part from the PIL/pillow code is this (from site-packages/PIL/EpsImagePlugin.py, line 84):
# Build ghostscript command
command = ["gs",
"-q", # quiet mode
"-g%dx%d" % size, # set output geometry (pixels)
"-r%d" % (72*scale), # set input DPI (dots per inch)
"-dNOPAUSE -dSAFER", # don't pause between pages, safe mode
"-sDEVICE=ppmraw", # ppm driver
"-sOutputFile=%s" % outfile, # output file
"-c", "%d %d translate" % (-bbox[0], -bbox[1]),
# adjust for image origin
"-f", infile, # input file
]
But on Windows the gs command will be replaced with the path to the executable.

Categories

Resources