I have spent a few days orientating myself to the spss and spssaux modules - which are great resources. Though I feel like I am missing some conceptual understanding because I can do basic things like retrieve value labels via spssaux.getValueLabels or spss.DataStep()
print spssaux.getValueLabels(2)
>>> {u'1': u'Neutral', u'0': u'Disagree', u'2': u'Agree'}
or
dataset = spssDataset()
variable_list = dataset.varList
print variable_list[2].valueLabels.data
>>> {0.0: u'Disagree', 1.0: u'Neutral', 2.0: u'Agree'}
However, I'm struggling to figure out how to retrieve the actual data values.
I'm also having trouble figuring out how to retrieve the values from analyses and to use them in Python. At the moment I have been running analyses using spss.Submit(), but I suspect this is limited in terms of feeding values back to Python (i.e., feeding back means and significance values to Python, which can be then used in Python to make decisions).
If you have any suggestions for ideas, please note that I need to be operating within the Python environment as this data retrieval/analyses is incorporated into a broader Python program.
Thanks!
The spss.Cursor class is a low level class that is rather hard to use. The spssdata.Spssdata class provides a much friendlier interface. You can also use the spss.Dataset class, which was modeled after Spssdata and has additional capabilities but is slower.
For retrieving Viewer output, the basic workhorse is OMS writing to the xml workspace or to new datasets. You can use some functions in the spssaux module that wrap this. createDatasetOuput simplifies creating datasets from tables. createXmlOutput and the companion getValuesFromXmlWorkspace use the xml workspace. Underneath the latter, the spss.EvaluateXPath api lets you pluck whatever piece of output you want from a table.
Also, if you are basically living in a Python world, have you discovered external mode? This lets you run Statistics from an external Python program. You can use your Python IDE to work interactively in the Python code and debug. You just import the spss module and whatever else you need and use the provided apis as needed. In external mode, however, there is no Viewer, so you can't use the SpssClient module apis.
See the spss.Cursor class in the Python reference guide for SPSS. It is hard to give general advice about your workflow, but if you are producing stats in SPSS files you can then grab them for use in Python programs. Here is one example:
*Make some fake data.
DATA LIST FREE / ID X.
BEGIN DATA
1 5
2 6
3 7
END DATA.
DATASET NAME Orig.
BEGIN PROGRAM Python.
import spss, spssdata
alldata = spssdata.Spssdata().fetchall()
print alldata
#this just grabs all of the data
END PROGRAM.
*Make your mean in SPSS syntax.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK
/MeanX = MEAN(X).
BEGIN PROGRAM Python.
var = ["MeanX"]
alldata2 = spssdata.Spssdata(var).fetchone()
print alldata2
#This just grabs the mean of the variable you created
END PROGRAM.
Related
I have a SSIS package that will import an excel file. I want to use a python script to run through all the column headings and replace any white spaces with a '_'.
Previously when doing this for a pandas dataframe, I'd use:
df.columns = [w.replace(' ','_') for w in list(df.columns)]
However I don't know how to reference the column headers from python. I understanding I use a 'Execute Process Task' and how to implement that into SSIS, however how can I refer to a dataset contained within the SSIS package from Python?
Your dataset won't be in SSIS. The only data that is "in" SSIS are row buffers in a Data Flow Task. There you define a source, destination and any transformation that takes place per row.
If you're going to execute a python script, the end result is that you've expressed the original Excel file in some other format. Maybe you rewrote it as a CSV, maybe you wrote it to a table, perhaps it's just written back as a new Excel file but with no whitespace in the column names.
There is no native Data Flow source that will allow you to use python directly. There is a Script Component which allows you to run anything and there is IronPython which would allows you to run IronPython in SSIS but that's not going to work for a Data Flow Task. A Data Flow Task is metadata dependent at run time. That is, before the package runs, the engine will interrogate the source and destination elements to ensure they exist, the data type of the columns is the same or bigger than the data type described in the contract that was built during design time.
In simple terms, you can't dynamically change out the shape of the data in a Data Flow Task. If you need a generic dynamic data importer, then you're writing all the logic yourself. You can still use SSIS as the execution framework as it has nice logging, management, etc but your SSIS package is going to be a mostly .NET project.
So, with all of that said, I think the next challenge you'll run into if you try to use IronPython with Pandas is that they don't work together. At least, not well enough that the expressed desire "a column rename" is worth the effort and maintenance headache you'd have.
There is an option to execute sp_execute_external_script with python script in a Data Flow and use it as a source. You can also save it to CSV or excel file and read it in SSIS.
I have a C++ based application logging data to files and I want to load that data in Python so I can explore it. The data files are flat files with a known number of records per file. The data records are represented as a struct (nested structs) in my C++ application. This struct (subtructs) change regularly during my development process, so I also have to make associated changes to my Python code that loads the data. This is obviously tedious and doesn't scale well. What I am interested in is a way to automate the process of updating the Python code (or some other way to handle this problem altogether). I am exploring some libraries that convert my C++ structs to other formats such as JSON, but I have yet to find a solid solution. Can anyone suggest something?
Consider using data serialization system / format that has C++ and Python bindings: https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats
(e.g. protobuf or even json or csv)
Alternatively consider writing a library in C that reads the data end exposes them as structures. Then use: https://docs.python.org/3.7/library/ctypes.html to call this C library and retrieve records
Of course if semantics of the data changes (e.g. new important field needs to by analyzed) you will have to handle that new stuff in the python code. No free lunch.
I am wondering how can I convert Stata code into Python code.
For example, my Stata code looks like
if ("`var1'"=="") {
local l_QS "select distinct CountryName from `l_tableName'"
ODBCLoad, exec("`l_QS'") dsn("`dsn'") clear
}
And I want to convert it to Python code such as
if (f"{var1}"=="") :
l_QS = f"select distinct CountryName from {l_tableName}"
SQL_read(f"{l_QS}", dsn = f"{dsn}")
I am new to coding so I don't know what branch of computer science knowledge or what tools/techniques are relevant. I suppose knowledge about compilers and/or using regular expressions may help so I put those tags on my question. Any high-level pointers are appreciated, and specific code examples would be even better. Thanks in advance.
A very simple workaround would be to use the subprocess module included with python and write a basic command line wrapper to your scripts to use their functionality, then build your code from now on in python.
You could also look into possible API functionality in Stata if you have a whole lot of Stata code and it would take forever to convert it manually to python. This would require you to have access to a server and could be potentially costly, but would be cleaner than the subprocess module and wouldn't require the source code to be contained on your local machine. Also note that it's possible that Stata does not have tools to build an API.
As far as I am aware there are no projects that will directly parse a file from any language and convert it into python. This would be a huge project, although maybe with machine learning or AI it would be possible, though still very difficult. There are libraries for wrapping code in C and C++ (others too I'm sure I just know that these are available), but I can't find anything for Stata.
I've got a kind of weird question--but would be immensely useful if it is possible--in Maya using Python, can I take in several points of user input and have Python create a separate script for me? In this instance, I want to take in controller and locator names and have Python spit out a complete IKFK match script also in Python (it's really just a lot of getAttr and setAttr commands, although with 6 if statements per limb for PV matching.) The only other wrinkle there is that it has to be able to prefix hierarchy names in the script if necessary if the rig is imported into a new scene rather than just opened. There's an expression component to my switches that it would be nice if Python could make for me, too.
Is this possible or am I crazy?
That's no problem. Just write a textfile with a .py extension into a path where maya can find it. Then you have to import it somewhere. Creating expressions is not a problem either.
Maybe it could make sense to think about the approach you choose. Imagine you have written a dozen of these new python files and you discover a problem in the script, you will have to redo it. I'd try to collect all data and only write the required informations into a textfile e.g. in json format. Then you can read the data and rebuild your skeletons.
I have two .csv files that I want to use LibreOffice's compare documents tool (Edit>Compare Document) with.
These csv files are made after the run of a long and involved script, and it would be nice to be able to have the compare process to be automatic as well, with the result being a window of LibreOffice open with the changes as if I selected compare manually. I want the specific LibreOffice gui (which I believe does a great job highlighting differences) not just a diff.
Looking online, it seems like there is nice but limited set of python wrappers for libre office (pyoo).
However, despite related questions, I couldn't see any way of gaining access to the compare functionality through this or any other library. Is the Compare Documents functionality available at the python level, the UNO API level, or simply not available at all?
Use the dispatcher:
Dispatcher.executeDispatch(
(XDispatchProvider)Frame, ".uno:CompareDocuments", "", 0, propertyValueFile);
A complete Java example is at https://forum.openoffice.org/en/forum/viewtopic.php?f=44&t=2795.