Automating IBM SPSS Data Collection survey export? - python

I'm so sorry for the vague question here, but I'm hoping an SPSS expert will be able to help me out here. We have some surveys that are done via SPSS, from which we extract data for an internal report. Right now the process is very cumbersome and requires going to the SPSS Data Collection Interviewer Server Administration page and manually exporting data from two different projects (which takes hours at a time!). We then take that data, massage it, and upload it to another database that drives the internal report.
My question is, does anyone out there know how to automate this process? Is there a SQL Server database behind the SPSS data? Where does the .mdd file come in to play? Can my team (who is well-versed in extracting data from various sources) tap into the SQL Server database behind SPSS to get our data? Or do we need some sort of Python script and plugin?
If I'm missing information that would be helpful in answering the question, please let me know. I'm happy to provide it; I just don't know what to provide.
Thanks so much.

As mentioned by other contributors, there are a few ways to achieve this. The simplest I can suggest is using the DMS (data management script) and windows scheduler. Ideally you should follow below steps.
Prerequisite:
1. You should have access to the server running IBM Data collection
2. Basic knowledge of windows task scheduler
3. Knowledge of DMS scripting
Approach:
1. Create a new DMS script from the template
2. If you want to perform only data extract / transformation, you only need input and output data source
3. In the input data source, create/build the connection string pointing to your survey on IBM Data collection server. Use the data source as SQL
4. In the select query: use "Select * from VDATA" if you want to export all variables
5. Set the output data connection string by selecting the output data format as SPSS (if you want to export it in SPSS)
6. run the script manually and see if the SPSS export is what is expected
7. Create batch file using text editor (save with .bat extension). Add below lines
cd "C:\Program Files\IBM\SPSS\DataCollection\6\DDL\Scripts\Data Management\DMS"
Call DMSRun YOURDMSFILENAME.dms
Then add a line to copy (using XCOPY) the data / files extracted to the location where you want to further process it.
Save the file and open windows scheduler to schedule the execution of this batch file for data extraction.
If you want to do any further processing, you create an mrs or dms file and add to the batch file.
Hope this helps!

There are a number of different ways you can accomplish easing this task and even automate it completely. However, if you are not an IBM SPSS Data Collection expert and don't have access to somebody who is or have the time to become one, I'd suggest getting in touch with some of the consultants who offer services on the platform. Internally IBM doesn't have many skilled SPSS resources available, so they rely heavily on external partners to do services on a lot of their products. This goes for IBM SPSS Data Collection in particular, but is also largely true for SPSS Statistics.
As noted by previous contributors there is an approach using Python for data cleaning, merging and other transformations and then loading that output into your report database. For maintenance reasons I'd probably not suggest this approach. Though you are most likely able to automate the export of data from SPSS Data Collection to a sav file with a simple SPSS Syntax (and an SPSS add-on data component), it is extremely error prone when upgrading either SPSS Statistics or SPSS Data Collection.
From a best practice standpoint, you ought to use the SPSS Data Collection Data Management module. It is very flexible and hardly requires any maintenance on upgrades, because you are working within the same data model framework (e.g. survey metadata, survey versions, labels etc. is handled implicitly) right until you load your transformed data into your reporting database.
Ideally the approach would be to build the mentioned SPSS Data Collection Data Management script and trigger it at the end of each completed interview. In this way your reporting will be close to real-time (you can make it actual real-time by triggering the DM script during the interview using the interview script events - just a FYI).
All scripting on the SPSS Data Collection platform including Data Management scripting is very VB-like, so for most people knowing VB, it is very easy to get started and it is documented very well in the SPSS Data Collection DDL. There you'll also be able to find examples of extracting survey data from SPSS Data Collection surveys (as well as reading and writing data to/from other databases, files etc.). There are also many examples of data manipulation and transformation.
Lastly, to answer your specific questions:
Yes, there is always an MS SQL Server behind SPSS Data Collection -
no exceptions. However, generally speaking the data model is way to
complex to read out data directly from it. If you have a look in it,
you'll quickly realize this.
The MDD file (short for Meta Data Document) is containing all survey meta
data including data source specifications, version history etc.
Without it you'll not be able to make anything of the survey data in
the database, which is the main reason I'd suggest to stay within the
SPSS Data Collection platform for as large part of your data handling
as possible. However, it is indeed just a readable XML file.
Note that the SPSS Data Collection Data Management Module requires a separate license and if the scripting needed is large or complex, you'd probably want base professional too, if that's not what you already use for developing the questionnaires and handling the surveys.
Hope that helps.

This isn't as clean as working directly with whatever database is holding the data, but you could do something with an exported data set:
There may or may not be a way for you to write and run an export script from inside your Admin panel or whatever. If not, you could write a simple Python script using Selenium WebDriver which logs into your admin panel and exports all data to a *.sav data file.
Then you can use the Python SPSS extensions to write your analysis scripts. Note that these scripts have to run on a machine that has a copy of SPSS installed.
Once you have your data and analysis results accessible to Python, you should be able to easily write that to your other database.

Related

Easiest way to use database between two software

fellows! I hope you are doing fine these days.
Currently I am working on a project at work, whose final goal is to create an interface between 2 pieces of software (I cannot mention which software, as it is a research project). The steps are as follows:
There are 2 software which provide data about processes.
The goal is: when the data changes in the first software, the data corresponding to the same process should also change in the second software.
The data can be exported as Excel files or I can obtain it using Python (with scripts created by us). The leaders of the project proposed us to create a data base where we will store the data and compare the data from the two software in order to change it.
My question is: how it is the easiest way to do it and which database software is more suitable? Currently, I am used to work with Python, but I do not mind inserting the Excel files directly in the database.
Thank you in advance! I wish you a nice day, folks!

How to extract serialised data from Access

I want to enable a user to export some data to a web application I am building. The data from the legacy application can be accessed through MS Acces (ODBC). The web application is written in Django/Python, but that is not very relevant.
The user would have to export data from time to time and import it into the web app. The table structure in the web app more-or-less mirrors the one in the legacy application.
My question of how to get the data from Access to a format that is easily parseable in the web app. The data is from 5 different tables and interrelated. Is there a way to serialise the data from Access into an XML / JSON file? I know that you can do an XML export, but as far as I know that is limited to a query, so I wouldn't have the hierarchy... Is there a VBA library to help with the task?
You can reference Microsoft XML, v5.0 (or whatever version) in the Visual Basic Editor and create XML programmatically.
See
- Simple example
- Introduction to XML in Microsoft Windows (in depth example)
Answering my own question here. I did some googling and it looks like you can export data from a table together with selected other tables. For that, it is necessary to draw the relationships within Access.
That might also solve my problem (and without composing the XML manually). Will find out if this works and check back later.
source: http://msdn.microsoft.com/en-us/library/office/aa167823(v=office.11).aspx#odc_accessnewxmlfeatures_includingrelatedtableswhenexportingxml

Write Microsoft Word Doc with MySQL data

I'm programming a MySQL database with Web interface for remote access. I used Django as a framework. But now, I want to generate some reports using the MySQL data and modify them after generating. Therefore, I automatically think of exporting data to or importing from Word. The thing is, how I do this?
I have seen several options. One of them, using Python-docx, a library to generate docx documents in Python. I could have a problem with this, because the generated reports will be large, with lots of images, tables, pages, etc. I worked with xlsxwriter, and when the files were large it took long time to generate de xlsx. I don't know if Python-docx would be the better solution.
Other option is to import data directly from Microsoft Word, using some software for this concrete purpose or using a macro VBA. I have programmed some example code with VBA to import data of MySQL using connectors ODBC and it's immediately possible, but there is thousand of objects and classes of VBA Word to learn.
Exposed the problem, any tips or suggestions??? Thanks in advance!
Another option is to generate HTML & open as a word document.
If you take a document similar to what you want to generate & save as HTML you will see what word does. Take this file as a template for your documents

GUI for minimalistic XML database

I would like to create a very simple database describing the projects that I have worked on in the past, with a small number of text and numerical fields (volume, name of client, date, etc...).
The solution that comes to mind for this would be an XML format (with individual files for each project, or a single file containing everything).
I could of course program this with Python (I know that there are simple commands for this) but this application seems to obvious that there must already be simple GUIs doing exactly this.
So my question is: are there simple GUI programs (running in Linux) that enable the creation and management of such extremely simple XML databases ?
Thank you !
If you store your data in a MySQL database, then phpmyadmin can export that data to many formats, including XML.

ETL using Python

I am working on a data warehouse and looking for an ETL solution that uses Python.
I have played with SnapLogic as an ETL, but I was wondering if there were any other solutions out there.
This data warehouse is just getting started. Ihave not brought any data over yet. It will easily be over 100 gigs with the initial subset of data I want to load into it.
Yes. Just write Python using a DB-API interface to your database.
Most ETL programs provide fancy "high-level languages" or drag-and-drop GUI's that don't help much.
Python is just as expressive and just as easy to work with.
Eschew obfuscation. Just use plain-old Python.
We do it every day and we're very, very pleased with the results. It's simple, clear and effective.
You can use pyodbc a library python provides to extract data from various Database Sources. And than use pandas dataframes to manipulate and clean the data as per the organizational needs. And than pyodbc to load it to your data warehouse.

Categories

Resources