Importing data from Excel to REDCap using API by Python

Importing data from Excel to REDCap using API by Python - python

I am new in the Python world but have a reasonable understanding at a basic level. I would appreciate it if someone could share a guide on how to import data from Excel to REDCap using API Python. The data which I have are medical-related like patient name, age, comorbidities, ... etc.

Approach this in two steps.
Use something like pandas.read_excel()` to get the data under control of Python.
Then use the PyCap package's import_records() to write the records to the REDCap server.
(In future SO posts, include more details so the code can be more tailored. I know it's tricky when PHI is involved and a fake dataset must be used on SO.)

Related

Easiest way to use database between two software

fellows! I hope you are doing fine these days.
Currently I am working on a project at work, whose final goal is to create an interface between 2 pieces of software (I cannot mention which software, as it is a research project). The steps are as follows:
There are 2 software which provide data about processes.
The goal is: when the data changes in the first software, the data corresponding to the same process should also change in the second software.
The data can be exported as Excel files or I can obtain it using Python (with scripts created by us). The leaders of the project proposed us to create a data base where we will store the data and compare the data from the two software in order to change it.
My question is: how it is the easiest way to do it and which database software is more suitable? Currently, I am used to work with Python, but I do not mind inserting the Excel files directly in the database.
Thank you in advance! I wish you a nice day, folks!

Import data from excel spreadsheet to django model

I'm building a website that'll have a django backend. I want to be able to serve the medical billing data from a database that django will have access to. However, all of the data we receive is in excel spreadsheets. So I've been looking for a way to get the data from a spreadsheet, and then import it into a django model. I know there are some different django packages that can do this, but I'm having a hard time understanding how to use these packages. On top of that I'm using python 3 for this project. I've used win32com for automation stuff in excel in the past. I could write a function that could grab the data from the spreadsheet. Though what I want figure out is how would I write the data to a django model? Any advice is appreciated.

Use http://www.python-excel.org/ and consider this process:
Make a view where user can upload the xls file.
Open the file with xlrd. xlrd.open_workbook(filename)
Extract, create dict to map the data you want to sync in db.
Use the models to add, update or delete the information.
If you follow the process, you can learn a lot of how loading and extracting works and how does it fits with the requirements. I recommend to you first do the step 2 and 3 in shell to get more quicker experiments and avoid to be uploading/testing/error with a django view.
Hope this kickoff base works for you.

Why don't you use django-import-export?
It's a widget that allows you to import excel files from admin section.
It's very easy to install, here you find the installation tutorial, and here an example.

Excel spreadsheets are saved as .csv files, and there are plenty of examples and explanations on how to work with them, such as here and here, online already.
In general, if you are having difficulty understanding documentation or packages, my advice would be to search for specific examples or see if whatever you are trying to do has already been done. Play with it to get a working understanding, and then modify it to fit your needs.

Automating IBM SPSS Data Collection survey export?

I'm so sorry for the vague question here, but I'm hoping an SPSS expert will be able to help me out here. We have some surveys that are done via SPSS, from which we extract data for an internal report. Right now the process is very cumbersome and requires going to the SPSS Data Collection Interviewer Server Administration page and manually exporting data from two different projects (which takes hours at a time!). We then take that data, massage it, and upload it to another database that drives the internal report.
My question is, does anyone out there know how to automate this process? Is there a SQL Server database behind the SPSS data? Where does the .mdd file come in to play? Can my team (who is well-versed in extracting data from various sources) tap into the SQL Server database behind SPSS to get our data? Or do we need some sort of Python script and plugin?
If I'm missing information that would be helpful in answering the question, please let me know. I'm happy to provide it; I just don't know what to provide.
Thanks so much.

As mentioned by other contributors, there are a few ways to achieve this. The simplest I can suggest is using the DMS (data management script) and windows scheduler. Ideally you should follow below steps.
Prerequisite:
1. You should have access to the server running IBM Data collection
2. Basic knowledge of windows task scheduler
3. Knowledge of DMS scripting
Approach:
1. Create a new DMS script from the template
2. If you want to perform only data extract / transformation, you only need input and output data source
3. In the input data source, create/build the connection string pointing to your survey on IBM Data collection server. Use the data source as SQL
4. In the select query: use "Select * from VDATA" if you want to export all variables
5. Set the output data connection string by selecting the output data format as SPSS (if you want to export it in SPSS)
6. run the script manually and see if the SPSS export is what is expected
7. Create batch file using text editor (save with .bat extension). Add below lines
cd "C:\Program Files\IBM\SPSS\DataCollection\6\DDL\Scripts\Data Management\DMS"
Call DMSRun YOURDMSFILENAME.dms
Then add a line to copy (using XCOPY) the data / files extracted to the location where you want to further process it.
Save the file and open windows scheduler to schedule the execution of this batch file for data extraction.
If you want to do any further processing, you create an mrs or dms file and add to the batch file.
Hope this helps!

There are a number of different ways you can accomplish easing this task and even automate it completely. However, if you are not an IBM SPSS Data Collection expert and don't have access to somebody who is or have the time to become one, I'd suggest getting in touch with some of the consultants who offer services on the platform. Internally IBM doesn't have many skilled SPSS resources available, so they rely heavily on external partners to do services on a lot of their products. This goes for IBM SPSS Data Collection in particular, but is also largely true for SPSS Statistics.
As noted by previous contributors there is an approach using Python for data cleaning, merging and other transformations and then loading that output into your report database. For maintenance reasons I'd probably not suggest this approach. Though you are most likely able to automate the export of data from SPSS Data Collection to a sav file with a simple SPSS Syntax (and an SPSS add-on data component), it is extremely error prone when upgrading either SPSS Statistics or SPSS Data Collection.
From a best practice standpoint, you ought to use the SPSS Data Collection Data Management module. It is very flexible and hardly requires any maintenance on upgrades, because you are working within the same data model framework (e.g. survey metadata, survey versions, labels etc. is handled implicitly) right until you load your transformed data into your reporting database.
Ideally the approach would be to build the mentioned SPSS Data Collection Data Management script and trigger it at the end of each completed interview. In this way your reporting will be close to real-time (you can make it actual real-time by triggering the DM script during the interview using the interview script events - just a FYI).
All scripting on the SPSS Data Collection platform including Data Management scripting is very VB-like, so for most people knowing VB, it is very easy to get started and it is documented very well in the SPSS Data Collection DDL. There you'll also be able to find examples of extracting survey data from SPSS Data Collection surveys (as well as reading and writing data to/from other databases, files etc.). There are also many examples of data manipulation and transformation.
Lastly, to answer your specific questions:
Yes, there is always an MS SQL Server behind SPSS Data Collection -
no exceptions. However, generally speaking the data model is way to
complex to read out data directly from it. If you have a look in it,
you'll quickly realize this.
The MDD file (short for Meta Data Document) is containing all survey meta
data including data source specifications, version history etc.
Without it you'll not be able to make anything of the survey data in
the database, which is the main reason I'd suggest to stay within the
SPSS Data Collection platform for as large part of your data handling
as possible. However, it is indeed just a readable XML file.
Note that the SPSS Data Collection Data Management Module requires a separate license and if the scripting needed is large or complex, you'd probably want base professional too, if that's not what you already use for developing the questionnaires and handling the surveys.
Hope that helps.

This isn't as clean as working directly with whatever database is holding the data, but you could do something with an exported data set:
There may or may not be a way for you to write and run an export script from inside your Admin panel or whatever. If not, you could write a simple Python script using Selenium WebDriver which logs into your admin panel and exports all data to a *.sav data file.
Then you can use the Python SPSS extensions to write your analysis scripts. Note that these scripts have to run on a machine that has a copy of SPSS installed.
Once you have your data and analysis results accessible to Python, you should be able to easily write that to your other database.

Anyone familiar with data format of Comfirmit?

I recently asked about accessing data from SPSS and got some absolutely wonderful help here. I now have an almost identical need to read data from a Confirmit data file. Not finding a ton of confirmit data file format on the web. It appears that Confirmit can export to SPSS *.sav files. This might be one avenue for me. Here's the exact needs:
I need to be able to extract two different but related types of info from a market research study done using ConfirmIt:
I need to be able to discover the data "schema", as in what questions are being asked (the text of the questions) and what the type of the answer is (multiple choice, yes/no, text) and what text labels are associated with each answer.
I need to be able to read respondents answers and populate my data model. So for each of the questions discovered as part of step 1 above, I need to build a table of respondent answers.
With SPSS this was easy thanks to a data access module available freely available by IBM and a nice Python wrapper by Albert-Jan Roskam. Googling I'm not finding much info. Any insight into this is helpful. Something like a Python or Java class to read the confirmit data would be perfect!
Assuming my best option ends up being to export to SPSS *.sav file, does anyone know if it will meet both of my use cases above (contain the questions, answers schema and also contain each participant's results)?

You can get the data schema from Excel definition export from Confirmit
You can export from Confirmit txt file with the same template

I was recently given a data set from confirmit. There are almost 4000 columns in the excel file. I want to enter it into a mysql db. There is not way they are just doing that output from one table. Do you know how the table schema works for confirmit?

ETL using Python

I am working on a data warehouse and looking for an ETL solution that uses Python.
I have played with SnapLogic as an ETL, but I was wondering if there were any other solutions out there.
This data warehouse is just getting started. Ihave not brought any data over yet. It will easily be over 100 gigs with the initial subset of data I want to load into it.

Yes. Just write Python using a DB-API interface to your database.
Most ETL programs provide fancy "high-level languages" or drag-and-drop GUI's that don't help much.
Python is just as expressive and just as easy to work with.
Eschew obfuscation. Just use plain-old Python.
We do it every day and we're very, very pleased with the results. It's simple, clear and effective.

You can use pyodbc a library python provides to extract data from various Database Sources. And than use pandas dataframes to manipulate and clean the data as per the organizational needs. And than pyodbc to load it to your data warehouse.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.