Importing and processing a spreadsheet on Google App Engine

Importing and processing a spreadsheet on Google App Engine - python

I have a spreadsheet full of information and one time only I need to put this data into the Datastore by reading the rows and creating model entities out of all of the data. Each row is an entity and each column is a different property.
I am a little confused how exactly I can put the data into a form that is process-able by GAE and then what I should use to process the spreadsheet in python. I can easily move my data, which is currently in Excel, to Google Docs if that makes things easier but I am still not sure what to do from there.

An easy way is to publish the spreadsheet. See this blogpost: http://blog.pamelafox.org/2010/08/importing-data-from-spreadsheets-to-app.html
Another method is to download the the spreadsheet as a CSV and upload this CSV with the app engine bulkloader.

Related

What is the correct way to upload files to a SQL Server inside my web application?

I am developing a web application in which users can upload excel files. I know I can use the OPENROWSET function to read data from excel into a SQL Server but I am refraining from doing so because this function requires a file path.
It seems kind of indirect as I am uploading a file to a directory and then telling SQL Server go look in that directory for the file instead of just giving SQL Server the file.
The other option would be to read the Excel file into a pandas dataframe and then use the to_sql function but pandas read_excel function is quite slow and the other method I am sure would be faster.
Which of these two methods is "correct" when handling file uploads from a web application?
If the first method is not frowned upon or "incorrect", then I am almost certain it is faster and will use that. I just want an experienced developers thoughts or opinions. The webapp's backend is Python and flask.

If I am understanding your question correctly, you are trying to load the contents of an xls(s) file into a SQLServer database. This is actually not trivial to do, as depending on what is in the Excel file you might want to have one table, or more probably multiple tables based on the data. So I would step back for a bit and ask three questions:
What is the data I need to save and how should that data be structured in my SQL tables. Forget about excel at this point -- maybe just examine the first row of data and see how you need to save it.
How do I get the file into my web application? For example, when the user uploads a file you would want to use a POST form and send the file data to your server and your server to save that file (for example, either on S3, or in a /tmp folder, or into memory for temporary processing).
Now that you know what your input is (the xls(x) file and its location) and how you need to save your data (the sql schema), now it's time to decide what the best tool for the job is. Pandas is probably not going to be a good tool, unless you literally just want to load the file and dump it as-is with minimal (if any) changes to a single table. At this point I would suggest using something like xlrd if only xls files, or openpyxl for xls and xlsx files. This way you can shape your data any way you want. For example, if the user enters in malformed dates; empty cells (should they default to something?); mismatched types, etc.
In other words, the task you're describing is not trivial at all. It will take quite a bit of planning and designing, and then quite a good deal of python code once you have your design decided. Feel free to ask more questions here for more specific questions if you need to (for example, how to capture the POST data in a file update or whatever you need help with).

Import data from excel spreadsheet to django model

I'm building a website that'll have a django backend. I want to be able to serve the medical billing data from a database that django will have access to. However, all of the data we receive is in excel spreadsheets. So I've been looking for a way to get the data from a spreadsheet, and then import it into a django model. I know there are some different django packages that can do this, but I'm having a hard time understanding how to use these packages. On top of that I'm using python 3 for this project. I've used win32com for automation stuff in excel in the past. I could write a function that could grab the data from the spreadsheet. Though what I want figure out is how would I write the data to a django model? Any advice is appreciated.

Use http://www.python-excel.org/ and consider this process:
Make a view where user can upload the xls file.
Open the file with xlrd. xlrd.open_workbook(filename)
Extract, create dict to map the data you want to sync in db.
Use the models to add, update or delete the information.
If you follow the process, you can learn a lot of how loading and extracting works and how does it fits with the requirements. I recommend to you first do the step 2 and 3 in shell to get more quicker experiments and avoid to be uploading/testing/error with a django view.
Hope this kickoff base works for you.

Why don't you use django-import-export?
It's a widget that allows you to import excel files from admin section.
It's very easy to install, here you find the installation tutorial, and here an example.

Excel spreadsheets are saved as .csv files, and there are plenty of examples and explanations on how to work with them, such as here and here, online already.
In general, if you are having difficulty understanding documentation or packages, my advice would be to search for specific examples or see if whatever you are trying to do has already been done. Play with it to get a working understanding, and then modify it to fit your needs.

Writing to a Google Document With Python

I have some data that I want to write to a simple multi-column table in Google Docs. Is this way too cumbersome to even begin attempting? I would just render it in XHTML, but my client has a very specific work flow set up on Google Docs that she doesn't want me to interfere with. The Google Docs API seems more geared toward updating metadata and making simple changes to the file format than it is toward undertaking the task of writing entire documents using only raw data and a few formatting rules. Am I missing something, or are there any other libraries that might be able to achieve this?

From the docs:
What can this API do?
The Google Documents List API allows developers to create, retrieve,
update, and delete Google Docs (including but not limited to text
documents, spreadsheets, presentations, and drawings), files, and
collections.
So maybe you could save your data to an excel worksheet temporarily and then upload and convert this.
Saving to excel can be done with xlwt:
import xlwt
workbook = xlwt.Workbook()
sheet = workbook.add_sheet('Sheet 1')
sheet.write(0,1,'First text')
workbook.save('test.xls')

Google Spreadsheets API Column Titles

I am using the Google Spreadsheets API in python, and I can't get the column headers for the ListView of a worksheet.
Does anyone know how to get those titles? I can only find the underscore preceded hashes.

Take a look at a this library I created, its a simple wrapper to the spreadsheet API designed to make cases like the one you are facing much simpler to deal with.

Batch inserts into spreadsheet using python-gdata

I'm trying to use python-gdata to populate a worksheet in a spreadsheet. The problem is, updating individual cells is woefully slow. (By doing them one at a time, each request takes about 500ms!) Thus, I'm attempting to use the batch mechanism built into gdata to speed things up.
The problem is, I can't seem to insert new cells. I've scoured the web for examples, but I couldn't find any. This is my code, which I've adapted from an example in the documentation. (The documentation does not actually say how to insert cells, but it does show how to update cells. Since this is a new worksheet, it has no cells.)
Furthermore, with debugging enabled I can see that my requests returns HTTP 200 OK.
import time
import gdata.spreadsheet
import gdata.spreadsheet.service
import gdata.spreadsheets.data
email = '<snip>'
password = '<snip>'
spreadsheet_key = '<snip>'
worksheet_id = 'od6'
spr_client = gdata.spreadsheet.service.SpreadsheetsService()
spr_client.email = email
spr_client.password = password
spr_client.source = 'Example Spreadsheet Writing Application'
spr_client.ProgrammaticLogin()
# create a cells feed and batch request
cells = spr_client.GetCellsFeed(spreadsheet_key, worksheet_id)
batchRequest = gdata.spreadsheet.SpreadsheetsCellsFeed()
# create a cell entry
cell_entry = gdata.spreadsheet.SpreadsheetsCell()
cell_entry.cell = gdata.spreadsheet.Cell(inputValue="foo", text="bar", row='1', col='1')
# add the cell entry to the batch request
batchRequest.AddInsert(cell_entry)
# submit the batch request
updated = spr_client.ExecuteBatch(batchRequest, cells.GetBatchLink().href)
My hunch is that I'm simply misunderstanding the API, and that this should work with changes. Any help is much appreciated.

I recently ran across this as well (when trying to delete) but per the docs here it doesn't appear that batch insert or delete operations are supported:
A number of batch operations can be combined into a single request.
The two types of batch operations supported are query and update.
insert and delete are not supported because the cells feed cannot be
used to insert or delete cells. Remember that the worksheets feed must
be used to do that.
I'm not sure of your use case, but would using the ListFeed help at all? It still won't let you batch operations, so there will be the associated latency, but it may be more tolerable than what you're dealing with now (or were at the time).

As of Google I/O 2016, the latest Google Sheets API supports batch cell updates (and reads). Be aware however, that GData is now deprecated, along with most GData-based APIs, including your sample above as the new API is not GData. Also putting email addresses and passwords in plain text in code is a security risk, so new(er) Google APIs use OAuth2 for authorization. You need to get the latest Google APIs Client Library for Python. It's as easy as pip install -U google-api-python-client [or pip3 for Python 3].
As far as batch insert goes, here's a simple code sample. Assume you have multiple rows of data in rows. To mass-inject this into a Sheet, say with file ID SHEET_ID & starting at the upper-left in cell A1, you'd make one call like this:
SHEETS.spreadsheets().values().update(spreadsheetId=SHEET_ID, range='A1',
body={'values': rows}, valueInputOption='RAW').execute()
If you want a longer example, see the first video below where those rows are read out of a relational database. For those new to this API, here's one code sample from the official docs to help get you kickstarted. For slightly longer, more "real-world" examples, see these videos & blog posts:
Migrating SQL data to a Sheet plus code deep dive post
Formatting text using the Sheets API plus code deep dive post
Generating slides from spreadsheet data plus code deep dive post
The latest Sheets API provides features not available in older releases, namely giving developers programmatic document-oriented access to a Sheet as if you were using the user interface (create frozen rows, perform cell formatting, resizing rows/columns, adding pivot tables, creating charts, etc.)
However, to perform file-level access on Sheets, such as import/export, copy, move, rename, etc., you'd use the Google Drive API. Examples of using the Drive API:
Exporting a Google Sheet as CSV (blogpost)
"Poor man's plain text to PDF" converter (blogpost) (*)
(*) - TL;DR: upload plain text file to Drive, import/convert to Google Docs format, then export that Doc as PDF. Post above uses Drive API v2; this follow-up post describes migrating it to Drive API v3, and here's a developer video combining both "poor man's converter" posts.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.