Import csv data to web2py database and process uploads - python

I've made a really simple single-user database application with web2py to be deployed to a desktop machine. The reason I choose web2py is due to its simplicity and its not intrusive web server.
My problem is that I need to migrate an existing database from another application that I've just preprocessed and prepared into a csv file that can be now perfectly imported into web2py's sqlite database.
Now, I have a problem with a 'upload' field in one of the tables, which correspond to a small image, I've formated that field into de the csv, with the name of the corresponding .jpg file that I extrated from the original database. The problem is that I have not managed how to insert these correctly into the upload folder, as the web2py engine automatically changes the filename of the users' uploads to a safe format, and copying my files straight to the folder does not work.
My question is, does anyone know a proper way to include this image collection into the uploads folder?. I don't know if there is a way to disable this protection or if I have to manually change their name to a valid hash. I've also considered the idea of coding an automatic insert process into the database...
Thanks all for you attention!
EDIT (a working example):
An example database:
db.define_table('product',
Field('name'),
Field('color'),
Field('picture', 'upload'),
)
Then using the default appadmin module from my application I import a csv file with entries of the form:
product.name,product.color,product.picture
"p1","red","p1.jpg"
"p2","blue","p2.jpg"
Then in my application I have the usual download function:
def download():
return response.download(request, db)
Which I call requesting the images uploaded into the database, for example, to be included into a view:
<img src="{{=URL('download', args=product.picture)}}" />
So my problem is that I have all the images corresponding the database records and I need to import them into my application, by properly including them into the uploads folder.

If you want the files to be named via the standard web2py file upload mechanism (which is a good idea for security reasons) and easily downloaded via the built-in response.download() method, then you can do something like the following.
In /yourapp/controllers/default.py:
def copy_files():
import os
for row in db().select(db.product.id, db.product.picture):
picture = open(os.path.join(request.folder, 'private', row.picture), 'rb')
row.update_record(picture=db.product.picture.store(picture, row.picture))
return 'Files copied'
Then place all the files in the /yourapp/private directory and go to the URL /default/copy_files (you only need to do this once). This will copy each file into the /uploads directory and rename it, storing the new name in the db.product.picture field.
Note, the above function doesn't have to be a controller action (though if you do it that way, you should remove the function when finished). Instead, it could be a script that you run via the web2py command line (needs to be run in the app environment to have access to the database connection and model, as well as reference to the proper /uploads folder) -- in that case, you would need to call db.commit() at the end (this is not necessary during HTTP requests).
Alternatively, you can leave things as they are and instead (a) manage uploads and downloads manually instead of relying on web2py's built-in mechanisms, or (b) create custom_store and custom_retrieve functions (unfortunately, I don't think these are well documented) for the picture field, which will bypass web2py's built-in store and retrieve functions. Life will probably be easier, though, if you just go through the one-time process described above.

Related

Mapping fields inside other fields

Hello I would like to make an app that allows the user to import data from a source of his choice (Airtable, xls, csv, JSON) and export to a JSON which will be pushed to an Sqlite database using an API.
The "core" of the functionality of the app is that it allows the user to create a "template" and "map" of the source columns inside the destination columns. Which source column(s) go to which destination column is up to the user. I am attaching two photos here (used in airtable/zapier), so you can get a better idea of the end result:
adding fields inside fields - airtableadding fields inside fields - zapier
I would like to know if you can recommend a library or a way to come about this problem? I have tried to look for some python or nodejs libraries, I am lost between using ETL libraries, some recommended using mapping/zipping features, others recommend coding my own classes. Do you know any libraries that allow to do the same thing as airtable/zapier ? Any suggestions ?
Save file on databases is really a bad practice since it takes up a lot of database storage space and would add latency in the communication.
I hardly recommend saving it on disk and store the path on database.

Django custom settings for app

I am currently developing an app for django that needs to have some custom settings that can be changed at runtime by admin users, and those settings have to be accessible to another separate system that uses the same database.
On one hand, we could store those settings in a json file and have it accessible to both systems, as only the django system will actually make any changes to the settings. On the other hand, we could just store those settings as a lone row in on a 'settings' table on the database.
The first choice seems quite cumbersome to deal with, and might result in some problems of multiple accesses, while the other would need a whole table in the database for a single row.
Is any of these ideas any good, or is there something I'm overlooking?
Make a table. Remember, Django makes it really easy. Just create another class for the settings table and populate it. You should be able to access it with just about any other language/system - e.g., PHP. The data gets backed up with everything else in the database, if you move to a different server the data moves along with everything else, etc. Yes, the overhead is technically a little more than a plain text file, but if it is really that small then that overhead is insignificant. If the list of settings grows over time then having it in a searchable database will make updates & retrieval much easier that a text file.

dump CSV file from Django query to Github

We want to automate a process through django admin where, whenever a user makes a change to a record (or adds/deletes a record), a CSV file is created and then dumped into a Github repository with a commit message specified by the person who made the change.
Creating the csv file from a queryset is easy enough... But how would we go about then getting that csv file to a folder that is git initialized so that we can commit it to a repository?
Any ideas would be great. Essentially we're looking for a way of tracking specific changes to the database. With CSV files in github, we can really easily follow the changes, and we want to leverage that.
cheers
If you can create your csv files the next step would be to talk to github via api or to have a local representation of a git repo which needs to be synct after file creation.
But if I may ask why do you want to do this with csv files in a github repo? My first response to a requirement like that would be to logg changes with the python logging infrastructure or to create an additional model to track the specific changes in the db.
Eventually this could also meet your requirements: https://django-simple-history.readthedocs.io/en/latest/
This doesn't exactly answer the question, but have you thought of using something like django-simple-history?
It's a really easy to use Django package that tracks all Django model state on every create/update/delete. Should be much easier to get going than fiddling around pushing CSVs to github.

How to Prevent Overwriting a Existing File Using FTPLIB in Python?

I made a python program which stores the user entered data in a sqlite database. I want to upload it to ftp server. I tried using ftplib in python. Here, database file name is same for all the users who uses the program.
Here, my problem is
If user1 upload the file to ftp server, and next user2 uploads it. The file is overwritten. How can I stop this?
Fairly: How can I stop overwriting a existing file and rename the current(file going to be uploaded) file in ftpserver. So, that I will have those 2 files?
Use ftplib.mlsd() to list the directory before uploading. If the given file is already there, then don't upload it.
Be careful with this -- if two people are uploading at the same time, it's still possible for user A to upload and overwrite user B.
Your question lacks details (among the rest: what version of Python you are using, what OS are you on, and most crucially what the naming scheme for the copied files should be).
I'll assume that each client program uses a SQLite file named "userdata.db" and that you want to make sure that on the FTP server each copy is identified by the user name.
So User:Pamar will have userdata.db.pamar on ftp server, while User:Lucy will end up with userdata.db.lucy
(I hope you have some way to be sure that no two users have the same name, btw).
The easiest solution I can think of is:
Use shutil to make a temporary (local) copy of the db with the desidered name, transfer it by FTP, then delete it.
I.e, in the case of User Pamar you'll have:
Step 1: Copy /userhome/pamar/userdata.db -> /userhome/pamar/userdata.db.pamar
Step 2: FTP transfer /userhome/pamar/userdata.db.pamar -> FTPServer
Step 3: Delete /userhome/pamar/userdata.db.pamar
It's not very elegant, and you will use some extra space on the (local) filesystem until the copy is completed, and this may be a problem if the userdata.db is particularly large (but then you would probably not use sqlite in the first place).

Importing a CSV file into a PostgreSQL DB using Python-Django

Note: Scroll down to the Background section for useful details. Assume the project uses Python-Django and South, in the following illustration.
What's the best way to import the following CSV
"john","doe","savings","personal"
"john","doe","savings","business"
"john","doe","checking","personal"
"john","doe","checking","business"
"jemma","donut","checking","personal"
Into a PostgreSQL database with the related tables Person, Account, and AccountType considering:
Admin users can change the database model and CSV import-representation in real-time via a custom UI
The saved CSV-to-Database table/field mappings are used when regular users import CSV files
So far two approaches have been considered
ETL-API Approach: Providing an ETL API a spreadsheet, my CSV-to-Database table/field mappings, and connection info to the target database. The API would then load the spreadsheet and populate the target database tables. Looking at pygrametl I don't think what i'm aiming for is possible. In fact, i'm not sure any ETL APIs do this.
Row-level Insert Approach: Parsing the CSV-to-Database table/field mappings, parsing the spreadsheet, and generating SQL inserts in "join-order".
I implemented the second approach but am struggling with algorithm defects and code complexity. Is there a python ETL API out there that does what I want? Or an approach that doesn't involve reinventing the wheel?
Background
The company I work at is looking to move hundreds of project-specific design spreadsheets hosted in sharepoint into databases. We're near completing a web application that meets the need by allowing an administrator to define/model a database for each project, store spreadsheets in it, and define the browse experience. At this stage of completion transitioning to a commercial tool isn't an option. Think of the web application as a django-admin alternative, though it isn't, with a DB modeling UI, CSV import/export functionality, customizable browse, and modularized code to address project-specific customizations.
The implemented CSV import interface is cumbersome and buggy so i'm trying to get feedback and find alternate approaches.
How about separating the problem into two separate problems?
Create a Person class which represents a person in the database. This could use Django's ORM, or extend it, or you could do it yourself.
Now you have two issues:
Create a Person instance from a row in the CSV.
Save a Person instance to the database.
Now, instead of just CSV-to-Database, you have CSV-to-Person and Person-to-Database. I think this is conceptually cleaner. When the admins change the schema, that changes the Person-to-Database side. When the admins change the CSV format, they're changing the CSV-to-Database side. Now you can deal with each separately.
Does that help any?
I write import sub-systems almost every month at work, and as I do that kind of tasks to much I wrote sometime ago django-data-importer. This importer works like a django form and has readers for CSV, XLS and XLSX files that give you lists of dicts.
With data_importer readers you can read file to lists of dicts, iter on it with a for and save lines do DB.
With importer you can do same, but with bonus of validate each field of line, log errors and actions, and save it at end.
Please, take a look at https://github.com/chronossc/django-data-importer. I'm pretty sure that it will solve your problem and will help you with process of any kind of csv file from now :)
To solve your problem I suggest use data-importer with celery tasks. You upload the file and fire import task via a simple interface. Celery task will send file to importer and you can validate lines, save it, log errors for it. With some effort you can even present progress of task for users that uploaded the sheet.
I ended up taking a few steps back to address this problem per Occam's razor using updatable SQL views. It meant a few sacrifices:
Removing: South.DB-dependent real-time schema administration API, dynamic model loading, and dynamic ORM syncing
Defining models.py and an initial south migration by hand.
This allows for a simple approach to importing flat datasets (CSV/Excel) into a normalized database:
Define unmanaged models in models.py for each spreadsheet
Map those to updatable SQL Views (INSERT/UPDATE-INSTEAD SQL RULEs) in the initial south migration that adhere to the spreadsheet field layout
Iterating through the CSV/Excel spreadsheet rows and performing an INSERT INTO <VIEW> (<COLUMNS>) VALUES (<CSV-ROW-FIELDS>);
Here is another approach that I found on github. Basically it detects the schema and allows overrides. Its whole goal is to just generate raw sql to be executed by psql and or whatever driver.
https://github.com/nmccready/csv2psql
% python setup.py install
% csv2psql --schema=public --key=student_id,class_id example/enrolled.csv > enrolled.sql
% psql -f enrolled.sql
There are also a bunch of options for doing alters (creating primary keys from many existing cols) and merging / dumps.

Categories

Resources