Deal with generated files for multiple users simultaneously

Deal with generated files for multiple users simultaneously - python

I am developing a website using django 1.8 and python3.4 with anaconda environment. Basically, A file and plot are generated and display on the web page after a user inputs parameters and submits it. Then the user can download the file and image.
My approach to handle the file storage is store them as static files and name them as "result.csv" and "plot.png".The files are totally different based on users' request. But if more than one user request something simultaneously, the system only save a file with a name. It is very dangerous.
I have no insight to deal with this situation. Could anyone give some suggestion or a direction for me? Thank you very much.

There are several ways to accomplish this. The first ones that spring to mind are, assuming you want to keep one set of results per user (i.e. the last generated)
1- Create unique names based on the user-id. This allows you to access the files without first consulting the user data in the DB.
It also has the advantage of not having to delete previous versions of the files.
2- Create unique filenames with the uuid library module
import uuid
user_filename = uuid.uuid4()
csv_filename = user_filename+'.csv'
png_filename = user_filename+'.png'
and store the user_filename in the DB user record for later access.
3- Do the same but using a timestamp with enough resolution
Alternatively you can create a subdirectory with a unique name and store the files inside it with a static name for both.
Options 2 and 3 require you to remove previous versions of the files when generating new ones for the same user.
As mentioned by #Wtower, store the files in the MEDIA directory, maybe under a further subdirectory.

Related

Mapping fields inside other fields

Hello I would like to make an app that allows the user to import data from a source of his choice (Airtable, xls, csv, JSON) and export to a JSON which will be pushed to an Sqlite database using an API.
The "core" of the functionality of the app is that it allows the user to create a "template" and "map" of the source columns inside the destination columns. Which source column(s) go to which destination column is up to the user. I am attaching two photos here (used in airtable/zapier), so you can get a better idea of the end result:
adding fields inside fields - airtableadding fields inside fields - zapier
I would like to know if you can recommend a library or a way to come about this problem? I have tried to look for some python or nodejs libraries, I am lost between using ETL libraries, some recommended using mapping/zipping features, others recommend coding my own classes. Do you know any libraries that allow to do the same thing as airtable/zapier ? Any suggestions ?

Save file on databases is really a bad practice since it takes up a lot of database storage space and would add latency in the communication.
I hardly recommend saving it on disk and store the path on database.

Storing Images generated in python to Django database(MySQL)

I am new to django and i have been scouting the site to find good ways to store images generated using python(there seems to be contradicting views on saving it to a local folder, file or database). My website's sole purrpose is to make qrcodes all the time. In my local machine i would save it in my program folder like:
import pyqrcode
qr = pyqrcode.create("{'name': 'myqr'}")
qr.png("horn.png", scale=6)
print "All done!"
I am running into a situation where i will be dealing with many users doing the same thing. I doubt saving it to a local folder would be a viable option. I am set on saving these images as blobs in mysql.
Has anyone done this sort of thing? If so, what was the best way to implement.Example code would also be helpful.

Storing images in the database is somewhat an edge-case. To relieve the database, I would not store them in the database, but in a folder on your server.
I understand your question so, that some users might create the same qr-codes. In this case, I would create a database table like that:
CREATE TABLE qrcodes (
value TEXT PRIMARY KEY,
fname TEXT
);
With this table you can find the qr-codes by the encoded value. If an entry exists for a given value, than you can just return the contents of the file whose name is stored.
Leaves one question: How to create the file names. There are many possibilities. One would be to create an UUID and make a filename from it. An other option would be to use a global counter and give every file a new number. The counter must be stored in a database table of course.
Of course you can still store the image in the database, if you want. Just don't use the fname field, but a blob-field that stores the content. That solution should work on most databases, but is likely slower than the file approach when you have really big amounts of data.

Import csv data to web2py database and process uploads

I've made a really simple single-user database application with web2py to be deployed to a desktop machine. The reason I choose web2py is due to its simplicity and its not intrusive web server.
My problem is that I need to migrate an existing database from another application that I've just preprocessed and prepared into a csv file that can be now perfectly imported into web2py's sqlite database.
Now, I have a problem with a 'upload' field in one of the tables, which correspond to a small image, I've formated that field into de the csv, with the name of the corresponding .jpg file that I extrated from the original database. The problem is that I have not managed how to insert these correctly into the upload folder, as the web2py engine automatically changes the filename of the users' uploads to a safe format, and copying my files straight to the folder does not work.
My question is, does anyone know a proper way to include this image collection into the uploads folder?. I don't know if there is a way to disable this protection or if I have to manually change their name to a valid hash. I've also considered the idea of coding an automatic insert process into the database...
Thanks all for you attention!
EDIT (a working example):
An example database:
db.define_table('product',
Field('name'),
Field('color'),
Field('picture', 'upload'),
)
Then using the default appadmin module from my application I import a csv file with entries of the form:
product.name,product.color,product.picture
"p1","red","p1.jpg"
"p2","blue","p2.jpg"
Then in my application I have the usual download function:
def download():
return response.download(request, db)
Which I call requesting the images uploaded into the database, for example, to be included into a view:
<img src="{{=URL('download', args=product.picture)}}" />
So my problem is that I have all the images corresponding the database records and I need to import them into my application, by properly including them into the uploads folder.

If you want the files to be named via the standard web2py file upload mechanism (which is a good idea for security reasons) and easily downloaded via the built-in response.download() method, then you can do something like the following.
In /yourapp/controllers/default.py:
def copy_files():
import os
for row in db().select(db.product.id, db.product.picture):
picture = open(os.path.join(request.folder, 'private', row.picture), 'rb')
row.update_record(picture=db.product.picture.store(picture, row.picture))
return 'Files copied'
Then place all the files in the /yourapp/private directory and go to the URL /default/copy_files (you only need to do this once). This will copy each file into the /uploads directory and rename it, storing the new name in the db.product.picture field.
Note, the above function doesn't have to be a controller action (though if you do it that way, you should remove the function when finished). Instead, it could be a script that you run via the web2py command line (needs to be run in the app environment to have access to the database connection and model, as well as reference to the proper /uploads folder) -- in that case, you would need to call db.commit() at the end (this is not necessary during HTTP requests).
Alternatively, you can leave things as they are and instead (a) manage uploads and downloads manually instead of relying on web2py's built-in mechanisms, or (b) create custom_store and custom_retrieve functions (unfortunately, I don't think these are well documented) for the picture field, which will bypass web2py's built-in store and retrieve functions. Life will probably be easier, though, if you just go through the one-time process described above.

How can i make the tree like menus from python list

I have the name of files in the list with folders. The list contains 2000 file names like this
Countries/US/newyork/file1.pdf
Countries/Australia/Sydney/file1.pdf
Countries/Canada/Toronto/bla/blabla/file2.pdf
and so on.
I want to index those files in database so that i can have hierarchical directory structure.
IN my Django app i want to display first the root level menus like
countries --- US , Australia, canada
Then i someone click on country then it get the second level of folders and so on and in end i want to see files if there are no more folders.
rather than querying my storage evry time , i want to store all that info in my database so that my web pages are displayed from DB and when user click download then i get the file from my Storage
i am not able to find how should i make the Model or database table for that

I suggest following way:
Create models to store your tree structure and files for example:
class Node(TreeModel):
parent # foreign key to Node
class File(Model):
node # foreign key to Node
name # name of file
path # path to the file on disk for example
After that move your files in one or few directories (read this How many files can I put in a directory?)
also you can rename them (for example by using hash from files).
Update the model File to put there new paths to your files.
Having done this you are able to easy show files and build path to files etc.
For the model Node use [django-mptt][1] (there are other solutions for django, google it) to get an efficient API to manage a Tree-like model.
You can also create your own Django Storage Backend (or find there are many solutions on the Internet).
https://github.com/django-mptt/
https://docs.djangoproject.com/en/1.5/howto/custom-file-storage/
Updated
You can add new files by using django admin. You should use amazon s3 django storage backend http://django-storages.readthedocs.org/en/latest/backends/amazon-S3.html.
Change:
class File(Model):
node # foreign key to Node
name # name of file
file # django models.FileField
In this case you have not to update index.

What do I need to consider when scaling an application that stores files in the filesystem?

I am interesting in making an app where users can upload large files (~2MB) that are converted into html documents. This application will not have a database. Instead, these html files are stored in a particular writable directory outside of the document source tree. Thus this directory will grow larger and larger as more files are added to it. Users should be able to view these html files by visiting the appropriate url. All security concerns aside, what do I need to be worried about if this directory continues to grow? Will accessing the files inside take longer when there are more of them? Will it potentially crash because of this? Should I create a new directory every 100 files or so to prevent this?
It it is important, I want to make this app using pyramid and python

You might want to partition the directories by user, app or similar so that it's easy to manage anyway - like if a user stops using the service you could just delete their directory. Also I presume you'll be zipping them up. If you keep it well decoupled then you'll be able to change your mind later.
I'd be interested to see how using something like SQLite would work for you, as you could have a sqlite db per partitioned directory.
I presume HTML files are larger than the file they uploaded, so why store the big HTML file.
Things like Mongodb etc are out of the question? as is your app scales with multiple servers you've the issue of accessing other files on a different server, unless you pick the right server in the first place using some technique. Then it's possible you've got servers sitting idle as no one wants there documents.
Why the limitation on just storing files in a directory, is it a POC?
EDIT
I find value in reading things like http://blog.fogcreek.com/the-trello-tech-stack/ and I'd advise you find a site already doing what you do and read about their tech. stack.
As someone already commented why not use Amazon S3 or similar.
Ask yourself realistically how many users do you imagine and really do you want to spend a lot of energy worrying about being the next facebook and trying to do the ultimate tech stack for the backend when you could get your stuff out there being used.
Years ago I worked on a system that stored insurance certificates on the filesystem, we use to run out of inodes.!
Dare I say it's a case of suck it and see what works for you and your app.
EDIT
HAProxy I believe are meant to handle all that load balancing concerns.
As I imagine as a user I wants to http://docs.yourdomain.com/myname/document.doc
although I presume there are security concerns of it being so obvious a name.

This greatly depends on your filesystem. You might want to look up which problems the git folks encountered (also using a sole filesystem based database).
In general, it will be wise do split that directory up, for example by taking the first two or three letters of the file name (or a hash of those) and group the files into subdirectories based on that key. You'd have a structure like:
uploaddir/
00/
files whose name sha1 starts with 00
01/
files whose name sha1 starts with 01
and so on. This takes some load off the filesystem by partitioning the possibly large directories. If you want to be sure that no user can perform an Denial-of-Service-Attack by specifically uploading files whose names hash to the same initial characters, you can also seed the hash differently or salt it or anything like that.
Specifically, the effects of large directories are pretty file-system specific. Some might become slow, some may cope really well, others may have per-directory limits for files.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.