How to Prevent Overwriting a Existing File Using FTPLIB in Python? - python

I made a python program which stores the user entered data in a sqlite database. I want to upload it to ftp server. I tried using ftplib in python. Here, database file name is same for all the users who uses the program.
Here, my problem is
If user1 upload the file to ftp server, and next user2 uploads it. The file is overwritten. How can I stop this?
Fairly: How can I stop overwriting a existing file and rename the current(file going to be uploaded) file in ftpserver. So, that I will have those 2 files?

Use ftplib.mlsd() to list the directory before uploading. If the given file is already there, then don't upload it.
Be careful with this -- if two people are uploading at the same time, it's still possible for user A to upload and overwrite user B.

Your question lacks details (among the rest: what version of Python you are using, what OS are you on, and most crucially what the naming scheme for the copied files should be).
I'll assume that each client program uses a SQLite file named "userdata.db" and that you want to make sure that on the FTP server each copy is identified by the user name.
So User:Pamar will have userdata.db.pamar on ftp server, while User:Lucy will end up with userdata.db.lucy
(I hope you have some way to be sure that no two users have the same name, btw).
The easiest solution I can think of is:
Use shutil to make a temporary (local) copy of the db with the desidered name, transfer it by FTP, then delete it.
I.e, in the case of User Pamar you'll have:
Step 1: Copy /userhome/pamar/userdata.db -> /userhome/pamar/userdata.db.pamar
Step 2: FTP transfer /userhome/pamar/userdata.db.pamar -> FTPServer
Step 3: Delete /userhome/pamar/userdata.db.pamar
It's not very elegant, and you will use some extra space on the (local) filesystem until the copy is completed, and this may be a problem if the userdata.db is particularly large (but then you would probably not use sqlite in the first place).

Related

dump CSV file from Django query to Github

We want to automate a process through django admin where, whenever a user makes a change to a record (or adds/deletes a record), a CSV file is created and then dumped into a Github repository with a commit message specified by the person who made the change.
Creating the csv file from a queryset is easy enough... But how would we go about then getting that csv file to a folder that is git initialized so that we can commit it to a repository?
Any ideas would be great. Essentially we're looking for a way of tracking specific changes to the database. With CSV files in github, we can really easily follow the changes, and we want to leverage that.
cheers
If you can create your csv files the next step would be to talk to github via api or to have a local representation of a git repo which needs to be synct after file creation.
But if I may ask why do you want to do this with csv files in a github repo? My first response to a requirement like that would be to logg changes with the python logging infrastructure or to create an additional model to track the specific changes in the db.
Eventually this could also meet your requirements: https://django-simple-history.readthedocs.io/en/latest/
This doesn't exactly answer the question, but have you thought of using something like django-simple-history?
It's a really easy to use Django package that tracks all Django model state on every create/update/delete. Should be much easier to get going than fiddling around pushing CSVs to github.

Deal with generated files for multiple users simultaneously

I am developing a website using django 1.8 and python3.4 with anaconda environment. Basically, A file and plot are generated and display on the web page after a user inputs parameters and submits it. Then the user can download the file and image.
My approach to handle the file storage is store them as static files and name them as "result.csv" and "plot.png".The files are totally different based on users' request. But if more than one user request something simultaneously, the system only save a file with a name. It is very dangerous.
I have no insight to deal with this situation. Could anyone give some suggestion or a direction for me? Thank you very much.
There are several ways to accomplish this. The first ones that spring to mind are, assuming you want to keep one set of results per user (i.e. the last generated)
1- Create unique names based on the user-id. This allows you to access the files without first consulting the user data in the DB.
It also has the advantage of not having to delete previous versions of the files.
2- Create unique filenames with the uuid library module
import uuid
user_filename = uuid.uuid4()
csv_filename = user_filename+'.csv'
png_filename = user_filename+'.png'
and store the user_filename in the DB user record for later access.
3- Do the same but using a timestamp with enough resolution
Alternatively you can create a subdirectory with a unique name and store the files inside it with a static name for both.
Options 2 and 3 require you to remove previous versions of the files when generating new ones for the same user.
As mentioned by #Wtower, store the files in the MEDIA directory, maybe under a further subdirectory.

Import csv data to web2py database and process uploads

I've made a really simple single-user database application with web2py to be deployed to a desktop machine. The reason I choose web2py is due to its simplicity and its not intrusive web server.
My problem is that I need to migrate an existing database from another application that I've just preprocessed and prepared into a csv file that can be now perfectly imported into web2py's sqlite database.
Now, I have a problem with a 'upload' field in one of the tables, which correspond to a small image, I've formated that field into de the csv, with the name of the corresponding .jpg file that I extrated from the original database. The problem is that I have not managed how to insert these correctly into the upload folder, as the web2py engine automatically changes the filename of the users' uploads to a safe format, and copying my files straight to the folder does not work.
My question is, does anyone know a proper way to include this image collection into the uploads folder?. I don't know if there is a way to disable this protection or if I have to manually change their name to a valid hash. I've also considered the idea of coding an automatic insert process into the database...
Thanks all for you attention!
EDIT (a working example):
An example database:
db.define_table('product',
Field('name'),
Field('color'),
Field('picture', 'upload'),
)
Then using the default appadmin module from my application I import a csv file with entries of the form:
product.name,product.color,product.picture
"p1","red","p1.jpg"
"p2","blue","p2.jpg"
Then in my application I have the usual download function:
def download():
return response.download(request, db)
Which I call requesting the images uploaded into the database, for example, to be included into a view:
<img src="{{=URL('download', args=product.picture)}}" />
So my problem is that I have all the images corresponding the database records and I need to import them into my application, by properly including them into the uploads folder.
If you want the files to be named via the standard web2py file upload mechanism (which is a good idea for security reasons) and easily downloaded via the built-in response.download() method, then you can do something like the following.
In /yourapp/controllers/default.py:
def copy_files():
import os
for row in db().select(db.product.id, db.product.picture):
picture = open(os.path.join(request.folder, 'private', row.picture), 'rb')
row.update_record(picture=db.product.picture.store(picture, row.picture))
return 'Files copied'
Then place all the files in the /yourapp/private directory and go to the URL /default/copy_files (you only need to do this once). This will copy each file into the /uploads directory and rename it, storing the new name in the db.product.picture field.
Note, the above function doesn't have to be a controller action (though if you do it that way, you should remove the function when finished). Instead, it could be a script that you run via the web2py command line (needs to be run in the app environment to have access to the database connection and model, as well as reference to the proper /uploads folder) -- in that case, you would need to call db.commit() at the end (this is not necessary during HTTP requests).
Alternatively, you can leave things as they are and instead (a) manage uploads and downloads manually instead of relying on web2py's built-in mechanisms, or (b) create custom_store and custom_retrieve functions (unfortunately, I don't think these are well documented) for the picture field, which will bypass web2py's built-in store and retrieve functions. Life will probably be easier, though, if you just go through the one-time process described above.

What do I need to consider when scaling an application that stores files in the filesystem?

I am interesting in making an app where users can upload large files (~2MB) that are converted into html documents. This application will not have a database. Instead, these html files are stored in a particular writable directory outside of the document source tree. Thus this directory will grow larger and larger as more files are added to it. Users should be able to view these html files by visiting the appropriate url. All security concerns aside, what do I need to be worried about if this directory continues to grow? Will accessing the files inside take longer when there are more of them? Will it potentially crash because of this? Should I create a new directory every 100 files or so to prevent this?
It it is important, I want to make this app using pyramid and python
You might want to partition the directories by user, app or similar so that it's easy to manage anyway - like if a user stops using the service you could just delete their directory. Also I presume you'll be zipping them up. If you keep it well decoupled then you'll be able to change your mind later.
I'd be interested to see how using something like SQLite would work for you, as you could have a sqlite db per partitioned directory.
I presume HTML files are larger than the file they uploaded, so why store the big HTML file.
Things like Mongodb etc are out of the question? as is your app scales with multiple servers you've the issue of accessing other files on a different server, unless you pick the right server in the first place using some technique. Then it's possible you've got servers sitting idle as no one wants there documents.
Why the limitation on just storing files in a directory, is it a POC?
EDIT
I find value in reading things like http://blog.fogcreek.com/the-trello-tech-stack/ and I'd advise you find a site already doing what you do and read about their tech. stack.
As someone already commented why not use Amazon S3 or similar.
Ask yourself realistically how many users do you imagine and really do you want to spend a lot of energy worrying about being the next facebook and trying to do the ultimate tech stack for the backend when you could get your stuff out there being used.
Years ago I worked on a system that stored insurance certificates on the filesystem, we use to run out of inodes.!
Dare I say it's a case of suck it and see what works for you and your app.
EDIT
HAProxy I believe are meant to handle all that load balancing concerns.
As I imagine as a user I wants to http://docs.yourdomain.com/myname/document.doc
although I presume there are security concerns of it being so obvious a name.
This greatly depends on your filesystem. You might want to look up which problems the git folks encountered (also using a sole filesystem based database).
In general, it will be wise do split that directory up, for example by taking the first two or three letters of the file name (or a hash of those) and group the files into subdirectories based on that key. You'd have a structure like:
uploaddir/
00/
files whose name sha1 starts with 00
01/
files whose name sha1 starts with 01
and so on. This takes some load off the filesystem by partitioning the possibly large directories. If you want to be sure that no user can perform an Denial-of-Service-Attack by specifically uploading files whose names hash to the same initial characters, you can also seed the hash differently or salt it or anything like that.
Specifically, the effects of large directories are pretty file-system specific. Some might become slow, some may cope really well, others may have per-directory limits for files.

Memory usage of file versus database for simple data storage

I'm writing the server for a Javascript app that has a syncing feature. Files and directories being created and modified by the client need to be synced to the server (the same changes made on the client need to be made on the server, including deletes).
Since every file is on the server, I'm debating the need for a MySQL database entry corresponding to each file. The following information needs to be kept on each file/directory for every user:
Whether it was deleted or not (since deletes need to be synced to other clients)
The timestamp of when every file was last modified (so I know whether the file needs updating by the client or not)
I could keep both of those pieces of information in files (e.g. .deleted file and .modified file in every user's directory containing file paths + timestamps in the latter) or in the database.
However, I also have to fit under an 80mb memory constraint. Between file storage and
database storage, which would be more memory-efficient for this purpose?
Edit: Files have to be stored on the filesystem (not in a database), and users have a quota for the storage space they can use.
Probably the filesystem variant will be more efficient memory wise as long as the number of files is low, but that solution probably won't scale. Databases are optimized to do exactly that. Searching the filesystem, opening the file, searching the document, will be expensive as the number of files and requests increase.
But nobody says you have to use MySQl. A NoSQL database like Redis, or maybe something like CouchDB (where you could keep the file itself and include versioning) might be solutions that are more attractive.
here a quick comparison of NoSQL databases.
And a longer comparison.
Edit: From your comments, I would build it as follows: create an API abstracting the backend for all the operations you want to do. Then implement the backend part with the 2 or 3 operations that happen most, or could be more expensive, for the filesytem, and for a database (or two). Test and benchmark.
I'd go for one of the NoSQL databases. You can store file contents and provide some key function based on user's IDs in order to retrieve those contents when you need them. Redis or Casandra can be good choices for this case. There are many libs to use these databases in Python as well as in many other languages.
In my opinion, the only real way to be sure is to build a test system and compare the space requirements. It shouldn't take that long to generate some random data programatically. One might think the file system would be more efficient, but databases can and might compress the data or deduplicate it, or whatever. Don't forget that a database would also make it easier to implement new features, perhaps access control.

Categories

Resources