Callback for upload progress - python

I am using Dropbox client for Python (actually a Python 3 version, but I don't think it matters now) to upload some files to my Dropbox. I am also using PyQt4 to have a GUI for this.
Is there a possibility to specify a callback to be called when the file is uploaded to show the user the upload progress?

You mean, you want to show progress while the file is uploading (on a progressbar or something)?
You probably need get_chunked_uploader()
From API Docs:
DESCRIPTION Uploads large files to Dropbox in multiple chunks. Also
has the ability to resume if the upload is interrupted. This allows
for uploads larger than the /files_put maximum of 150 MB.
Typical usage:
1) Send a PUT request to /chunked_upload with the first chunk of the file
without setting upload_id, and receive an upload_id in return.
2)Repeatedly PUT subsequent chunks using the upload_id to identify the
upload in progress and an offset representing the number of bytes
transferred so far.
3)After each chunk has been uploaded, the server
returns a new offset representing the total amount transferred.
...

Related

Verify image in Google Cloud Storage Bucket

I have system that creates uploadable link to Google Cloud Storage Bucket uploads. After that user is uploading it directly there from Frontend.
Is there a way to verify this image file there without downloading it to a Backend app and verify there (e.g. using PIL for python)?
Verification for:
is it an image at all;
is it fully uploaded;
is it not broken;
etc.
P.S. is there anything similar for PDF?
Cloud Storage doesn't directly offer any direct support for any particular formats, be it JPEG or PDF or anything else. To fully validate what's in a file, you need to download it and check.
You can, however, get part of the way there.
First, you can have your client validate the file, then capture the size and/or a checksum (either MD5 or CRC32c) of the original file, and you can specify them as part of the upload to ensure that they are uploaded exactly as intended. If your server can know the intended file size or checksum, you can ask Cloud Storage for just the metadata of an object without downloading it to verify that it is as intended.
Second, many files, including JPEG, have particular headers or footers that describe their contents. Instead of downloading what is potentially a very large image, you could download only the first few bytes from Cloud Storage. If the first two bytes aren't 0xFF and 0xD8, then it's not a JPEG file. Similar magic numbers exist for many other formats.

How to speed up Flask response download speed

My frontend web app is calling my python Flask API on an endpoint that is cached and returns a JSON that is about 80,000 lines long and 1.7 megabytes.
It takes my UI about 7.5 seconds to download all of it.
It takes Chrome when calling the path directly about 6.5 seconds.
I know that I can split up this endpoint for performance gains, but out of curiosity, what are some other great options to improve the download speed of all this content?
Options I can think of so far:
1) compressing the content. But then I would have to decompress it on the frontend
2) Use something like gRPC
Further info:
My flask server is using WSGIServer from gevent and the endpoint code is below. PROJECT_DATA_CACHE is the already Jsonified data that is returned:
#blueprint_2.route("/projects")
def getInitialProjectsData():
global PROJECT_DATA_CACHE
if PROJECT_DATA_CACHE:
return PROJECT_DATA_CACHE
else:
LOGGER.debug('No cache available for GET /projects')
updateProjectsCache()
return PROJECT_DATA_CACHE
Maybe you could stream the file? I cannot see any way to transfer a file 80,000 lines long without some kind of download or wait.
This would be an opportunity to compress and decompress it, like you suggested. Definitely make sure that the JSON is minified.
One way to minify a JSON: https://www.npmjs.com/package/json-minify
Streaming a file:
https://blog.al4.co.nz/2016/01/streaming-json-with-flask/
It also really depends on the project, maybe you could get the users to download it completely?
The best way to do this is to break your JSON into chunks and stream it by passing a generator to the Response. You can then render the data as you receive it or show a progress bar displaying the percentage that is done. I have an example of how to stream data as a file is being downloaded from AWS s3 here. That should point you in the right direction.

How to read a part of amazon s3 key, assuming that "multipart upload complete" is yet to happen for that key?

I'm working on aws S3 multipart upload, And I am facing following issue.
Basically I am uploading a file chunk by chunk to s3, And during the time if any write happens to the file locally, I would like to reflect that change to the s3 object which is in current upload process.
Here is the procedure that I am following,
Initiate multipart upload operation.
upload the parts one by one [5 mb chunk size.] [do not complete that operation yet.]
During the time if a write goes to that file, [assuming i have the details for the write [offset, no_bytes_written] ].
I will calculate the part no for that write happen locally, And read that chunk from the s3 uploaded object.
Read the same chunk from the local file and write to read part from s3.
Upload the same part to s3 object.
This will be an a-sync operation. I will complete the multipart operation at the end.
I am facing an issue in reading the uploaded part that is in multipart uploading process. Is there any API available for the same?
Any help would be greatly appreciated.
There is no API in S3 to retrieve a part of a multi-part upload. You can list the parts but I don't believe there is any way to retrieve an individual part once it has been uploaded.
You can re-upload a part. S3 will just throw away the previous part and use the new one in it's place. So, if you had the old and new versions of the file locally and were keeping track of the parts yourself, I suppose you could, in theory, replace individual parts that had been modified after the multipart upload was initiated. However, it seems to me that this would be a very complicated and error-prone process. What if the change made to a file was to add several MB's of data to it? Wouldn't that change your boundaries? Would that potentially affect other parts, as well?
I'm not saying it can't be done but I am saying it seems complicated and would require you to do a lot of bookkeeping on the client side.

Big file stored in GCS server only partialy when served using blob key

I am writing zip files into google cloud storage using GCS client library. Then I retrieve the blob key using create_gs_key() function. Immediately after creating the file I try to download it using a second http request. I write the blob key obtained in the previous call to X-AppEngine-BlobKey header response.
When the file is relatively big, usually about 30 MB or more, the first try sometimes results in an incomplete file, a few MB smaller than the target size. If you wait a little the next try is usually fine.
I had the same problem when I tried to write files into blobstore using the API that is now deprecated.
Is it guaranteed that when you close the file it should already be available for serving?

Preventing files with size greater than some limit from being uploaded

I set up a server using cherrypy to which files can be uploaded. However, I want to prevent files being uploaded if they exceed a certain size. I searched a bit but was not able to find out an answer. Is there a way to achieve this with cherrypy or in general ?
cherrypy._cpserver.Server.max_request_body_size is probably what you want.
Before an HTTP client uploads a file, in the HTTP headers they must specify the size of their message body. Based on that you can immediately reject the upload attempt with a HTTP 413 Request Entity Too Large error message.
It's possible to circumvent this by saying a certain amount and then uploading more - but most servers are smart enough to stop reading when they've hit the maximum they intend on accepting from a client.
Unfortunately, this isn't always the case because there's also a method of HTTP upload called 'Chunked encoding', in which the client (or server, depending on which way it's going) is not required to advertise the size of their upload. You mostly see this on the server side as a way to stream data to clients.

Categories

Resources