I have a Google App Engine app where I need to store text files that are larger than 1 MB (the maximum entity size.
I'm currently storing them in the Blobstore and I make use of the Files API for reading and writing them. Current operations including uploading them from a user, reading them to process and update, and presenting them to a user. Eventually, I would like to allow a user to edit them (likely as a Google doc).
Are there advantages to storing such text files in Google Cloud Storage, as a Google Doc, or in some other location instead of using the Blobstore?
It really depends on what exactly you need. There are of course advantages of using one service over the other, but in the end it really doesn't matter, since all of the solutions will be almost equally fast and not that expensive. If you will have a huge amount of data after some time you might consider switching to another solution, just because you might save some money.
Having said that, I will suggest you to continue with the Blobstore API, since that will not require extra communication with external services, more secret keys, etc. Security and speed wise it is exactly the same. When you will reach 10K or 100K users you will already going to know if it'actually worth it to store them somewhere else. Continue with what you know best, but just make sure that you're following the right practices when building on Google App Engine.
If you're already using the Files API to read and write the files, I'd recommend you use Google Cloud Storage rather than the Blobstore. GCS offers a richer RESTful API (makes it easier to do things like access control), does a number of things to accelerate serving static data, etc.
Sharing data is more easy in Google Docs (now Google Drive) and Google Cloud Storage. Using Google drive u can also use the power of Google Apps scripts.
Related
I want to reorganize a series of linked google drives. Currently, each drive contains files corresponding to a letter (A-Z), and I want to reorganize them, so the files are organized by year instead. There is a massive amount of data in each drive, so it would take a lot of time to share and then copy files from one to another, and there are also many different file types. I've looked at some cloud transfer solutions, but if anyone knows if this is feasible with the drive API please let me know. I've looked at the documentation, but I'm not sure how to apply it to a transfer this large.
Use the Google Drive API, specifically the files listing via the Google Python library to list the files by page and store those in a file or database. Then, sort them accordingly with help of Python or the database.
To copy files between accounts, try rclone. It supports the server-side copy feature (--drive-server-side-across-configs) which means the file doesn't have to be downloaded and uploaded locally and instead is copied on the Google drive side. This should be significantly faster.
I have a Django Rest Framework Project that I've integrated with Django-Storages to upload files to GCS. Everything works locally. However, Google App Engine imposes a hard limit of 32mb on the size of each request, I cannot upload any files greater than this described limit.
I looked into many posts here on StackOverflow and on the internet. Some of the solutions out listed the use of Blobstore API. However, I cannot find a way to integrate this into Django. Another solution describes the use of django-filetransfers but that plugin is obsolete.
I would appreciate it if someone can point me towards an approach I can take to fixing this problem.
PS: I would like to point out that the current setup works likes this. A post request sends the file up to the server which then handles the process of storing the file in google cloud storage. Since Google App Engine restricts request size to 32mb I cannot get to the point of receiving the file. So my issue is that how can I go about uploading these large files.
According with the official documentation[1] cloud storage can manage files until the 5 tb of size, nevertheless, is recommended take a look at the best practices document[2], also there is an example about how to upload objects using python here [3].
[1]https://cloud.google.com/storage/docs/json_api/v1/objects/insert
[2]https://cloud.google.com/storage/docs/best-practices#uploading
[3]https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
I recently am wrapping up a personal project that involved using flask, python, and pythonanywhere. I learned a lot and now I have some new ideas for personal projects.
My next project involves changing video files and converting them into other file types for example JPGs. when I drafted up how my system could work I quikly realized that the current platform I am using for web application hosting, meaning pythonanywhere, will be too expensive and perhaps even too slow it since I will be working with large files.
I searched around and found AWS S3 for file storage but I am having trouble finding out how I can operate on that data to do my conversions in python. I definitely don't want to download from S3 operate, on the data in Python anywhere, and then reupload the converted files to a bucket. The project will be available for use on the internet so I am trying to make it as robust and scalable as possible.
I found it hard to even word this question on what to ask as I am not too sure if I am even asking the right questions. I guess I am looking for a way to manipulate large data files, preferably in python, without having to work with the data locally if that makes any sense.
I am open to learning new technologies if that is the case and am looking for some direction on how I might achieve this personal project.
Have you looked into AWS Elastic Transcoder?
Amazon Elastic Transcoder lets you convert media files that you have stored in Amazon Simple Storage Service (Amazon S3) into media files in the formats required by consumer playback devices. For example, you can convert large, high-quality digital media files into formats that users can play back on mobile devices, tablets, web browsers, and connected televisions.
Like all things AWS, there are SDKs (e.g. Python SDK) that allow you to programmatically access the service.
I tried to find a way to do Resumable Upload and resuming it using the Drive API v3 on Python 3.5. I came across Google's Official API Guide on Media Upload however it used the file.insert function which seems to not be available in v3.
Additionally, I also planned to upload large files so a progress bar/percentage could really help. Also, do you think I should be using Chunk Upload? Google's official docs seems to say there's a lost in performance.
Thank you!
the files.insert function which seems to not be available in v3.
files.insert method was changed to files.create in v3. You can check that out in the Migration Guide.
I also planned to upload large files so a progress bar/percentage
If you want to show progress bars, check out some HTML5 and JS tutorials on the web like this one. There's plenty on the web for additional samples.
do you think I should be using Chunk Upload?
Resumable upload is good for big files as opposed to simple upload. So if you're working with large files, then that's the recommended way.
Resumable
upload:
uploadType=resumable. For reliable transfer, especially important with
larger files. With this method, you use a session initiating request,
which optionally can include metadata. This is a good strategy to use
for most applications, since it also works for smaller files at the
cost of one additional HTTP request per upload.
I am thinking about using Google App Engine.It is going to be a huge website. In that case, what is your piece of advice using Google App Engine. I heard GAE has restrictions like we cannot store images or files more than 1MB limit(they are going to change this from what I read in the GAE roadmap),query is limited to 1000 results, and I am also going to se web2py with GAE. So I would like to know your comments.
Thanks
Having developed a smallish site with GAE, I have some thoughts
If you mean "huge" like "the next YouTube", then GAE might be a great fit, because of the previously mentioned scaling.
If you mean "huge" like "massively complex, with a whole slew of screens, models, and features", then GAE might not be a good fit. Things like unit testing are hard on GAE, and there's not a built-in structure for your app that you'd get with something like (famously) (Ruby on) Rails, or (Python powered) Turbogears.
ie: there is no staging environment: just your development copy of the system and production. This may or may not be a bad thing, depending on your situation.
Additionally, it depends on the other Python modules you intend to pull in: some Python modules just don't run on GAE (because you can't talk to hardware, or because there are just too many files in the package).
Hope this helps
using web2py on Google App Engine is a great strategy. It lets you get up and running fast, and if you do outgrow the restrictions of GAE then you can move your web2py application elsewhere.
However, keeping this portability means you should stay away from the advanced parts of GAE (Task Queues, Transactions, ListProperty, etc).
The AppEngine uses BigTable as it's datastore backend. Don't try to write a traditional relational-database driven application. BigTable is much more well suited for use as a highly-scalable key-value store. Avoid joins if at all possible.
I wouldn't worry about any of this. After having played with Google App Engine for a while now, I've found that it scales quite well for large data sets. If your data elements are large (i.e. photos), then you'll need to integrate with another service to handle them, but that's probably going to be true no matter what with data of that size. Also, I've found BigTable relatively easy to work with having come from a background entirely in relational databases. Finally, Django is a somewhat hidden, but awesome, "feature" of Google App Engine. If you've never used it, it's a really nice, elegant web framework that makes a lot of common tasks trivial (forms come to mind here).
Google has just released version 1.3.0 of the SDK with support with a new Blobstore API for storage of files up to 50MB. See the post "App Engine SDK 1.3.0 Released Including Support for Larger User Uploads".
What about Google Wave? It's being built on appengine, and once live, real-time translatable chat reaches the corporate sector... I could see it hitting top 1000th... But then again, that's an internal project that gets to do special stuff other appengine apps can't.... Like hanging threads; I think... And whatever else Wave has under the hood...
If you are planning on a 'huge' website, then don't use App Engine. Simple as that. The App Engine is not built to deliver the next top 1000th website.
Allow me to also ask what do you mean by 'huge', how many simultaneous users? Queries per second? DB load?