A little info about the project: https://ahashplace.#PreferablyAFreeHost.something/ is a grid of query strings. When you make a request to the site with x, y and color in the query string if the sha256 hash of your query is lower than the current query in that x,y position the site will save your query for that position.
So I was running on Heroku's free tier (a recently discontinued product) with an s3 bucket as backup. With Heroku private repos I had the AWS credentials as plain text. I need s3 to keep an updated copy of a file (data.json) so that the app can get it instead of rolling back to its deployment version when the build re-deploys.
I would like the build to be a public repository. This is for a couple reasons; first I want this to be open source and second I would prefer not to give out my github credentials even if Render will deploy on their hardware from your own private repos for free.
The question: How do I get my dynamic file to endure along side an open source python web app?
My thoughts so far...
It would be great if I could make whatever credentials I put out there only work from the host; based on like a known static ip and sub-net-mask but I'm not sure that would be secure or viable as the host might change my network address and it could be spoofed by an attacker.
I've used mongoDB before and it didn't really have this problem and I don't know why...
I could put the data in a public repo and just run code privately to periodically fetch data from the app and push updates to the repo. Then the app could look to the repo when starting up the build. This is messy though.
I use a bot that writes (file ids) of the files that sent from user to a text file and then read from this text file the (file ids) then send it back to the user.The method worked, but when I deploy it to Heroku, I can no longer see, process, or download the text file.
Is there a way to view the text files that we deploy to heroku? Or is there a way to upload the text files on a cloud website and then make the bot open (read & write) the text file using the text file URL (but I think this would allow any user on the internet to access and modify my text files, which means it is not safe)? Create SQL database and upload text files and link each text file with its own URL (But I'm new to SQL)?
Is there any other simple method to solve this problem? What do you advise me to do in this case?
https://github.com/zieadshabkalieh/a
NOTE: The text file in my code named first.txt
Heroku has an ephemeral filesystem: every file created by the application will be removed (also any change to existing files deployed with the application) when there is a new deployment or application restart.
Heroku Dynos also restart every 24 hours.
It is a good idea to persist data to a remote storage (like S3) or a DB (always a good option but requires a little bit more work).
For reading/writing simple files you can check HerokuFiles repository with some Python examples and options. I would suggest S3 (using Python boto module) as it is easy to use, even if the number/size of files will one day increase.
I don't really know if this even classifies as a problem, but it somehow surprised me, since I thought that Docker containers run in sandboxed enviroments.
I have a running docker-container that includes a django-rest server.
When making the appropiate requests from the frontend (that's also served through a seperate NGINX-container in the same docker network) the request successfully triggers the response of the django server that includes writing the contents of the requests to a text-file.
Now, the path to the text file points to a subdirectory in the folder where the script itself is located.
But now the problem: the file is saved in the directory on the host-machine itself and (to my knowledge) not in the container.
Now is that something that docker is capable of doing or am I just not aware that django itself (which is running in the WSGI development server mind you) writes the file to the host machine.
now = datetime.now()
timestamp = datetime.timestamp(now)
polygon = json.loads(data['polygon'])
string = ''
for key in polygon:
string += f'{key}={polygon[key]}\r\n'
with open(f"./path/to/images/image_{timestamp}.txt", "w+") as f:
f.write(string)
return 200
Is there something I am missing or something I do not understand about docker or django yet?
I'm pretty new with Google App Engine.
What i need to do is to upload a pretty large CSV to CloudSQL.
I've got an HTML page that has a file upload module which when uploaded reaches the Blobstore.
After which i open the CSV with the Blob reader and execute each line to CloudSQL using cursor.execute("insert into table values"). The problem here is that i can only execute the HTTP request for a minute and not all the data gets inserted in that short a time. It also keeps the screen in a loading state throughout which i would like to avoid by making the code run in the back end if that's possible?
I also tried going the "LOAD DATA LOCAL INFILE" way.
"LOAD DATA LOCAL INFILE" works from my local machine when i'm connected to CloudSQL via the terminal. And its pretty quick.
How would i go about using this within App Engine?
Or is there a better way to import a large CSV into CloudSQL through the Blobstore or Google Cloud Storage directly after uploading the CSV from the HTML?
Also, is it possible to use Task Queues with Blob Store and then insert the data into CloudSQL on the backend?
I have used a similar approach for Datastore and not CloudSQL but the same approach can be applied to your scenario.
Setup a non-default module (previously backend, deprecated now) of your application
Send a http request which will trigger the module endpoint through a task queue (to avoid 60 second deadline)
Use mapreduce with CSV as input and do the operation on each line of csv within the map function (to avoid memory errors and resume pipeline from where it left in case of any errors during operation)
EDIT: Elaborating map reduce as per OP request, and also eliminating the use of taskqueue
Read the mapreduce basics from the docs found here
Download the dependency folders for mapreduce to work (simplejson, graphy, mapreduce)
Download this file to your project folder and save as "custom_input_reader.py"
Now copy the code below to your main_app.py file.
main_app.py
from mapreduce import base_handler
from mapreduce import mapreduce_pipeline
from custom_input_reader import GoogleStorageLineInputReader
def testMapperFunc(row):
# do process with csv row
return
class TestGCSReaderPipeline(base_handler.PipelineBase):
def run(self):
yield mapreduce_pipeline.MapPipeline(
"gcs_csv_reader_job",
"main_app.testMapperFunc",
"custom_input_reader.GoogleStorageLineInputReader",
params={
"input_reader": {
"file_paths": ['/' + bucketname + '/' + filename]
}
})
Create a http handler which will initiate the map job
main_app.py
class BeginUpload(webapp2.RequestHandler):
# do whatever you want
upload_task = TestGCSReaderPipeline()
upload_task.start()
# do whatever you want
If you want to pass any parameters, add the parameter in "run" method and provide values when creating the pipeline object
You can try importing CSV data via cloud console:
https://cloud.google.com/sql/docs/import-export?hl=en#import-csv
I am using this file storage engine to store files to Amazon S3 when they are uploaded:
http://code.welldev.org/django-storages/wiki/Home
It takes quite a long time to upload because the file must first be uploaded from client to web server, and then web server to Amazon S3 before a response is returned to the client.
I would like to make the process of sending the file to S3 asynchronous, so the response can be returned to the user much faster. What is the best way to do this with the file storage engine?
Thanks for your advice!
I've taken another approach to this problem.
My models have 2 file fields, one uses the standard file storage backend and the other one uses the s3 file storage backend. When the user uploads a file it get's stored localy.
I have a management command in my application that uploads all the localy stored files to s3 and updates the models.
So when a request comes for the file I check to see if the model object uses the s3 storage field, if so I send a redirect to the correct url on s3, if not I send a redirect so that nginx can serve the file from disk.
This management command can ofcourse be triggered by any event a cronjob or whatever.
It's possible to have your users upload files directly to S3 from their browser using a special form (with an encrypted policy document in a hidden field). They will be redirected back to your application once the upload completes.
More information here: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1434
There is an app for that :-)
https://github.com/jezdez/django-queued-storage
It does exactly what you need - and much more, because you can set any "local" storage and any "remote" storage. This app will store your file in fast "local" storage (for example MogileFS storage) and then using Celery (django-celery), will attempt asynchronous uploading to the "remote" storage.
Few remarks:
The tricky thing is - you can setup it to copy&upload, or to upload&delete strategy, that will delete local file once it is uploaded.
Second tricky thing - it will serve file from "local" storage until it is not uploaded.
It also can be configured to make number of retries on uploads failures.
Installation & usage is also very simple and straightforward:
pip install django-queued-storage
append to INSTALLED_APPS:
INSTALLED_APPS += ('queued_storage',)
in models.py:
from queued_storage.backends import QueuedStorage
queued_s3storage = QueuedStorage(
'django.core.files.storage.FileSystemStorage',
'storages.backends.s3boto.S3BotoStorage', task='queued_storage.tasks.TransferAndDelete')
class MyModel(models.Model):
my_file = models.FileField(upload_to='files', storage=queued_s3storage)
You could decouple the process:
the user selects file to upload and sends it to your server. After this he sees a page "Thank you for uploading foofile.txt, it is now stored in our storage backend"
When the users has uploaded the file it is stored temporary directory on your server and, if needed, some metadata is stored in your database.
A background process on your server then uploads the file to S3. This would only possible if you have full access to your server so you can create some kind of "deamon" to to this (or simply use a cronjob).*
The page that is displayed polls asynchronously and displays some kind of progress bar to the user (or s simple "please wait" Message. This would only be needed if the user should be able to "use" (put it in a message, or something like that) it directly after uploading.
[*: In case you have only a shared hosting you could possibly build some solution which uses an hidden Iframe in the users browser to start a script which then uploads the file to S3]
You can directly upload media to the s3 server without using your web application server.
See the following references:
Amazon API Reference : http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?UsingHTTPPOST.html
A django implementation : https://github.com/sbc/django-uploadify-s3
As some of the answers here suggest uploading directly to S3, here's a Django S3 Mixin using plupload:
https://github.com/burgalon/plupload-s3mixin
I encountered the same issue with uploaded images. You cannot pass along files to a Celery worker because Celery needs to be able to pickle the arguments to a task. My solution was to deconstruct the image data into a string and get all other info from the file, passing this data and info to the task, where I reconstructed the image. After that you can save it, which will send it to your storage backend (such as S3). If you want to associate the image with a model, just pass along the id of the instance to the task and retrieve it there, bind the image to the instance and save the instance.
When a file has been uploaded via a form, it is available in your view as a UploadedFile file-like object. You can get it directly out of request.FILES, or better first bind it to your form, run is_valid and retrieve the file-like object from form.cleaned_data. At that point at least you know it is the kind of file you want it to be. After that you can get the data using read(), and get the other info using other methods/attributes. See https://docs.djangoproject.com/en/1.4/topics/http/file-uploads/
I actually ended up writing and distributing a little package to save an image asyncly. Have a look at https://github.com/gterzian/django_async Right it's just for images and you could fork it and add functionalities for your situation. I'm using it with https://github.com/duointeractive/django-athumb and S3