I'm using Python/Django on Heroku (Cedar Stack) and I've got a management command that I need to write that will pull a file out of an S3 bucket and process it. I'm not sure I understand how to use the ephemeral filesystem. Are there only certain directories that are writeable? I found an other article that implied that there were only certain folders that were writable (but, it doesn't seem to apply to the Cedar stack). I found this dev article but it doesn't go into much detail (note: I do understand that it's just temporary. I only need to unzip the file and process the file). Can I just create a folder anywhere under the application's root? And how would I get that? It seems like I could probably just use $HOME. I did a bit of testing by connecting to via
$ heroku run bash
and running:
$ echo #HOME
returns:
/app
and running:
$ mkdir $HOME/tmp
creates a folder in the app's root and gives with the same user and group as the other files and folders.
So... anything I'm missing here? A better way to do it? Is there an OS environment variable for this? I've run "env" and I don't see a better one.
To really understand the ephemeral filesystem, you need to understand what a dyno is. You can read more about how dynos work. In a nutshell, though, a process runs on Heroku in a virtual machine with its own filesystem. That virtual machine can stop for a number of reasons, taking the filesystem along with it.
The underlying filesystem will be destroyed when an app is restarted, reconfigured (e.g. heroku config ...), scaled, etc. For example, if you have two web dynos, write some files to the ephemeral filesystem, and scale to three dynos, those files will be destroyed because your app is running on new dynos.
In general, the ephemeral filesystem works just like any filesystem. Directories you have permission to write to, such as $HOME and /tmp, you can write files to. Any files that require permanence should be written to S3, or a similar durable store. S3 is preferred as Heroku runs on AWS and S3 offers some performance advantages. Any files that can be recreated at will can be stored on the dyno's ephemeral store.
You can create a file under the '/tmp' directory, and that file will be destroyed after the request is complete. I'm doing this on Cedar, and I haven't had any problems.
Related
So I have a bit of a issue, I want to use Heroku to host my flask web app, and then I also want to use Heroku pipeline to link to the GitHub repository where I am housing this project. The issue is that on my website I allow the user to upload files to the server, but I feel that If I were to update the GitHub repository I will lose all the files the user uploaded when the server reloads the new GitHub. I would like to know if this is a real issue and if so is there some way I could fix this?
Storing user-uploaded files to Heroku isn't a good idea because Heroku provides ephemeral filesystem.
The Heroku filesystem is ephemeral - that means that any changes to the filesystem whilst the dyno is running only last until that dyno is shut down or restarted. Each dyno boots with a clean copy of the filesystem from the most recent deploy. This is similar to how many container based systems, such as Docker, operate.
So even if you just restart your app, Users will lose their files. But they provide some alternate options to store these. As you are using python this Addon may help you.
Read More - https://help.heroku.com/K1PPS2WM/why-are-my-file-uploads-missing-deleted
I have a free dyno instance running a simple worker that creates an RSS file and upload it to PythonAnywhere (using it just like a web server for this static rss.xml file).
I am trying to move from PythonAnywhere to use a web heroku-buildpack-static on the same worker dyno but I cannot make it work. Looks like worker and web run in different folders / environments and I cannot find where it is located.
worker: python main.py
web: bin/boot
The main.py script writes the file to the current folder and uploads it with success to PythonAnywhere, but I cannot see where this file is written on Heroku. I tried to create a folder /app/web and modify this on main.py to write to it but also I cannot see the file created / updated, I used the Heroku console to check this. I think a worker uses a different home or instance to run but I am not sure what is this structure located. I also created a .profile with the following command without success:
chmod -R 777 /app/web
The app also contains a static.json file with the following to point the correct folder and avoid cache
{
"root": "/app/web/",
"headers": {
"/": {
"Cache-Control": "no-store, no-cache"
}
}
}
Looks like worker and web run in different folders / environments
Yes, that is exactly what is happening.
on the same worker Dyno
In fact, you are not on the same dyno. Your web process and your worker process execute in isolated environments. Consider this section of the documentation under the heading "Process types vs dynos":
A process type is the prototype from which one or more dynos are instantiated. This is similar to the way a class is the prototype from which one or more objects are instantiated in object-oriented programming.
You cannot write files to your to your web dyno from your worker dyno. They are entirely isolated and do not share a filesystem.
As msche has pointed out, dyno filesystems are ephemeral. Even if you do manage to write this file, e.g. by running a web service instead of a static host that has an API endpoint to accept the file, that file will be lost every time the dyno restarts. This happens frequently (at least once per day).
Even if you are writing the file every two minutes as you say in your comment your site will be broken for one minute every day on average. I suggest storing this data elsewhere, e.g. as a file on Amazon S3 or in a client-server data store.
Note that you can also host a static site directly from Amazon S3, which might be a good fit here.
Heroku uses an ephemeral file system, see this link. The heroku documentation suggests using a third party storage such as AWS S3.
I have a server and a front end, I would like to get python code from the user in the front end and execute it securely on the backend.
I read this article explaining the problematic aspect of each approach - eg pypl sandbox, docker etc.
I can define the input the code needs, and the output, and can put all of them in a directory, what's important to me is that this code should not be able to hurt my filesystem, and create a disk, memory, cpu overflow (I need to be able to set timeouts, and prevent it from accesing files that are not devoted to it)
What is the best practice here? Docker? Kubernetes? Is there a python module for that?
Thanks.
You can have a python docker container with the volume mount. Volume mount will be the directory in local system for code which will be available in your docker container also. By doing this you have isolated user supplied code only to the container when it runs.
Scan your python container with CIS benchmark for better security
TL;DR
I would like to be able to check if a git repo (located on a shared network) was updated without using a git command. I was thinking checking one of the files located in the .git folder to do so, but I can't find the best file to check. Anyone have a suggestion on how to achieve this?
Why:
The reason why I need to do this is because I have many git repos located on a shared drive. From a python application I built, I synchronize the content of some of these git repo on a local drive on a lot of workstation and render nodes.
I don't want to use git because the git server is not powerful enough to support the amount of requests of all the computers in the studio would need to perform constantly.
This is why I ended up with the solution of putting the repos on the network server and syncing the repo content on a local cache on each computer using rsync
That works fine, but the as time goes by, the repos are getting larger and the rsync is taking too much time to perform. So I would like to be have to (ideally) check one file that would tell me if the local copy is out of sync with the network copy and perform the rsync only when they are out of sync.
Thanks
Check the .git/FETCH_HEAD for the time stamp and the content.
Every time you fetch content its updating the content and the modification time of the file.
I am working on a Django based application whose location on my disk is home/user/Documents/project/application. Now this application takes in some values from the user and writes them into a file located in a folder which is under the project directory i.e home/user/Documents/project/folder/file. While running the development server using the command python manage.py runserver everything worked fine, however after deployment the application/views.py which accesses the file via open('folder/path','w') is not able to access it anymore, because by default it looks in var/www folder when deployed via apache2 server using mod_wsgi.
Now, I am not putting the folder into /var/www because it is not a good practise to put any python code there as it might become readable clients which is a major security threat. Please let me know, how can I point the deployed application to read and write to correct file.
The real solution is to install your data files in /srv/data/myapp or some such so that you can give the webserver user correct permissions to only those directories. Whether you choose to put your code in /var/www or not, is a separate question, but I would suggest putting at least your wsgi file there (and, of course, specifying your <DocumentRoot..> correctly.