Force GAE to update all files on deployment - python

Is there a way to force GAE to upload and update all files, even if it thinks it doesn't require any updations?
Clarification - If I make quick back-to-back updates, I find that certain files, that were definitely modified, refuse to be updated online. Apart from assigning version numbers to force the update, which is very painful, is there another way?
EDIT - I'm referring to javascript files

Those files do get updated, you don't see the update because of caching that happens in some where a long the chain. In order to get the latest files load the file with a slug (e.g. http://myapp.com/scripts/script.js?slug) and update the slug each time you deploy your application.

Related

dump CSV file from Django query to Github

We want to automate a process through django admin where, whenever a user makes a change to a record (or adds/deletes a record), a CSV file is created and then dumped into a Github repository with a commit message specified by the person who made the change.
Creating the csv file from a queryset is easy enough... But how would we go about then getting that csv file to a folder that is git initialized so that we can commit it to a repository?
Any ideas would be great. Essentially we're looking for a way of tracking specific changes to the database. With CSV files in github, we can really easily follow the changes, and we want to leverage that.
cheers
If you can create your csv files the next step would be to talk to github via api or to have a local representation of a git repo which needs to be synct after file creation.
But if I may ask why do you want to do this with csv files in a github repo? My first response to a requirement like that would be to logg changes with the python logging infrastructure or to create an additional model to track the specific changes in the db.
Eventually this could also meet your requirements: https://django-simple-history.readthedocs.io/en/latest/
This doesn't exactly answer the question, but have you thought of using something like django-simple-history?
It's a really easy to use Django package that tracks all Django model state on every create/update/delete. Should be much easier to get going than fiddling around pushing CSVs to github.

Heroku: how to store a variable that mutates?

I have deployed a small application to Heroku. The slug contains, among other things, a list in a textfile. I've set a scheduled job to, once an hour, run a python script that select an item from that list, and does something with that item.
The trouble is that I don't want to select the same item twice in sequence. So I need to be able to store the last-selected item somewhere. It turns out that Heroku apparently has a read-only filesystem, so I can't save this information to a temporary or permanent file.
How can I solve this problem? Can I use os.environ in python to set a configuration variable that stores the last-selected element from the list?
Have to agree with #KlausD, doing what you are suggesting is actually a bit more complex trying to work with a filesystem that won't change and tracking state information (last selected) that you may need to persist. Even if you were able to store the last item in some environmental variable, a restart of the server would lose that information.
Adding a db, and connecting it to python would literally take minutes on Heroku. There are plenty of well documented libraries and ORMs available to create a simple model for you to store your list and your cursor. I normally recommend against storing pointers to information in preference to making the correct item obvious due to the architecture, but that may not be possible in your case.

microservices and multiple databases

i have written MicroServices like for auth, location, etc.
All of microservices have different database, with for eg location is there in all my databases for these services.When in any of my project i need a location of user, it first looks in cache, if not found it hits the database. So far so good.Now when location is changed in any of my different databases, i need to update it in other databases as well as update my cache.
currently i made a model (called subscription) with url as its field, whenever a location is changed in any database, an object is created of this subscription. A periodic task is running which checks for subscription model, when it finds such objects it hits api of other services and updates location and updates the cache.
I am wondering if there is any better way to do this?
I am wondering if there is any better way to do this?
"better" is entirely subjective. if it meets your needs, it's fine.
something to consider, though: don't store the same information in more than one place.
if you need an address, look it up from the service that provides address, every time.
this may be a performance hit, but it eliminates the problem of replicating the data everywhere.
another option would be a more proactive approach, as suggested in comments.
instead of creating a task list for changes, and doing that periodically, send a message across rabbitmq immediately when the change happens. let every service that needs to know, get a copy of the message and update it's own cache of info.
just remember, though. every time you have more than one copy of the information, you reduce the "correctness" of the system, as a whole. it will always be possible for the information found in one of your apps to be out of date, because it did not get an update from the official source.

What do I need to consider when scaling an application that stores files in the filesystem?

I am interesting in making an app where users can upload large files (~2MB) that are converted into html documents. This application will not have a database. Instead, these html files are stored in a particular writable directory outside of the document source tree. Thus this directory will grow larger and larger as more files are added to it. Users should be able to view these html files by visiting the appropriate url. All security concerns aside, what do I need to be worried about if this directory continues to grow? Will accessing the files inside take longer when there are more of them? Will it potentially crash because of this? Should I create a new directory every 100 files or so to prevent this?
It it is important, I want to make this app using pyramid and python
You might want to partition the directories by user, app or similar so that it's easy to manage anyway - like if a user stops using the service you could just delete their directory. Also I presume you'll be zipping them up. If you keep it well decoupled then you'll be able to change your mind later.
I'd be interested to see how using something like SQLite would work for you, as you could have a sqlite db per partitioned directory.
I presume HTML files are larger than the file they uploaded, so why store the big HTML file.
Things like Mongodb etc are out of the question? as is your app scales with multiple servers you've the issue of accessing other files on a different server, unless you pick the right server in the first place using some technique. Then it's possible you've got servers sitting idle as no one wants there documents.
Why the limitation on just storing files in a directory, is it a POC?
EDIT
I find value in reading things like http://blog.fogcreek.com/the-trello-tech-stack/ and I'd advise you find a site already doing what you do and read about their tech. stack.
As someone already commented why not use Amazon S3 or similar.
Ask yourself realistically how many users do you imagine and really do you want to spend a lot of energy worrying about being the next facebook and trying to do the ultimate tech stack for the backend when you could get your stuff out there being used.
Years ago I worked on a system that stored insurance certificates on the filesystem, we use to run out of inodes.!
Dare I say it's a case of suck it and see what works for you and your app.
EDIT
HAProxy I believe are meant to handle all that load balancing concerns.
As I imagine as a user I wants to http://docs.yourdomain.com/myname/document.doc
although I presume there are security concerns of it being so obvious a name.
This greatly depends on your filesystem. You might want to look up which problems the git folks encountered (also using a sole filesystem based database).
In general, it will be wise do split that directory up, for example by taking the first two or three letters of the file name (or a hash of those) and group the files into subdirectories based on that key. You'd have a structure like:
uploaddir/
00/
files whose name sha1 starts with 00
01/
files whose name sha1 starts with 01
and so on. This takes some load off the filesystem by partitioning the possibly large directories. If you want to be sure that no user can perform an Denial-of-Service-Attack by specifically uploading files whose names hash to the same initial characters, you can also seed the hash differently or salt it or anything like that.
Specifically, the effects of large directories are pretty file-system specific. Some might become slow, some may cope really well, others may have per-directory limits for files.

Set Time Constraint for generated URL of uploaded files

I am trying to build an application (using Django) which uploads files and generates corresponding URL. Is there some way we can set time constraint for the url, i.e. the uploaded file in url should exist only for little time after the specified time that url should give an error.
I would be using the default django server, in such a case what would be the possible ways to tackle the time constarint problem. I would be glad if you answer for both the cases as for global and individual files, or even a single solution is good :)
~Newbie up with a Herculean Task! Thank You :)
You can have a datetimefield as an additional column and expire it as and when required.
If your uploaded files are being served by the Django app itself, then it's quite easy (and can be solved in different ways depending on weither the "time constraint" is global to all files/urls or not).
Else - that is if the files are served by Apache or anything similar - you'll have to resort to some async mechanism to collect and delete "obsolete" files, either the Q&D way (using a cron job) or with some help from Celery.

Categories

Resources