Heroku: how to store a variable that mutates? - python

I have deployed a small application to Heroku. The slug contains, among other things, a list in a textfile. I've set a scheduled job to, once an hour, run a python script that select an item from that list, and does something with that item.
The trouble is that I don't want to select the same item twice in sequence. So I need to be able to store the last-selected item somewhere. It turns out that Heroku apparently has a read-only filesystem, so I can't save this information to a temporary or permanent file.
How can I solve this problem? Can I use os.environ in python to set a configuration variable that stores the last-selected element from the list?

Have to agree with #KlausD, doing what you are suggesting is actually a bit more complex trying to work with a filesystem that won't change and tracking state information (last selected) that you may need to persist. Even if you were able to store the last item in some environmental variable, a restart of the server would lose that information.
Adding a db, and connecting it to python would literally take minutes on Heroku. There are plenty of well documented libraries and ORMs available to create a simple model for you to store your list and your cursor. I normally recommend against storing pointers to information in preference to making the correct item obvious due to the architecture, but that may not be possible in your case.

Related

Is there a good way to store a boolean array into a file or database in python?

I am building an image mosaic that detect if the user's selected area are taken or not.
My idea is to store the available_spots in a list, and I would just have to look through the list to check whether a spot is available or not.
The problem is that when I reload the website, avaliable_spots also gets reset to blank list,
so I want to store this array somewhere, that is fast to read and write to.
I am currently thinking about a text file to store this, but that might take forever to read since array length is over 1.4 million. Is there any other solutions that might be better?
You can't store the data in a file for a few reasons: (1) GAE standard won't let you, (2) the data is lost when your server is restarted, and (3) different instances will have different data.
Of course you can and should store the data in a database of your choice. Firestore is likely a better and cheaper option than SQL. It should be fast enough for you and you can implement caching if needed.
You might be able to store the data in a single Firestore entity and consider using compression if you are getting close to the max entity size.
If you want to store into a database you can use the "sqlite3" module.
Is a simple database that gets stored in a file so you dont have to install a database program. Is great for small projects.
If you want to do more complex stuff with databases you can use "sqlalchemy".

microservices and multiple databases

i have written MicroServices like for auth, location, etc.
All of microservices have different database, with for eg location is there in all my databases for these services.When in any of my project i need a location of user, it first looks in cache, if not found it hits the database. So far so good.Now when location is changed in any of my different databases, i need to update it in other databases as well as update my cache.
currently i made a model (called subscription) with url as its field, whenever a location is changed in any database, an object is created of this subscription. A periodic task is running which checks for subscription model, when it finds such objects it hits api of other services and updates location and updates the cache.
I am wondering if there is any better way to do this?
I am wondering if there is any better way to do this?
"better" is entirely subjective. if it meets your needs, it's fine.
something to consider, though: don't store the same information in more than one place.
if you need an address, look it up from the service that provides address, every time.
this may be a performance hit, but it eliminates the problem of replicating the data everywhere.
another option would be a more proactive approach, as suggested in comments.
instead of creating a task list for changes, and doing that periodically, send a message across rabbitmq immediately when the change happens. let every service that needs to know, get a copy of the message and update it's own cache of info.
just remember, though. every time you have more than one copy of the information, you reduce the "correctness" of the system, as a whole. it will always be possible for the information found in one of your apps to be out of date, because it did not get an update from the official source.

Is there a way to check if a Django management command is running?

The views rely on Redis to be populated. Redis is populated from a management command ran every 10 minutes. This management command deletes all existing keys and re-adds them with new data. How could I determine if the management command is running from a django view?
Right now I'm having the management command write to an external file and have a view read that file on each request. If the database is refreshing via the management command I hold up the view until it finishes (polling style).
Django does not provide a pre-packaged way to check whether an administration command is running. This being said, you should never write code that explicitly blocks a view while waiting for some result. You can easily use up all threads and processes that the server that runs your application has made available to your application. Your users will have a poor experience on your site, even those that don't do anything that has to do with the problem you're trying to solve here.
What I'm getting from your description is that you want users to get reasonably fresh results. For something like this I would use a solution based on versioning the data. It would go like this:
Declare a Redis-backed cache in your settings.py file that will contain the data populated by the command and read by the view. Make sure the TIMEOUT of the cache is set to NONE.
A current version number is recorded with the key CURRENT_VERSION. This key is itself unversioned.
When the command refreshes the data in the cache, it stores it in keys with version set to CURRENT_VERSION + 1. You'll have something like:
current_version = cache.get(CURRENT_VERSION)
# Record the new data.
for ...:
cache.set(some_key, some_value, version=current_version + 1)
Django's cache system does not readily allow getting a set of keys that correspond to a specific criterion. However, your view will want to obtain all keys that belong to a specific version. This information can be recorded as:
cache.set(ALL_RECORDS,
[... list of keys set in the loop above ...],
version=current_version + 1)
Where ALL_RECORDS is a key value that is guaranteed not to clash with CURRENT_VERSION or any of the keys set for the individual records.
Once the command is done, it atomically increases the value of CURRENT_VERSION:
cache.incr(CURRENT_VERSION)
The documentation on the Redis backend states that if you perform an increment on appropriate values (that's left vague but integers would seem appropriate) then Redis will perform the increment atomically.
The command should also clean up old versions form the cache. One method to ensure that old data does not stay in the cache is to set expiration times on the keys when you set their values. Your command refreshing the cache runs every 10 minutes. So you set keys to expire after 15 minutes. But suppose that a problem prevents multiple runs of the command to run. What then? Your data will be expired and removed from the cache, and the view will run with an empty data set. If this is okay for your situation, then I guess you could set the timeout parameter every time you do cache.set, except for CURRENT_VERSION which should never expire.
If you are not okay with your view running with an empty data set (which seems more probable to me), then you have to write code in your command to seek old versions and remove them explicitly.
Your view accesses the cache by:
Reading the value of CURRENT_VERSION:
current_version = cache.get(CURRENT_VERSION)
Reading the list of records in the version it got:
keys = cache.get(ALL_RECORDS, version=current_version)
Processing the records:
for key in keys:
value = cache.get(key, version=current_version)
The view should detect the case where the cache has not been initialized and fail gracefully. When deploying the application, care should be taken that the command has run at least once before the site can be accessed.
If the view starts working while the command is updating the cache, it does not matter. From the point of view of the command, the cache is just accessing the previous version. From the point of view of the view, the command is busy creating the next version but this is invisible to the view. The view does not have to block.

Reset Index in neo4j using Python

Is there a possibility to reset the indices once I deleted the nodes just as if deleted the whole folder manually?
I am deleting the whole database with node.delete() and relation.delete() and just want the indices to start at 1 again and not where I had actually stopped...
I assume you are referring to the node and relationship IDs rather than the indexes?
Quick answer: You cannot explicitly force the counter to reset.
Slightly longer answer: Generally speaking, these IDs should not carry any relevance within your application. There have been a number of discussions about this within the Neo4j mailing list and Stack Overflow as the ID is an internal artifact and should not be used like a primary key. It's purpose is more akin to an in-memory address and if you require unique identifiers, you are better off considering something like a UUID.
You can stop your database, delete all the files in the database folder, and start it again.
This way, the ID generation will start back from 1.
This procedure completely wipes your data, so handle with care.
Now you certainly can do this using Python.
see https://stackoverflow.com/a/23310320

App engine app design questions

I want to load info from another site (this part is done), but i am doing this every time the page is loaded and that wont do. So i was thinking of having a variable in a table of settings like 'last checked bbc site' and when the page loads it would check if its been long enough since last check to check again. Is there anything silly about doing it that way?
Also do i absolutely have to use tables to store 1 off variables like this setting?
I think there are 2 options that would work for you, besides creating a entity in the datastore to keep track of "last visited time".
One way is to just check the external page periodically, using the cron api as described by jldupont.
The second way is to store the last visited time in memcache. Although memcache is not permanent, it doesn't have to be if you are only storing last refresh times. If your entry in memcache were to disappear for some reason, the worst that would happen would be that you would fetch the page again, and update memcache with the current date/time.
The first way would be best if you want to check the external page at regular intervals. The second way might be better if you want to check the external page only when a user clicks on your page, and you haven't fetched that page yourself in the recent past. With this method, you aren't wasting resources fetching the external page unless someone is actually looking for data related to it.
You could also use Scheduled Tasks.
Also, you don't absolutely need to use the Datastore for configuration parameters: you could have this in a script / config file.
If you want some handler on your GAE app (including one for a scheduled task, reception of messages, web page visits, etc) to store some new information in such a way that some handler in the future can recover that information, then GAE's storage is the only good general way (memcache could expire from under you, for example). Not sure what you mean by "tables" (?!), but guessing that you actually mean GAE's storage the answer is "yes". (Under very specific circumstances you might want to put that data to some different place on the network, such as your visitor's browser e.g. via cookies, or an Amazon storage instance, etc, but it does not appear to me that those specific circumstances are appliable to your use case).

Categories

Resources