I'm trying to maintain a very volatile database in memory (disk would be too slow), always updating as I'm listening to hundreds of JSON websocket streams.
Currently I'm using redis, but it doesn't have dictionary support, just arrays. You can dump a JSON string to a redis db, but redis won't know it's JSON. So you can't access or change a certain key's value; you have to load the whole JSON string, edit it, then dump it back into redis. This of course is very slow, and I'd like to stop doing this.
There's a library called reJSON which allows redis to recognize JSON and edit JSON dictionaries, but I'd have to rewrite a lot of my code to use it. However, if I used reJSON, I could directly access and edit particular keys' values instead of loading and dumping the whole JSON string.
Currently what I'm doing is concatenating key names since they contain dictionaries. The problem here is when there's a dictionary within a dictionary; I'd have to concatenate a 2nd time, and this would produce a -ton- of keys. I don't think this is the optimal approach.
I was also recommended to use Redis commands and hashes instead of just storing JSON strings, but hashes don't seem to support dictionaries, just arrays.
As for the data itself, I'm listening to websocket streams where each update gives me data like this:
https://api.binance.com/api/v1/klines?symbol=XRPBTC&interval=5m&limit=1
This is a "candle" for the trading market "XRPBTC", and the candle completes every "5 minutes". I want to keep create a new key every time the first element of that API data changes (meaning 5 minutes have passed and there's now a new candle. This value is a millisecond epoch). If it didn't change, the current candle isn't new, but other elements of the array changed, and these changes need to be made in the redis DB.
Let's say I got a websocket update where the 4th element of the array of the current candle changed. This is most likely to happen, as it is the "close", or current price of the market.
What I have right now is a key called candles_XRPBTC5MINUTE. This key's value is a dictionary of dictionaries.
https://pastebin.com/rAYs0TaN
The "close" value in redis is here:
candles_XRPBTC5MINUTE["1554792300000"]["close"]
I want to edit the value in redis to be the one I got in the new websocket update. candles_XRPBTC5MINUTE contains nested dictionaries, and is 0.1 megabytes. Currently I load candles_XRPBTC5MINUTE from redis as JSON, update candles_XRPBTC5MINUTE["1554792300000"]["close"] to whatever is in the websocket update, then dump it back to redis as JSON. As you can tell, this is a lot of handling of old, unnecessary data when I'm focusing on the newest key 1554792300000.
My options seem to be:
A. Use reJSON
B. Keep using vanilla Redis, but concatenate key names again, creating tens of thousands of keys (candles_XRPBTC5MINUTE_1554792300000, candles_XRPBTC5MINUTE_1554792600000, candles_XRPBTC5MINUTE_1554792900000, etc for 1MINUTE, 3MINUTE, 15MINUTE, for hundreds of other markets)
C. Try to store the data I retrieve from the websockets as Redis hashes instead of JSON strings
What is the best option here, and why? Are there any other options?
Related
I'm relatively new in MongoDB- I've done stuff in it before, but my current project involves using collections to store values per "key". In this case- a key is referring to a string of characters that will be used to access my software. The authentication and key generation will be done on my website using Flask as the backend, which means I can use Python to handle all the key generation and authentication stuff. I have the code complete for the most part, it's able to generate and authenticate keys amazingly, and I'm really happy with how it works. Now the problem I face is getting the collection or key to automatically delete after 3 days.
The reason I want them to delete after 3 days is because the keys aren't lifetime keys. The software is a free software, but in order to use it you must have a key. That key should expire after a certain amount of time (in this case, 3 days) and the user must go back and get another one.
Please note that I can't use invidual documents, as one I've already set it up to use collections and two it needs to store multiple documents as compared to one document.
I've already tried TTL on the collection but it doesn't seem to be working.
What's the best way to do this? Keep in mind the collection name is the key itself so it can't have a date of deletion in it (a date that another code scans and when that date is met the collection is deleted).
I am currently developing a Python Discord bot that uses a Mongo database to store user data.
As this data is continually changed, the database would be subjected to a massive number of queries to both extract and update the data; so I'm trying to find ways to minimize client-server communication and reduce bot response times.
In this sense, is it a good idea to create a copy of a Mongo collection as a dictionary list as soon as the script is run, and manipulate the data offline instead of continually querying the database?
In particular, every time a data would be searched with the collection.find() method, it is instead extracted from the list. On the other hand, every time a data needs to be updated with collection.update(), both the list and the database are updated.
I'll give an example to better explain what I'm trying to do. Let's say that my collection contains documents with the following structure:
{"user_id": id_of_the_user, "experience": current_amount_of_experience}
and the experience value must be continually increased.
Here's how I'm implementing it at the moment:
online_collection = db["collection_name"] # mongodb cursor
offline_collection = list(online_collection.find()) # a copy of the collection
def updateExperience(user_id):
online_collection.update_one({"user_id":user_id}, {"$inc":{"experience":1}})
mydocument = next((document for document in offline_documents if document["user_id"] == user_id))
mydocument["experience"] += 1
def findExperience(user_id):
mydocument = next((document for document in offline_documents if document["user_id"] == user_id))
return mydocument["experience"]
As you can see, the database is involved only for the update function.
Is this a valid approach?
For very large collections (millions of documents) does the next () function have the same execution times or would there still be some slowdowns?
Also, while not explicitly asked in the question, I'd me more than happy to get any advice on how to improve the performance of a Discord bot, as long as it doesn't include using a VPS or sharding, since I'm already using these options.
I don't really see why not - as long as you're aware of the following :
You will need the system resources to load an entire database into memory
It is your responsibility to sync the actual db and your local store
You do need to be the only person/system updating the database
Eventually this pattern will fail i.e. db gets too large, or more than one process needs to update, so it isn't future-proof.
In essence you're talking about a caching solution - so no need to reinvent the wheel - many such products/solutions you could use.
It's probably not the traditional way of doing things, but if it works then why not
I am considering to serialize a big set of database records for cache in Redis, using python and Cassandra. I have either to serialize each record and persist a string in redis or to create a dictionary for each record and persist in redis as a list of dictionaries.
Which way is faster? pickle each record? or create a dictionary for each record?
And second : Is there any method to fetch from database as list of dic's? (instead of a list of model obj's)
Instead of serializing your dictionaries into strings and storing them in a Redis LIST (which is what it sounds like you are proposing), you can store each dict as a Redis HASH. This should work well if your dicts are relatively simple key/value pairs. After creating each HASH you could add the key for the HASH to a LIST, which would provide you with an index of keys for the hashes. The benefits of this approach could be avoiding or lessening the amount of serialization needed, and may make it easier to use the data set in other applications and from other languages.
There are of course many other approaches you can take and that will depend on lots of factors related to what kind of data you are dealing with and how you plan to use it.
If you do go with serialization you might want to at least consider a more language agnostic serialization format, like JSON, BSON, YAML, or one of the many others.
I'm evaluating using redis to store some session values. When constructing the redis client (we will be using this python one) I get to pass in the db to use. Is it appropriate to use the DB as a sort of prefix for my keys? E.g. store all session keys in db 0 and some messages in db 1 and so on? Or should I keep all my applications keys in the same db?
Quoting my answer from this question:
It depends on your use case, but my rule of thumb is: If you have a
very large quantity of related data keys that are unrelated to all the
rest of your data in Redis, put them in a new database. Reasons being:
You may need to (non-ideally) use the keys command to get all of that
data at some point, and having the data segregated makes that much
cheaper.
You may want to switch to a second redis server later, and having
related data pre-segregated makes this much easier.
You can keep your databases named somewhere, so it's easier for you,
or a new employee to figure out where to look for particular data.
Conversely, if your data is related to other data, they should always
live in the same database, so you can easily write pipelines and lua
scripts that can access both.
For example, I have object user stored in database (Redis)
It has several fields:
String nick
String password
String email
List posts
List comments
Set followers
and so on...
In Python programm I have class (User) with same fields for this object. Instances of this class maps to object in database. The question is how to get data from DB for best performance:
Load values for each field on instance creating and initialize fields with it.
Load field value each time on field value requesting.
As second one but after value load replace field property by loaded value.
p.s. redis runs in localhost
The method entirely depends on the requirements.
If there is only one client reading and modifying the properties, this is a rather simple problem. When modifying data, just change the instance attributes in your current Python program and -- at the same time -- keep the DB in sync while keeping your program responsive. To that end, you should outsource blocking calls to another thread or make use of greenlets. If there is only one client, there definitely is no need to fetch a property from the DB on each value lookup.
If there are multiple clients reading the data and only one client modifying the data, you have to think about which level of synchronization you need. If you need 100 % synchronization, you will have to fetch data from the DB on each value lookup.
If there are multiple clients changing the data in the database you better look into a rock-solid industry standard solution rather than writing your own DB cache/mapper.
Your distinction between (2) and (3) does not really make sense. If you fetch data on every lookup, there is no need to 'store' data. You see, if there can be multiple clients involved these things quickly become quite complex and it's really hard to get it right.