Batch value insert in Redis list - python

Is there a way to store multiple values in a Redis list at the same time? I can only find a way to insert 1 value at the time in a list.
I've been looking at the following commands documentation: http://redis.io/commands
update: I have created a ticket for this feature.

You can use a pipeline, if using redis-py you could check out the below, I ran this on an aws instance of redis elasticache and found the following:
import redis
rs = redis.StrictRedis(host='host', port=6379, db=0)
q='test_queue'
Ran in 0.17s for 10,000 vals
def multi_push(q,vals):
pipe = rs.pipeline()
for val in vals:
pipe.lpush(q,val)
pipe.execute()
Ran in 13.20s for 10,000 vals
def seq_push(q,vals):
for val in vals:
rs.lpush(q,val)
~ 78x faster.

Yeah, it doesn't look like that's possible. You might be able to use MULTI (transactions) to store multiple values in an atomic sequence.

You can if you run 2.4. While it isn't marked stable yet, it should be soon IIRC. that said I'm running out of trunk and it's been rock solid for me with many gigs of data and a high rate if churn. For more details on variadic commands see 2.4 and other news.

For lists, not that i am aware of, but depending on your data volume, it might be efficient for you to re-cast your data to use redis' multiset command, HMSET, which does indeed give you multiple inserts in a single call:
HMSET V3620 UnixTime 1309312200 UID 64002 username "doug" level "noob"
As you expect, HMSET creates the redis keyed to V3620. The key follows the HMSET command followed by multiple field--value pairs:
HMSET key field 1 value 1 field 2 value 2

Related

redis lrange prefix fetching way

I have list type data in redis, there're so many keys which can't be fetched all one time. I tried to use python redis Lrange function to get in batch style, such as 1000 a time, but it seems not work as it always return empty. Lrange regard * as a character, how should I do it?
conn.Lrange(f'test-{id}-*', 0, 1000)
conn.lrange() and the underlying Redis LRANGE command returns the elements of a given set, not the keys of a database: your code returns an empty array because the given key does not exist.
To retrieve the keys of a database you should use the SCAN command, exposed by conn.scan().

The most efficient way to compare two dictionaries, verifying dict_2 is a complete subset of dict_1, and return all values of dict_2 which are less?

I'm working on a data pipeline that will pull data from online and store it in MongoDB. To manage the process, I've developed two dictionaries; request_totals and mongo_totals. mongo_totals will contain a key for each container in the Mongo database, along with a value for the max(id) that container contains. request_totals has a key for each category data can be pulled from, along with a value for the max(id) of that category. If MongoDB is fully updated, these who dictionaries would be identical.
I've developed code that runs, but I can't shake the feeling that I'm not really being efficient here. I hope that someone can share some tips on how to better write this:
def compare(request_totals, mongo_totals):
outdated = dict()
# Verifies MongoDB contains no unique collections
if request_totals | mongo_totals != request_totals:
raise AttributeError('mongo_totals does not appear to be a subset of request_totals')
sharedKeys = set(request_totals.keys()).intersection(mongo_totals.keys())
unsharedKeys = set(request_totals) - set(mongo_totals)
# Updates outdated dict with outdated key-value pairs representing MongoDB collections
for key in sharedKeys:
if request_totals[key] > mongo_totals[key]:
outdated.update({key : mongo_totals[key]})
elif request_totals[key] < mongo_totals[key]:
raise AttributeError(
f'mongo_total for {key}: {mongo_totals[key]} exceeds request_totals for {key}: {request_totals[key]}')
return outdated|dict.fromkeys(unsharedKeys, 0)
compare(request_totals, mongo_totals)
The returned dictionary has key:value pairs that maybe used in the following way; Query the API using the key, and offset the records by the key's value. This way, it allows me to keep the database updated. Is there a more efficient way to handle this comparison?

Cursor not found while reading all documents from a collection

I have a collection student and I want this collection as list in Python, but unfortunately I got the following error CursorNextError: [HTTP 404][ERR 1600] cursor not found. Is there an option to read a 'huge' collection without an error?
from arango import ArangoClient
# Initialize the ArangoDB client.
client = ArangoClient()
# Connect to database as user.
db = client.db(<db>, username=<username>, password=<password>)
print(db.collections())
students = db.collection('students')
#students.all()
students = db.collection('handlingUnits').all()
list(students)
[OUT] CursorNextError: [HTTP 404][ERR 1600] cursor not found
students = list(db.collection('students'))
[OUT] CursorNextError: [HTTP 404][ERR 1600] cursor not found
as suggested in my comment, if raising the ttl is not an option (what I wouldn't do either) I would get the data in chunks instead of all at once. In most cases you don't need the whole collection anyway, so maybe think of limiting that first. Do you really need all documents and all their fields?
That beeing said I have no experience with arango, but this is what I would do:
entries = db.collection('students').count() # get total amount of documents in collection
limit=100 # blocksize you want to request
yourlist = [] # final output
for x in range(int(entries/limit) + 1):
block = db.collection('students').all(skip=x*limit, limit=100)
yourlist.extend(block) # assuming block is of type list. Not sure what arango returns
something like this. (Based on the documentation here: https://python-driver-for-arangodb.readthedocs.io/_/downloads/en/dev/pdf/)
Limit your request to a reasonable amount and then skip this amount with your next request. You have to check if this "range()" thing works like that you might have to think of a better way of defining the number of iterations you need.
This also assumes arango sorts the all() function per default.
So what is the idea?
determin the number of entries in the collection.
based on that determin how many requests you need (f.e. size=1000 -> 10 blocks each containing 100 entries)
make x requests where you skip the blocks you already have. First iteration entries 1-100; second iteration 101-200, third iteration 201-300 etc.
By default, AQL queries generate the complete result, which is then held in memory, and provided batch by batch. So the cursor is simply fetching the next batch of the already calculated result. In most of the cases this is fine, but if your query produces a huge result set, then this can take a long time and will require a lot of memory.
As an alternative you can create a streaming cursor. See https://www.arangodb.com/docs/stable/http/aql-query-cursor-accessing-cursors.html and check the stream option.
Streaming cursors calculate the next batch on demand and are therefore better suited to iterate a large collection.

How to Get The Counts of Redis Values for A Set in Python

I want to get the count of values given the key schema.
I have a set in my Redis with their keys being: 'sample:key:schema'
I want to get total number of values associated with this key.
Currently, I do the following and it works
import redis
redis_client = redis.StrictRedis(host='localhost', port=6379, db=0)
key_schema = 'sample:key:schema'
count_of_values = len(redis_client.smembers(key_schema))
Is there a better way to get the counts directly without having to fetch all the records and count them?
You don't have to get with smembers and len later. You may use scard for this, this is the link for python documentation.
This is from the official redis documentation
Returns the set cardinality (number of elements) of the set stored at key.

using redis-py to bulk populate a redis list

In a Django project, I'm using redis as a fast backend.
I can LPUSH multiple values in a redis list like so:
lpush(list_name,"1","2","3")
However, I can't do it if I try
values_list = ["1","2","3"]
lpush(list_name,values_list)
For the record, this doesn't return an error. Instead it creates a list list_name with a single value. E.g. ['["1","2","3"]']. This is not usable if later one does AnObject.objects.filter(id__in=values_list). Nor does it work if one does AnObject.objects.filter(id__in=values_list[0]) (error: invalid literal for int() with base 10: '[').
What's the best way to LPUSH numerous values in bulk into a redis list (python example is preferred).
lpush (list_name, *values_list)
This will unpack contents in value_list as parameters.
If you have enormous numbers of values to insert into db, you can pipeline them. Or you can use the command line tool redis-cli --pipe to populate a database

Categories

Resources