using redis-py to bulk populate a redis list - python

In a Django project, I'm using redis as a fast backend.
I can LPUSH multiple values in a redis list like so:
lpush(list_name,"1","2","3")
However, I can't do it if I try
values_list = ["1","2","3"]
lpush(list_name,values_list)
For the record, this doesn't return an error. Instead it creates a list list_name with a single value. E.g. ['["1","2","3"]']. This is not usable if later one does AnObject.objects.filter(id__in=values_list). Nor does it work if one does AnObject.objects.filter(id__in=values_list[0]) (error: invalid literal for int() with base 10: '[').
What's the best way to LPUSH numerous values in bulk into a redis list (python example is preferred).

lpush (list_name, *values_list)
This will unpack contents in value_list as parameters.
If you have enormous numbers of values to insert into db, you can pipeline them. Or you can use the command line tool redis-cli --pipe to populate a database

Related

redis lrange prefix fetching way

I have list type data in redis, there're so many keys which can't be fetched all one time. I tried to use python redis Lrange function to get in batch style, such as 1000 a time, but it seems not work as it always return empty. Lrange regard * as a character, how should I do it?
conn.Lrange(f'test-{id}-*', 0, 1000)
conn.lrange() and the underlying Redis LRANGE command returns the elements of a given set, not the keys of a database: your code returns an empty array because the given key does not exist.
To retrieve the keys of a database you should use the SCAN command, exposed by conn.scan().

Substitute variable value to command in python

I am trying to write a simple python script to collect certain command outputs from mongodb, as part of validating the database before and after backup.
Below is the script.
import pymongo
import json
client = pymongo.MongoClient('localhost',username='admin',password='admin')
db = client['mydb']
collections = list(db.list_collection_names())
for i in collections:
print(db.$i.estimated_document_count())
All the collections are stored in the list called collections and I want to run it in for loop so that I can get document count in each collection. I know the last print statement here is wrong. How to get it right? I want $i to substitute the collection name during each iteration so that I can get the document count of that collection.
When I run "print(db.audit.estimated_document_count())" it gives me the document count in audit collection. But how to iterate through the list in for loop and substitute the value of i in the command?
Also, for validating backup/restore is there any other commands that I should run against database to verify backup/restore?
You cannot use a computed value as an identifier in at "dot" expression, at least not without resorting to dirty tricks.
What you can do, is to find some other mechanism for getting a collection given its name. According to the tutorial and API reference, you can use db['foo'] instead of db.foo.

How can I put the the list type of item into database?

I use Scrapy to write a spider to get something from a website.And I want to put the item into database.In my code,there are five items ,two of items are unicode type,so I can put it into database directily,but two of items are list type,How can I put it into database?Here is my code about the items whose type are list:
descr = sel.xpath(
'//*[#id="root"]/div/main/div/div[2]/div[1]/div[2]/div/div/div/div[2]/div/div/span/p[1]/text()').extract()
print 'type is:', type(descr)
answer_time = sel.xpath(
'//*[#id="root"]/div/main/div/div[2]/div[1]/div[2]/div/div/div/div[2]/div/div/div/a/span/#data-tooltip').extract()
print 'type is:', type(answer_time)
It should depend strongly on how the data will be used once in the db and what kind of db you're using.
If you're using something like MongoDB, it fully supports just adding the list as part of the record. Some relational dbs such as postgresql have support for JSON column types.
Beyond that there are two main options. 1) Cast the list to a string using something like JSON and save it in a text column along with your other unicode columns. 2) Take full advantage of a relational db and use a second table to store a one:many relationship.
Casting it a string is much easier/faster to dev. Great for rapid prototyping. However, using multiple tables has huge efficiency perks if the data is going to be read for analysis on any data set that is not small.

Create SQL database from dict with different features in python

I have the following dict:
base = {}
base['id1'] = {'apple':2, 'banana':4,'coconut':1}
base['id2'] = {'apple':4, 'pear':8}
base['id3'] = {'banana':1, 'tomato':2}
....
base['idN'] = {'pineapple':1}
I want to create a SQL database to store it. I normally use sqlite but here the number of variables (features in the dict) is not the same for all ids and I do not know all of them thus I cannot use the standard procedure.
Does someone know an easy way to do it ?
ids will get duplicated if you use the sql i would suggest use postgres as it has a jsonfield ypu can put your data there corresponding to each key. Assuming you are not constrained to use SQL.

Batch value insert in Redis list

Is there a way to store multiple values in a Redis list at the same time? I can only find a way to insert 1 value at the time in a list.
I've been looking at the following commands documentation: http://redis.io/commands
update: I have created a ticket for this feature.
You can use a pipeline, if using redis-py you could check out the below, I ran this on an aws instance of redis elasticache and found the following:
import redis
rs = redis.StrictRedis(host='host', port=6379, db=0)
q='test_queue'
Ran in 0.17s for 10,000 vals
def multi_push(q,vals):
pipe = rs.pipeline()
for val in vals:
pipe.lpush(q,val)
pipe.execute()
Ran in 13.20s for 10,000 vals
def seq_push(q,vals):
for val in vals:
rs.lpush(q,val)
~ 78x faster.
Yeah, it doesn't look like that's possible. You might be able to use MULTI (transactions) to store multiple values in an atomic sequence.
You can if you run 2.4. While it isn't marked stable yet, it should be soon IIRC. that said I'm running out of trunk and it's been rock solid for me with many gigs of data and a high rate if churn. For more details on variadic commands see 2.4 and other news.
For lists, not that i am aware of, but depending on your data volume, it might be efficient for you to re-cast your data to use redis' multiset command, HMSET, which does indeed give you multiple inserts in a single call:
HMSET V3620 UnixTime 1309312200 UID 64002 username "doug" level "noob"
As you expect, HMSET creates the redis keyed to V3620. The key follows the HMSET command followed by multiple field--value pairs:
HMSET key field 1 value 1 field 2 value 2

Categories

Resources