Redis and redis-py: Storing abstract objects - python

In Python I have objects that contain other objects. What is the best way to represent this using Redis?
This answer adresses this. The solution is basically that you give every object an id and if a objectA contains objectB what you store in objectA is the id of objectB. If there's nothing better, I guess this seems reasonable.
Now my question is, how do I generate these ids? Lets say that my objects are users that contain other objects called items. Each unique item I give a unique id. But when a new item is created, how to I make sure that the id I give the new item doesn't exist already, without having to check the all the existing ids? Suppoose for example that I'm storying all the existing items in the redis namespace as item:int, item:5313, item:1234 etc. I want to create a new item, how do I check the existing ids in a way that's efficient?
Thanks.

You can use autoincrement id counter, like this:
redis 127.0.0.1:6379> incr next_id:user
(integer) 1
redis 127.0.0.1:6379> incr next_id:user
(integer) 2

Related

Doing "group by" in django but still retaining complete object

I want to do a GROUP BY in Django. I saw answers on Stack Overflow that recommend:
Member.objects.values('designation').annotate(dcount=Count('designation'))
This works, but the problem is you're getting a ValuesQuerySet instead of a QuerySet, so the queryset isn't giving me full objects but only specific fields. I want to get complete objects.
Of course, since we're grouping we need to choose which object to take out of each group; I want a way to specify the object (e.g. take the one with the biggest value in a certain field, etc.)
Does anyone know how I can do that?
If you're willing to make two queries, you could do the following:
dcounts = Member.objects.values('id', 'designation').annotate(dcount=Count('designation')).order_by('-dcount')
member = Member.objects.get(id=dcounts.first()['id'])
If you wanted the top five objects by dcount, you could do the following:
ids = [dcount['id'] for dcount in dcounts[:5]]
members = Member.objects.filter(id__in=ids)
It sounds like you don't necessarily need to GROUP BY, but just want to limit your selection to one item per field (eg, the MAX value of a certain field).
Can you try getting distinct objects by field, such as
In Postgres
Member.objects.order_by('designation').distinct('designation')
In any other database
Member.objects.distinct('designation')
https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct

python adding unique items to a huge table

I have a very large list of items (10M+) that must be put in a table with three columns (Item_ID,Item_name,Item_count)
The items in the table must be unique.
We are adding the items one by one.
When each new item is added, we need to check:
if it is on the table, update its count +1, and retrieve its ID
if not on the table, insert it in the table, assign it an ID and set its count to 1
I have tried with different database implementations (MySQL and sqlite, python shelve, and my own flat file implementation), but the problem is always the same: the more rows there are on the table, the more lookup operations will be needed (for a table 10,000 rows, will need to do around 10,000*10,000 at least lookups for the following 10,000 items.
Indexing the database may sound a good idea to optimize, but my understanding is that the indexing is done after the bulk of the data is inserted, not updated with each insertion.
So, how can we add such large number of items into a table the way described?
you can use set() to check if that item is already on the list
im assuming that you have a list of list(w=[[id,name,count],[id,name,count],..])
r=[e[1] for e in list] <--- this will create a new list that only contains the names
if(len(set(r+item[1]))== len(set(r))){ <-if this is true then the item is on list
list[list.index(item)][countIndex]+= 1 <-- count +1
list[list.index(item)][idindex] <-- to retrieve id
}else{
list=list+[id,item-name,count] <-- this will add the item
}
if you have the list on your database its the same, just use queries the get and set the info.
to set the id you can search the last item id and set +1 like this
list=list+[list[len(list)][id]+1,item-name,count]

Finding the distribution of a field in a list of objects?

I have a list of objects. Each object has a field called grade whose value is between 0 and 5. Now I want to see the distribution of this field across my list of objects. Is there any way to find it?
I know I can iterate over the whole objects and find it out but I don't want to do that.
As near as I can tell, using a table Table with a grade column you need something like this:
counts = Table.objects.annotate(count=Count("grade", distinct=True)
This adds a count attribute to each member of the counts query set, which you can access just like a regular database column.

Hidden id on tkInter Listbox

I'm wondering if it's possible to somehow store a hidden id along with each entry on a Listbox. The reason for this is that I've got a table which contains a unique id which is from a database (not visible to the user but used to uniquely identify each record) I'm caching the table in memory and using a dictionary keyed on the id
I'd like to create a Listbox which allows me to select one of the records - the displayed text would not be the unique id but a descriptive field (such as 'Name') which is probably unique but this is not enforced and there is no index on it. So for example, If I have:
Id Name
-- ----
2 Rod
5 Jane
15 Freddy
Then selecting Jane, I would somehow be able to easily access the id 5
My problem is that I can't find a way to associate the unique id (5) with the selection (Jane) so that I can easily identify the cached record. I know that I can use control variables but this just gives me a list of all the strings in the list - not what I want. Also, the index (for example on insert) does not seem to be reliable for this purpose.
The only way that I've managed to do this is to have another dictionary mapping the name to the id. For a number of reasons, this is sub-optimal.
Am I missing something here? Is there an easier way of doing this?
Keep your ids in a list, then use the .curselection() index to map these back to the row ids, as long as you keep the ordering the same.
In your example, Jane is the second choice in your list, so if selected .curselection() returns 1. If you have a rowids list in the same order, rowids[1] will be 5:
>>> rowids = [2, 5, 15]
>>> rowids[listbox.curselection()]
5
Slightly more efficient than mapping names to rowids in a dictionary.
If you use the ttk.Treeview widget instead of a Listbox, you can store the id in an invisible column.

Most efficient way to update attribute of one instance

I'm creating an arbitrary number of instances (using for loops and ranges). At some event in the future, I need to change an attribute for only one of the instances. What's the best way to do this?
Right now, I'm doing the following:
1) Manage the instances in a list.
2) Iterate through the list to find a key value.
3) Once I find the right object within the list (i.e. key value = value I'm looking for), change whatever attribute I need to change.
for Instance within ListofInstances:
if Instance.KeyValue == SearchValue:
Instance.AttributeToChange = 10
This feels really inefficient: I'm basically iterating over the entire list of instances, even through I only need to change an attribute in one of them.
Should I be storing the Instance references in a structure more suitable for random access (e.g. dictionary with KeyValue as the dictionary key?) Is a dictionary any more efficient in this case? Should I be using something else?
Thanks,
Mike
Should I be storing the Instance references in a structure more suitable for random access (e.g. dictionary with KeyValue as the dictionary key?)
Yes, if you are mapping from a key to a value (which you are in this case), such that one typically accesses an element via its key, then a dict rather than a list is better.
Is a dictionary any more efficient in this case?
Yes, it is much more efficient. A dictionary takes O(1) on average to lookup an item by its key whereas a list takes O(n) to lookup an item by its key, which is what you are currently doing.
Using a Dictionary
# Construct the dictionary
d = {}
# Insert items into the dictionary
d[key1] = value1
d[key2] = value2
# ...
# Checking if an item exists
if key in d:
# Do something requiring d[key]
# such as updating an attribute:
d[key].attr = val
As you mention, you need to keep an auxiliary dictionary with the key value as the key and the instance (or list of instance with that value for their attribute) as the value(s) -- way more efficient. Indeed, there's nothing more efficient than a dictionary for such uses.
It depends on what the other needs of your program are. If all you ever do with these objects is access the one with that particular key value, then sure, a dictionary is perfect. But if you need to preserve the order of the elements, storing them in a dictionary won't do that. (You could store them in both a dict and a list, or there might be a data structure that provides a compromise between random access and order preservation) Alternatively, if more than one object can have the same key value, then you can't store both of them in a single dict at the same time, at least not directly. (You could have a dict of lists or something)

Categories

Resources