Say I have a class Box with two attributes, self.contents and self.number. I have instances of box in a list called Boxes. Is there anyway to access/modify a specific instance by its attribute rather than iterating through Boxes? For example, if I want a box with box.number = 40 (and the list is not sorted) what would be the best way to modify its contents.
If you need to do it more frequently and you have unique numbers, then create a dictionary:
numberedBox = dict((b.number, b) for b in Boxes)
you can then access your boxes directly with numbers:
numberedBox[40]
but if you want to change their number, you will have to modify the numberedBox dictionary too...
Otherwise yes, you have to iterate over the list.
The most straightforward way is to use a list comprehension:
answer=[box for box in boxes if box.number==40]
Be warned though. This actually does iterate over the whole list. Since the list is not sorted, there is no faster method than to iterate over it (and thus do a linear search), unless you want to copy all the data into some other data structure (e.g. dict, set or sort the list).
Use the filter builtin:
wanted_boxes = filter(lambda box: box.number == 40, boxes)
Although not as flexible as using a dictionary, you might be able to get by using a simple lookup table to the map box numbers to a particular box in boxes. For example if you knew the box numbers could range 0...MAX_BOX_NUMBER, then the following would be very fast. It requires only one full scan of the Boxes list to setup the table.
MAX_BOX_NUMBER = ...
# setup lookup table
box_number = [None for i in xrange(MAX_BOX_NUMBER+1)]
for i in xrange(len(Boxes)):
box_number[Boxes[i].number] = Boxes[i]
box_number[42] # box in Boxes with given number (or None)
If the box numbers are in some other arbitrary range, some minor arithmetic would have to be applied to them before their use as indices. If the range is very large, but sparsely populated, dictionaries would be the way to go to save memory but would require more computation -- the usual trade-off.
Related
I have a dict with 50,000,000 keys (strings) mapped to a count of that key (which is a subset of one with billions).
I also have a series of objects with a class set member containing a few thousand strings that may or may not be in the dict keys.
I need the fastest way to find the intersection of each of these sets.
Right now, I do it like this code snippet below:
for block in self.blocks:
#a block is a python object containing the set in the thousands range
#block.get_kmers() returns the set
count = sum([kmerCounts[x] for x in block.get_kmers().intersection(kmerCounts)])
#kmerCounts is the dict mapping millions of strings to ints
From my tests so far, this takes about 15 seconds per iteration. Since I have around 20,000 of these blocks, I am looking at half a week just to do this. And that is for the 50,000,000 items, not the billions I need to handle...
(And yes I should probably do this in another language, but I also need it done fast and I am not very good at non-python languages).
There's no need to do a full intersection, you just want the matching elements from the big dictionary if they exist. If an element doesn't exist you can substitute 0 and there will be no effect on the sum. There's also no need to convert the input of sum to a list.
count = sum(kmerCounts.get(x, 0) for x in block.get_kmers())
Remove the square brackets around your list comprehension to turn it into a generator expression:
sum(kmerCounts[x] for x in block.get_kmers().intersection(kmerCounts))
That will save you some time and some memory, which may in turn reduce swapping, if you're experiencing that.
There is a lower bound to how much you can optimize here. Switching to another language may ultimately be your only option.
Essentially this is what I'm trying to do:
I have a set that I add objects to. These objects have their own equality method, and a set should never have an element equal to another element in the set. However, when attempting to insert an element, if it is equal to another element, I'd like to record a merged version of the two elements. That is, the objects have an "aux" field that is not considered in its equality method. When I'm done adding things, I would like an element's "aux" field to contain a combination of all of the "aux" fields of equal elements I've tried to add.
My thinking was, okay, before adding an element to the set, check to see if it's already in the set. If so, pull it out of the set, combine the two elements, then put it back in. However, the remove method in Python sets doesn't return anything and the pop method returns an arbitrary element.
Can I do what I'm trying to do with sets in Python, or am I barking up the wrong tree (what is the right tree?)
Sounds like you want a defaultdict
from collections import defaultdict
D = defaultdict(list)
D[somekey].append(auxfield)
Edit:
To use your merge function, you can combine the code people have given in the comments
D = {}
for something in yourthings:
if something.key in D:
D[something.key] = something.auxfield
else:
D[something.key] = merge(D[something.key], something.auxfield)
Given a list of objects, where each has a property named x, and I want to remove all the objects whose x property contains value v from the list.
One way to do it is to use list comprehension: [item for item in mylist if item.x != v], but since my list is small (usually less than 10). Another way is to iterate through the list in a loop and check for every single item.
Is there a third way that is equally fast or even faster?
You can also use a generator or the filter function. Choose what you find the most readable; efficiency doesn't really matter at this point (especially not if you're dealing with just a few elements).
Create a new list using list comprehension syntax. I don't think you can do anything faster than that. It doesn't matter that your list is small, that's even better.
Is there a better way to implement a paging solution using dict than this?
I have a dict with image names and URLs.
I need to 16 key value pairs at a time depending on the user's request, i.e. page number.
It's a kind of paging solution.
I can implement this like:
For example :
dict = {'g1':'first', 'g2':'second', ... }
Now I can create a mapping of the keys to numbers using:
ordered={}
for i, j in enumerate(dict):
ordered[i]=j
And then retrieve them:
dicttosent={}
for i in range(paegnumber, pagenumber+16):
dicttosent[ordered[i]] = dict[ordered[i]]
Is this a proper method, or will this give random results?
Store g1, g2, etc in a list called imagelist
Fetch the pages using imagelist[pagenumber: pagenumber+16].
Use your original dict (image numbers to urls) to lookup the url for each of those 16 imagenames.
1) Will this give random results ?
Sort of.
Quoting from the official documentation about dict:
Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.
So for your purposes you can't know a priori on what will be the order of your iteration.
OrderedDict is what you're looking for: an OrderedDict is a dict that remembers the order that keys were first inserted.
2) Is this actually a proper method?
It doesn't seem so.
I don't know if there are library that will handle all that information for you (maybe someone else can tell you that), but it seems like you're trying to emulate the OrderedDict behaviour.
You can directly use an OrderedDict, or if you want to enumerate your info a list can do that.
It depends. If your dict doesn't change during the lifetime of the application and you don't care about ordering of the items in your dict you should be ok.
If not, you should probably use collections.OrderedDict or keep a sorted list of keys, depending on your requirements. Using normal dict doesn't give you any guarantees about iteration order, so after each modification of the input dict you can get different results.
Why not just create a dict that maps to your pages? You could start off with two lists, one containing your image names and the other containing the URLs.
perPage = 16
nameList = ['g1', 'g2', ... ]
urlList = ['first', 'second', ... ]
# This is a generator expression that can create
# the nested dicts. You can also use a simple
# for loop
pageDict = dict(( (i, dict(( (nameList[j], urlList[j])
for j in range(i*perPage, i*perPage+perPage))))
for i in range(len(nameList) / perPage)))
It indexes from 0, so your first page will be pageDict[0].
...Now that I look at it again, that generator expression looks kind of awful. :|
I'm new to Python and still learning. I was wondering if there was a standard 'best practice' for storing more than one key value in a tuple. Here's an example:
I have a value called 'red' which has a value of 3 and I need to divide it by a number (say 10). I need to store 3 values: Red (the name), 3 (number of times its divides 10) and 1 (the remainder). There are other values that are similar that will need to be included as well, so this is for red but same results for blue, green, etc. (numbers are different for each label).
I read around and I think way I found was to use nested lists, but I am doing this type of storage for a billion records (and I'll need to search through it so I thought maybe nested anything might slow me down).
I tried to create something like {'red':3:1,...} but its not the correct syntax and I'm considering adding a delimiter in the key value and then splitting it but not sure if that's efficient (such as {'red':3a1,..} then parse by the letter a).
I'm wondering if there's any better ways to store this or is nested tuples my only solution? I'm using Python 2.
The syntax for tuples is: (a,b,c).
If you want a dictionary with multiple values you can have a list as the value: {'red':[3,1]}.
You may want to also consider named tuples, or even classes. This will allow you to name the fields instead of accessing them by index, which will make the code more clear and structured.
I read around and I think way I found was to use nested lists, but I am doing this type of storage for a billion records(and I'll need to search through it so I thought maybe nested anything might slow me down).
If you have a billion records you probably should be persisting the data (for example in a database). You will likely run out of memory if you try to keep all the data in memory at once.
Use tuple. For example:
`('red', 3, 1)`
Perhaps you mean dictionaries instead of tuples?
{'red': [3,1], 'blue': [2,2]}
If you are trying to store key/value pairs the best way would be to store them in a dictionary. And if you need more than one value to each key, just put those values in a list.
I don't think you would want to store such things in a tuple because tuples aren't mutable. So if you decide to change the order of the quotient and remainder (1, 3) instead of (3,1), you would need to create new tuples. Whereas with lists, you could simply rearrange the order.