Given a python dictionary and an integer n, I need to access the nth key. I need to do this repeatedly many times in my project.
I have written a function which does this:
def ix(self,dict,n):
count=0
for i in sorted(dict.keys()):
if n==count:
return i
else:
count+=1
But the problem is that if the dictionary is huge, the time complexity increases when used repeatedly.
Is there an efficient way to do this?
I guess you wanted to do something like this, but as dictionary don't have any order so the order of keys in dict.keys can be anything:
def ix(self, dct, n): #don't use dict as a variable name
try:
return list(dct)[n] # or sorted(dct)[n] if you want the keys to be sorted
except IndexError:
print 'not enough keys'
dict.keys() returns a list so, all you need to do is dict.keys()[n]
But, a dictionary is an unordered collection so nth element does not make any sense in this context.
Note: Indexing dict.keys() is not supported in python3
For those that want to avoid the creation of a new temporary list just to access the nth element, I suggest to use an iterator.
from itertools import islice
def nth_key(dct, n):
it = iter(dct)
# Consume n elements.
next(islice(it, n, n), None)
# Return the value at the current position.
# This raises StopIteration if n is beyond the limits.
# Use next(it, None) to suppress that exception.
return next(it)
This can be notably faster for very large dictionaries compared to converting the keys into a temporary list first and then accessing its nth element.
It is mentioned in multiple answers, that dictionaries were unordered. This is nonly true for python versions up to 3.6. From 3.7 ongoing, dictionaries are in fact ordered.
Related
I am initializing my list object using following code.
list = [
func1(centroids[0],value),
func1(centroids[1],value),
....,
func1(centroids[n],value)]
I am trying to do it a more elegant way using some inline iteration. Following is the pseudo code of one possible way.
list = [value for value in func1(centroids[n],value)]
I am not clear how to call func1 in an iterative way. Can you suggest a possible implementation?
For a list of objects, Python knows how to iterate over it directly so you can eliminate the index shown in most of the other answers,
res = [func1(c, value) for c in centroids]
That's all there is to it.
A simple list comprehension consists of the "template" list element, followed by the iterator needed to step through the desired values.
my_list = [func1(centroids[0],value)
for n in range(n+1)]
Use this code:
list = [func1(centroids[x], value) for x in range(n)]
This is called a list comprehension. Put the values that you want the list to contain up front, then followed by the for loop. You can use the iterating variable of the for loop with the value. In this code, you set up n number(s) of variable(s) from the function call func1(centroids[x], value). If the variable n equals to, let's say, 4, list = [func1(centroids[0], value), func1(centroids[0], value), func1(centroids[0], value), func1(centroids[0], value)] would be equal to the code above
I've got a list of object in python which i want to sort based on a attribute.
For eg:
abc is a class with attributes id and count.
I have a list objects of the class abc.
list=[abc('1',120),abc('2',0),abc('0',180),abc('5',150)].
I want to sort the list in ascending order of the attribute 'count'
I have done it using:
list.sort(key=attrgetter('count'))
I have found using profiling my python script that it takes lot of time for sorting.
Can anyone suggest a better and faster way to sort list of object based on a attribute minimizing the time for sorting.
A nitpick: you're using the name list for your list which will overwrite the standard list class. You'd be better off using l as the list name.
I tested sorting a list containing 12 times the contents of your list 100000 times. It took 0.848 s without a comparator function or a key when I used the sorted() function to avoid re-sorting an already sorted list.
There are at least three ways I can think of:
A. Use the sort() with the comparator function:
def comparator(x, y):
return cmp(x.count, y.count)
l.sort(cmp=comparator)
This took 9.598 s on my system when I used the sorted() function to avoid re-sorting an already sorted list.
B. Use the sort() with the key function:
l.sort(key=operator.attrgetter('count'))
This took 3.111 s on my system when I used the sorted() function to avoid re-sorting an already sorted list.
C. Use native C code to improve the performance of the sorting. I didn't test this.
So, you seem to be already using the fastest all-Python way there is and the way forward would be the use of native C code.
I believe the sort method is implemented using Timsort algorithm, so there is not much you can improve in terms of sorting.
What you can do is to insert the elements differently provided you have the control over the inserting part of code.
For example you could use a binary heap to optimize retreiving of the largest element (see heapq module in Python) or binary search tree to maintain the sorting order.
The data strcuture you choose, mainly depends on what do you want to do with the elements.
If I understand you correctly:
class Abc(object):
def __init__(self, name, count):
self.name = name
self.count = count
#classmethod
def sort_key(cls, key):
if key == 'count':
return lambda obj: obj.count
elif key == 'name':
return lambda obj: obj.name
lst = [Abc('1', 120), Abc('2', 0), Abc('0', 180), Abc('5', 150)]
lst.sort(key=Abc.sort_key('count'))
for e in lst:
print e.name, e.count
print
lst.sort(key=Abc.sort_key('name'))
for e in lst:
print e.name, e.count
print
I'd not recommend you use 'id', 'abc' and 'list' as names of arbitrary variables because they are keywords in python.
I am interested in a dict implementation for Python that provides an iterating interface to sorted values. I.e., a dict with a "sortedvalues()" function.
Naively one can do sorted(dict.values()) but that's not what I want. Every time items are inserted or deleted, one has to run a full sorting which isn't efficient.
Note that I am not asking about key-sorted dict either (for that question, there are excellent answers in Key-ordered dict in Python and Python 2.6 TreeMap/SortedDictionary?).
One solution is to write a class that inherits from dict but also maintains a list of keys sorted by their value (sorted_keys), along with the list of corresponding (sorted) values (sorted_values).
You can then define a __setitem__() method that uses the bisect module in order to know quickly the position k where the new (key, value) pair should be inserted in the two lists. You can then insert the new key and the new value both in the dictionary itself, and in the two lists that you maintain, with sorted_values[k:k] = [new_value] and sorted_keys[k:k] = [new_key]; unfortunately, the time complexity of such an insertion is O(n) (so O(n^2) for the whole dictionary).
Another approach to the ordered element insertion would be to use the heapq module and insert (value, key) pairs in it. This works in O(log n) instead of the list-based approach of the previous paragraph.
Iterating over the dictionary can then simply done by iterating over the list of keys (sorted_keys) that you maintain.
This method saves you the time it would take to sort the keys each time you want to iterate over the dictionary (with sorted values), by basically shifting (and increasing, unfortunately) this time cost to the construction of the sorted lists of keys and values.
The problem is that you need to sort or hash it by keys to get reasonable insert and lookup performance. A naive way of implementing it would be a value-sorted tree structure of entries, and a dict to lookup the tree position for a key. You need to get deep into updating the tree though, as this lookup dictionary needs to be kept correct. Essentially, as you would do for an updatable heap.
I figure there are too many options to make a resonable standard library option out of such a structure, while it is too rarely needed.
Update: a trick that might work for you is to use a dual structure:
a regular dict storing the key-value pairs as usual
any kind of sorted list, for example using bisect
Then you have to implement the common operations on both: a new value is inserted into both structures. The tricky part are the update and delete operations. You use the first structure to look up the old value, delete the old value from the second structure, then (when updating) reinsert as before.
If you need to know the keys too, store (value, key) pairs in your b list.
Update 2: Try this class:
import bisect
class dictvs(dict):
def __init__(self):
self._list = []
def __setitem__(self, key, value):
old = self.get(key)
if old is None:
bisect.insort(self._list, value)
dict.__setitem__(self, key, value)
else:
oldpos = bisect.bisect_left(self._list, old)
newpos = bisect.bisect_left(self._list, value)
if newpos > oldpos:
newpos -= 1
for i in xrange(oldpos, newpos):
self._list[i] = self._list[i + 1]
else:
for i in xrange(oldpos, newpos, -1):
self._list[i] = self._list[i - 1]
self._list[newpos] = value
dict.__setitem__(self, key, value)
def __delitem__(self, key):
old = self.get(key)
if old is not None:
oldpos = bisect.bisect(self._list, old)
del self._list[oldpos]
dict.__delitem__(self, key)
def values(self):
return list(self._list)
It's not a complete dict yet I guess. I havn't tested deletions, and just a tiny update set. You should make a larger unit test for it, and compare the return of values() with that of sorted(dict.values(instance)) there. This is just to show how to update the sorted list with bisect
Here is another, simpler idea:
You create a class that inherits from dict.
You use a cache: you only sort the keys when iterating over the dictionary, and you mark the dictionary as being sorted; insertions should simply append to the list of keys.
kindall mention in a comment that sorting lists that are almost sorted is fast, so this approach should be quite fast.
You can use a skip dict. It is a Python dictionary that is permanently sorted by value.
Insertion is slightly more expensive than a regular dictionary, but it is well worth the cost if you frequently need to iterate in order, or perform value-based queries such as:
What's the highest / lowest item?
Which items have a value between X and Y?
I have about half a million items that need to be placed in a list, I can't have duplications, and if an item is already there I need to get it's index. So far I have
if Item in List:
ItemNumber=List.index(Item)
else:
List.append(Item)
ItemNumber=List.index(Item)
The problem is that as the list grows it gets progressively slower until at some point it just isn't worth doing. I am limited to python 2.5 because it is an embedded system.
You can use a set (in CPython since version 2.4) to efficiently look up duplicate values. If you really need an indexed system as well, you can use both a set and list.
Doing your lookups using a set will remove the overhead of if Item in List, but not that of List.index(Item)
Please note ItemNumber=List.index(Item) will be very inefficient to do after List.append(Item). You know the length of the list, so your index can be retrieved with ItemNumber = len(List)-1.
To completely remove the overhead of List.index (because that method will search through the list - very inefficient on larger sets), you can use a dict mapping Items back to their index.
I might rewrite it as follows:
# earlier in the program, NOT inside the loop
Dup = {}
# inside your loop to add items:
if Item in Dup:
ItemNumber = Dup[Item]
else:
List.append(Item)
Dup[Item] = ItemNumber = len(List)-1
If you really need to keep the data in an array, I'd use a separate dictionary to keep track of duplicates. This requires twice as much memory, but won't slow down significantly.
existing = dict()
if Item in existing:
ItemNumber = existing[Item]
else:
ItemNumber = existing[Item] = len(List)
List.append(Item)
However, if you don't need to save the order of items you should just use a set instead. This will take almost as little space as a list, yet will be as fast as a dictionary.
Items = set()
# ...
Items.add(Item) # will do nothing if Item is already added
Both of these require that your object is hashable. In Python, most types are hashable unless they are a container whose contents can be modified. For example: lists are not hashable because you can modify their contents, but tuples are hashable because you cannot.
If you were trying to store values that aren't hashable, there isn't a fast general solution.
You can improve the check a lot:
check = set(List)
for Item in NewList:
if Item in check: ItemNumber = List.index(Item)
else:
ItemNumber = len(List)
List.append(Item)
Or, even better, if order is not important you can do this:
oldlist = set(List)
addlist = set(AddList)
newlist = list(oldlist | addlist)
And if you need to loop over the items that were duplicated:
for item in (oldlist & addlist):
pass # do stuff
I never actually thought I'd run into speed-issues with python, but I have. I'm trying to compare really big lists of dictionaries to each other based on the dictionary values. I compare two lists, with the first like so
biglist1=[{'transaction':'somevalue', 'id':'somevalue', 'date':'somevalue' ...}, {'transactio':'somevalue', 'id':'somevalue', 'date':'somevalue' ...}, ...]
With 'somevalue' standing for a user-generated string, int or decimal. Now, the second list is pretty similar, except the id-values are always empty, as they have not been assigned yet.
biglist2=[{'transaction':'somevalue', 'id':'', 'date':'somevalue' ...}, {'transactio':'somevalue', 'id':'', 'date':'somevalue' ...}, ...]
So I want to get a list of the dictionaries in biglist2 that match the dictionaries in biglist1 for all other keys except id.
I've been doing
for item in biglist2:
for transaction in biglist1:
if item['transaction'] == transaction['transaction']:
list_transactionnamematches.append(transaction)
for item in biglist2:
for transaction in list_transactionnamematches:
if item['date'] == transaction['date']:
list_transactionnamematches.append(transaction)
... and so on, not comparing id values, until I get a final list of matches. Since the lists can be really big (around 3000+ items each), this takes quite some time for python to loop through.
I'm guessing this isn't really how this kind of comparison should be done. Any ideas?
Index on the fields you want to use for lookup. O(n+m)
matches = []
biglist1_indexed = {}
for item in biglist1:
biglist1_indexed[(item["transaction"], item["date"])] = item
for item in biglist2:
if (item["transaction"], item["date"]) in biglist1_indexed:
matches.append(item)
This is probably thousands of times faster than what you're doing now.
What you want to do is to use correct data structures:
Create a dictionary of mappings of tuples of other values in the first dictionary to their id.
Create two sets of tuples of values in both dictionaries. Then use set operations to get the tuple set you want.
Use the dictionary from the point 1 to assign ids to those tuples.
Forgive my rusty python syntax, it's been a while, so consider this partially pseudocode
import operator
biglist1.sort(key=(operator.itemgetter(2),operator.itemgetter(0)))
biglist2.sort(key=(operator.itemgetter(2),operator.itemgetter(0)))
i1=0;
i2=0;
while i1 < len(biglist1) and i2 < len(biglist2):
if (biglist1[i1]['date'],biglist1[i1]['transaction']) == (biglist2[i2]['date'],biglist2[i2]['transaction']):
biglist3.append(biglist1[i1])
i1++
i2++
elif (biglist1[i1]['date'],biglist1[i1]['transaction']) < (biglist2[i2]['date'],biglist2[i2]['transaction']):
i1++
elif (biglist1[i1]['date'],biglist1[i1]['transaction']) > (biglist2[i2]['date'],biglist2[i2]['transaction']):
i2++
else:
print "this wont happen if i did the tuple comparison correctly"
This sorts both lists into the same order, by (date,transaction). Then it walks through them side by side, stepping through each looking for relatively adjacent matches. It assumes that (date,transaction) is unique, and that I am not completely off my rocker with regards to tuple sorting and comparison.
In O(m*n)...
for item in biglist2:
for transaction in biglist1:
if (item['transaction'] == transaction['transaction'] &&
item['date'] == transaction['date'] &&
item['foo'] == transaction['foo'] ) :
list_transactionnamematches.append(transaction)
The approach I would probably take to this is to make a very, very lightweight class with one instance variable and one method. The instance variable is a pointer to a dictionary; the method overrides the built-in special method __hash__(self), returning a value calculated from all the values in the dictionary except id.
From there the solution seems fairly obvious: Create two initially empty dictionaries: N and M (for no-matches and matches.) Loop over each list exactly once, and for each of these dictionaries representing a transaction (let's call it a Tx_dict), create an instance of the new class (a Tx_ptr). Then test for an item matching this Tx_ptr in N and M: if there is no matching item in N, insert the current Tx_ptr into N; if there is a matching item in N but no matching item in M, insert the current Tx_ptr into M with the Tx_ptr itself as a key and a list containing the Tx_ptr as the value; if there is a matching item in N and in M, append the current Tx_ptr to the value associated with that key in M.
After you've gone through every item once, your dictionary M will contain pointers to all the transactions which match other transactions, all neatly grouped together into lists for you.
Edit: Oops! Obviously, the correct action if there is a matching Tx_ptr in N but not in M is to insert a key-value pair into M with the current Tx_ptr as the key and as the value, a list of the current Tx_ptr and the Tx_ptr that was already in N.
Have a look at Psyco. Its a Python compiler that can create very fast, optimized machine code from your source.
http://sourceforge.net/projects/psyco/
While this isn't a direct solution to your code's efficiency issues, it could still help speed things up without needing to write any new code. That said, I'd still highly recommend optimizing your code as much as possible AND use Psyco to squeeze as much speed out of it as possible.
Part of their guide specifically talks about using it to speed up list, string, and numeric computation heavy functions.
http://psyco.sourceforge.net/psycoguide/node8.html
I'm also a newbie. My code is structured in much the same way as his.
for A in biglist:
for B in biglist:
if ( A.get('somekey') <> B.get('somekey') and #don't match to itself
len( set(A.get('list')) - set(B.get('list')) ) > 10:
[do stuff...]
This takes hours to run through a list of 10000 dictionaries. Each dictionary contains lots of stuff but I could potentially pull out just the ids ('somekey') and lists ('list') and rewrite as a single dictionary of 10000 key:value pairs.
Question: how much faster would that be? And I assume this is faster than using a list of lists, right?