Efficiently iterating a dictionary in Python - python

So here's the problem, I'm importing a dictionary with anywhere from 6000 to 12000 keys. Then using a nested for algorithm to group them into a list inside of another dictionary. I'm using the following code to check if the key is in the dictionary:
for key in range(sizeOfOriginalKeys):
if key in key_data:
As you might imagine, this is taking forever since the sorting algorithm is fairly complex. I would like to only iterate through the keys in 'key_data', without doing 1000 to 11999 checks if there is that key in the dictionary. Is there a way to make a list of current keys? Then iterate through them? Or at least something more efficient than what I'm currently doing?
Current Code after Kevin's suggestion:
for key in key_data:
currentKey = key_data[key].name
if key_data[currentKey].prefList[currentPref] == currentGroup
key_data[currentKey].currentScore = getDotProduct()
group_data[currentGroup].keyList.append(key_data[currentKey])
group_data[currentGroup].sortKeys()
del key_data[currentKey]
The key names are integers.
At the end of the sorting algorithm I delete the key, if its been sorted into a group.
Now I get an error: dictionary changed size during iteration.
Thoughts?

You're trying too hard:
for key in key_data:

You can try
for key,value in key_data.items() :
print key
print value
you can access to value without calling key_data[key]

Related

Is there a way to select particular key value of dictionary present in a list of dictionaries

I am presently using a for loop to print all the required key value pairs of a dictionary. However is there a simpler way to select the required key value pair?
for i in (out['elements']):
out = (i['insights'][0]['details']['socialInfo'])
out_temp.append(out)
The content of out is actually a JSON with list of dictionaries and each dictionary contains a list of dictionaries.
You can use map to generate the new list as well. But I think what you are doing is fine, it is much easier to read than the alternatives.
out_temp = list(map(lambda x: x['insights'][0]['details']['socialInfo'], out['elements']))
I cannot see an unequivocally simpler way to access the data you require. However, you can apply your logic more efficiently via a list comprehension:
out_temp = [i['insights'][0]['details']['socialInfo'] for i in out['elements']]
Whether or not this is also simpler is open to debate.

Error Changing Dictionary Keys

I've two defaultdicts I eventually want to merge, but first I need to make their keys match. According to some threads I've seen here, I can use pop() to replace keys in a dictionary. But that only updates the existing dictionary, whereas I want to create a new dictionary with the new keys. So something like:
existing_dict_one -> new_dict_one
This is what I've so far:
def split_tabs(x):
"""
Function to split tab-separated strings, used to break up the keys that are separated by tabs.
"""
return x.split('\t')
def create_dict(old_dict):
"""
Function to create a new defaultdict from an existing defaultdict, just with
different keys.
"""
new_dict = old_dict.copy() # Create a copy of old_dict to house the new keys, but with the same values.
for key, value in new_dict.iteritems():
umi = split_tabs(key)[0] # Change key to be UMI, which is the 0th index of the tab-delimited key.
# new_key = key.replace(key, umi)
new_dict[umi] = new_dict.pop(key)
return new_dict
However, I'm getting the following error
RuntimeError: dictionary changed size during iteration
and I don't know how to fix it. Does anyone know how to correct it? I'd like to use the variable "umi" as the new key.
I'd like to post the variable "key" and dictionary "old_dict" I'm using for testing this code, but it's messy and takes up a lot of space. So here's a pastebin link that contains them instead.
Note that "umi" comes from variable "key" which is separated by tabs. So I split "key" and get the first object as "umi".
Just use a dict comprehension for this:
new_dict = {split_tabs(key)[0]: value for key, value in old_dict.iteritems()}
Trying to modify a dictionary while iterating over it is not a good idea in general.
If you use .items() instead of .iteritems(), you won't have that problem, because that will just return a list that is disconnected from the dictionary. In python 3 it would be 'list(new_dict.items())`.
Also if there's any possibility that the dictionary values are mutable, you'll have to use copy.deepcopy(old_dict) instead of just old_dict.copy().

Ignoring a key when looping through a sorted dictionary in Python

I have a dictionary in python and I'm assigning elements to an array utilizing a key with four elements. I want to plot my arrays by looping through my sorted dictionary but I'd like to ignore one of the keys in the loop. My code looks like this:
key = (process, temp, board, chip)
#Do some stuff in a loop
for key in sorted(svmDict):
#plot some things but don't sort with the variable chip
I found some articles for removing a specific key but in my case chip is actually a variable and I removing each key seems cumbersome and likely unnecessary.
If you're not worried about speed I would just check whether or not you are at an acceptable key in the loop. You can directly check against one value you want to skip or make a list of values you want to skip
ignore_list = [chip]
for key in sorted(svmDict):
if key not in ignore_list:
#do the thing

Choose one key arbitrarily in a dictionary without iteration [duplicate]

This question already has answers here:
dict.keys()[0] on Python 3 [duplicate]
(3 answers)
Closed 6 years ago.
I just wanna make sure that in Python dictionaries there's no way to get just a key (with no specific quality or relation to a certain value) but doing iteration. As much as I found out you have to make a list of them by going through the whole dictionary in a loop. Something like this:
list_keys=[k for k in dic.keys()]
The thing is I just need an arbitrary key if the dictionary is not empty and don't care about the rest. I guess iterating over a long dictionary in addition to creation of a long list for just randomly getting a key is a whole lot overhead, isn't it?
Is there a better trick somebody can point out?
Thanks
A lot of the answers here produce a random key but the original question asked for an arbitrary key. There's quite a difference between those two. Randomness has a handful of mathematical/statistical guarantees.
Python dictionaries are not ordered in any meaningful way. So, yes, accessing an arbitrary key requires iteration. But for a single arbitrary key, we do not need to iterate the entire dictionary. The built-in functions next and iter are useful here:
key = next(iter(mapping))
The iter built-in creates an iterator over the keys in the mapping. The iteration order will be arbitrary. The next built-in returns the first item from the iterator. Iterating the whole mapping is not necessary for an arbitrary key.
If you're going to end up deleting the key from the mapping, you may instead use dict.popitem. Here's the docstring:
D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple;
but raise KeyError if D is empty.
You can use random.choice
rand_key = random.choice(dict.keys())
And this will only work in python 2.x, in python 3.x dict.keys returns an iterator, so you'll have to do cast it into a list -
rand_key = random.choice(list(dict.keys()))
So, for example -
import random
d = {'rand1':'hey there', 'rand2':'you love python, I know!', 'rand3' : 'python has a method for everything!'}
random.choice(list(d.keys()))
Output -
rand1
You are correct: there is not a way to get a random key from an ordinary dict without using iteration. Even solutions like random.choice must iterate through the dictionary in the background.
However you could use a sorted dict:
from sortedcontainers import SortedDict as sd
d = sd(dic)
i = random.randrange(len(d))
ran_key = d.iloc[i]
More here:.
http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html
Note that whether or not using something like SortedDict will result in any efficiency gains is going to be entirely dependent upon the actual implementation. If you are creating a lot of SD objects, or adding new keys very often (which have to be sorted), and are only getting a random key occasionally in relation to those other two tasks, you are unlikely to see much of a performance gain.
How about something like this:
import random
arbitrary_key = random.choice( dic.keys() )
BTW, your use of a list comprehension there really makes no sense:
dic.keys() == [k for k in dic.keys()]
check the length of dictionary like this, this should do !!
import random
if len(yourdict) > 0:
randomKey = random.sample(yourdict,1)
print randomKey[0]
else:
do something
randomKey will return a list, as we have passed 1 so it will return list with 1 key and then get the key by using randomKey[0]

Finding a key in a dictionary without knowing its full name

I have a dictionary with a key called ev#### where #### is some number that I do not know ahead of time. There is only one of this type of key in the dictionary and no other key starts with ev.
What's the cleanest way to access that key without knowing what the #### is?
You can try this list comprehension: (ideone)
result = [v for k, v in d.iteritems() if k.startswith('ev')][0]
Or this approach using a generator expression: (ideone)
result = next(v for k, v in d.iteritems() if k.startswith('ev'))
Note that these will both require a linear scan of the items in the dictionary, unlike an ordinary key-lookup which runs in constant time on average (assuming a good hash function). The generator expression however can stop as soon as it finds the key. The list comprehension will always scan the entire dicitonary.
If there is only one such value in the dictionary, I would say it's better to use an approach similar to this:
for k,v in d.iteritems():
if k.startswith('ev'):
result = v
break
else:
raise KeyError() # or set to default value
That way you don't have to loop through every value in the dictionary, but only until you find the key, which should speed up the calculation by ~ 2x on average.
Store the item in the dictionary without the ev prefix in the first place.
If you also need to access it with the prefix, store it both ways.
If there can be multiple prefixes for a given number, use a second dictionary that stores the actual keys associated with each number as a list or sub-dictionary, and use that to find the available keys in the main dictionary matching the number.
If you can't easily do this when the dictionary is initially created (e.g. someone else's code is giving you the dict and you can't change it), and you will be doing a lot of lookups of this sort, it is probably worthwhile to iterate over the dict once and make the second dict, or use a dict to cache the lookups, or something of that sort, to avoid iterating the keys each time.

Categories

Resources