Related
What is the best way to replace the first character of all keys in a dictionary?
old_cols= {"~desc1":"adjustment1","~desc23":"adjustment3"}
I am trying to get
new_cols= {"desc1":"adjustment1","desc23":"adjustment3"}
I have tried:
for k,v in old_cols.items():
new_cols[k]=old_cols.pop(k[1:])
old_cols = {"~desc1":"adjustment1", "~desc23":"adjustment3"}
new_cols = {}
for k, v in old_cols.items():
new_key = k[1:]
new_cols[new_key] = v
Here it is with a dictionary comprehension:
old_cols= {"~desc1":"adjustment1","~desc23":"adjustment3"}
new_cols = {k.replace('~', ''):old_cols[k] for k in old_cols}
print(new_cols)
#{'desc1': 'adjustment1', 'desc23': 'adjustment3'}
There are many ways to do this with list-comprehension or for-loops. What is important is to understand is that dictionaries are mutable. This basically means that you can either modify the existing dictionary or create a new one.
If you want to create a new one (and I would recommend it! - see 'A word of warning ...' below), both solutions provided in the answers by #Ethem_Turgut and by #pakpe do the trick. I would probably wirte:
old_dict = {"~desc1":"adjustment1","~desc23":"adjustment3"}
# get the id of this dict, just for comparison later.
old_id = id(old_dict)
new_dict = {k[1:]: v for k, v in old_dict.items()}
print(f'Is the result still the same dictionary? Answer: {old_id == id(new_dict)}')
Now, if you want to modify the dictionary in place you might loop over the keys and adapt the key/value to your liking:
old_dict = {"~desc1":"adjustment1","~desc23":"adjustment3"}
# get the id of this dict, just for comparison later.
old_id = id(old_dict)
for k in [*old_dict.keys()]: # note the weird syntax here
old_dict[k[1:]] = old_dict.pop(k)
print(f'Is the result still the same dictionary? Answer: {old_id == id(old_dict)}')
A word of warning for the latter approach:
You should be aware that you are modifying the keys while looping over them. This is in most of the cases problematic and can even lead to a RuntimeError if you loop directly over old_dict. I avoided this by explicitly unpacking the keys into a list and the looping over that list with [*old_dict.keys()].
Why can modifying keys while looping over them be problematic? Imagine for example that you have the keys '~key1' and 'key1' in your old_dict. Now when your loop handles '~key1' it will modify it to 'key1' which already exists in old_dict and thus it will overwrite the value of 'key1' with the value from '~key1'.
So, only use the latter approach if you are certain to not produce issues like the example mentioned here before. If you are uncertain, simply create a new dictionary!
Let's say we have a Python dictionary d, and we're iterating over it like so:
for k, v in d.iteritems():
del d[f(k)] # remove some item
d[g(k)] = v # add a new item
(f and g are just some black-box transformations.)
In other words, we try to add/remove items to d while iterating over it using iteritems.
Is this well defined? Could you provide some references to support your answer?
See also How to avoid "RuntimeError: dictionary changed size during iteration" error? for the separate question of how to avoid the problem.
Alex Martelli weighs in on this here.
It may not be safe to change the container (e.g. dict) while looping over the container.
So del d[f(k)] may not be safe. As you know, the workaround is to use d.copy().items() (to loop over an independent copy of the container) instead of d.iteritems() or d.items() (which use the same underlying container).
It is okay to modify the value at an existing index of the dict, but inserting values at new indices (e.g. d[g(k)] = v) may not work.
It is explicitly mentioned on the Python doc page (for Python 2.7) that
Using iteritems() while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.
Similarly for Python 3.
The same holds for iter(d), d.iterkeys() and d.itervalues(), and I'll go as far as saying that it does for for k, v in d.items(): (I can't remember exactly what for does, but I would not be surprised if the implementation called iter(d)).
You cannot do that, at least with d.iteritems(). I tried it, and Python fails with
RuntimeError: dictionary changed size during iteration
If you instead use d.items(), then it works.
In Python 3, d.items() is a view into the dictionary, like d.iteritems() in Python 2. To do this in Python 3, instead use d.copy().items(). This will similarly allow us to iterate over a copy of the dictionary in order to avoid modifying the data structure we are iterating over.
I have a large dictionary containing Numpy arrays, so the dict.copy().keys() thing suggested by #murgatroid99 was not feasible (though it worked). Instead, I just converted the keys_view to a list and it worked fine (in Python 3.4):
for item in list(dict_d.keys()):
temp = dict_d.pop(item)
dict_d['some_key'] = 1 # Some value
I realize this doesn't dive into the philosophical realm of Python's inner workings like the answers above, but it does provide a practical solution to the stated problem.
The following code shows that this is not well defined:
def f(x):
return x
def g(x):
return x+1
def h(x):
return x+10
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[g(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[h(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
The first example calls g(k), and throws an exception (dictionary changed size during iteration).
The second example calls h(k) and throws no exception, but outputs:
{21: 'axx', 22: 'bxx', 23: 'cxx'}
Which, looking at the code, seems wrong - I would have expected something like:
{11: 'ax', 12: 'bx', 13: 'cx'}
Python 3 you should just:
prefix = 'item_'
t = {'f1': 'ffw', 'f2': 'fca'}
t2 = dict()
for k,v in t.items():
t2[k] = prefix + v
or use:
t2 = t1.copy()
You should never modify original dictionary, it leads to confusion as well as potential bugs or RunTimeErrors. Unless you just append to the dictionary with new key names.
This question asks about using an iterator (and funny enough, that Python 2 .iteritems iterator is no longer supported in Python 3) to delete or add items, and it must have a No as its only right answer as you can find it in the accepted answer. Yet: most of the searchers try to find a solution, they will not care how this is done technically, be it an iterator or a recursion, and there is a solution for the problem:
You cannot loop-change a dict without using an additional (recursive) function.
This question should therefore be linked to a question that has a working solution:
How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete")
Also helpful as it shows how to change the items of a dict on the run: How can I replace a key:value pair by its value wherever the chosen key occurs in a deeply nested dictionary? (= "replace").
By the same recursive methods, you will also able to add items as the question asks for as well.
Since my request to link this question was declined, here is a copy of the solution that can delete items from a dict. See How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete") for examples / credits / notes.
import copy
def find_remove(this_dict, target_key, bln_overwrite_dict=False):
if not bln_overwrite_dict:
this_dict = copy.deepcopy(this_dict)
for key in this_dict:
# if the current value is a dict, dive into it
if isinstance(this_dict[key], dict):
if target_key in this_dict[key]:
this_dict[key].pop(target_key)
this_dict[key] = find_remove(this_dict[key], target_key)
return this_dict
dict_nested_new = find_remove(nested_dict, "sub_key2a")
The trick
The trick is to find out in advance whether a target_key is among the next children (= this_dict[key] = the values of the current dict iteration) before you reach the child level recursively. Only then you can still delete a key:value pair of the child level while iterating over a dictionary. Once you have reached the same level as the key to be deleted and then try to delete it from there, you would get the error:
RuntimeError: dictionary changed size during iteration
The recursive solution makes any change only on the next values' sub-level and therefore avoids the error.
I got the same problem and I used following procedure to solve this issue.
Python List can be iterate even if you modify during iterating over it.
so for following code it will print 1's infinitely.
for i in list:
list.append(1)
print 1
So using list and dict collaboratively you can solve this problem.
d_list=[]
d_dict = {}
for k in d_list:
if d_dict[k] is not -1:
d_dict[f(k)] = -1 # rather than deleting it mark it with -1 or other value to specify that it will be not considered further(deleted)
d_dict[g(k)] = v # add a new item
d_list.append(g(k))
Today I had a similar use-case, but instead of simply materializing the keys on the dictionary at the beginning of the loop, I wanted changes to the dict to affect the iteration of the dict, which was an ordered dict.
I ended up building the following routine, which can also be found in jaraco.itertools:
def _mutable_iter(dict):
"""
Iterate over items in the dict, yielding the first one, but allowing
it to be mutated during the process.
>>> d = dict(a=1)
>>> it = _mutable_iter(d)
>>> next(it)
('a', 1)
>>> d
{}
>>> d.update(b=2)
>>> list(it)
[('b', 2)]
"""
while dict:
prev_key = next(iter(dict))
yield prev_key, dict.pop(prev_key)
The docstring illustrates the usage. This function could be used in place of d.iteritems() above to have the desired effect.
In a directory images, images are named like - 1_foo.png, 2_foo.png, 14_foo.png, etc.
The images are OCR'd and the text extract is stored in a dict by the code below -
data_dict = {}
for i in os.listdir(images):
if str(i[1]) != '_':
k = str(i[:2]) # Get first two characters of image name and use as 'key'
else:
k = str(i[:1]) # Get first character of image name and use 'key'
# Intiates a list for each key and allows storing multiple entries
data_dict.setdefault(k, [])
data_dict[k].append(pytesseract.image_to_string(i))
The code performs as expected.
The images can have varying numbers in their name ranging from 1 to 99.
Can this be reduced to a dictionary comprehension?
No. Each iteration in a dict comprehension assigns a value to a key; it cannot update an existing value list. Dict comprehensions aren't always better--the code you wrote seems good enough. Although maybe you could write
data_dict = {}
for i in os.listdir(images):
k = i.partition("_")[0]
image_string = pytesseract.image_to_string(i)
data_dict.setdefault(k, []).append(image_string)
Yes. Here's one way, but I wouldn't recommend it:
{k: d.setdefault(k, []).append(pytesseract.image_to_string(i)) or d[k]
for d in [{}]
for k, i in ((i.split('_')[0], i) for i in names)}
That might be as clean as I can make it, and it's still bad. Better use a normal loop, especially a clean one like Dennis's.
Slight variation (if I do the abuse once, I might as well do it twice):
{k: d.setdefault(k, []).append(pytesseract_image_to_string(i)) or d[k]
for d in [{}]
for i in names
for k in i.split('_')[:1]}
Edit: kaya3 now posted a good one using a dict comprehension. I'd recommend that over mine as well. Mine are really just the dirty results of me being like "Someone said it can't be done? Challenge accepted!".
In this case itertools.groupby can be useful; you can group the filenames by the numeric part. But making it work is not easy, because the groups have to be contiguous in the sequence.
That means before we can use groupby, we need to sort using a key function which extracts the numeric part. That's the same key function we want to group by, so it makes sense to write the key function separately.
from itertools import groupby
def image_key(image):
return str(image).partition('_')[0]
images = ['1_foo.png', '2_foo.png', '3_bar.png', '1_baz.png']
result = {
k: list(v)
for k, v in groupby(sorted(images, key=image_key), key=image_key)
}
# {'1': ['1_foo.png', '1_baz.png'],
# '2': ['2_foo.png'],
# '3': ['3_bar.png']}
Replace list(v) with list(map(pytesseract.image_to_string, v)) for your use-case.
Let's say we have a Python dictionary d, and we're iterating over it like so:
for k, v in d.iteritems():
del d[f(k)] # remove some item
d[g(k)] = v # add a new item
(f and g are just some black-box transformations.)
In other words, we try to add/remove items to d while iterating over it using iteritems.
Is this well defined? Could you provide some references to support your answer?
See also How to avoid "RuntimeError: dictionary changed size during iteration" error? for the separate question of how to avoid the problem.
Alex Martelli weighs in on this here.
It may not be safe to change the container (e.g. dict) while looping over the container.
So del d[f(k)] may not be safe. As you know, the workaround is to use d.copy().items() (to loop over an independent copy of the container) instead of d.iteritems() or d.items() (which use the same underlying container).
It is okay to modify the value at an existing index of the dict, but inserting values at new indices (e.g. d[g(k)] = v) may not work.
It is explicitly mentioned on the Python doc page (for Python 2.7) that
Using iteritems() while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.
Similarly for Python 3.
The same holds for iter(d), d.iterkeys() and d.itervalues(), and I'll go as far as saying that it does for for k, v in d.items(): (I can't remember exactly what for does, but I would not be surprised if the implementation called iter(d)).
You cannot do that, at least with d.iteritems(). I tried it, and Python fails with
RuntimeError: dictionary changed size during iteration
If you instead use d.items(), then it works.
In Python 3, d.items() is a view into the dictionary, like d.iteritems() in Python 2. To do this in Python 3, instead use d.copy().items(). This will similarly allow us to iterate over a copy of the dictionary in order to avoid modifying the data structure we are iterating over.
I have a large dictionary containing Numpy arrays, so the dict.copy().keys() thing suggested by #murgatroid99 was not feasible (though it worked). Instead, I just converted the keys_view to a list and it worked fine (in Python 3.4):
for item in list(dict_d.keys()):
temp = dict_d.pop(item)
dict_d['some_key'] = 1 # Some value
I realize this doesn't dive into the philosophical realm of Python's inner workings like the answers above, but it does provide a practical solution to the stated problem.
The following code shows that this is not well defined:
def f(x):
return x
def g(x):
return x+1
def h(x):
return x+10
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[g(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[h(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
The first example calls g(k), and throws an exception (dictionary changed size during iteration).
The second example calls h(k) and throws no exception, but outputs:
{21: 'axx', 22: 'bxx', 23: 'cxx'}
Which, looking at the code, seems wrong - I would have expected something like:
{11: 'ax', 12: 'bx', 13: 'cx'}
Python 3 you should just:
prefix = 'item_'
t = {'f1': 'ffw', 'f2': 'fca'}
t2 = dict()
for k,v in t.items():
t2[k] = prefix + v
or use:
t2 = t1.copy()
You should never modify original dictionary, it leads to confusion as well as potential bugs or RunTimeErrors. Unless you just append to the dictionary with new key names.
This question asks about using an iterator (and funny enough, that Python 2 .iteritems iterator is no longer supported in Python 3) to delete or add items, and it must have a No as its only right answer as you can find it in the accepted answer. Yet: most of the searchers try to find a solution, they will not care how this is done technically, be it an iterator or a recursion, and there is a solution for the problem:
You cannot loop-change a dict without using an additional (recursive) function.
This question should therefore be linked to a question that has a working solution:
How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete")
Also helpful as it shows how to change the items of a dict on the run: How can I replace a key:value pair by its value wherever the chosen key occurs in a deeply nested dictionary? (= "replace").
By the same recursive methods, you will also able to add items as the question asks for as well.
Since my request to link this question was declined, here is a copy of the solution that can delete items from a dict. See How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete") for examples / credits / notes.
import copy
def find_remove(this_dict, target_key, bln_overwrite_dict=False):
if not bln_overwrite_dict:
this_dict = copy.deepcopy(this_dict)
for key in this_dict:
# if the current value is a dict, dive into it
if isinstance(this_dict[key], dict):
if target_key in this_dict[key]:
this_dict[key].pop(target_key)
this_dict[key] = find_remove(this_dict[key], target_key)
return this_dict
dict_nested_new = find_remove(nested_dict, "sub_key2a")
The trick
The trick is to find out in advance whether a target_key is among the next children (= this_dict[key] = the values of the current dict iteration) before you reach the child level recursively. Only then you can still delete a key:value pair of the child level while iterating over a dictionary. Once you have reached the same level as the key to be deleted and then try to delete it from there, you would get the error:
RuntimeError: dictionary changed size during iteration
The recursive solution makes any change only on the next values' sub-level and therefore avoids the error.
I got the same problem and I used following procedure to solve this issue.
Python List can be iterate even if you modify during iterating over it.
so for following code it will print 1's infinitely.
for i in list:
list.append(1)
print 1
So using list and dict collaboratively you can solve this problem.
d_list=[]
d_dict = {}
for k in d_list:
if d_dict[k] is not -1:
d_dict[f(k)] = -1 # rather than deleting it mark it with -1 or other value to specify that it will be not considered further(deleted)
d_dict[g(k)] = v # add a new item
d_list.append(g(k))
Today I had a similar use-case, but instead of simply materializing the keys on the dictionary at the beginning of the loop, I wanted changes to the dict to affect the iteration of the dict, which was an ordered dict.
I ended up building the following routine, which can also be found in jaraco.itertools:
def _mutable_iter(dict):
"""
Iterate over items in the dict, yielding the first one, but allowing
it to be mutated during the process.
>>> d = dict(a=1)
>>> it = _mutable_iter(d)
>>> next(it)
('a', 1)
>>> d
{}
>>> d.update(b=2)
>>> list(it)
[('b', 2)]
"""
while dict:
prev_key = next(iter(dict))
yield prev_key, dict.pop(prev_key)
The docstring illustrates the usage. This function could be used in place of d.iteritems() above to have the desired effect.
Is it possible to manipulate each dictionary in iterable with lambdas?
To be clarify, just want to delete _id key from each element.
Wonder If there is an elegant way to achieve this simple task without 3rd parties like (toolz), functions, or copying dict objects.
Possible solutions that I do not search for.
Traditional way:
cleansed = {k:v for k,v in data.items() if k not in ['_id'] }
Another way:
def clean(iterable):
for d in iterable:
del d['_id']
return
3rd Party way:
Python functional approach: remove key from dict using filter
Functional programming tendency is to not alter the original object, but to return a new modified copy of the data. Main problem is that del is a statement and can't be used functionally.
Maybe this pleases you:
[d.pop('_id') for d in my_dicts]
that will modify all the dictionaries in my_dicts in place and remove the '_id' key from them. The resulting list is a list of _id values, but the dictionaries are edited in the original my_dicts.
I don't recommend using this at all, since it is not as easy to understand as your second example. Using functional programming to alter persistent data is not elegant at all. But if you need to...
Check if this do what you want:
# Python 2.x
func = lambda iterable, key="_id": type(iterable)([{k: v for k, v in d.iteritems() if k != key} for d in iterable])
# Python 3.x
func = lambda iterable, key="_id": type(iterable)([{k: v for k, v in d.items() if k != key} for d in iterable])