Imagine that you have:
value = ["Name:","Mike", "Hobby:", "bakset", "voly"]
What is the simplest way to produce the following dictionary?
output : {"Name:" : ["Mike"], "Hobby:" : ["bakset", "voly"]}
with python
Value with ":" would be the key for dictionary
Dict comprehension:
>>> {v: (a := []) for v in value if v[-1] == ':' or a.append(v)}
{'Name:': ['Mike'], 'Hobby:': ['bakset', 'voly']}
Though I suspect you're not sharing the original data but already processed data, and that there's a better way to directly build from the original data. Possibly with an existing parser for the format.
Related
What is the best way to replace the first character of all keys in a dictionary?
old_cols= {"~desc1":"adjustment1","~desc23":"adjustment3"}
I am trying to get
new_cols= {"desc1":"adjustment1","desc23":"adjustment3"}
I have tried:
for k,v in old_cols.items():
new_cols[k]=old_cols.pop(k[1:])
old_cols = {"~desc1":"adjustment1", "~desc23":"adjustment3"}
new_cols = {}
for k, v in old_cols.items():
new_key = k[1:]
new_cols[new_key] = v
Here it is with a dictionary comprehension:
old_cols= {"~desc1":"adjustment1","~desc23":"adjustment3"}
new_cols = {k.replace('~', ''):old_cols[k] for k in old_cols}
print(new_cols)
#{'desc1': 'adjustment1', 'desc23': 'adjustment3'}
There are many ways to do this with list-comprehension or for-loops. What is important is to understand is that dictionaries are mutable. This basically means that you can either modify the existing dictionary or create a new one.
If you want to create a new one (and I would recommend it! - see 'A word of warning ...' below), both solutions provided in the answers by #Ethem_Turgut and by #pakpe do the trick. I would probably wirte:
old_dict = {"~desc1":"adjustment1","~desc23":"adjustment3"}
# get the id of this dict, just for comparison later.
old_id = id(old_dict)
new_dict = {k[1:]: v for k, v in old_dict.items()}
print(f'Is the result still the same dictionary? Answer: {old_id == id(new_dict)}')
Now, if you want to modify the dictionary in place you might loop over the keys and adapt the key/value to your liking:
old_dict = {"~desc1":"adjustment1","~desc23":"adjustment3"}
# get the id of this dict, just for comparison later.
old_id = id(old_dict)
for k in [*old_dict.keys()]: # note the weird syntax here
old_dict[k[1:]] = old_dict.pop(k)
print(f'Is the result still the same dictionary? Answer: {old_id == id(old_dict)}')
A word of warning for the latter approach:
You should be aware that you are modifying the keys while looping over them. This is in most of the cases problematic and can even lead to a RuntimeError if you loop directly over old_dict. I avoided this by explicitly unpacking the keys into a list and the looping over that list with [*old_dict.keys()].
Why can modifying keys while looping over them be problematic? Imagine for example that you have the keys '~key1' and 'key1' in your old_dict. Now when your loop handles '~key1' it will modify it to 'key1' which already exists in old_dict and thus it will overwrite the value of 'key1' with the value from '~key1'.
So, only use the latter approach if you are certain to not produce issues like the example mentioned here before. If you are uncertain, simply create a new dictionary!
In a directory images, images are named like - 1_foo.png, 2_foo.png, 14_foo.png, etc.
The images are OCR'd and the text extract is stored in a dict by the code below -
data_dict = {}
for i in os.listdir(images):
if str(i[1]) != '_':
k = str(i[:2]) # Get first two characters of image name and use as 'key'
else:
k = str(i[:1]) # Get first character of image name and use 'key'
# Intiates a list for each key and allows storing multiple entries
data_dict.setdefault(k, [])
data_dict[k].append(pytesseract.image_to_string(i))
The code performs as expected.
The images can have varying numbers in their name ranging from 1 to 99.
Can this be reduced to a dictionary comprehension?
No. Each iteration in a dict comprehension assigns a value to a key; it cannot update an existing value list. Dict comprehensions aren't always better--the code you wrote seems good enough. Although maybe you could write
data_dict = {}
for i in os.listdir(images):
k = i.partition("_")[0]
image_string = pytesseract.image_to_string(i)
data_dict.setdefault(k, []).append(image_string)
Yes. Here's one way, but I wouldn't recommend it:
{k: d.setdefault(k, []).append(pytesseract.image_to_string(i)) or d[k]
for d in [{}]
for k, i in ((i.split('_')[0], i) for i in names)}
That might be as clean as I can make it, and it's still bad. Better use a normal loop, especially a clean one like Dennis's.
Slight variation (if I do the abuse once, I might as well do it twice):
{k: d.setdefault(k, []).append(pytesseract_image_to_string(i)) or d[k]
for d in [{}]
for i in names
for k in i.split('_')[:1]}
Edit: kaya3 now posted a good one using a dict comprehension. I'd recommend that over mine as well. Mine are really just the dirty results of me being like "Someone said it can't be done? Challenge accepted!".
In this case itertools.groupby can be useful; you can group the filenames by the numeric part. But making it work is not easy, because the groups have to be contiguous in the sequence.
That means before we can use groupby, we need to sort using a key function which extracts the numeric part. That's the same key function we want to group by, so it makes sense to write the key function separately.
from itertools import groupby
def image_key(image):
return str(image).partition('_')[0]
images = ['1_foo.png', '2_foo.png', '3_bar.png', '1_baz.png']
result = {
k: list(v)
for k, v in groupby(sorted(images, key=image_key), key=image_key)
}
# {'1': ['1_foo.png', '1_baz.png'],
# '2': ['2_foo.png'],
# '3': ['3_bar.png']}
Replace list(v) with list(map(pytesseract.image_to_string, v)) for your use-case.
In Python, the content of a list instance can be transformed nicely using a list comprehension—for example:
lst = [item for item in lst if item.startswith( '_' )]
Furthermore, if you want to change lst in place (so that any existing references to the old lst now see the updated content) then you can use the following slice-assignment idiom:
lst[:] = [item for item in lst if item.startswith( '_' )]
My question is: is there any equivalent to this for dict objects? Sure, you can transform content using a dict comprehension—for example:
dct = { key:value for key,value in dct.items() if key.startswith( '_' ) }
but this is not an in-place change—the name dct gets re-bound to a new dict instance. Since dct{:} = ... is not legal syntax, the best I can come up with to change a dict in-place is:
tmp = { key:value for key,value in dct.items() if key.startswith( '_' ) }
dct.clear()
dct.update(tmp)
Any time I see a simple operation require three lines and a variable called tmp, I tend to think it's not very Pythonic. Is there any slicker way to do this?
edit: I guess the filtering and the assignment are separate issues. Really this question is about the assignment—I want to allow arbitrary transformations of the dict content, rather than necessarily just selective removal of some keys.
I have a list like this-
send_recv_pairs = [(['produce_send'], ['consume_recv']), (['Send'], ['Recv']), (['sender2'], ['receiver2'])]
I want something like
[ {['produce_send']:['consume_recv']},{['Send']:['Recv']},{['sender2']:['receiver2']}
How to do this?
You can not use list as the key of dictionary.
This Article explain the concept,
https://wiki.python.org/moin/DictionaryKeys
To be used as a dictionary key, an object must support the hash function (e.g. through hash), equality comparison (e.g. through eq or cmp), and must satisfy the correctness condition above.
And
lists do not provide a valid hash method.
>>> d = {['a']: 1}
TypeError: unhashable type: 'list'
If you want to specifically differentiate the key values you can use tuple as they hash able
{ (i[0][0], ): (i[1][0], ) for i in send_recv_pairs}
{('Send',): ('Recv',),
('produce_send',): ('consume_recv',),
('sender2',): ('receiver2',)}
You can't have lists as keys, only hashable types - strings, numbers, None and such.
If you still want to use a dictionary knowing that, then:
d={}
for tup in send_recv_pairs:
d[tup[0][0]]=tup[1]
If you want the value to be string as well, use tup[1][0] instead of tup[1]
As a one liner:
d={tup[0][0]]:tup[1] for tup in list} #tup[1][0] if you want values as strings
You can check it over here, in the second way of creating distionary.
https://developmentality.wordpress.com/2012/03/30/three-ways-of-creating-dictionaries-in-python/
A Simple way of doing it,
First of all, your tuple is tuple of lists, so better change it to tuple of strings (It makes more sense I guess)
Anyway simple way of working with your current tuple list can be like :
mydict = {}
for i in send_recv_pairs:
print i
mydict[i[0][0]]= i[1][0]
As others pointed out, you cannot use list as key to dictionary. So the term i[0][0] first takes the first element from the tuple - which is a list- and then the first element of list, which is the only element anyway for you.
Do you mean like this?
send_recv_pairs = [(['produce_send'], ['consume_recv']),
(['Send'], ['Recv']),
(['sender2'], ['receiver2'])]
send_recv_dict = {e[0][0]: e[1][0] for e in send_recv_pairs}
Resulting in...
>>> {'produce_send': 'consume_recv', 'Send': 'Recv', 'sender2': 'receiver2'}
As mentioned in other answers, you cannot use a list as a dictionary key as it is not hashable (see links in other answers).
You can therefore just use the values in your lists (assuming they stay as simple as in your example) to create the following two possibilities:
send_recv_pairs = [(['produce_send'], ['consume_recv']), (['Send'], ['Recv']), (['sender2'], ['receiver2'])]
result1 = {}
for t in send_recv_pairs:
result1[t[0][0]] = t[1]
# without any lists
result2 = {}
for t in send_recv_pairs:
result2[t[0][0]] = t[1][0]
Which respectively gives:
>>> result1
{'produce_send': ['consume_recv'], 'Send': ['Recv'], 'sender2': ['receiver2']}
>>> result2
{'produce_send': 'consume_recv', 'Send': 'Recv', 'sender2': 'receiver2'}
Try like this:
res = { x[0]: x[1] for x in pairs } # or x[0][0]: x[1][0] if you wanna store inner values without list-wrapper
It's for Python 3 and when keys are unique. If you need collect list of values per key, instead of single value, than you may use something like itertools.groupby or map+reduce. Wrote about this in comments and I'll provide example.
And yes, list cannot store key-values, only dict's, but maybe it's just typo in question.
You can not use list as the dictionary key, but instead you may type-cast it as tuple to create the dict object.
Below is the sample example using a dictionary comprehension:
>>> send_recv_pairs = [(['produce_send'], ['consume_recv']), (['Send'], ['Recv']), (['sender2'], ['receiver2'])]
>>> {tuple(k): v for k, v in send_recv_pairs}
{('sender2',): ['receiver2'], ('produce_send',): ['consume_recv'], ('Send',): ['Recv']}
For details, take a look at: Why can't I use a list as a dict key in python?
However if your nested tuple pairs were not list, but any other hashable object pairs, you may have type-casted it to dict for getting the desired result. For example:
>>> my_list = [('key1', 'value1'), ('key2', 'value2')]
>>> dict(my_list)
{'key1': 'value1', 'key2': 'value2'}
I am new to Python and working on revising some existing code.
There is a JSON string coming into a Python function that looks like this:
{"criteria": {"modelName":"='ALL'", "modelName": "='NEW'","fields":"*"}}
Right now it appears a dictionary is being used to create a string:
crit=data['criteria']
for crit_key in crit
crit_val = crit[crit_key]
sql+ = sql+= ' and ' + crit_key + crit_val
When the sql string is printed, only the last 'modelName' appears. It seems like a dictionary is being used as modelName is a key so the second modelName overwrites the first? I want the "sql" string in the end to contain both modelNames.
edited because of OP comments
Well, if you can't update your JSON and have to deal with.
You can make something like:
data = '{"criteria": {"modelName":"=\'ALL\'", "modelName": "=\'NEW\'","fields":"*"}}'
import json
def dict_raise_on_duplicates(ordered_pairs):
d = {}
duplicated = []
for k, v in ordered_pairs:
if k in d:
if k not in duplicated:
duplicated.append(k)
d[k] = [d[k]] + [v]
else:
d[k].append(v)
else:
d[k] = v
return d
print json.loads(data, object_pairs_hook=dict_raise_on_duplicates)
In this example, data is the JSON string with duplicated keys.
According to json.loads allows duplicate keys in a dictionary, overwriting the first value I just force json.load to handle duplicate keys.
If a duplicated key is spotted, it will create a new list containing current key data and add new value.
After, it will only append news values in created list.
Output:
{u'criteria': {u'fields': u'*', u'modelName': [u"='ALL'", u"='NEW'"]}}
You will have to update your code anyway, but you now can handle it.