I am trying to read multiple files with very similar data. Each line of this data has a accessor_key and a value assosciated with it. I am trying to create a dictionary with the accessor_key as the dictionary key and as the dictionary value - a list of all the values read so far.
My code looks like this:
with open(ind_file, "r") as r:
for line in r:
nline = line.strip()
spl = nline.split(",")
if agg_d.has_key(spl[0]):
key = spl[0]
val = spl[1]
dummy = agg_d[key]
dummy.append(val)
agg_d[key] = dummy
print key, agg_d[key]
else:
print "Something is wrong"
print agg_d
print spl[0]
print spl[1]
As you can see I want the value to get bigger every time, (the list increases in size by 1 every iteration) and store it back to the dictionary.
However when I run this program, all keys in the dictionary take on the value of the list.
So for example in the beginning of the program the dictionary is :
agg_d = {'some_key': [], 'another_key': []}
After running it once it becomes:
agg_d = {'some_key': ['1'], 'another_key': ['1']}
When it should be just:
agg_d = {'some_key': ['1'], 'another_key': []}
EDIT: I found the work around I was looking for. I simply did:
with open(ind_file, "r") as r:
for line in r:
nline = line.strip()
spl = nline.split(",")
if agg_d.has_key(spl[0]):
key = spl[0]
val = spl[1]
dummy = agg_d[key]
ad = dummy[:]
ad.append(val)
agg_d[key] = ad
print key, agg_d[key]
else:
print "Something is wrong"
print agg_d
print spl[0]
print spl[1]
But I would still like to know why this is happening at all. Is 'dummy' referenced to all the values of the dictionary? I am running this with Python 2.7.
Is 'dummy' referenced to all the values of the dictionary? I am running this with Python 2.7.
Yes. You've added a reference to the list, and there can be multiple references to that same list as you have observed. To illustrate this simply, try this:
dummy = [1,2,3] # creates a list object and assigns reference to the name 'dummy'
d = dict()
d['some key'] = dummy # creates the key 'some key' in the dictionary and assigns its value as the reference to the name 'dummy'
dummy.append(4) # mutates the list referred to by name 'dummy'
# at this point, all references to that object have mutated similarly
print d['some key']
You will observe the following output:
>>> [1,2,3,4]
Your workaround is OK, but you could improve:
with open(ind_file, "r") as r:
for line in r:
spl = line.strip().split(",")
key, val = spl[0], spl[1]
if key in agg_d:
agg_d[key] = agg_d[key][:].append(val)
print key, agg_d[key]
else:
print "Something is wrong"
print agg_d
print spl[0]
print spl[1]
agg_d[key] = agg_d[key][:].append(val)
This does not mutate your dummy list in place, and reassigns the value to the dictionary. Also avoids some unnecessary variables like nline and ad and dummy.
It looks like agg_d is already initialised with your expected keys. You don't show how this is done, but I'm guessing that all of the initial values are in fact the same list - to which you append values in the code above.
If you initialise agg_d with a new list per key, then the problem should go away. You may be able to do this with a dictionary comprehension:
>>> keys = ["a", "b", "c"]
>>> agg_d = {k:[] for k in keys}
>>> agg_d["a"].append(1)
>>> agg_d
{'a': [1], 'c': [], 'b': []}
Alternatively, depending on your needs, you could initialise each entry on demand as you encounter each key when reading the file.
Your workaround works because it replaces the original list with a new list and removes the shared reference.
The issue is that by default Python just adds a reference to the list as the dict value, not the list itself. So dict values are actually the bunch of pointers to the same object. You need to explicitly copy the list using either dummy[:] as you suggest in comment, or copy.deepcopy() to be more explicit.
Related
d={1:'a', 2:'b', 3:'c', 4:'a', 5:'d', 6:'e', 7:'a', 8:'b'}
value = raw_input("Choose a value to be searched: ")
data = ""
if value in d:
data = d.keys["value"]
print(data)
else:
print "There isn't such value in the dictionary"
So I write 'a' and I want to get the key '1'
but it skips "data = d.keys["value"] print(data)" and it prints me the message of "else"
What have I done wrong?
Containment checks for dict check the keys, not the values, and 'a' is a value in the dict, not a key.
The simplest fix would be to change your test to:
if value in d.viewvalues(): # d.values() on Python 3
but that's still sub-optimal; you can't perform efficient (O(1)) lookups in the values of a dict (nor can you do d.keys[value] as you seem to think you can; you'd have to perform a second linear scan to find the key, or perform a more complicated single scan to determine if the value exists and pull the key at the same time).
Really though, it seems like you want your dictionary reversed, with the keys as values and vice-versa. Doing it this way:
d = {1:'a', 2:'b', 3:'c', 4:'a', 5:'d', 6:'e', 7:'a', 8:'b'}
d_inv = {v: k for k, v in d.items()} # Make inverted version of d
value = raw_input("Choose a value to be searched: ")
if value in d_inv:
data = d_inv[value]
print(data)
else:
print "There isn't such value in the dictionary"
you can perform the containment check and lookup efficiently (if d isn't otherwise needed, you can just replace d with the same structure as d_inv and use d instead of d_inv uniformly).
As stated, you need the value, here's an alternative
if any(d[value] for value in d):
Then, d.keys["value"] is actually d[value]
You could do something like this:
d={1:'a', 2:'b', 3:'c', 4:'a', 5:'d', 6:'e', 7:'a', 8:'b'}
value = 'q'
data = [key for key, val in d.items() if val == value]
if len(data) > 0:
print(data)
else:
print "There isn't such value in the dictionary"
Then you would get the results
[1, 4, 7]
I am having trouble converting a 2d list into a 2d dictionary. I haven't worked much with 2d dictionaries prior to this so please bear with me. I am just wondering why this keeps pulling up a KeyError. In this quick example I would want the dictionary to look like {gender: { name: [food, color, number] }}
2dList = [['male','josh','chicken','purple','10'],
['female','Jenny','steak','blue','11']]
dict = {}
for i in range(len(2dList)):
dict[2dList[i][0]][2dList[i][1]] = [2dList[i][2], 2dList[i][3], 2dList[i][4]]
I keep getting the error message: KeyError: 'male'. I know this is how you add keys for a 1d dictionary, but am unsure regarding 2d dictionaries. I always believed it was:
dictionary_name[key1][key2] = value
You can try this :) It will also work if you have more than one male or female in your List
List = [['male','josh','chicken','purple','10'],
['female','Jenny','steak','blue','11']]
d = {}
for l in List:
gender = l[0]
name = l[1]
food = l[2]
color = l[3]
number = l[4]
if gender in d: # if it exists just add new name by creating new key for name
d[gender][name] = [food,color,number]
else: # create new key for gender (male/female)
d[gender] = {name:[food,color,number]}
You are attempting to build a nested dictionary. But are not explicitly initializing the second-layer dictionaries. You need to do this each time, a new key is encountered. Btw, 2dlist is an erroneous way to declare variables in python. This should work for you:
dList = [['male','josh','chicken','purple','10'],
['female','Jenny','steak','blue','11']]
dict = {}
for i in range(len(dList)):
if not dList[i][0] in dict.keys():
dict[dList[i][0]] = {}
dict[dList[i][0]][dList[i][1]] = [dList[i][2], dList[i][3], dList[i][4]]
print(dict)
To get more or less "sane" result use the following (list of dictionaries, each dict is in format {gender: { name: [food, color, number] }}):
l = [['male','josh','chicken','purple','10'], ['female','Jenny','steak','blue','11']]
result = [{i[0]: {i[1]:i[2:]}} for i in l]
print(result)
The output:
[{'male': {'josh': ['chicken', 'purple', '10']}}, {'female': {'Jenny': ['steak', 'blue', '11']}}]
You are getting a KeyError because you are trying to access a non-existing entry on the dictionary with male as the key
You can use defaultdict instead of dict.
from collections import defaultdict
2dList = [['male','josh','chicken','purple','10'],
['female','Jenny','steak','blue','11']]
dict = defaultdict(list)
for i in range(len(2dList)):
dict[2dList[i][0]][2dList[i][1]] = [2dList[i][2], 2dList[i][3], 2dList[i][4]]
Try this
twodList = [['male','josh','chicken','purple','10'],
['female','Jenny','steak','blue','11']]
dic = {twodList[i][0]: {twodList[i][1]: twodList[i][2:]} for i in range(len(twodList))}
As someone mentioned in the comments, you cannot have a variable starting with a number.
list1=[['male','josh','chicken','purple','10'],['female','Jenny','steak','blue','11'],['male','johnson','chicken','purple','10'],['female','jenniffer','steak','blue','11']]
dict = {}
for i in range(len(list1)):
if list1[i][0] in dict:
if list1[i][1] in dict[list1[i][0]]:
dict[list1[i][0]][list1[i][1]] = [list1[i][2], list1[i][3], list1[i][4]]
else:
dict[list1[i][0]][list1[i][1]] = {}
dict[list1[i][0]][list1[i][1]] = [list1[i][2], list1[i][3], list1[i][4]]
else:
dict[list1[i][0]] = {}
if list1[i][1] in dict[list1[i][0]]:
dict[list1[i][0]][list1[i][1]] = [list1[i][2], list1[i][3], list1[i][4]]
else:
dict[list1[i][0]][list1[i][1]] = {}
dict[list1[i][0]][list1[i][1]] = [list1[i][2], list1[i][3], list1[i][4]]
print dict
Above one gives below output:
{"male":{"josh":["chicken","purple","10"],"johnson":["chicken","purple","10"]},"female":{"jenniffer":["steak","blue","11"],"Jenny":["steak","blue","11"]}}
I have the following dictionary (short version, real data is much larger):
dict = {'C-STD-B&M-SUM:-1': 0, 'C-STD-B&M-SUM:-10': 4.520475, 'H-NSW-BAC-ART:-9': 0.33784000000000003, 'H-NSW-BAC-ART:0': 0, 'H-NSW-BAC-ENG:-59': 0.020309999999999998, 'H-NSW-BAC-ENG:-6': 0,}
I want to divide it into smaller nested dictionaries, depending on a part of the key name.
Expected output would be:
# fixed closing brackets
dict1 = {'C-STD-B&M-SUM: {'-1': 0, '-10': 4.520475}}
dict2 = {'H-NSW-BAC-ART: {'-9': 0.33784000000000003, '0': 0}}
dict3 = {'H-NSW-BAC-ENG: {'-59': 0.020309999999999998, '-6': 0}}
Logic behind is:
dict1: if the part of the key name is 'C-STD-B&M-SUM', add to dict1.
dict2: if the part of the key name is 'H-NSW-BAC-ART', add to dict2.
dict3: if the part of the key name is 'H-NSW-BAC-ENG', add to dict3.
Partial code so far:
def divide_dictionaries(dict):
c_std_bem_sum = {}
for k, v in dict.items():
if k[0:13] == 'C-STD-B&M-SUM':
c_std_bem_sum = k[14:17], v
What I'm trying to do is to create the nested dictionaries that I need and then I'll create the dictionary and add the nested one to it, but I'm not sure if it's a good way to do it.
When I run the code above, the variable c_std_bem_sum becomes a tuple, with only two values that are changed at each iteration. How can I make it be a dictionary, so I can later create another dictionary, and use this one as the value for one of the keys?
One way to approach it would be to do something like
d = {'C-STD-B&M-SUM:-1': 0, 'C-STD-B&M-SUM:-10': 4.520475, 'H-NSW-BAC-ART:-9': 0.33784000000000003, 'H-NSW-BAC-ART:0': 0, 'H-NSW-BAC-ENG:-59': 0.020309999999999998, 'H-NSW-BAC-ENG:-6': 0,}
def divide_dictionaries(somedict):
out = {}
for k,v in somedict.items():
head, tail = k.split(":")
subdict = out.setdefault(head, {})
subdict[tail] = v
return out
which gives
>>> dnew = divide_dictionaries(d)
>>> import pprint
>>> pprint.pprint(dnew)
{'C-STD-B&M-SUM': {'-1': 0, '-10': 4.520475},
'H-NSW-BAC-ART': {'-9': 0.33784000000000003, '0': 0},
'H-NSW-BAC-ENG': {'-59': 0.020309999999999998, '-6': 0}}
A few notes:
(1) We're using nested dictionaries instead of creating separate named dictionaries, which aren't convenient.
(2) We used setdefault, which is a handy way to say "give me the value in the dictionary, but if there isn't one, add this to the dictionary and return it instead.". Saves an if.
(3) We can use .split(":") instead of hardcoding the width, which isn't very robust -- at least assuming that's the delimiter, anyway!
(4) It's a bad idea to use dict, the name of a builtin type, as a variable name.
That's because you're setting your dictionary and overriding it with a tuple:
>>> a = 1, 2
>>> print a
>>> (1,2)
Now for your example:
>>> def divide_dictionaries(dict):
>>> c_std_bem_sum = {}
>>> for k, v in dict.items():
>>> if k[0:13] == 'C-STD-B&M-SUM':
>>> new_key = k[14:17] # sure you don't want [14:], open ended?
>>> c_std_bem_sum[new_key] = v
Basically, this grabs the rest of the key (or 3 characters, as you have it, the [14:None] or [14:] would get the rest of the string) and then uses that as the new key for the dict.
How can one split a dictionary in two without creating new copies of the dictionary values?
original_dict = {'foo':'spam', 'bar':'eggs'}
keys_for_dict1 = ['foo']
dict1 = {}; dict2 = {}
for key in original_dict:
if key in keys_for_dict1:
dict1[key] = original_dict[key]
else:
dict2[key] = original_dict[key]
This has created duplicates of the values of original_dict, so that modifying it does not modify dict1 or dict2. (Iterating through original_dict.items() instead gives the same behaviour.) The values in my dictionary are very large objects which I want to avoid recreating. How can I capture the behaviour of new_dict = original_dict, which copies by reference, for this splitting-in-two scenario? Thanks.
Your code does what you want. To make it clearer, let's use lists for the values.
original_dict = {'foo':[1,2,3], 'bar':[4,5,6]}
keys_for_dict1 = ['foo']
dict1 = {}; dict2 = {}
for key in original_dict:
if key in keys_for_dict1:
dict1[key] = original_dict[key]
else:
dict2[key] = original_dict[key]
dict1['foo'][2]='a'
print dict1['foo']
print original_dict['foo']
The output is
[1, 2, 'a']
[1, 2, 'a']
So when I edited dict1['foo'] it also changed original_dict['foo'] because you have not created new copies. It is the same object.
As a general rule in python, unless you put in the extra effort, when you do something like a=b, if b is some object, then you're actually making them point to the same thing, not be pointers to two initially identical objects. You do not create a new copy unless you go to extra effort.
I am iterating through a file looking for certain attributes in each line, and if the line matches I want to insert it as an item in a list for a particular dictionary key.
For example:
list_of_names = ['aaron', 'boo', 'charlie']
for name in list_of_names
if color contains 'a':
#add here: add to list in dict['has_a']
print dict['has_a']
Should print ['aaron', 'charlie'].
The reason I'm asking this is because I'm not sure how else to create multiple entries for a key in a dictionary.
You can use python's defaultdict for this purpose. It will automatically generate a list as a default value for the dictionary.
from collections import defaultdict
mydict = defaultdict(list)
list_of_names = ['aaron', 'boo', 'charlie']
for name in list_of_names:
if 'a' in name:
mydict['has_a'].append(name)
print mydict['has_a']
Output:
['aaron', 'charlie']
The OP has indicated in a comment that he wants heterogenous values in his dictionary. In that case a defaultdict may not be appropriate and instead he should just special case those two cases.
# Initialize our dictionary with list values for the two special cases.
mydict = {'has_a' : [], 'has_b' : []}
list_of_names = ['aaron', 'boo', 'charlie']
for name in list_of_names:
if 'a' in name:
mydict['has_a'].append(name)
# When not in a special case, just use the dictionary like normal to assign values.
print mydict['has_a']
I think it is a good use case for the setdefault method of the dict object:
d = dict()
for name in list_of_names:
if 'a' in name:
d.setdefault("has_a", []).append(name)
You can use key function to get list of keys and check if adding is needed. Then append as always.
list_of_names = ['aaron', 'boo', 'charlie']
has_dictionary = {}
for name in list_of_names:
if name.find('a') != -1:
if 'has_a' not in has_dictionary.keys():
has_dictionary['has_a'] = []
has_dictionary['has_a'].append(name)
print(has_dictionary['has_a'])