split dictionary in two 'by reference', not copying values - python

How can one split a dictionary in two without creating new copies of the dictionary values?
original_dict = {'foo':'spam', 'bar':'eggs'}
keys_for_dict1 = ['foo']
dict1 = {}; dict2 = {}
for key in original_dict:
if key in keys_for_dict1:
dict1[key] = original_dict[key]
else:
dict2[key] = original_dict[key]
This has created duplicates of the values of original_dict, so that modifying it does not modify dict1 or dict2. (Iterating through original_dict.items() instead gives the same behaviour.) The values in my dictionary are very large objects which I want to avoid recreating. How can I capture the behaviour of new_dict = original_dict, which copies by reference, for this splitting-in-two scenario? Thanks.

Your code does what you want. To make it clearer, let's use lists for the values.
original_dict = {'foo':[1,2,3], 'bar':[4,5,6]}
keys_for_dict1 = ['foo']
dict1 = {}; dict2 = {}
for key in original_dict:
if key in keys_for_dict1:
dict1[key] = original_dict[key]
else:
dict2[key] = original_dict[key]
dict1['foo'][2]='a'
print dict1['foo']
print original_dict['foo']
The output is
[1, 2, 'a']
[1, 2, 'a']
So when I edited dict1['foo'] it also changed original_dict['foo'] because you have not created new copies. It is the same object.
As a general rule in python, unless you put in the extra effort, when you do something like a=b, if b is some object, then you're actually making them point to the same thing, not be pointers to two initially identical objects. You do not create a new copy unless you go to extra effort.

Related

Python: Why does the dict.fromkeys method not produce a working dicitonary

I had the following dictionary:
ref_range = range(0,100)
aas = list("ACDEFGHIKLMNPQRSTVWXY*")
new_dict = {}
new_dict = new_dict.fromkeys(ref_range,{k:0 for k in aas})
Then I added a 1 to a specific key
new_dict[30]['G'] += 1
>>>new_dict[30]['G']
1
but
>>>new_dict[31]['G']
1
What is going on here? I only incremented the nested key 30, 'G' by one.
Note: If I generate the dictionary this way:
new_dict = {}
for i in ref_range:
new_dict[i] = {a:0 for a in aas}
Everything behaves fine. I think this is a similar question here, but I wanted to know a bit about why this happening rather than how to solve it.
fromkeys(S, v) sets all of the keys in S to the same value v. Meaning that all of the keys in your dictionary new_dict refer to the same dictionary object, not to their own copies of that dictionary.
To set each to a different dict object you cannot use fromkeys. You need to just set each key to a new dict in a loop.
Besides what you have you could also do
{i: {a: 0 for a in aas} for i in ref_range}

Python: List as a key in dictionary -

I am trying to read multiple files with very similar data. Each line of this data has a accessor_key and a value assosciated with it. I am trying to create a dictionary with the accessor_key as the dictionary key and as the dictionary value - a list of all the values read so far.
My code looks like this:
with open(ind_file, "r") as r:
for line in r:
nline = line.strip()
spl = nline.split(",")
if agg_d.has_key(spl[0]):
key = spl[0]
val = spl[1]
dummy = agg_d[key]
dummy.append(val)
agg_d[key] = dummy
print key, agg_d[key]
else:
print "Something is wrong"
print agg_d
print spl[0]
print spl[1]
As you can see I want the value to get bigger every time, (the list increases in size by 1 every iteration) and store it back to the dictionary.
However when I run this program, all keys in the dictionary take on the value of the list.
So for example in the beginning of the program the dictionary is :
agg_d = {'some_key': [], 'another_key': []}
After running it once it becomes:
agg_d = {'some_key': ['1'], 'another_key': ['1']}
When it should be just:
agg_d = {'some_key': ['1'], 'another_key': []}
EDIT: I found the work around I was looking for. I simply did:
with open(ind_file, "r") as r:
for line in r:
nline = line.strip()
spl = nline.split(",")
if agg_d.has_key(spl[0]):
key = spl[0]
val = spl[1]
dummy = agg_d[key]
ad = dummy[:]
ad.append(val)
agg_d[key] = ad
print key, agg_d[key]
else:
print "Something is wrong"
print agg_d
print spl[0]
print spl[1]
But I would still like to know why this is happening at all. Is 'dummy' referenced to all the values of the dictionary? I am running this with Python 2.7.
Is 'dummy' referenced to all the values of the dictionary? I am running this with Python 2.7.
Yes. You've added a reference to the list, and there can be multiple references to that same list as you have observed. To illustrate this simply, try this:
dummy = [1,2,3] # creates a list object and assigns reference to the name 'dummy'
d = dict()
d['some key'] = dummy # creates the key 'some key' in the dictionary and assigns its value as the reference to the name 'dummy'
dummy.append(4) # mutates the list referred to by name 'dummy'
# at this point, all references to that object have mutated similarly
print d['some key']
You will observe the following output:
>>> [1,2,3,4]
Your workaround is OK, but you could improve:
with open(ind_file, "r") as r:
for line in r:
spl = line.strip().split(",")
key, val = spl[0], spl[1]
if key in agg_d:
agg_d[key] = agg_d[key][:].append(val)
print key, agg_d[key]
else:
print "Something is wrong"
print agg_d
print spl[0]
print spl[1]
agg_d[key] = agg_d[key][:].append(val)
This does not mutate your dummy list in place, and reassigns the value to the dictionary. Also avoids some unnecessary variables like nline and ad and dummy.
It looks like agg_d is already initialised with your expected keys. You don't show how this is done, but I'm guessing that all of the initial values are in fact the same list - to which you append values in the code above.
If you initialise agg_d with a new list per key, then the problem should go away. You may be able to do this with a dictionary comprehension:
>>> keys = ["a", "b", "c"]
>>> agg_d = {k:[] for k in keys}
>>> agg_d["a"].append(1)
>>> agg_d
{'a': [1], 'c': [], 'b': []}
Alternatively, depending on your needs, you could initialise each entry on demand as you encounter each key when reading the file.
Your workaround works because it replaces the original list with a new list and removes the shared reference.
The issue is that by default Python just adds a reference to the list as the dict value, not the list itself. So dict values are actually the bunch of pointers to the same object. You need to explicitly copy the list using either dummy[:] as you suggest in comment, or copy.deepcopy() to be more explicit.

Dividing dictionary into nested dictionaries, based on the key's name on Python 3.4

I have the following dictionary (short version, real data is much larger):
dict = {'C-STD-B&M-SUM:-1': 0, 'C-STD-B&M-SUM:-10': 4.520475, 'H-NSW-BAC-ART:-9': 0.33784000000000003, 'H-NSW-BAC-ART:0': 0, 'H-NSW-BAC-ENG:-59': 0.020309999999999998, 'H-NSW-BAC-ENG:-6': 0,}
I want to divide it into smaller nested dictionaries, depending on a part of the key name.
Expected output would be:
# fixed closing brackets
dict1 = {'C-STD-B&M-SUM: {'-1': 0, '-10': 4.520475}}
dict2 = {'H-NSW-BAC-ART: {'-9': 0.33784000000000003, '0': 0}}
dict3 = {'H-NSW-BAC-ENG: {'-59': 0.020309999999999998, '-6': 0}}
Logic behind is:
dict1: if the part of the key name is 'C-STD-B&M-SUM', add to dict1.
dict2: if the part of the key name is 'H-NSW-BAC-ART', add to dict2.
dict3: if the part of the key name is 'H-NSW-BAC-ENG', add to dict3.
Partial code so far:
def divide_dictionaries(dict):
c_std_bem_sum = {}
for k, v in dict.items():
if k[0:13] == 'C-STD-B&M-SUM':
c_std_bem_sum = k[14:17], v
What I'm trying to do is to create the nested dictionaries that I need and then I'll create the dictionary and add the nested one to it, but I'm not sure if it's a good way to do it.
When I run the code above, the variable c_std_bem_sum becomes a tuple, with only two values that are changed at each iteration. How can I make it be a dictionary, so I can later create another dictionary, and use this one as the value for one of the keys?
One way to approach it would be to do something like
d = {'C-STD-B&M-SUM:-1': 0, 'C-STD-B&M-SUM:-10': 4.520475, 'H-NSW-BAC-ART:-9': 0.33784000000000003, 'H-NSW-BAC-ART:0': 0, 'H-NSW-BAC-ENG:-59': 0.020309999999999998, 'H-NSW-BAC-ENG:-6': 0,}
def divide_dictionaries(somedict):
out = {}
for k,v in somedict.items():
head, tail = k.split(":")
subdict = out.setdefault(head, {})
subdict[tail] = v
return out
which gives
>>> dnew = divide_dictionaries(d)
>>> import pprint
>>> pprint.pprint(dnew)
{'C-STD-B&M-SUM': {'-1': 0, '-10': 4.520475},
'H-NSW-BAC-ART': {'-9': 0.33784000000000003, '0': 0},
'H-NSW-BAC-ENG': {'-59': 0.020309999999999998, '-6': 0}}
A few notes:
(1) We're using nested dictionaries instead of creating separate named dictionaries, which aren't convenient.
(2) We used setdefault, which is a handy way to say "give me the value in the dictionary, but if there isn't one, add this to the dictionary and return it instead.". Saves an if.
(3) We can use .split(":") instead of hardcoding the width, which isn't very robust -- at least assuming that's the delimiter, anyway!
(4) It's a bad idea to use dict, the name of a builtin type, as a variable name.
That's because you're setting your dictionary and overriding it with a tuple:
>>> a = 1, 2
>>> print a
>>> (1,2)
Now for your example:
>>> def divide_dictionaries(dict):
>>> c_std_bem_sum = {}
>>> for k, v in dict.items():
>>> if k[0:13] == 'C-STD-B&M-SUM':
>>> new_key = k[14:17] # sure you don't want [14:], open ended?
>>> c_std_bem_sum[new_key] = v
Basically, this grabs the rest of the key (or 3 characters, as you have it, the [14:None] or [14:] would get the rest of the string) and then uses that as the new key for the dict.

How to delete just one key from a python dictionary?

I am creating a dictionary, that potentially has keys that are the same, but the values of each key are different. Here is the example dict:
y = {44:0, 55.4:1, 44:2, 5656:3}
del y[44]
print y
{5656: 3, 55.399999999999999: 1}
I would like to be able to do something like:
del y[44:0]
Or something of that nature.
You never had duplicate keys:
>>> y = {44:0, 55.4:1, 44:2, 5656:3}
>>> y
{5656: 3, 55.4: 1, 44: 2}
A dictionary can be initialised with duplicate keys, but all but one will be ignored.
Just an idea- instead of having scalar values, why not a collection of some kind? Perhaps even a set if they are unique values:
myDict = {44:set([1, 2, 3])}
So to add or remove an object:
myDict[44].add(1)
myDict[44].remove(1)
For adding a new key:
if newKey not in myDict:
myDict[newKey] = set() # new empty set
Your question is moot. In your y declaration, the 44:2 wouldn't go alongside 44:0, it would overwrite it. You'd need to use a different key if you want both values in the dictionary.
Before deleting:--
>>>dic={"a":"A","b":"B","c":"C","d":"D","e":"E"}
>>> del dic['a']
after deleting :-
>>>dic
{'b': 'B','c': 'C','d': 'D','e': 'E'}

Merge 2 dictionaries in Python

I have 2 dictionaries.
dict1={('SAN RAMON', 'CA'): 1, ('UPLAND', 'CA'): 4, ('POUGHKEESIE', 'NY'): 3, ('CATTANOOGA', 'TN'): 1}
dict2={('UPLAND', 'CA'): 5223, ('PORT WASHING', 'WI'): 11174, ('PORT CLINTON', 'OH'): 6135, ('GRAIN VALLEY', 'MO'): 10352, ('GRAND JUNCTI', 'CO'): 49688, ('FAIRFIELD', 'IL'): 5165}
These are just samples, in reality each dict has hundreds of entries. I am trying to merge the two dictionaries and create dict 3 that contains {dict1.values(): dict2.values()} but only if that city appears in both dicts. So, one entry in dict3 would look like
{4:5223} # for 'UPLAND', 'CA' since it appears in both dict1 and dict2
This is just a small step in a larger function I am writing. I was going to try something like :
for item in dict1.keys():
if item not in dict2.keys():
del item
return dict[(dict1.keys())=(dict2.keys())]
I can't figure out how to make sure the number of complaints from dict1 matches the same city it is being referred to in dict2.
Here's what I think you want (demo):
dict3 = dict((dict1[key], dict2[key]) for key in dict1 if key in dict2)
Expanded a little, it looks like this:
dict3 = {}
for key in dict1:
if key in dict2:
dict3[dict1[key]] = dict2[key]
The common keys are:
set(dict1.keys()) & set(dict2.keys())
create dict 3 that contains {dict1.values(): dict2.values()}
This doesn't make sense, dictionaries are key-value pairs... what do you really want? Tip:
dict3 = {}
for k in set(dict1.keys()) & set(dict2.keys()):
dict3[dict1[k]]=dict2[k]
{4: 5223}
The straightforward way would be to check each key in one for membership in the other:
result = {}
for key in dict1:
if key in dict2:
result[dict1[key]] = dict2[key]
You could also try converting them into a set or frozenset and taking their intersection, but it's not clear to me whether that will be faster or not:
keys_in_both = frozenset(dict1) & frozenset(dict2)
result = dict((dict1[key], dict2[key]) for key in keys_in_both)

Categories

Resources