Python - Load multiple Pickle objects into a single dictionary - python

So my problem is this... I have multiple Pickle object files (which are Pickled Dictionaries) and I want to load them all, but essentially merge each dictionary into a single larger dictionary.
E.g.
I have pickle_file1 and pickle_file2 both contain dictionaries. I would like the contents of pickle_file1 and pickle_file2 loaded into my_dict_final.
EDIT
As per request here is what i have so far:
for pkl_file in pkl_file_list:
pickle_in = open(pkl_file,'rb')
my_dict = pickle.load(pickle_in)
pickle_in.close()
In essence, it works, but just overwrites the contents of my_dict rather than append each pickle object.
Thanks in advance for the help.

my_dict_final = {} # Create an empty dictionary
with open('pickle_file1', 'rb') as f:
my_dict_final.update(pickle.load(f)) # Update contents of file1 to the dictionary
with open('pickle_file2', 'rb') as f:
my_dict_final.update(pickle.load(f)) # Update contents of file2 to the dictionary
print my_dict_final

You can use the dict.update function.
pickle_dict1 = pickle.load(picke_file1)
pickle_dict2 = pickle.load(picke_file2)
my_dict_final = pickle_dict1
my_dict_final.update(pickle_dict2)
Python Standard Library Docs

#Nunchux, #Vikas Ojha If the dictionaries happen to have common keys, the update method will, unfortunately, overwrite the values for those common keys. Example:
>>> dict1 = {'a': 4, 'b': 3, 'c': 0, 'd': 4}
>>> dict2 = {'a': 1, 'b': 8, 'c': 5}
>>> All_dict = {}
>>> All_dict.update(dict1)
>>> All_dict.update(dict2)
>>> All_dict
{'a': 1, 'b': 8, 'c': 5, 'd': 4}
If you'd like to avoid this and keep adding the counts of common keys, one option is to use the following strategy. Applied to your example, here is a minimal working example:
import os
import pickle
from collections import Counter
dict1 = {'a': 4, 'b': 3, 'c': 0, 'd': 4}
dict2 = {'a': 1, 'b': 8, 'c': 5}
# just creating two pickle files:
pickle_out = open("dict1.pickle", "wb")
pickle.dump(dict1, pickle_out)
pickle_out.close()
pickle_out = open("dict2.pickle", "wb")
pickle.dump(dict2, pickle_out)
pickle_out.close()
# Here comes:
pkl_file_list = ["dict1.pickle", "dict2.pickle"]
All_dict = Counter({})
for pkl_file in pkl_file_list:
if os.path.exists(pkl_file):
pickle_in = open(pkl_file, "rb")
dict_i = pickle.load(pickle_in)
All_dict = All_dict + Counter(dict_i)
print (dict(All_dict))
This will happily give you:
{'a': 5, 'b': 11, 'd': 4, 'c': 5}

Related

How to merge two json objects coming from a file

I have two json objects coming from a file. Those two objects make one record. They are of different length. I was using pandas.read_json(), but didnt work.
Here is an example:
input:
{"a":1,"b":2,"c":3}{"x":[100],"y":"123"}
expected output:
{
"a":1,
"b":2,
"c":3,
"x":[100],
"y":"123"
}
IIUC, You want to read two JSON and create a new JSON from them.
import json
new_json = {}
for json_file in ['js1.json', 'js2.json']:
with open(json_file) as f:
d = json.load(f)
new_json.update(d)
print(new_json)
# {'a': 1, 'b': 2, 'c': 3, 'x': [100], 'y': '123'}
# create a new json that contains two old json
res = json.dumps(new_json)
Update You can use ast.literal_eval, If two JSON in one file.
import json
import ast
# jss.json -> {"a":1,"b":2,"c":3}{"x":[100],"y":"123"}
new_json = {}
for json_file in ['jss.json']:
with open(json_file) as f:
jsons = f.read()
for js in jsons.split('}')[:-1]:
st = js+'}'
d = ast.literal_eval(st)
new_json.update(d)
print(new_json)
# {'a': 1, 'b': 2, 'c': 3, 'x': [100], 'y': '123'}
# create a new json that contains two old json
res = json.dumps(new_json)

how to add multiple dictionary to a json file in python?

How can I add multiple dictionary in a JSON file?
I want to add 1 or 2 dictionaries once and after a while to add 1 or 2 or 3 dictionaries in same JSON file.
Exemple:
dict1 = {'a': 1, 'b':2}
-> I want to add it to a 'test.json' file and after a while I want to add the dictionary
dict2 = {'c': 1, 'd':2}
dict3 = {'e': 1, 'f':2}
-> and after a while I want to add this 2 for example
EDIT
import json
dict1 = {'a': 1, 'b': 1}
dict2 = {'c': 2, 'd': 2}
dict3 = {'e': 3, 'f': 3}
list1 = []
list1.append(dict1)
with open('testjson_dict.json', 'a') as f:
json.dump(list1, f)
-> this is first output
[
{
"a": 1,
"b": 1
}
]
-> than I append dict2 to list1, and this is the output, it create a second list and put dict2 in it, how can i change the code to put dict2 in my first list?
[
{
"a": 1,
"b": 1
}
][
{
"c": 2,
"d": 2
}
]
I am assuming you want to store these dicts as a list in json, so the final result would be:
[
{'a': 1, 'b':2},
{'c': 1, 'd':2},
{'e': 1, 'f':2}
]
Here is a possible workflow. Start with dict_list = [dict1].
Make sure you import json. Write dict_list to test.json
with open('test.json', 'w', encoding='utf-8') as json_file:
json.dump(dict_list, json_file)
Read the contents of test.json into a Python list.
with open('test.json', encoding='utf-8') as json_file:
dicts = json.load(json_file)
Add dict2 and dict3 to the list you just read in.
Overwrite test.json with the resulting list (like in step 1).
Now test.json should incude a list of the 3 dicts.
You can concat the new data as a list with + in that way:
import json
# write first file
dict_list = [{'a': 1, 'b':2}]
with open('test.json', 'w', encoding='utf-8') as json_file:
json.dump(dict_list, json_file)
# concat to readed file and overvwrite
with open('test.json', encoding='utf-8') as json_file:
dicts = json.load(json_file)
dicts += [{'c': 1, 'd':2}, {'e': 1, 'f':2}] # list concatenation operation
with open('test.json', 'w', encoding='utf-8') as json_file:
json.dump(dicts, json_file)

Python adding data in multi-dimensional dictionary

while (E > 0):
line = raw_input("enter edges : ")
data = line.split()
mygraph[data[0]] = {data[1] : data[2]} //this line
print mygraph
E-=1
Desired data structure:
mygraph = {
'B': {'A': 5, 'D': 1, 'G': 2}
'A': {'B': 5, 'D': 3, 'E': 12, 'F' :5}}
i want to add multiple entries for same key like
but mycode is taking only one value for one node and then replacing
the entries.How to do that?
You need to first add an empty dictionary for the key data[0] if it doesn't already exist, then add the values to it. Otherwise you just wipe out it out every time you loop.
The two usual ways are either to use setdefault on a normal dictionary:
mygraph.setdefault(data[0], {})[data[1]] = data[2]
or use collections.defaultdict where the default is an empty dictionary:
>>> from collections import defaultdict
>>> mygraph = defaultdict(dict)
>>> edges = [[1, 2, 3], [1, 3, 6]]
>>> for edge in edges:
... mygraph[edge[1]][edge[2]] = edge[3]
>>> mygraph
{1: {2: 3,
3: 6}}
Replace this line:
mygraph[data[0]] = {data[1] : data[2]}
with these:
if not data[0] in mygraph:
mygraph[data[0]] = {}
mygraph[data[0]][data[1]] = data[2]

How to find the difference between two lists of dictionaries checking the key-value pair

I've already searched for a solution to my problem but with no success. Part of the solution for my problem is here, but this does not solve it all.
I have two lists of dictionaries like this - each dictionary is written to a csv file but I read the contents to the following variables:
list1 = [{a:1, b:2, c:3}, {a:4, b:5, c:6}, {a:7, b:8, c:9}]
list2 = [{b:2, a:1, c:3}, {c:6, b:5, a:4}, {b:8, a:7, c:9}]
Using the solution of the link above, ie:
>>> import itertools
>>> a = [{'a': '1'}, {'c': '2'}]
>>> b = [{'a': '1'}, {'b': '2'}]
>>> intersec = [item for item in a if item in b]
>>> sym_diff = [item for item in itertools.chain(a,b) if item not in intersec]
I get no matches because the order of the dictionary is different. But in fact, both lists are the same. How can I check this? Do I have to sort the dictionaries before writing them to the csv file? Can this be a solution?
This is my major problem at the moment but I have another issue also. It would be great to be able to make this match check but ignoring one or more keys defined by me. Is this also possible?
EDIT: I have the dicitonaries in a csv file and I'm reading them with the following code:
def read_csv_file(self, filename):
'''Read CSV file and return its content as a Python list.'''
f = open(filename, 'r')
csvfile = csv.reader(f)
f.close
return [row for row in csvfile]
This is very important because I think the problem is that after reading the values from the csv it's not dictionaries anymore, so the order has to be the same.
EDIT2: sample of the csv file (3 lines, it's creating an empty line but that's not an issue...)
"{u'Deletion': '0', u'Source': 'Not Applicable', u'Status': ''}"
"{u'Deletion': '0', u'Source': 'Not Applicable', u'Status': ''}"
Part of this solution was found by OP as per our last CHAT conversation, it was to convert a string into dictionary using ast module.
Now using this module to convert every row read by the csv.reader() as it returns a list of strings, which would be a list of one string in case of OP's CVS file, then append this dictionary into a list. After that using list comprehension with itertools.chain, we can get the difference between the two lists.
import csv
import ast
import itertools
def csvToList(myCSVFile):
'''This function is used to convert strings returned by csv.reader() into List of dictionaries'''
f = open(myCSVFile, 'r')
l = []
try:
reader = csv.reader(f)
for row in reader:
if row: #as you mentioned in your 2nd edit that you could have empty rows.
l.append(ast.literal_eval(row[0]))
finally:
f.close()
return l
list1 = csvToList('myCSV1.csv')
list2 = csvToList('myCSV2.csv')
l1_sub_l2 = [d for d in list1 if d not in list2]
l2_sub_l1 = [d for d in list2 if d not in list1]
list_difference = list(itertools.chain(l1_sub_l2, l2_sub_l1))
You need to double check your code. I'm not getting the issue you're bringing up.
list1 = [{a:1, b:2, c:3}, {a:4, b:5, c:6}, {a:7, b:8, c:9}]
list2 = [{b:2, a:1, c:3}, {c:6, b:5, a:4}, {b:8, a:7, c:9}]
list1 = [{'a':1, 'b':2, 'c':3}, {'a':4, 'b':5, 'c':7}, {'a':7, 'b':8, 'c':9}]
list2 = [{'b':2, 'a':1, 'c':3}, {'c':6, 'b':2, 'a':4}, {'b':8, 'a':7, 'c':9}]
intersec = [item for item in list1 if item in list2]
sym_diff = [item for item in itertools.chain(list1,list2) if item not in intersec]
print(intersec)
print(sym_diff)
>>>[{'a': 1, 'c': 3, 'b': 2}, {'a': 4, 'c': 6, 'b': 5}, {'a': 7, 'c': 9, 'b': 8}]
>>>>[]
If I change list1 and list 2 (middle dictionary):
list1 = [{'a':1, 'b':2, 'c':3}, {'a':7, 'b':5, 'c':2}, {'a':7, 'b':8, 'c':9}]
list2 = [{'b':2, 'a':1, 'c':3}, {'c':6, 'b':5, 'a':4}, {'b':8, 'a':7, 'c':9}]
Running same code:
[{'a': 1, 'c': 3, 'b': 2}, {'a': 7, 'c': 9, 'b': 8}]
[{'a': 7, 'c': 2, 'b': 5}, {'a': 4, 'c': 6, 'b': 5}]
The provided code in the link seems to be working fine. The order of the dictionary or a list does not matter in python.
Use a dictionary comprehension instead of a list comprehension in your return.

Dict inside defaultdict being shared across keys

I have a dictionary inside a defaultdict. I noticed that the dictionary is being shared across keys and therefore it takes the values of the last write. How can I isolate those dictionaries?
>>> from collections import defaultdict
>>> defaults = [('a', 1), ('b', {})]
>>> dd = defaultdict(lambda: dict(defaults))
>>> dd[0]
{'a': 1, 'b': {}}
>>> dd[1]
{'a': 1, 'b': {}}
>>> dd[0]['b']['k'] = 'v'
>>> dd
defaultdict(<function <lambda> at 0x7f4b3688b398>, {0: {'a': 1, 'b': {'k': 'v'}}, 1:{'a': 1, 'b': {'k': 'v'}}})
>>> dd[1]['b']['k'] = 'v2'
>>> dd
defaultdict(<function <lambda> at 0x7f4b3688b398>, {0: {'a': 1, 'b': {'k': 'v2'}}, 1: {'a': 1, 'b': {'k': 'v2'}}})
Notice that v was set to v2 for both dictionaries. Why is that? and how to change this behavior without much performance overhead?
When you do dict(defaults) you're not copying the inner dictionary, just making another reference to it. So when you change that dictionary, you're going to see the change everywhere it's referenced.
You need deepcopy here to avoid the problem:
import copy
from collections import defaultdict
defaults = {'a': 1, 'b': {}}
dd = defaultdict(lambda: copy.deepcopy(defaults))
Or you need to not use the same inner mutable objects in successive calls by not repeatedly referencing defaults:
dd = defaultdict(lambda: {'a': 1, 'b': {}})
Your values all contain references to the same object from defaults: you rebuild the outer dict, but not the inner one. Just make a function that creates a new, separate object:
def builder():
return {'a': 1, 'b': {}}
dd = defaultdict(builder)

Categories

Resources