Finding common value pair fields from numerous json in Python

Finding common value pair fields from numerous json in Python - python

I have list of JSON files. Now I intend to find all the common value pairs from all these JSON and copy it to different JSON. Also the common value pairs should be removed from all JSON's.
lets say I have a.json, b.json, c.json ... z.json
Now the common label value pair in all of them is
"Town" : "New York"
then, this common element should be moved to a new JSON file called common.json and also the element should be removed from all the JSON files.
An eg json file would look like:
{
"RepetitionTime": 2,
"EchoTime": 0,
"MagneticFieldStrength": 3,
"SequenceVariant": "SK",
"MRAcquisitionType": "2D",
"FlipAngle": 90,
"ScanOptions": "FS",
"SliceTiming": [[0.0025000000000000022], [0.5], [-0.030000000000000027], [0.46625], [-0.06374999999999997], [0.43375000000999997], [-0.09624999999999995], [0.40000000001], [-0.12999999999], [0.36750000001], [-0.16249999998999998], [0.333750000005], [-0.19624999999500004], [0.301250000005], [-0.228749999995], [0.26749999999999996], [-0.26249999999500007], [0.235], [-0.29500000000000004], [0.20124999999999998], [-0.32875], [0.16875000001], [-0.36124999999999996], [0.13500000001], [-0.39499999999], [0.10250000000999998], [-0.42749999999], [0.06875000000499998], [-0.46124999999500005], [0.036250000005000005]],
"SequenceName": "epfid2d1_64",
"ManufacturerModelName": "TrioTim",
"TaskName": "dis",
"ScanningSequence": "EP",
"Manufacturer": "SIEMENS"
}
I way i am thinking is too complex. I thought to take each line and of first json file and check with all other jsons.
There should be something easy and efficient. any pointers?

To compare all files in one time, you can also use Sets to compare all key-values at once using &
>>> import json
>>> json_dict1 = json.loads('{"a":1, "b":2}')
>>> json_dict2 = json.loads('{"a":1, "b":4, "c":5}')
>>> json_dict3 = json.loads('{"a":1, "b":2, "c":5}')
>>> a = set(json_dict1.items())
>>> b = set(json_dict2.items())
>>> c = set(json_dict3.items())
>>> a & b & c
{('a', 1)}
Note that you can also do other operations with Sets, here an example from the doc:
>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a # unique letters in a
{'a', 'r', 'b', 'c', 'd'}
>>> a - b # letters in a but not in b
{'r', 'd', 'b'}
>>> a | b # letters in either a or b
{'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}
>>> a & b # letters in both a and b
{'a', 'c'}
>>> a ^ b # letters in a or b but not both
{'r', 'd', 'b', 'm', 'z', 'l'}
EDIT
Finally, I asked my own so question based on the same problem as you (almost).
Here is the best response
If you are too lazy to click on the link, here is an overview:
>>> list_dict = [json_dict1, json_dict2, json_dict3]
>>> {k: v
for k, v in list_dict[0].items()
if all(k in d and d[k] == v
for d in list_dict[1:])}
{'a': 1}

Since you didn't provide your exact JSON sample, I assume it is just regular json as '{"key":"value"}'.
convert json string to dictionary:
import json
json_dict = json.loads('{"a":1, "b":2}') # converts json string to dictionary
now assume we have two converted dictionaries:
>>> dict1= {"a":1,"b":2}
>>> dict2= {"a":1,"b":3}
comparing two dictionaries and finding the common key-value pairs(similarly for the diff k-v pairs), I am using python3:
>>> {k:v for k, v in dict1.items() for k1,v1 in dict2.items() if k ==k1 and v==v1}
{'a': 1}
My post showed you the idea how to solve your issue, it might have edge issues for your specific JSON lines, you can modify it and fit your needs. Hope it helps

Related

Merge two dictionaries with keys only from the first dict

I want to merge two dictionaries so that the resulting dict has keys from the first dict and values from the first and second.
>>> A = {'AB': 'a', 'A': 'a'}
>>> B = {'AB': 'b', 'B': 'b'}
>>> merge_left(A, B)
{'AB': 'b', 'A': 'a'}
This is somewhat similar to a left outer join used in merging database tables in that one side is used as the "base" and the other side is compared to it.
Here is a table of what value each key should have in the resulting dict.
Possible Situations
Key in A
Key not in A
Key in B
Use B's value
Don't include
Key not in B
Use A's value
N/A
Is there a function merge_left or something similar that returns the dict above?

I used dict.get's ability to return a default value to make a fairly short merge_left function. This uses a dict-comprehension over key, value pairs of the first dict and checks them against the second.
def merge_left(defaults, override):
return {key, override.get(key, default) for key, default in defaults.items()}
Since this function is just returning a dict-comprehension, you can "inline" it directly into your code.
>>> A = {'AB': 'a', 'A': 'a'}
>>> B = {'AB': 'b', 'B': 'b'}
>>> {k: B.get(k, a) for k, a in A.items()}
{'AB': 'b', 'A': 'a'}

Having trouble defining a function that will make creating a dictionary easier

I am trying to create a player_def function that will make creating a dictionary a little easier.
Looking at it now, this is probably kind of dumb because I can just do players["betts"]["avg"]=340, right? Anyway, to understand how Python works I would be grateful if any of you can explain why the following code is returning a key error instead of creating a nested dictionary.
def player_def(x,y,z):
players[x][y]=z
player_def("betts","avg",340)
print(players["betts"])

The easiest solution would be to use a collections.defaultdict:
from collections import defaultdict
players = defaultdict(dict)
def player_def(x,y,z):
players[x][y] = z
player_def("betts","avg",340)
print(players["betts"])
# {'avg': 340}
We define players as a defaultdict of dict. When we do:
players["betts"]["avg"] = 340
if players doesn't yet have a betts key, a new one is created on the fly with an empty dict as value. So, we can add "avg": 340 to this new dict.

Do you mean this? I'm sorry, but my query does not respond to your problem in a comment, so I had to put it as a possible solution / explanation.
>>> d={}
>>> d
{}
>>> d['a'] = {'b' : {'c','d','e'} }
>>> d
{'a': {'b': {'c', 'e', 'd'}}}
>>>
>>> d['a']['b']
{'c', 'e', 'd'}
///EDIT: So when the dictionary already exists, then you can change its contents. However, if you want to add a new pair (to the right side of an existing key), you must add to the existing key, a non-existent, just above syntax. I guess I explain that complicated, sorry.
>>> d['a']['b'] = "4"
>>> d
{'a': {'b': '4'}}
>>> d['a']['b'] = ["4","test","hello"]
>>> d
{'a': {'b': ['4', 'test', 'hello']}}
>>> d['a']['b'] = (1,2,3,4)
>>> d
{'a': {'b': (1, 2, 3, 4)}}
>>>
Another example from Python console:
>>> test = {}
>>> test['betts']['avg'] = 300
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'betts'
>>> test['betts'] = {}
>>> test['betts']['avg'] = 300
>>> test
{'betts': {'avg': 300}}
>>>

Reformatting a dict where the values have a dict-like relationship

I have a defaultdict that looks like this:
d = { 'ID_001': ['A', 'A_part1', 'A_part2'],
'ID_002': ['A', 'A_part3'],
'ID_003': ['B', 'B_part1', 'B_part2', 'A', 'A_part4'],
'ID_004': ['C', 'C_part1', 'A', 'A_part5', 'B', 'B_part3']
}
Before I go any further, I have to say that A_part1 isn't the actual string -- the strings are really a bunch of alphanumeric characters; I represented it as such to show that A_part1 is text that is associated with A, if you see what I mean.)
Standing back and looking at it, what I really have is a dict where the values have their own key/value relationship, but that relationship exists only in the order they appear in, in the list.
I am attempting to end up with something like this:
['ID_001 A A_part1, A_part2',
'ID_002 A A_part3',
'ID_003 B B_part1 B_part2',
'ID_003 A A_part4',
'ID_004 C C_part1',
'ID_004 A A_part5',
'ID_004 B B_part3']
I have made a variety of attempts; I keep wanting to run through the dict's value, making note of the character in the first position (eg, the A), and collect values until I find a B or a C, then stop collecting. Then append what I have to a list that I have declared elsewhere. Ad nauseum.
I'm running into all sorts of problems, not the least of which is bloated code. I'm missing the ability to iterate through the value in a clean way. Invariably, I seem to run into index errors.
If anyone has any ideas/philosophy/comments I'd be grateful.

What about something like:
d = { 'ID_001': ['A', 'A_part1', 'A_part2'],
'ID_002': ['A', 'A_part3'],
'ID_003': ['B', 'B_part1', 'B_part2', 'A', 'A_part4'],
'ID_004': ['C', 'C_part1', 'A', 'A_part5', 'B', 'B_part3']
}
def is_key(s):
return s in ['A','B','C']
out = {}
for (k,v) in d.iteritems():
key = None
for e in v:
if is_key(e): key = e
else:
out_key = (k,key)
out[out_key] = out.get(out_key, []) + [e]
which generates:
{('ID_001', 'A'): ['A_part1', 'A_part2'],
('ID_002', 'A'): ['A_part3'],
('ID_003', 'A'): ['A_part4'],
('ID_003', 'B'): ['B_part1', 'B_part2'],
('ID_004', 'A'): ['A_part5'],
('ID_004', 'B'): ['B_part3'],
('ID_004', 'C'): ['C_part1']}
It's important that you update the is_key function to match your actual input.
Also, the variable names are far from optimal, but I'm not really sure what you're doing -- you should be able to (and should) give them more appropriate names.

May not be in the order you want, but no thanks for further headaches.
d = { 'ID_001': ['A', 'A_part1', 'A_part2'],
'ID_002': ['A', 'A_part3'],
'ID_003': ['B', 'B_part1', 'B_part2', 'A', 'A_part4'],
'ID_004': ['C', 'C_part1', 'A', 'A_part5', 'B', 'B_part3']
}
rst = []
for o in d:
t_d={}
for t_o in d[o]:
if not t_o[0] in t_d:
t_d[t_o[0]] = [t_o]
else: t_d[t_o[0]].append(t_o)
for t_o in t_d:
rst.append(' '.join([o,t_d[t_o][0],', '.join(t_d[t_o][1:])]))
print(rst)
https://ideone.com/FeBDLA
['ID_004 C C_part1', 'ID_004 A A_part5', 'ID_004 B B_part3', 'ID_003 A A_part4', 'ID_003 B B_part1, B_part2', 'ID_002 A A_part3', 'ID_001 A A_part1, A_part2']

Whenever you're trying to do something involving contiguous groups, you should think of itertools.groupby. You weren't very specific about what condition separates the groups, but if we take "the character in the first position" at face value:
from itertools import groupby
new_list = []
for key, sublist in sorted(d.items()):
for _, group in groupby(sublist, key=lambda x: x[0]):
new_list.append(' '.join([key] + list(group)))
produces
>>> for elem in new_list:
... print(elem)
...
ID_001 A A_part1 A_part2
ID_002 A A_part3
ID_003 B B_part1 B_part2
ID_003 A A_part4
ID_004 C C_part1
ID_004 A A_part5
ID_004 B B_part3

Save a dictionary key as a variable

I'm working on a small framework and I've found a place where it would be beneficial to save a dictionary key as variable.
The problem I have is that the dictionary may have any number of layers, so it's not just a case of storing the final key. For example in the below I am accessing ['dig']['result'], but that could equally be ['output'] or ['some']['thing']['strange']
if result:
if self.cli_args.json:
pprint(result)
else:
print result['dig']['result']
I could save the key as a string and use eval() in something such as:
key="['test']"
test_dict = { "test" : "This works" }
eval("test_dict" + key)
>>> 'This works'
But eval is really dirty right? :-)
Is there a nice / pythonic way to do this?

To handle an arbitrary depth of key nesting, you can iterate over a sequence (e.g. tuple) of the keys:
>>> d = {'a': {'b': {'c': 'd'}}}
>>> d['a']['b']['c']
'd'
>>> keys = ('a', 'b', 'c') # or just 'abc' for this trivial example
>>> content = d
>>> for k in keys:
content = content[k]
>>> content
'd'

>>> def access(o,path):
... for k in path.split('/'):
... o = o[k]
... return o
...
>>> access({'a': {'b': {'c': 'd'}}},'a/b/c')
'd'

Python dictionary.keys() error

I am trying to use the .keys() and instead of getting a list of the keys like
always have in the past. However I get this.
b = { 'video':0, 'music':23 }
k = b.keys()
print( k[0] )
>>>TypeError: 'dict_keys' object does not support indexing
print( k )
dict_keys(['music', 'video'])
it should just print ['music', 'video'] unless I'm going crazy.
What's going on?

Python 3 changed the behavior of dict.keys such that it now returns a dict_keys object, which is iterable but not indexable (it's like the old dict.iterkeys, which is gone now). You can get the Python 2 result back with an explicit call to list:
>>> b = { 'video':0, 'music':23 }
>>> k = list(b.keys())
>>> k
['music', 'video']
or just
>>> list(b)
['music', 'video']

If you assigned k like so:
k = list(b.keys())
your code will work.
As the error says, the dict_keys type does not support indexing.

This is one of the breaking changes between Python 2 and 3.
In Python 2:
>>> help(dict.keys)
keys(...)
D.keys() -> list of D's keys
In Python 3:
>>> help(dict.keys)
keys(...)
D.keys() -> a set-like object providing a view on D's keys
This change in behavior makes a lot of sense since a dict is semantically unordered and its keys are unique - just like a set.
This change means that you don't have to create a new list of keys every time you want to do some kind of set comparison with a dict's keys.
Getting the same behavior in 2 and 3
To help transition to Python 3, Python 2.7 has another dict method, viewkeys. The viewkeys method is most similar to Python 3's dict.keys method:
>>> d
{'a': None, 'c': None, 'b': None, 'd': None}
>>> for k in d.viewkeys(): print k
...
a
c
b
d
>>> d.viewkeys() & set('abc')
set(['a', 'c', 'b'])
In Python 3, the closest analog to the old behavior is to pass dict.keys() to list:
>>> d
{'d': None, 'a': None, 'c': None, 'b': None}
>>> list(d.keys())
['d', 'a', 'c', 'b']
Or just pass the dict to list, since a dict will iterate over its keys anyways:
>>> list(d)
['d', 'a', 'c', 'b']
You could create a utility functions to abstract the behavior over 2 and 3:
if hasattr(dict, 'viewkeys'): # Python 2.7
def keys(d):
return d.viewkeys()
else: # Python 3
def keys(d):
return d.keys()
And pass a dict to list to get the list form, and in both 2 and 3, you'll get the same output:
>>> d
{'b': None, 'a': None, 'c': None, 'd': None}
>>> keys(d)
dict_keys(['b', 'a', 'c', 'd'])
>>> list(d)
['b', 'a', 'c', 'd']

If you simply want a list of keys from a dictionary you can directly do like this:
b = {"name": "xyz", "class":"abc", "college": "qwert"}
key_list = list(b)
key_list will contain all the key names as a list, though, this will not repeats a key, if found more than once. Duplicate keys will be counted as one.

import random
b = { 'video':0, 'music':23,"picture":12 }
random.choice(tuple(b.items()))
# Returns a random dictionary entry as a tuple:
# ('music', 23)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding common value pair fields from numerous json in Python - python

Related

Merge two dictionaries with keys only from the first dict

Having trouble defining a function that will make creating a dictionary easier

Reformatting a dict where the values have a dict-like relationship

Save a dictionary key as a variable

Python dictionary.keys() error

Categories

Resources