Ways to compare keys from json dump with python - python

I am newbie to programming ,need some inputs/direction to build a smart code.
I have 10 ec2 instances, each instance have a Tag which contains a dictionary of 3 key/val pairs. Some instances have same keys and may be few have different key,I want to find out which instances have different keys within the Tag.
Comparing every key with rest 9 instances's keys is not the best way to go I think.
Please let me how to approach this issue and do I need to use json dumper to parse the data?
Here is the example of a single instance, I have 10 of these.
"tags": [
{
"depid": 18,
"key": "sales",
"value": "31"
},
{
"depid": 239,
"key": "eng",
"value": "steve"
},

Is it what you were looking for?
data = {'tags': [{'key': 'key1', 'value': 'value1'},
{'key': 'key2', 'value': 'value2'}]}
keys = set([tag['key'] for tag in data['tags']])
required_keys = set(['key1', 'key2'])
print keys == required_keys # check whether matches exactly
print keys >= required_keys # check whether contains all required keys

Related

Merge deep JSON files in Python

I have two JSON files, one that contains a fully defined object with multiple levbels of nesting, the other contains a stripped back version of the same object that lists just elements that need to be changed
File 1 example
{
"toplevel": {
"value": {
"settings": [
{
"name": "A Default Value",
"region": "US",
"inner": {
"name": "Another Default",
"setting": "help"
}
}
]
}
}
}
File 2 example
{
"toplevel": {
"value": {
"settings": [
{
"name": "A Real Value",
"inner": {
"name": "Another Real Value",
}
}
]
}
}
}
I want to merge the updates from file 2 into file 1.
my output should look like
{
"toplevel": {
"value": {
"settings": [
{
"name": "A Real Value",
"region": "US",
"inner": {
"name": "Another Real Value",
"setting": "help"
}
}
]
}
}
}
so far I've tried
f1 = json_load(file1)
f2 = json_load(file2)
f1['toplevel']['value']['settings'][0].update(f2['toplevel']['value']['settings'][0].items())
it works perfectly for the top level items, but obviously it overwrites the whole of the "inner" object, removing the "setting" key inside it.
Is there a way to traverse the whole tree and replace only the non-dictionary values? I don't have access to external libraries other than json and collections (for the ordered dict)
It depends slightly on what you want
Solution 1
If you simply want to replace all values by the new dictionary, you can use the following options:
result = {**file_1, **file_2}
from pprint import pprint
pprint(result)
This will result in:
{'toplevel': {'value': {'settings': [{'inner': {'name': 'Another Real Value'},
'name': 'A Real Value'}]}}}
Alternatively you can use
file_1.update(file_2)
pprint(file_1)
Which will lead to the same outcome, but will update file_1 in place.
Solution 2
If you only want to update the specific key in the nesting, and leave all other values intact, you can do this using recursion. In your example you are using dict, list and str values. So I will build the recursion using the same types.
def update_dict(original, update):
for key, value in update.items():
# Add new key values
if key not in original:
original[key] = update[key]
continue
# Update the old key values with the new key values
if key in original:
if isinstance(value, dict):
update_dict(original[key], update[key])
if isinstance(value, list):
update_list(original[key], update[key])
if isinstance(value, (str, int, float)):
original[key] = update[key]
return original
def update_list(original, update):
# Make sure the order is equal, otherwise it is hard to compare the items.
assert len(original) == len(update), "Can only handle equal length lists."
for idx, (val_original, val_update) in enumerate(zip(original, update)):
if not isinstance(val_original, type(val_update)):
raise ValueError(f"Different types! {type(val_original)}, {type(val_update)}")
if isinstance(val_original, dict):
original[idx] = update_dict(original[idx], update[idx])
if isinstance(val_original, (tuple, list)):
original[idx] = update_list(original[idx], update[idx])
if isinstance(val_original, (str, int, float)):
original[idx] = val_update
return original
The above might be a bit harder to understand, but I will try to explain it.
There are two methods, one which will merge two dictionaries and one that tries to merge two lists.
Merging dictionaries
In order to merge the two dictionaries I go over all the keys and values of the update dictionary, because this will probably be the smaller of the two.
The first block puts new keys in the original dictionary, this is updating values that weren't in the original dictionary at the start.
The second block is updating the nested values. There I distinguish three cases:
If the value is another dict, run the dictionary merge again, but one level deeper.
If the value is a list (or tuple), run the list merge function.
If the value is a str (or int, float), replace the original value with the updated value.
Merging lists
This is a bit trickier than dictionaries, because lists do not have an order or keys that I can compare. Therefore I have to make a heavy assumption that the list updates will always contain the same elements, see limitations on how to handle lists with more than 1 element.
Since the lists are of the same length, I can assume that the indices of the lists are matching. Now in order to check if all the values are the same, we have to do the following:
Make sure that the value types are the same, otherwise we will throw an error since I am not sure how to handle that case.
If the values are dictionaries, use the merging of dictionaries.
If the values are list (or tuple) us the list merging.
If the values are str (or int, float), override the original in place.
Result
using:
from pprint import pprint
pprint(update_dict(file_1, file_2))
The final result will be:
{'toplevel': {'value': {'settings': [{'inner': {'name': 'Another Real Value',
'setting': 'help'},
'name': 'A Real Value',
'region': 'US'}]}}}
Note that in contrast with the first solution the values 'setting': 'help' and 'region': 'US'} are now still in the original dictionary.
Limitations
Due to the same length constraint, if you do not want to update an element in the list you have to pass the same element type, but empty.
Example on how to ignore a list update:
... {'settings': [
{} # do not update the first element.
{'name': 'A new name'} # update second element.
]
}

how can I override the key and place the new value in it by dict in JSON file?

I want to get an API from somewhere such as a news website but I have an issue which took me a lot of times and I asked here before but I got no clear answer so, I touched the problem now, the overview of the task is I want to create an API but the data always change by the time.
so, my task is so: I need to create a new JSON file to save all data whether it was old or is a new one, the old one will appear in old.html and the new data will appear in news.html therefore, I need to create a dictionary to has the objects but I want the key when it's come in from the request if the same will override the old as Python does with keys but their values will be included in the object key for instance:
d = {
"2020-12-16": {
"name": "Joe"
}
}
so, the above example is a simple dict but if I reload the page will be getting the new request if there is one and then, the dict will include another named object with the key "2020-12-16" which override the first key so, I want to add the value and Ignore the existing key unless if it has a new data so, in this case, will be added.
also, the last condition is the values of dict will not be repeated so, How can I do that?
sorry for talking long and thanks in advance.
Is this what you mean?
d = {
"2020-12-16": {
"name": "Joe"
}
}
d2 = {
"2020-12-16": {
"name2": "Smith"
}
}
for key, val in d2.items():
new_dict = d.get(key, {})
new_dict.update(val)
d[key] = new_dict
print(d)
Will output:
{'2020-12-16': {'name': 'Joe', 'name2': 'Smith'}}
EDIT:
If you would like each dict to remain independent you need your entries in d to be a list of dicts:
d = {
"2020-12-16": [{
"name": "Joe"
}]
}
d2 = {
"2020-12-16": {
"name": "Smith"
}
}
for key, val in d2.items():
entry = d.get(key, [])
entry.append(val)
print(d)
Output:
{'2020-12-16': [{'name': 'Joe'}, {'name': 'Smith'}]}

Merge/Concatenate two dictionaries (/tuples) with same keys python

I have two json objects represented as a dictionary and I want to concatenate both into one and the last will be a json.
At the moment I have:
obj1 = {
"FS":11440000,
"BW":76000000,
"Gain":62,
"Fc":70000000,
"real":[ 4,2,3,],
"imag":[1,1,3,],
}
obj2 = {
"FS":61440000,
"BW":56000000,
"Gain":62,
"Fc":80000000,
"real":[ 1,2,3,],
"imag":[1,2,3,],
}
I want to have:
[
{
[
{
"FS":61440000,
"BW":56000000,
"Gain":62,
"Fc":70000000,
"real":[ 1,2,3,],
"imag":[1,2,3,],
},
{
"FS":61440000,
"BW":56000000,
"N":8192,
"Gain":62,
"Fc":80000000,
"real":[ 1,2,3,],
"imag":[1,2,3,],
}
],
"ts":1231234165234,
"scale":[10000,-45],
"N":8192,
},
]
How to join obj1 + obj2 and remain the keys and not update them? I need all of them as you can see the final output I'm trying to create.
After concatenating obj1 and obj2 into one I need to add 3 more keys.
I'm using python 3.6.
The dict output you expect is badly formatted, so you will never be able to make it. (dicts needs key for each value - even if value can be list).
foo = {"foo": "value"}
bar = {"bar": "value"}
data = {"ts": "...", "scale": [10000, -45], "N": 8192, "data": [foo, bar]}
Would gives you a dict where you can access data via data['data']

Creating a nested lists of lists in python ... (really csv -> json conversion)

I've just been pounding at this problem which should be easy -- I'm just very new to Python which is required in this case.
I'm readying in a .csv file and trying to created a nested structure so that json.dumps gives me a pretty nice nested .json file.
The result json is actually six levels deep but I thought if I could get the bottom two working the rest would be the same. The input is working just great as I've ended up with job['fieldname'] for building the structure. The problem is getting the result to nest.
Ultimately I want:
"PAYLOAD": {
"TEST": [
{
"JOB_ONE": {
"details": {
"customerInformation": {
"lastName": "Chun",
"projectName": "N Pacific Recovery",
"firstName": "Wally",
"secondaryPhoneNumber": ""
},
"description": "N Pacific Garbage Sweep",
"productType": "Service Generation",
"address": {
"city": "Bristol",
"zipCodePlusSix": "",
"stateName": "",
"zipCode": "53104",
"line1": "12709 789441th Ave",
"county": "",
"stateCode": "WI",
"usage": "NA",
"zipCodePlusFour": "",
"territory": "",
}
}
}
},
{
"JOB_TWO": {
"details": {
.... similar to JOB_ONE ....
}
}
}
}],
"environment": "N. Pacific",
"requestorName": "Waldo P Rossem",
"requestorEmail": "waldo# no where.com",
However, with the code below, which only deals with the "details section", I end up with a stack of all addresses, followed by all of the customer information. So, the loop is processing all the csv records and appending the addresses, and then looping csv records and appending the info.
for job in csv.DictReader(csv_file):
if not job['Cancelled']:
# actually have no idea how to get these two to work
details['description']: job['DESCRIBE']
details['projectType']: job['ProjectType']
# the following cycle through the customerInformation and then
# appends the addresses. So I end up with a large block of customer
# records and then a second block of their addresses
details['customerInformation'].append({
'lastName': "job[Lastname]",
'firstName': job['FirstName'],
'projectName':"N Pacific Prototype",
})
details['address'].append({
'city': job['City'],
'zipCode': job['Zip'],
'line1': job['Address'],
'stateCode': job['State'],
'market': job['Market']
})
What I am trying to understand is how to fix this loop and get the description and project type to appear in the right place AND setup the data structure so that the bottom flags are also properly structure for the final json dump.
This is largely due to my lack of experience with Python but unfortunately, its a requirement -- otherwise, I could have had it done hours ago using gawk!
Requested CSV follows:
Sure... took me a while to dummy it up as the above is an abbreviated snippet.
JobNumber,FirstName,Lastname,secondaryPhoneNumber,Market,Address,City,State,Zip,requestorName,requestorEmail,environment
22056,Wally,Fruitvale,,N. Pacific,81 Stone Church Rd,Little Compton,RI,17007,Waldo P Rossem,waldo# no where.com,N. Pacific
22057,William,Stevens,,Southwest,355 Vt Route 8a,Jacksonville,VT,18928,Waldo P Rossem,waldo# no where.com,N. Pacific
22058,Wallace,Chen,,Northeast,1385 Jepson Rd,Stamford,VT,19403,Waldo P Rossem,waldo# no where.com,N.
You can create the details dict as a literal vs. create and key assignment:
data = []
for job in csv.DictReader(csv_file):
if job['Cancelled']:
continue
details = {
'description': job['DESCRIBE'],
'projectType': job['ProjectType'],
'customerInformation' : {
'lastName': job['Lastname'],
'firstName': job['FirstName'],
...
},
...
}
data.append(details)
json_str = json.dumps(data)
I think all you need for your puzzle is to know a few basic things about dictionaries:
Initial assignment:
my_dict = {
"key1": "value1",
"key2": "value2",
...
}
Writing key/value pairs to an already initialized dict:
my_dict["key2"] = "new value"
Reading:
my_dict["key2"]
prints> "new value"
Looping keys:
for key in my_dict:
print(key)
prints> "key1"
prints> "key2"
Looping both key and value:
for key, value in my_dict.items():
...
Looping values only:
for value in my_dict.values():
...
If all you want is a JSON compatible dict, then you won't need much else than this, without me going into defaultdicts, tuple keys and so on - just know that it's worth reading up on that once you've figured out basic dicts, lists, tuples and sets.
Edit: One more thing: Even when new I think it's worth trying Jupyter notebook to explore your ideas in Python. I find it to be much faster to try things out and get the results back immediately, since you don't have to switch between editor and console.
You're not far off.
You first need to initialise details as a dict:
details = {}
Then add the elements you want:
details['description'] = job['DESCRIBE']
details['projectType'] = job['ProjectType']
Then for the nested ones:
details['customerInformation'] = {
'lastName': job['Lastname'],
'firstName': job['FirstName'],
'projectName':"N Pacific Prototype",
}
For more details on how to use dict: https://docs.python.org/3/library/stdtypes.html?highlight=dict#dict.
Then you can get the JSON with JSON.dumps(details) (documentation here: https://docs.python.org/3/library/json.html?highlight=json#json.dumps).
Or you can first gather all the details in a list, and then turn the list into a JSON string:
all_details = []
for job in ...:
(build details dict)
all_details.append(details)
output = JSON.dumps(all_details)

Dealing with JSON with duplicate keys [duplicate]

This question already has answers here:
json.loads allows duplicate keys in a dictionary, overwriting the first value
(3 answers)
Closed 12 days ago.
If I have JSON with duplicate keys and different values in each of the duplicate keys, how can I extract both in python?
ex:
{
'posting': {
'content': 'stuff',
'timestamp': '123456789'
}
'posting': {
'content': 'weird stuff',
'timestamp': '93828492'
}
}
If I wanted to grab both timestamps, how would I do so?
I tried a a = json.loads(json_str) and then a['posting']['timestamp'] but that only returns one of the values.
You can't have duplicate keys. You can change the object to array instead.
[
{
'content': 'stuff',
'timestamp': '123456789'
},
{
'content': 'weird stuff',
'timestamp': '93828492'
}
]
Duplicate keys actually overwrite the previous entry. Instead you maintain an array for that key. Example json is as below
{
'posting' : [
{
'content': 'stuff',
'timestamp': '123456789'
},
{
'content': 'weird stuff',
'timestamp': '93828492'
}
]
}
you can now access different elements in posting key like this
json.posting[0] , json.posting[1]
As has already been covered: it is against the standard, and the outcome across systems is undefined, so avoid duplicate keys.
Yet, if a third party software component forces this upon you, note the section abut this topic from the standard library https://docs.python.org/3/library/json.html#repeated-names-within-an-object
By default, this module does not raise an exception; instead, it ignores all but the last name-value pair for a given name [...] The object_pairs_hook parameter can be used to alter this behavior.
So let's do it!
import itertools, json
def duplicate_object_pairs_hook(pairs):
def _key(pair):
(k, v) = pair
return k
def gpairs():
for (k, group) in itertools.groupby(pairs, _key):
ll = [v for (_, v) in group]
(v, *extra) = ll
yield (k, ll if extra else v)
return dict(gpairs())
badj = """{
"posting": {"content": "stuff", "timestamp": "123456789"},
"posting": {"content": "weird stuff", "timestamp": "93828492"}
}"""
data = json.loads(badj, object_pairs_hook=duplicate_object_pairs_hook)
Now data evals to
{
'posting': [
{'content': 'stuff', 'timestamp': '123456789'},
{'content': 'weird stuff', 'timestamp': '93828492'},
],
}
Remember that this hook will be called for every json node parsed, with the list of tuples of key-value pairs parsed. The default behavior should be equivalent to the dict constructor given a key-value tuple iterable.
Also, I assumed duplicate keys are adjacent, as that's my use-case, but you might have to sort the pairs before grouping them.

Categories

Resources