DeepDiff ignore with regex

DeepDiff ignore with regex - python

I have two objects:
d1 = [ { "id": 3, "name": "test", "components": [ { "id": 1, "name": "test" }, { "id": 2, "name": "test2" } ] } ]
d2 = [ { "id": 4, "name": "test", "components": [ { "id": 2, "name": "test" }, { "id": 3, "name": "test"2 } ] } ]
As you can see, everything stays the same, but the id property changes on both root object and also inside components.
I'm using DeepDiff to compare d1 and d2 and trying to ignore comparison of id objects. However, I'm not sure how to achieve this. I tried the following which didn't seem to work.
excluded_paths = "root[\d+\]['id']"
diff = DeepDiff(d1, d2, exclude_paths=excluded_paths)

You can try using exclude_obj_callback:
from deepdiff import DeepDiff
def exclude_obj_callback(obj, path):
return True if "id" in path else False
d1 = [ { "id": 3, "name": "test", "components": [ { "id": 1, "name": "test" }, { "id": 2, "name": "test2" } ] } ]
d2 = [ { "id": 4, "name": "test", "components": [ { "id": 2, "name": "test" }, { "id": 3, "name": "test2" } ] } ]
print(DeepDiff(d1, d2, exclude_obj_callback=exclude_obj_callback))
What this does is returns a boolean for every deep component that includes the string "id" in it. You may want to be careful with this since you may exclude other objects that you didn't mean to. A way around this could be to set less generic key values for example "component_id".

Related

how do I access this json data in python?

hi I'm pretty new at coding and I was trying to create a program in python that reads and save in another file the data inside a json file (not everything, just what I want). I googled how to parse data but there's something I don't understand.
that's a part of the json file:
`
{
"profileRevision": 548789,
"profileId": "campaign",
"profileChangesBaseRevision": 548789,
"profileChanges": [
{
"changeType": "fullProfileUpdate",
"profile": {
"_id": "2da4f079f8984cc48e84fc99dace495d",
"created": "2018-03-29T11:02:15.190Z",
"updated": "2022-10-31T17:34:43.284Z",
"rvn": 548789,
"wipeNumber": 9,
"accountId": "63881e614ef543b2932c70fed1196f34",
"profileId": "campaign",
"version": "refund_teddy_perks_september_2022",
"items": {
"8ec8f13f-6bf6-4933-a7db-43767a055e66": {
"templateId": "Quest:heroquest_loadout_constructor_2",
"attributes": {
"quest_state": "Claimed",
"creation_time": "min",
"last_state_change_time": "2019-05-18T16:09:12.750Z",
"completion_complete_pve03_diff26_loadout_constructor": 300,
"level": -1,
"item_seen": true,
"sent_new_notification": true,
"quest_rarity": "uncommon",
"xp_reward_scalar": 1
},
"quantity": 1
},
"6940c71b-c74b-4581-9f1e-c0a87e246884": {
"templateId": "Worker:workerbasic_sr_t01",
"attributes": {
"gender": "2",
"personality": "Homebase.Worker.Personality.IsDreamer",
"level": 1,
"item_seen": true,
"squad_slot_idx": -1,
"portrait": "WorkerPortrait:IconDef-WorkerPortrait-Dreamer-F02",
"building_slot_used": -1,
"set_bonus": "Homebase.Worker.SetBonus.IsMeleeDamageLow"
}
}
}
]
}
`
I can access profileChanges. I wrote this to create another json file with only the profileChanges things:
`
myjsonfile= open("file.json",'r')
jsondata=myjsonfile.read()
obj=json.loads(jsondata)
ciso=obj['profileChanges']
for i in ciso:
print(i)
with open("file2", "w") as outfile:
json.dump( ciso, outfile, indent=1)
the issue I have is that I can't access "profile" (inside profileChanges) in the same way by parsing the new file and I have no idea on how to do it

Access to JSON or dict element is realized by list indexes, please look at below example:
a = [
{
"friends": [
{
"id": 0,
"name": "Reba May"
}
],
"greeting": "Hello, Doris Gallagher! You have 2 unread messages.",
"favoriteFruit": "strawberry"
},
]
b = a['friends']['id] # b = 0

I've added a couple of closing braces to make your snippet valid json:
s = '''{
"profileRevision": 548789,
"profileId": "campaign",
"profileChangesBaseRevision": 548789,
"profileChanges": [
{
"changeType": "fullProfileUpdate",
"profile": {
"_id": "2da4f079f8984cc48e84fc99dace495d",
"created": "2018-03-29T11:02:15.190Z",
"updated": "2022-10-31T17:34:43.284Z",
"rvn": 548789,
"wipeNumber": 9,
"accountId": "63881e614ef543b2932c70fed1196f34",
"profileId": "campaign",
"version": "refund_teddy_perks_september_2022",
"items": {
"8ec8f13f-6bf6-4933-a7db-43767a055e66": {
"templateId": "Quest:heroquest_loadout_constructor_2",
"attributes": {
"quest_state": "Claimed",
"creation_time": "min",
"last_state_change_time": "2019-05-18T16:09:12.750Z",
"completion_complete_pve03_diff26_loadout_constructor": 300,
"level": -1,
"item_seen": true,
"sent_new_notification": true,
"quest_rarity": "uncommon",
"xp_reward_scalar": 1
},
"quantity": 1
},
"6940c71b-c74b-4581-9f1e-c0a87e246884": {
"templateId": "Worker:workerbasic_sr_t01",
"attributes": {
"gender": "2",
"personality": "Homebase.Worker.Personality.IsDreamer",
"level": 1,
"item_seen": true,
"squad_slot_idx": -1,
"portrait": "WorkerPortrait:IconDef-WorkerPortrait-Dreamer-F02",
"building_slot_used": -1,
"set_bonus": "Homebase.Worker.SetBonus.IsMeleeDamageLow"
}
}
}
}
}
]
}
'''
d = json.loads(s)
print(d['profileChanges'][0]['profile']['version'])
This prints refund_teddy_perks_september_2022
Explanation:
d is a dict
d['profileChanges'] is a list of dicts
d['profileChanges'][0] is the first dict in the list
d['profileChanges'][0]['profile'] is a dict
d['profileChanges'][0]['profile']['version'] is the value of version key in the profile dict in the first entry of the profileChanges list.

Split Python Array based on a property

I have a Python List like this:
myList = [
{
"key": 1,
"date": "2020-01-02"
},
{
"key": 2,
"date": "2020-02-02"
},
{
"key": 3,
"date": "2020-01-03"
},
{
"key": 4,
"date": "2020-01-02"
},
{
"key": 5,
"date": "2020-02-02"
},
]
Now I want to split the array based on the property "date". I want my list to look like this
myList = [
[
{
"key": 1,
"date": "2020-01-02"
},
{
"key": 4,
"date": "2020-01-02"
},
],
[
{
"key": 2,
"date": "2020-02-02"
},
{
"key": 5,
"date": "2020-02-02"
},
],
[
{
"key": 3,
"date": "2020-01-03"
},
]
]
So I want a new array for each specific date in the current list. Can someone help me to achieve that?

d={}
for i in range(len(myList)):
d.setdefault(myList[i]['date'], []).append(i)
myList = [ [myList[i] for i in v] for k,v in d.items() ] # replace the original `myList` following PO behavior.
Logic:
You want to group the data based on 'date' that means you need a dictionary data structure. The rest are just implementation details.

How to parse JSON results by condition?

There is JSON and a Python script.
Which displays a list of Companies on the screen.
How to display all Regions for Company[id] ?
{
"data": [
{
"id": 1,
"attributes": {
"name": "Company1",
"regions": {
"data": [
{
"id": 1,
"attributes": {
"name": "Region 1",
}
},
{
"id": 2,
"attributes": {
"name": "Region 2",
}
},
]
}
}
},
{
"id": 2,
"attributes": {
"name": "Company2",
"regions": {
"data": [
{
"id": 1,
"attributes": {
"name": "Region 1",
}
},
{
"id": 2,
"attributes": {
"name": "Region 2",
}
}
]
}
}
},
],
}
Script for all companies.
import os
import json
import requests
BASE_URL = 'localhost'
res = requests.get(BASE_URL)
res_content = json.loads(res.content)
for holding in res_content['data']:
print(holding['id'], holding['attributes']['name'])
How to do the same for displaying the Region for Company[id] ?
Example: Display all Regions for Company 1

Iterate through a list of dictionaries, looking for a dictionary with the key 'name' that has the value 'Company1'. Once it finds that dictionary, it iterates through the list of dictionaries stored under the key 'regions' and prints the value of the key 'name' for each dictionary in that list.
You can try this:
for company in res_content['data']:
if company['attributes']['name'] == 'Company1':
for region in company['attributes']['regions']['data']:
print(region['attributes']['name'])

You just need to delve further down into the res_content object:
for holding in res_content['data']:
print(holding['id'], holding['attributes']['name'])
data = holding['attributes']['regions']['data']
for d in data:
print(' ', d['attributes']['name'])
Output:
1 Company1
Region 1
Region 2
2 Company2
Region 1
Region 2

How to convert a list of dictionaries to a list?

How to convert a list of dictionaries to a list?
Here is what I have:
{
"sources": [
{
"ID": "6953",
"VALUE": "https://address-jbr.ofp.ae"
},
{
"ID": "6967",
"VALUE": "https://plots.ae"
},
{
"ID": "6970",
"VALUE": "https://dubai-creek-harbour.ofp.ae"
}]}
Here is what I want it to look like:
({'6953':'https://address-jbr.ofp.ae','6967':'https://plots.ae','6970':'https://dubai-creek-harbour.ofp.ae'})

This is indeed very straightforward:
data = {
"sources": [
{
"ID": "6953",
"VALUE": "https://address-jbr.ofp.ae"
},
{
"ID": "6967",
"VALUE": "https://plots.ae"
},
{
"ID": "6970",
"VALUE": "https://dubai-creek-harbour.ofp.ae"
}]
}
Then:
data_list = [{x["ID"]: x["VALUE"]} for x in data["sources"]]
Which is the same as:
data_list = []
for x in data["sources"]:
data_list.append({
x["ID"]: x["VALUE"]
})
EDIT: You said convert to a "list" in the question and that confused me. Then this is what you want:
data_dict = {x["ID"]: x["VALUE"] for x in data["sources"]}
Which is the same as:
data_dict = {}
for x in data["sources"]:
data_dict[x["ID"]] = x["VALUE"]
P.S. Seems like you're asking for answers to your course assignments or something here, which is not what this place is for.

A solution using pandas
import pandas as pd
data = {
"sources": [
{"ID": "6953", "VALUE": "https://address-jbr.ofp.ae"},
{"ID": "6967", "VALUE": "https://plots.ae"},
{"ID": "6970", "VALUE": "https://dubai-creek-harbour.ofp.ae"},
]
}
a = pd.DataFrame.from_dict(data["sources"])
print(a.set_index("ID").T.to_dict(orient="records"))
outputs to:
[{'6953': 'https://address-jbr.ofp.ae', '6967': 'https://plots.ae', '6970': 'https://dubai-creek-harbour.ofp.ae'}]

this should work.
Dict = {
"sources": [
{
"ID": "6953",
"VALUE": "https://address-jbr.ofp.ae"
},
{
"ID": "6967",
"VALUE": "https://plots.ae"
},
{
"ID": "6970",
"VALUE": "https://dubai-creek-harbour.ofp.ae"
}]}
# Store all the keys here
value_LIST = []
for item_of_list in Dict["sources"]:
for key, value in item_of_list.items():
value_LIST.append(value)

duplicates in a JSON file based on two attributes

I have a JSON file and that is a nested JSON. I would like to remove duplicates based on two keys.
JSON example:
"books": [
{
"id": "1",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
},
{
"name": "Peter",
"main": 0
}
]
}
]
},
{
"id": "2",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "Jeroge",
"main": 1
},
{
"name": "Peter",
"main": 0
},
{
"name": "John",
"main": 0
}
]
}
]
},
{
"id": "3",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
}
]
}
]
}
]
Here I try to match the title and author name. For example, for id 1 and id 2 are duplicates( as the title is same and author names are also same(the author sequence doesn't matter and no need to consider the main attributes). So, in the output JSON only id:1 or id:2 will remain with id:3. In the final output I need two file.
Output_JSON:
"books": [
{
"id": "1",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
},
{
"name": "Peter",
"main": 0
}
]
}
]
},
{
"id": "3",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
}
]
}
]
}
]
duplicatedID.csv:
1-2
The following method I tried but it is not giving correct results:
list= []
duplicate_Id = []
for data in (json_data['books'])[:]:
elements= []
id = data['id']
title = data['story']['title']
elements.append(title)
for i in (data['description'][0]['author']):
name = (i['name'])
elements.append(name)
if not list:
list.append(elements)
else:
for j in list:
if set(elements) == set(j):
duplicate_Id.append(id)
elements = []

The general idea is to:
Get the groups identified by some function that collects duplicates.
Then return the first entry of each group, ensuring no duplicates.
Define the key function as the sorted list of authors and. As the list of authors is by definition the unique key, but may appear in any order.
import json
from itertools import groupby
j = json.load(books)
def transform(books):
groups = [list(group) for _, group in groupby(books, key=getAuthors)]
return [group[0] for group in groups]
def getAuthors(book):
authors = book['description'][0]['author']
return sorted([author['name'] for author in authors])
print(transform(j['books']))
If we wanted to get the duplicates, then we do the same computation, but return any sublist with length > 1 as this is by our definition duplicated data.
def transform(books):
groups = [list(group) for _, group in groupby(books, key=getAuthors)]
return [group for group in groups if len(group) > 1]
Where j['books'] is the JSON you gave enclosed in an object.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

DeepDiff ignore with regex - python

Related

how do I access this json data in python?

Split Python Array based on a property

How to parse JSON results by condition?

How to convert a list of dictionaries to a list?

duplicates in a JSON file based on two attributes

Categories

Resources