duplicates in a JSON file based on two attributes

duplicates in a JSON file based on two attributes - python

I have a JSON file and that is a nested JSON. I would like to remove duplicates based on two keys.
JSON example:
"books": [
{
"id": "1",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
},
{
"name": "Peter",
"main": 0
}
]
}
]
},
{
"id": "2",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "Jeroge",
"main": 1
},
{
"name": "Peter",
"main": 0
},
{
"name": "John",
"main": 0
}
]
}
]
},
{
"id": "3",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
}
]
}
]
}
]
Here I try to match the title and author name. For example, for id 1 and id 2 are duplicates( as the title is same and author names are also same(the author sequence doesn't matter and no need to consider the main attributes). So, in the output JSON only id:1 or id:2 will remain with id:3. In the final output I need two file.
Output_JSON:
"books": [
{
"id": "1",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
},
{
"name": "Peter",
"main": 0
}
]
}
]
},
{
"id": "3",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
}
]
}
]
}
]
duplicatedID.csv:
1-2
The following method I tried but it is not giving correct results:
list= []
duplicate_Id = []
for data in (json_data['books'])[:]:
elements= []
id = data['id']
title = data['story']['title']
elements.append(title)
for i in (data['description'][0]['author']):
name = (i['name'])
elements.append(name)
if not list:
list.append(elements)
else:
for j in list:
if set(elements) == set(j):
duplicate_Id.append(id)
elements = []

The general idea is to:
Get the groups identified by some function that collects duplicates.
Then return the first entry of each group, ensuring no duplicates.
Define the key function as the sorted list of authors and. As the list of authors is by definition the unique key, but may appear in any order.
import json
from itertools import groupby
j = json.load(books)
def transform(books):
groups = [list(group) for _, group in groupby(books, key=getAuthors)]
return [group[0] for group in groups]
def getAuthors(book):
authors = book['description'][0]['author']
return sorted([author['name'] for author in authors])
print(transform(j['books']))
If we wanted to get the duplicates, then we do the same computation, but return any sublist with length > 1 as this is by our definition duplicated data.
def transform(books):
groups = [list(group) for _, group in groupby(books, key=getAuthors)]
return [group for group in groups if len(group) > 1]
Where j['books'] is the JSON you gave enclosed in an object.

Related

How to count occurrences of a specific dict key in dicts list and some dicts values contains list and append the count in value

I'm trying to count the number of times a specified key occurs in my list of dicts. I've used loops and sum to count up all the keys, but how can I find the count for a specific key? I have this code, which does not work currently:
for dico in data:
for ele in dico['people']:
print(ele['name']+str(len(ele['animals'])))
The "entries" data looks like this:
[
{
"name": "Uzuzozne",
"people": [
{
"name": "Lillie Abbott",
"animals": [
{
"name": "John Dory"
}
]
}
]
},
{
"name": "Satanwi",
"people": [
{
"name": "Anthony Bruno",
"animals": [
{
"name": "Oryx"
}
]
}
]
},
{
"name": "Dillauti",
"people": [
{
"name": "Winifred Graham",
"animals": [
{ "name": "Anoa" },
{ "name": "Duck" },
{ "name": "Narwhal" },
{ "name": "Badger" },
{ "name": "Cobra" },
{ "name": "Crow" }
]
},
{
"name": "Blanche Viciani",
"animals":
[{ "name": "Barbet" },
{ "name": "Rhea" },
{ "name": "Snakes" },
{ "name": "Antelope" },
{ "name": "Echidna" },
{ "name": "Crow" },
{ "name": "Guinea Fowl" },
{ "name": "Deer Mouse" }]
}
]
}
]
My goal is to print the counts of People and Animals by counting the number of children and appeng it in the name, eg. Satanwi [2].
[ { name: 'Dillauti [5]',
people:
[ { name: 'Winifred Graham [6]',
animals:
[ { name: 'Anoa' },
{ name: 'Duck' },
{ name: 'Narwhal' },
{ name: 'Badger' },
{ name: 'Cobra' },
{ name: 'Crow' } ] },
{ name: 'Blanche Viciani [8]',
animals:
[ { name: 'Barbet' },
{ name: 'Rhea' },
{ name: 'Snakes' },
{ name: 'Antelope' },
{ name: 'Echidna' },
{ name: 'Crow' },
{ name: 'Guinea Fowl' },
{ name: 'Deer Mouse' } ] },
...
...
]

This will do what you want; feel free to ask if you need explanations:
for dico in data:
children = 0
for ele in dico['people']:
animals = len(ele['animals'])
children += 1 + animals
ele['name'] += f" [{animals}]"
dico['name'] += f" [{children}]"

How to parse JSON results by condition?

There is JSON and a Python script.
Which displays a list of Companies on the screen.
How to display all Regions for Company[id] ?
{
"data": [
{
"id": 1,
"attributes": {
"name": "Company1",
"regions": {
"data": [
{
"id": 1,
"attributes": {
"name": "Region 1",
}
},
{
"id": 2,
"attributes": {
"name": "Region 2",
}
},
]
}
}
},
{
"id": 2,
"attributes": {
"name": "Company2",
"regions": {
"data": [
{
"id": 1,
"attributes": {
"name": "Region 1",
}
},
{
"id": 2,
"attributes": {
"name": "Region 2",
}
}
]
}
}
},
],
}
Script for all companies.
import os
import json
import requests
BASE_URL = 'localhost'
res = requests.get(BASE_URL)
res_content = json.loads(res.content)
for holding in res_content['data']:
print(holding['id'], holding['attributes']['name'])
How to do the same for displaying the Region for Company[id] ?
Example: Display all Regions for Company 1

Iterate through a list of dictionaries, looking for a dictionary with the key 'name' that has the value 'Company1'. Once it finds that dictionary, it iterates through the list of dictionaries stored under the key 'regions' and prints the value of the key 'name' for each dictionary in that list.
You can try this:
for company in res_content['data']:
if company['attributes']['name'] == 'Company1':
for region in company['attributes']['regions']['data']:
print(region['attributes']['name'])

You just need to delve further down into the res_content object:
for holding in res_content['data']:
print(holding['id'], holding['attributes']['name'])
data = holding['attributes']['regions']['data']
for d in data:
print(' ', d['attributes']['name'])
Output:
1 Company1
Region 1
Region 2
2 Company2
Region 1
Region 2

Fast way of adding fields to a nested dict

I need a help with improving my code.
I've got a nested dict with many levels:
{
"11": {
"FacLC": {
"immty": [
"in_mm",
"in_mm"
],
"moood": [
"in_oo",
"in_oo"
]
}
},
"22": {
"FacLC": {
"immty": [
"in_mm",
"in_mm",
"in_mm"
]
}
}
}
And I want to add additional fields on every level, so my output looks like this:
[
{
"id": "",
"name": "11",
"general": [
{
"id": "",
"name": "FacLC",
"specifics": [
{
"id": "",
"name": "immty",
"characteristics": [
{
"id": "",
"name": "in_mm"
},
{
"id": "",
"name": "in_mm"
}
]
},
{
"id": "",
"name": "moood",
"characteristics": [
{
"id": "",
"name": "in_oo"
},
{
"id": "",
"name": "in_oo"
}
]
}
]
}
]
},
{
"id": "",
"name": "22",
"general": [
{
"id": "",
"name": "FacLC",
"specifics": [
{
"id": "",
"name": "immty",
"characteristics": [
{
"id": "",
"name": "in_mm"
},
{
"id": "",
"name": "in_mm"
},
{
"id": "",
"name": "in_mm"
}
]
}
]
}
]
}
]
I managed to write a 4-times nested for loop, what I find inefficient and inelegant:
for main_name, general in my_dict.items():
generals = []
for general_name, specific in general.items():
specifics = []
for specific_name, characteristics in specific.items():
characteristics_dicts = []
for characteristic in characteristics:
characteristics_dicts.append({
"id": "",
"name": characteristic,
})
specifics.append({
"id": "",
"name": specific_name,
"characteristics": characteristics_dicts,
})
generals.append({
"id": "",
"name": general_name,
"specifics": specifics,
})
my_new_dict.append({
"id": "",
"name": main_name,
"general": generals,
})
I am wondering if there is more compact and efficient solution.

In the past I created a function to do it. Basically you call this function everytime that you need to add new fields to a nested dict, independently on how many levels this nested dict have. You only have to inform the 'full path' , that I called the 'key_map'.
Like ['node1','node1a','node1apart3']
def insert_value_using_map(_nodes_list_to_be_appended, _keys_map, _value_to_be_inserted):
for _key in _keys_map[:-1]:
_nodes_list_to_be_appended = _nodes_list_to_be_appended.setdefault(_key, {})
_nodes_list_to_be_appended[_keys_map[-1]] = _value_to_be_inserted

Changing Key name in mongodb based on its value

I have a list of a element element_list=['A','C'] and my document in mongodb is like:
"product_id": {
"$oid": "AA"
},
"output": [
{
"product": {
"$oid": "A"
},
"value": 1
},
{
"product": {
"$oid": "B"
},
"value": 1
},
]
}
what I want is based on my element_list value the key should change like:
"product_id": {
"$oid": "AA"
},
"products": [
{
"product": {
"$oid": "A"
},
"value": 1
},
{
"Offer": {
"$oid": "B"
},
"value": 1
},
]
}
'B' is not present in element_list, that's why its key is Offer. How to automatically update multiple similar documents in python?

try
oids = set([e['product_id']['$oid'] for e in data])
for product in data:
new_products = []
for output in product['output']:
key = 'Offer' if output['product']['$oid'] not in oids else 'product'
new_products.append({key: {'$oid': output['product']['$oid'], 'value': output['value']}})
product['products'] = new_products
del product['output']
print(data)

Manipulating data from json to reflect a single value from each entry

Setup:
This data set has 50 "issues", within these "issues" i have captured the data that I need to then put into my postgresql database. But when i get to "components" is where i have trouble. I am able to get a list of all "names" of "components" but only want to have 1 instance of "name" for each "issue", and some of them have 2. Some are empty and would like to return null for those.
Here is some sample data that should suffice:
{
"issues": [
{
"key": "1",
"fields": {
"components": [],
"customfield_1": null,
"customfield_2": null
}
},
{
"key": "2",
"fields": {
"components": [
{
"name": "Testing"
}
],
"customfield_1": null,
"customfield_2": null
}
},
{
"key": "3",
"fields": {
"components": [
{
"name": "Documentation"
},
{
"name": "Manufacturing"
}
],
"customfield_1": null,
"customfield_2": 5
}
}
]
}
I am looking to return (just for the component name piece):
['null', 'Testing', 'Documentation']
I set up the other data for entry into the db like so:
values = list((item['key'],
//components list,
item['fields']['customfield_1'],
item['fields']['customfield_2']) for item in data_story['issues'])
I am wondering if there is a possible way to enter in the created components list where i have commented "components list" above
Just for recap, i want to have only 1 component name for each issue null or not and be able to have it put in the the values variable with the rest of the data. Also the first name in components will work for each "issue"

Here's what I would do, assuming that we are working with a data variable:
values = [(x['fields']['components'][0]['name'] if len(x['fields']['components']) != 0 else 'null') for x in data['issues']]
Let me know if you have any queries.

in dict comprehension use if/else
example code is
results = [ (x['fields']['components'][0]['name'] if 'components' in x['fields'] and len(x['fields']['components']) > 0 else 'null') for x in data['issues'] ]
full sample code is
import json
data = json.loads('''{ "issues": [
{
"key": "1",
"fields": {
"components": [],
"customfield_1": null,
"customfield_2": null
}
},
{
"key": "2",
"fields": {
"components": [
{
"name": "Testing"
}
],
"customfield_1": null,
"customfield_2": null
}
},
{
"key": "3",
"fields": {
"components": [
{
"name": "Documentation"
},
{
"name": "Manufacturing"
}
],
"customfield_1": null,
"customfield_2": 5
}
}
]
}''')
results = [ (x['fields']['components'][0]['name'] if 'components' in x['fields'] and len(x['fields']['components']) > 0 else 'null') for x in data['issues'] ]
print(results)
output is ['null', u'Testing', u'Documentation']

If you just want to delete all but one of the names from the list, then you can do that this way:
issues={
"issues": [
{
"key": "1",
"fields": {
"components": [],
"customfield_1": "null",
"customfield_2": "null"
}
},
{
"key": "2",
"fields": {
"components": [
{
"name": "Testing"
}
],
"customfield_1": "null",
"customfield_2": "null"
}
},
{
"key": "3",
"fields": {
"components": [
{
"name": "Documentation"
},
{
"name": "Manufacturing"
}
],
"customfield_1": "null",
"customfield_2": 5
}
}
]
}
Data^
componentlist=[]
for i in range(len(issues["issues"])):
x= issues["issues"][i]["fields"]["components"]
if len(x)==0:
x="null"
componentlist.append(x)
else:
x=issues["issues"][i]["fields"]["components"][0]
componentlist.append(x)
print(componentlist)
>>>['null', {'name': 'Testing'}, {'name': 'Documentation'}]
Or, if you just want the values, and not the dictionary keys:
else:
x=issues["issues"][i]["fields"]["components"][0]["name"]
componentlist.append(x)
['null', 'Testing', 'Documentation']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

duplicates in a JSON file based on two attributes - python

Related

How to count occurrences of a specific dict key in dicts list and some dicts values contains list and append the count in value

How to parse JSON results by condition?

Fast way of adding fields to a nested dict

Changing Key name in mongodb based on its value

Manipulating data from json to reflect a single value from each entry

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

duplicates in a JSON file based on two attributes - python

Related

How to count occurrences of a specific dict key in dicts list and some dicts values ​contains list and append the count in value

How to parse JSON results by condition?

Fast way of adding fields to a nested dict

Changing Key name in mongodb based on its value

Manipulating data from json to reflect a single value from each entry

Categories

Resources

How to count occurrences of a specific dict key in dicts list and some dicts values contains list and append the count in value