How to filter a json file to show only the information I need?
To start off I want to say I'm fairly new to python and working with JSON so sorry if this question was asked before and I overlooked it.
I have a JSON file that looks like this:
[
{
"Store": 417,
"Item": 10,
"Name": "Burger",
"Modifiable": true,
"Price": 8.90,
"LastModified": "09/02/2019 21:30:00"
},
{
"Store": 417,
"Item": 15,
"Name": "Fries",
"Modifiable": false,
"Price": 2.60,
"LastModified": "10/02/2019 23:00:00"
}
]
I need to filter this file to only show Item and Price, like
[
{
"Item": 10,
"Price": 8.90
},
{
"Item": 15,
"Price": 2.60
}
]
I have a code that looks like this:
# Transform json input to python objects
with open("StorePriceList.json") as input_file:
input_dict = json.load(input_file)
# Filter python objects with list comprehensions
output_dict = [x for x in input_dict if ] #missing logical test here.
# Transform python object back into json
output_json = json.dumps(output_dict)
# Show json
print(output_json)
What logical test I should be doing here to do that?
Let's say we can use dict comprehension, then it will be
output_dict = [{k:v for k,v in x.items() if k in ["Item", "Price"]} for x in input_dict]
You can also do it like this :)
>>> [{key: d[key] for key in ['Item', 'Price']} for d in input_dict] # you should rename it to `input_list` rather than `input_dict` :)
[{'Item': 10, 'Price': 8.9}, {'Item': 15, 'Price': 2.6}]
import pprint
with open('data.json', 'r') as f:
qe = json.load(f)
list = []
for item in qe['<your data>']:
query = (f'{item["Item"]} {item["Price"]}')
print("query")
Related
I have a json like:
pd = {
"RP": [
{
"Name": "PD",
"Value": "qwe"
},
{
"Name": "qwe",
"Value": "change"
}
],
"RFN": [
"All"
],
"RIT": [
{
"ID": "All",
"IDT": "All"
}
]
}
I am trying to change the value change to changed. This is a dictionary within a list which is within another dictionary. Is there a better/ more efficient/pythonic way to do this than what I did below:
for key, value in pd.items():
ls = pd[key]
for d in ls:
if type(d) == dict:
for k,v in d.items():
if v == 'change':
pd[key][ls.index(d)][k] = "changed"
This seems pretty inefficient due to the amount of times I am parsing through the data.
String replacement could work if you don't want to write depth/breadth-first search.
>>> import json
>>> json.loads(json.dumps(pd).replace('"Value": "change"', '"Value": "changed"'))
{'RP': [{'Name': 'PD', 'Value': 'qwe'}, {'Name': 'qwe', 'Value': 'changed'}],
'RFN': ['All'],
'RIT': [{'ID': 'All', 'IDT': 'All'}]}
I have the following sample list of dictionaries and I would like to replace any . in the dictionary with a _, so the list would look like the list below.
I tried using replace but get the following error:
dict object has no attribute 'replace'
if I try something like this:
orig = [
{
"health": "good",
"status": "up",
"date": "2022.03.10",
"device.id": "device01"
},
{
"health": "poor",
"status": "down",
"date": "2022.03.10",
"device.id": "device02"
}
]
length = len(orig)
for i in range(length):
orig[i].replace(".", "_")
Current list:
[
{
"health": "good",
"status": "up",
"date": "2022.03.10",
"device.id": "device01"
},
{
"health": "poor",
"status": "down",
"date": "2022.03.10",
"device.id": "device02"
}
]
The new list should look like this:
[
{
"health": "good",
"status": "up",
"date": "2022_03_10",
"device_id": "device01"
},
{
"health": "poor",
"status": "down",
"date": "2022_03_10",
"device_id": "device02"
}
]
I don't understand how what you're trying would even run. For the line orig[i].replace(".", "_"), orig[i] will be a dict, and since a dict has no replace() method, you'll get an error trying to execute this line.
You need to be working on additional level down, operating on each of the key/value pairs in each dict. Here's one solution:
orig= [{"health": "good", "status": "up", "date":"2022.03.10","device.id":"device01"}, {"health": "poor", "status": "down", "date":"2022.03.10","device.id":"device02"}]
result = []
for inner_dict in orig:
new_inner = {}
for k, v in inner_dict.items():
new_inner[k.replace('.', '_')] = v.replace('.', '_')
result.append(new_inner)
print(result)
If the keys didn't need to change, it would be simpler (see the other two answers that don't get it right). You then wouldn't have to create a new structure, but could just work on the values within the existing structure. But since the keys will also change, it's easiest just to build a new result from scratch, like this shows.
Result:
[{'health': 'good', 'status': 'up', 'date': '2022_03_10', 'device_id': 'device01'}, {'health': 'poor', 'status': 'down', 'date': '2022_03_10', 'device_id': 'device02'}]
Try this:
orig = list(map(lambda item: dict((k.replace('.', '_'), v.replace('.', '_')) for k, v in item.items()), orig))
The output should be your want.
Basically, the original data is a list of dict, and the target is to normalize(replace . -> _) each key and value in the dict.
So the inner transformation is using a dict() to produce a new dict from the original one, dict((k.replace('.', '_'), v.replace('.', '_')) for k, v in item.items())
And for the outer part is a pythonic map operation for iterating a list
Actually, #CryptoFool's answer should be more clear for beginners.
The answer by #CryptoFool seems like the one you want. A slightly more blunt force answer might be to just work with stings.
import json
orig= [
{"health": "good", "status": "up", "date":"2022.03.10","device.id":"device01"},
{"health": "poor", "status": "down", "date":"2022.03.10","device.id":"device02"}
]
orig_new = json.loads(json.dumps(orig).replace(".","_"))
print(orig_new)
That will give you :
[
{'health': 'good', 'status': 'up', 'date': '2022_03_10', 'device_id': 'device01'},
{'health': 'poor', 'status': 'down', 'date': '2022_03_10', 'device_id': 'device02'}
]
The following seems to do the trick:
def convert(list_dict, old_text, new_text):
def replace_dict(old_dict, old_text, new_text):
return {key.replace(old_text, new_text) : val.replace(old_text, new_text) for key, val in old_dict.items()}
for i in range(len(list_dict)):
list_dict[i] = replace_dict(list_dict[i], old_text, new_text)
orig= [{"health": "good", "status": "up", "date":"2022.03.10","device.id":"device01"}, {"health": "poor", "status": "down", "date":"2022.03.10","device.id":"device02"}]
convert(orig, '.', '-')
print(orig)
Basically, it modifies the old dictionary in-place but creates replacement dictionaries for each element.
you need to iterate in list of dictionaries or change format:
# first solution
new_list_of_dictionaries = []
for dictionary in orig:
new_dictionary = {}
for k, v in dictionary.items():
new_dictionary[k.replace(".", "_")] = v.replace(".", "_")
new_list_of_dictionaries.append(new_dictionary)
orig = new_list_of_dictionaries
# second_solution
import json
orig = json.loads(json.dumps(orig).replace(".", "_"))
I am trying to convert the CSV file into a Hierarchical JSON file.CSV file input as follows, It contains two columns Gene and Disease.
gene,disease
A1BG,Adenocarcinoma
A1BG,apnea
A1BG,Athritis
A2M,Asthma
A2M,Astrocytoma
A2M,Diabetes
NAT1,polyps
NAT1,lymphoma
NAT1,neoplasms
The expected Output format should be in the following format
{
"name": "A1BG",
"children": [
{"name": "Adenocarcinoma"},
{"name": "apnea"},
{"name": "Athritis"}
]
},
{
"name": "A2M",
"children": [
{"name": "Asthma"},
{"name": "Astrocytoma"},
{"name": "Diabetes"}
]
},
{
"name": "NAT1",
"children": [
{"name": "polyps"},
{"name": "lymphoma"},
{"name": "neoplasms"}
]
}
The python code I have written is below. let me know where I need to change to get the desired output.
import json
finalList = []
finalDict = {}
grouped = df.groupby(['gene'])
for key, value in grouped:
dictionary = {}
dictList = []
anotherDict = {}
j = grouped.get_group(key).reset_index(drop=True)
dictionary['name'] = j.at[0, 'gene']
for i in j.index:
anotherDict['disease'] = j.at[i, 'disease']
dictList.append(anotherDict)
dictionary['children'] = dictList
finalList.append(dictionary)
with open('outputresult3.json', "w") as out:
json.dump(finalList,out)
import json
json_data = []
# group the data by each unique gene
for gene, data in df.groupby(["gene"]):
# obtain a list of diseases for the current gene
diseases = data["disease"].tolist()
# create a new list of dictionaries to satisfy json requirements
children = [{"name": disease} for disease in diseases]
entry = {"name": gene, "children": children}
json_data.append(entry)
with open('outputresult3.json', "w") as out:
json.dump(json_data, out)
Use DataFrame.groupby with custom lambda function for convert values to dictionaries by DataFrame.to_dict:
L = (df.rename(columns={'disease':'name'})
.groupby('gene')
.apply(lambda x: x[['name']].to_dict('records'))
.reset_index(name='children')
.rename(columns={'gene':'name'})
.to_dict('records')
)
print (L)
[{'name': 'A1BG', 'children': [{'name': 'Adenocarcinoma'},
{'name': 'apnea'},
{'name': 'Athritis'}]},
{'name': 'A2M', 'children': [{'name': 'Asthma'},
{'name': 'Astrocytoma'},
{'name': 'Diabetes'}]},
{'name': 'NAT1', 'children': [{'name': 'polyps'},
{'name': 'lymphoma'},
{'name': 'neoplasms'}]}]
with open('outputresult3.json', "w") as out:
json.dump(L,out)
I'm working with csvfiles. My goal is to write a json format with csvfile information. Especifically, I want to get a similar format as miserables.json
Example:
{"source": "Napoleon", "target": "Myriel", "value": 1},
According with the information I have the format would be:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": "Germany",
"target": "USA",
"value": 2
},
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
However, with the code I used the output looks as follow:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": null,
"target": "USA",
"value": 2
}
][
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
Null source must be Germany. This is one of the main problems, because there are more cities with that issue. Besides this, the information is correct. I just want to remove several list inside the format and replace null to correct country.
This is the code I used using pandas and collections.
csvdata = pandas.read_csv('file.csv', low_memory=False, encoding='latin-1')
countries = csvdata['country'].tolist()
newcountries = list(set(countries))
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
sourceTemp = []
value = []
country = element
for k,v in frquency.items():
sourceTemp.append(k)
value.append(int(v))
forceData = {'source': Series(country), 'target': Series(sourceTemp), 'value': Series(value)}
dfForce = DataFrame(forceData)
jsondata = dfForce.to_json(orient='records', force_ascii=False, default_handler=callable)
parsed = json.loads(jsondata)
newData = json.dumps(parsed, indent=4, ensure_ascii=False, sort_keys=True)
# since to_json doesn´t have append mode this will be written in txt file
savetxt = open('data.txt', 'a')
savetxt.write(newData)
savetxt.close()
Any suggestion to solve this problem are appreciate!
Thanks
Consider removing the Series() around the scalar value, country. By doing so and then upsizing the dictionaries of series into a dataframe, you force NaN (later converted to null in json) into the series to match the lengths of other series. You can see this by printing out the dfForce dataframe:
from pandas import Series
from pandas import DataFrame
country = 'Germany'
sourceTemp = ['Mexico', 'USA', 'Argentina']
value = [1, 2, 3]
forceData = {'source': Series(country),
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 NaN USA 2
# 2 NaN Argentina 3
To resolve, simply keep country as scalar in dictionary of series:
forceData = {'source': country,
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 Germany USA 2
# 2 Germany Argentina 3
By the way, you do not need a dataframe object to output to json. Simply use a list of dictionaries. Consider the following using an Ordered Dictionary collection (to maintain the order of keys). In this way the growing list dumps into a text file without appending which would render an invalid json as opposite facing adjacent square brackets ...][... are not allowed.
from collections import OrderedDict
...
data = []
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
for k,v in frquency.items():
inner = OrderedDict()
inner['source'] = element
inner['target'] = k
inner['value'] = int(v)
data.append(inner)
newData = json.dumps(data, indent=4)
with open('data.json', 'w') as savetxt:
savetxt.write(newData)
How to convert JSON data from input.json to output.json using Python? In general, what data structures are used for filtering JSON data?
File: input.json
[
{
"id":1,
"a":22,
"b":11
},
{
"id":1,
"e":44,
"c":77,
"f":55,
"d":66
},
{
"id":3,
"b":11,
"a":22
},
{
"id":3,
"d":44,
"c":88
}
]
File: output.json
[
{
"id":1,
"a":22,
"b":11,
"e":44,
"c":77,
"f":55,
"d":66
},
{
"id":3,
"b":11,
"a":22,
"d":44,
"c":88
}
]
Any pointers would be appreciated!
The idea is to:
use json.load() to load the JSON content from file to a Python list
regroup the data by the id, using collections.defaultdict and .update() method
use json.dump() to dump the result into the JSON file
Implementation:
import json
from collections import defaultdict
# read JSON data
with open("input.json") as input_file:
old_data = json.load(input_file)
# regroup data
d = defaultdict(dict)
for item in old_data:
d[item["id"]].update(item)
# write JSON data
with open("output.json", "w") as output_file:
json.dump(list(d.values()), output_file, indent=4)
Now the output.json would contain:
[
{
"d": 66,
"e": 44,
"a": 22,
"b": 11,
"c": 77,
"id": 1,
"f": 55
},
{
"b": 11,
"id": 3,
"d": 44,
"c": 88,
"a": 22
}
]
from collections import defaultdict
input_list=[{"id":1, ...}, {...}]
result_dict=defaultdict(dict)
for d in input_list:
result_dict[d['id']].update(d)
output_list=result_dict.values()
result_dict is a default dictionary which uses a dict for every access without a available key. So we iterate through the input_list and update our result_dict with key equals id with the new values from the corresponding dictionary.
The output list is a transformation of the result_dict and uses only its values.
Use the json module to work directly with the json data.