Python script to surgically replace one JSON field by another - python

At the moment I'm working with a large set of JSON files of the following form:
File00, at time T1:
{
"AAA": {
"BBB": {
"000": "value0"
},
"CCC": {
"111": "value1",
"222": "value2",
"333": "value3"
},
"DDD": {
"444": "value4"
}
}
It's the situation that now I have a new input for the sub-field "DDD", I'd like to "wholesale" replace it with the following:
"DDD": {
"666": "value6",
"007": "value13"
}
Accordingly the file would be changed to:
File00, at time T2:
{
"AAA": {
"BBB": {
"000": "value0"
},
"CCC": {
"111": "value1",
"222": "value2",
"333": "value3"
},
"DDD": {
"666": "value6",
"007": "value13"
}
}
In the situation I'm confronted with, there are many files similar to File00, so I'm endeavoring to create a script that can process all the files in a particular directory, identifying the JSON field DDD and replace it's contents with something new.
How to do this in Python?

Here are the steps I took for each file:
Read the json
Convert it to a Python dict
Edit the Python dict
Convert it back to json
Write it to the file
Repeat
Here is my code:
import json
#list of files.
fileList = ["file1.json", "file2.json", "file3.json"]
for jsonFile in fileList:
# Open and read the file, then convert json to a Python dictionary
with open(jsonFile, "r") as f:
jsonData = json.loads(f.read())
# Edit the Python dictionary
jsonData["AAA"]["DDD"]={"666": "value6","007": "value13"}
# Convert Python dictionary to json and write to the file
with open(jsonFile, "w") as f:
json.dump(jsonData, f)
Also, I got code for iterating through a directory from here. You probably want something like this:
import os
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".json"):
fileList.append(filename)

Related

How can i convert CSV in JSON like I want

Hello I show you my problem's :
I right that for convert my csv in Json. But the résult is not exactly what I Want .
main.py
import csv
filename ="forcebrute.csv"
# opening the file using "with"
# statement
with open(filename, 'r') as data:
for line in csv.DictReader(data):
print(line)
csv
name;price;profit
Action-1;20;5
Action-2;30;10
Action-3;50;15
Action-4;70;20
Action-5;60;17
result i have:
{'name;price;profit': 'Action-1;20;5'}
{'name;price;profit': 'Action-2;30;10'}
{'name;price;profit': 'Action-3;50;15'}
{'name;price;profit': 'Action-4;70;20'}
{'name;price;profit': 'Action-5;60;17'}
And I would like this result:
You will need to specify the column delimiter then you can use json.dumps() to give you the required output format
import csv
import json
with open('forcebrute.csv') as data:
print(json.dumps([d for d in csv.DictReader(data, delimiter=';')], indent=2))
Output:
[
{
"name": "Action-1",
"price": "20",
"profit": "5"
},
{
"name": "Action-2",
"price": "30",
"profit": "10"
},
{
"name": "Action-3",
"price": "50",
"profit": "15"
},
{
"name": "Action-4",
"price": "70",
"profit": "20"
},
{
"name": "Action-5",
"price": "60",
"profit": "17"
}
]
You will need to use Dictreader from the csv library to read the contents of the CSV file and then convert the contents to a list before using json.dumps to turn the data into JSON.
import csv
import json
filename ="forcebrute.csv"
# Open the CSV file and read the contents into a list of dictionaries
with open(filename, 'r') as f:
reader = csv.DictReader(f, delimiter=';')
csv_data = list(reader)
# Convert the data to a JSON string and print it to the console
json_data = json.dumps(csv_data)
print(json_data)
An easy approach would be using pandas, also quite fast with large csv files. It might need some tweaking but you get the point.
import pandas as pd
import json
df = pd.read_csv(filename, sep = ';')
data = json.dumps(df.to_dict('records'))

Separate large JSON object into many different files

I have a JSON file with 10000 data entries like below in a file.
{
"1":{
"name":"0",
"description":"",
"image":""
},
"2":{
"name":"1",
"description":"",
"image":""
},
...
}
I need to write each entry in this object into its own file.
For example, the output of each file looks like this:
1.json
{
"name": "",
"description": "",
"image": ""
}
I have the following code, but I'm not sure how to proceed from here. Can anyone help with this?
import json
with open('sample.json', 'r') as openfile:
# Reading from json file
json_object = json.load(openfile)
You can use a for loop to iterate over all the fields in the outer object, and then create a new file for each inner object:
import json
with open('sample.json', 'r') as input_file:
json_object = json.load(input_file)
for key, value in json_object.items():
with open(f'{key}.json', 'w') as output_file:
json.dump(value, output_file)

How do I maintain the same structure when reading from, modifying and writing back to a JSON file?

I am currently reading in a JSON file, adding a key and writing it back out to the same file using this procedure
with open('data.json', 'r+') as f:
data = json.load(f)
temp_key={"test":"val"}
data["test"]["new_key"] = temp_key
f.seek(0) # <--- should reset file position to the beginning.
json.dump(data, f, indent=2)
f.truncate() # remove remaining part
(adopted from here)
but the issue is that it does not maintain order. for instance if I read in:
{
"test": {
"something": "something_else"
},
"abc": {
"what": "huh"
}
}
the output turns out as:
{
"abc": {
"what": "huh"
},
"test": {
"something": "something_else",
"new_key": {
"test": "val"
}
}
}
When I would like it to be:
{
"test": {
"something": "something_else",
"new_key": {
"test": "val"
}
},
"abc": {
"what": "huh"
}
}
I realise that JSON is a key/value based structure and the order does not matter, but is there a way of making the modification and maintaining the original structure?
As I said in a comment, you can use a collections.OrderedDict along with the optional object_pairs_hook keyword argument accepted by json.load() (in Python 2.7) to preserve the order of the original data when you rewrite the file.
This is what I meant:
#!/usr/bin/env python2
from collections import OrderedDict
import json
with open('ordered_data.json', 'r+') as f:
data = json.load(f, object_pairs_hook=OrderedDict)
temp_key = {"test": "val"}
data["test"]["new_key"] = temp_key
f.seek(0) # Reset file position to the beginning.
json.dump(data, f, indent=2)
f.truncate() # Remove remaining part.

Python JSON add Key-Value pair

I'm trying to add key value pairs into the existing JSON file. I am able to concatenate to the parent label, How to add value to the child items?
JSON file:
{
"students": [
{
"name": "Hendrick"
},
{
"name": "Mikey"
}
]
}
Code:
import json
with open("input.json") as json_file:
json_decoded = json.load(json_file)
json_decoded['country'] = 'UK'
with open("output.json", 'w') as json_file:
for d in json_decoded[students]:
json.dump(json_decoded, json_file)
Expected Results:
{
"students": [
{
"name": "Hendrick",
"country": "UK"
},
{
"name": "Mikey",
"country": "UK"
}
]
}
You can do the following in order to manipulate the dict the way you want:
for s in json_decoded['students']:
s['country'] = 'UK'
json_decoded['students'] is a list of dictionaries that you can simply iterate and update in a loop. Now you can dump the entire object:
with open("output.json", 'w') as json_file:
json.dump(json_decoded, json_file)
import json
with open("input.json", 'r') as json_file:
json_decoded = json.load(json_file)
for element in json_decoded['students']:
element['country'] = 'UK'
with open("output.json", 'w') as json_out_file:
json.dump(json_decoded, json_out_file)
opened a json file i.e. input.json
iterated through each of its element
add a key named "country" and dynamic value "UK", to each element
opened a new json file with the modified JSON.
Edit:
Moved writing to output file inside to first with segment. Issue with earlier implemenation is that json_decoded will not be instantiated if opening of input.json fails. And hence, writing to output will raise an exception - NameError: name 'json_decoded' is not defined
This gives [None, None] but update the dict:
a = {'students': [{'name': 'Hendrick'}, {'name': 'Mikey'}]}
[i.update({'country':'UK'}) for i in a['students']]
print(a)

In Python, how to parse the complex json file

I want to get "path" from the below json file; I used json.load to get read json file and then parse one by one using for key, value in data.items() and it leads to lot of for loop (Say 6 loops) to get to the value of "path"; Is there any simple method to retrieve the value of path?
The complete json file can be found here and below is the snippet of it.
{
"products": {
"com.ubuntu.juju:12.04:amd64": {
"version": "2.0.1",
"arch": "amd64",
"versions": {
"20161129": {
"items": {
"2.0.1-precise-amd64": {
"release": "precise",
"version": "2.0.1",
"arch": "amd64",
"size": 23525972,
"path": "released/juju-2.0.1-precise-amd64.tgz",
"ftype": "tar.gz",
"sha256": "f548ac7b2a81d15f066674365657d3681e3d46bf797263c02e883335d24b5cda"
}
}
}
}
},
"com.ubuntu.juju:14.04:amd64": {
"version": "2.0.1",
"arch": "amd64",
"versions": {
"20161129": {
"items": {
"2.0.1-trusty-amd64": {
"release": "trusty",
"version": "2.0.1",
"arch": "amd64",
"size": 23526508,
"path": "released/juju-2.0.1-trusty-amd64.tgz",
"ftype": "tar.gz",
"sha256": "7b86875234477e7a59813bc2076a7c1b5f1d693b8e1f2691cca6643a2b0dc0a2"
}
}
}
}
},
You can use recursive generator:
def get_paths(data):
if 'path' in data:
yield data['path']
for k in data.keys():
if isinstance(data[k], dict):
for i in get_paths(data[k]):
yield i
for path in get_paths(json_data): # loaded json data
print(path)
Is path key always at the same depth in the loaded json (which is a dict so) ? If so, what about doing
products = loaded_json['products']
for product in products.items():
print product[1].items()[2][1].items()[0][1].items()[0][1].items()[0][1]['path']
If not, the answer of Yevhen Kuzmovych is clearly better, cleaner and more general than mine.
If you only care about the path, I think using any JSON parser is an overkill, you can just use built in re regex and use the following pattern (\"path\":\s*\")(.*\s*)(?=\",). I didn't test the whole file but should be able to figure out the best pattern fairly easily.
If you only need the file names present in path field, you can easily get them by simply parsing the file:
import re
files = []
pathre = re.compile(r'\s*"path"\s*:\s*"(.*?)"')
with open('file.json') as fd:
for line in fd:
if "path" in line:
m = pathre.match(line)
if m is not None:
files.append(m.group(1))
If you need to process simultaneously the path and sha256 fields:
files = []
pathre = re.compile(r'\s*"path"\s*:\s*"(.*?)"')
share = re.compile(r'\s*"sha256"\s*:\s*"(.*?)"')
path = None
with open('file.json') as fd:
for line in fd:
if "path" in line:
m = pathre.match(line)
path = m.group(1)
elif "sha256" in line:
m = share.match(line)
if path is not None:
files.append((path, m.group(1)))
path = None
You can use a query language like JSONPath. Here you find the Python implementation: https://pypi.python.org/pypi/jsonpath-rw
Assuming you have your JSON content already loaded, you can do something like the following:
from jsonpath_rw import jsonpath, parse
# Load your JSON content first from a file or from a string
# json_data = ...
jsonpath_expr = parse('products..path')
for match in jsonpath_expr.find(json_data):
print(match.value)
For a further discussion you can read this: Is there a query language for JSON?

Categories

Resources