I want to replace the values from YAML file into a JSON file using Python 3.
I have many JSON files where I want to get the values from a master YAML file and replace only certain values like source server ip, email, hostname.
E.g. I have this YAML file (mast_conf.yaml):
- sourcesystem1:
sourceServer: 1.2.3.500
MailTo: gokul#gmail.com
- sourcesystem2:
sourceServer1: 2.2.3.500
sourceServer2: 3.2.3.500
MailTo: gokul#gmail.com
A JSON file (sourcesystem1.json):
{
"source":"sourcesystem1",
"frequency":"daily",
"sourceServer":"1.2.1.2",
"hostName":"1.2.1.3",
"fileFormat":"csv",
"delimiterType":"semicolon"
}
Another JSON file (sourcesystem2.json):
{
"source":"sourcesystem2",
"frequency":"daily",
"sourceServer":"1.2.3.2",
"hostName":"1.2.1.7",
"fileFormat":"csv",
"delimiterType":"commaseperated"
}
Below is my code I am trying out to parse the value from the json file
import json
import yaml
with open("master_conf.yaml", 'r') as f:
yaml_config = yaml.safe_load(f)
yaml_config = {
list(config.keys()[0]): list(config[config.keys()[0]])
for config in yaml_config
}
json_files = ( "sourcesystem1.json",
"sourcesystem2.json",
)
for json_file in json_files:
with open(json_file, "r") as f:
sourcesystem_conf = json.load(f)
sourcesystem = sourcesystem_conf["source"]
if sourcesystem in yaml_config:
for key, value in yaml_config[sourcesystem].items():
sourcesystem_conf[key] = value
with open(json_file, "w") as f:
json.dump(sourcesystem_conf, f, indent=2)
I am getting the below error by program
TypeError: 'dict_keys' object does not support indexing
When I run indivudually I get this issue for yaml
>>> yaml_config = { ... config.keys()[0]: config[config.keys()[0]] ... for config in yaml_config ... } Traceback (most recent call last): File "<stdin>", line 3, in <module> File "<stdin>", line 3, in <dictcomp> TypeError: 'dict_keys' object is not subscriptable >>>
Is there easier method to achieve my end goal where I want to replace the values in the JSON file from the Yaml configuration file
This is needed to update 1000s of Json file in a automated way for updating it from a master Yaml file
The easiest way is to use pyyaml, see Jon's answer.
Then you can load you yaml file using it :
>>> import yaml
>>> yaml_config = yaml.safe_load(yaml_file)
>>> yaml_config
[{'sourcesystem1': {'MailTo': 'gokul#gmail.com', 'sourceServer': '1.2.3.500'}},
{'sourcesystem2': {'MailTo': 'gokul#gmail.com',
'sourceServer1': '2.2.3.500',
'sourceServer2': '3.2.3.500'}}]
It will be easier to manipulate a dict with source systems as keys.
In python 2 aDict.keys() returns a list so the following will work :
>>> yaml_config = {
config.keys()[0]: config[config.keys()[0]]
for config in yaml_config
}
>>> yaml_config
{'sourcesystem1': {'MailTo': 'gokul#gmail.com', 'sourceServer': '1.2.3.500'},
'sourcesystem2': {'MailTo': 'gokul#gmail.com',
'sourceServer1': '2.2.3.500',
'sourceServer2': '3.2.3.500'}}
In python 3 aDict.keys() no longer returns a list so you can simply use a for loop :
yaml_config = {}
for config in yaml_config_raw:
source = [key for key in config][0]
yaml_config[source] = config[source]
Then you can just iterate over your json files to update them :
import json
import yaml
with open("mast_conf.yaml", 'r') as f:
yaml_config_raw = yaml.safe_load(f)
yaml_config = {}
for config in yaml_config_raw:
source = [key for key in config][0]
yaml_config[source] = config[source]
json_files = (
"sourcesystem1.json",
"sourcesystem2.json",
)
for json_file in json_files:
with open(json_file, "r") as f:
sourcesystem_conf = json.load(f)
sourcesystem = sourcesystem_conf["source"]
if sourcesystem in yaml_config:
for key, value in yaml_config[sourcesystem].items():
sourcesystem_conf[key] = value
with open(json_file, "w") as f:
json.dump(sourcesystem_conf, f, indent=2)
Related
I have a CSV file with 4 columns(A, B, C, D). Like this:
A
B
C
D
Example english text
Translated text
Context
Max Length
Example english text 2
Translated text 2
Context 2
Max Length 2
Example english text 3
Translated text 3
Context 3
Max Length 3
And I need a code, that makes transforms that, to this JSON file:
{
"Context": "Translated text",
"Context 2": "Translated text 2",
"Context 3": "Translated text 3"
}
I tried this:
import csv
import json
def make_json(csvFilePath, jsonFilePath):
data = {}
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
key = rows["C"]
data[key] = rows
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
csvFilePath = "main.csv"
jsonFilePath = "test.json"
make_json(csvFilePath, jsonFilePath)
But there's an error, and i'm not sure this is the best method.
How can i fix this?
The Error:
D:\Python\CSV to JSON>py csvtojson.py
Traceback (most recent call last):
File "D:\Python\CSV to JSON\csvtojson.py", line 25, in <module>
make_json(csvFilePath, jsonFilePath)
File "D:\Python\CSV to JSON\csvtojson.py", line 14, in make_json
key = rows["C"]
KeyError: 'C'
Thank you for the help!
This should do-
import csv
with open('data.csv', 'r') as f:
csv_data = csv.reader(f)
json_data = {data[2]: data[1] for data in csv_data}
with open('data.json', 'w') as f:
json.dump(json_data, f, indent=4)
You can use Pandas for it, its will late you do it with very few lines of code.
import pandas as pd
# Read the csv file into Python with help from Pandas
df = pd.read_csv('csv-file.csv')
# Work with your json data in Python
json_data = df.to_json(orient = 'records')
# Save your data into new json file
df.to_json("path\example.json", orient = 'records')
Then you load a CSV file into python and convert it to JSON and you are good to go! :)
If you did not have pandas just pip it in
pip install pandas
I need to convert json to csv, I just want to extract some keys in the file, but some keys do not exist in the json file, I hope it can automatically fill in these non-existent keys
import csv
import json
import sys
import codecs
def trans(path):
jsonData = codecs.open('C:/Users/jeri/Desktop/1.json', 'r', 'utf-8')
# csvfile = open(path + '.csv', 'w')
# csvfile = open(path + '.csv', 'wb')
csvfile = open('C:/Users/jeri/Desktop/1.csv', 'w', newline='', encoding='utf-8')
writer = csv.writer(csvfile, delimiter=',')
keys = ['dob','firstname','lastname']
writer.writerow(keys)
for line in jsonData:
dic = json.loads(line)
writer.writerow([dic['dob'],dic['firstname'],dic['lastname'],])
jsonData.close()
csvfile.close()
if __name__ == '__main__':
path = str(sys.argv[0])
print(path)
trans(path)
Console prompt::
Traceback (most recent call last):
File "C:\Users\jeri\PycharmProjects\pythonProject9\main.py", line 25, in <module>
trans(path)
File "C:\Users\jeri\PycharmProjects\pythonProject9\main.py", line 17, in trans
writer.writerow([dic['dob'],dic['firstname'],dic['lastname'],])
KeyError: 'dob'
If the key 'dob' might be missing, instead of dic['dob'], do dic.get('dob', None). That provides the default you want.
I think this would solve your problem.
(I defined a function to test the existence of each item in json, if exists it return the value and if it doesn't exists it returns 'N/A')
def getValue(dic, item):
try:
return dic[item]
except:
return 'N/A'
for line in jsonData:
dic = json.loads(line)
writer.writerow([getValue(dic, 'dob'), getValue(dic, 'firstname'), getValue(dic, 'lastname'),])
you can transform your for loop into something like this.
for line in jsonData:
dic = json.loads(line)
dob = dic['dob'] if "dob" in dic else None
firstname = dic['firstname'] if "firstname" in dic else None
lastname = dic['lastname'] if "lastname" in dic else None
writer.writerow([dob,firstname,lastname])
I have a bunch of json files with multiple lines who look like this:
file1
{"id":1,"name":"Eric","height":1.80, ...},
{"id":2,"name":"Bob","height":1.90, ...}
...
file2
{"id":3,"name":"Jenny","height":1.50, ...},
{"id":4,"name":"Marlene","height":1.60, ...}
...
I want to build a generator to yield each line as a dictionary. My current code:
from typing import Iterator, Dict, Any, Optional
import io
import os
def json_gen2(file_list: list) -> Iterator[Dict[str, Any]]:
import json
for file in file_list:
with open(file) as json_file:
data = []
for line in json_file:
data = json.load(line)
if not data:
break
yield data
datapath = os.path.normcase(os.getcwd()) + '/data/log_data'
file_list = get_files(datapath) # create path list of json files
jsonfile = json_gen2(file_list)
next(jsonfile)
i get the following
Error Message
pls help :)
Oops, I misread. You are doing the same thing I was saying. Your error is due to using 'load' instead of 'loads'. Each line returned by
for line in json_file:
data = json.load(line)
is a string, and you're attempting to read it as a file pointer.
Trying to create JSON file same name as XML file and dump XML data into that using Python
import os
import json
import xmltodict
# Reading file from directory
with os.scandir('C:/jsonfile/') as entries:
for entry in entries:
name=(entry.name)
print(name)
base = os.path.splitext(name)[0] #Getting name of the file
f= open("C:/jjsonfile/"+base+".json","w+")
with open("C:/jsonfile/"+name, 'r') as f: #Creating JSON file
xmlString = f.read()
jsonString = json.dumps(xmltodict.parse(xmlString), indent=4)
with open(f, 'w') as f: #Loading data into JSON file.
f.write(jsonString)
1019586313.xml
--------------------------------------------------------------------------- TypeError Traceback (most recent call
last) in
13 xmlString = f.read()
14 jsonString = json.dumps(xmltodict.parse(xmlString), indent=4)
---> 15 with open(f, 'w') as f:
16 f.write(jsonString)
17
TypeError: expected str, bytes or os.PathLike object, not
_io.TextIOWrapper
When using open to create the json file you need to pass it the path to the file you want to create,what you are doing is that you are passing it the object f,
with open(f, 'w') as f:
which is a _io.TextIOWrapper object that you created with
with open("C:/jsonfile/"+name, 'r') as f
so in order to fix this, you need to pass it the name of the json file you wa to create, so you should do this
with open("C:/jjsonfile/"+base+".json","w+") as f:
instead of this
with open(f, 'w') as f:
and remove this line
f= open("C:/jjsonfile/"+base+".json","w+")
import os
import json
import xmltodict
with os.scandir('C:/ARP_project/') as entries:
for entry in entries:
name=(entry.name)
print(name)
base = os.path.splitext(name)[0]
jsname= "C:/ARP_Json/"+ base+".json" # created Variable for json file name
f= open(jsname,"w+")
with open("C:/ARP_project/"+name, 'r') as f:
xmlString = f.read()
jsonString = json.dumps(xmltodict.parse(xmlString), indent=4)
with open(jsname, 'w') as f:
f.write(jsonString)
I am interested in data mining and I would like to open and work with yelp's data. Yelp's data is in json format and in it's website it has the following code to convert json to csv. However when I open command line and write the following:
$ python json_to_csv_converter.py yelp_academic_dataset.json
I get an error. Can you help me please?
The code is:
# -*- coding: utf-8 -*-
"""Convert the Yelp Dataset Challenge dataset from json format to csv.
For more information on the Yelp Dataset Challenge please visit http://yelp.com/dataset_challenge
"""
import argparse
import collections
import csv
import simplejson as json
def read_and_write_file(json_file_path, csv_file_path, column_names):
"""Read in the json dataset file and write it out to a csv file, given the column names."""
with open(csv_file_path, 'wb+') as fout:
csv_file = csv.writer(fout)
csv_file.writerow(list(column_names))
with open(json_file_path) as fin:
for line in fin:
line_contents = json.loads(line)
csv_file.writerow(get_row(line_contents, column_names))
def get_superset_of_column_names_from_file(json_file_path):
"""Read in the json dataset file and return the superset of column names."""
column_names = set()
with open(json_file_path) as fin:
for line in fin:
line_contents = json.loads(line)
column_names.update(
set(get_column_names(line_contents).keys())
)
return column_names
def get_column_names(line_contents, parent_key=''):
"""Return a list of flattened key names given a dict.
Example:
line_contents = {
'a': {
'b': 2,
'c': 3,
},
}
will return: ['a.b', 'a.c']
These will be the column names for the eventual csv file.
"""
column_names = []
for k, v in line_contents.iteritems():
column_name = "{0}.{1}".format(parent_key, k) if parent_key else k
if isinstance(v, collections.MutableMapping):
column_names.extend(
get_column_names(v, column_name).items()
)
else:
column_names.append((column_name, v))
return dict(column_names)
def get_nested_value(d, key):
"""Return a dictionary item given a dictionary `d` and a flattened key from `get_column_names`.
Example:
d = {
'a': {
'b': 2,
'c': 3,
},
}
key = 'a.b'
will return: 2
"""
if '.' not in key:
if key not in d:
return None
return d[key]
base_key, sub_key = key.split('.', 1)
if base_key not in d:
return None
sub_dict = d[base_key]
return get_nested_value(sub_dict, sub_key)
def get_row(line_contents, column_names):
"""Return a csv compatible row given column names and a dict."""
row = []
for column_name in column_names:
line_value = get_nested_value(
line_contents,
column_name,
)
if isinstance(line_value, unicode):
row.append('{0}'.format(line_value.encode('utf-8')))
elif line_value is not None:
row.append('{0}'.format(line_value))
else:
row.append('')
return row
if __name__ == '__main__':
"""Convert a yelp dataset file from json to csv."""
parser = argparse.ArgumentParser(
description='Convert Yelp Dataset Challenge data from JSON format to CSV.',
)
parser.add_argument(
'json_file',
type=str,
help='The json file to convert.',
)
args = parser.parse_args()
json_file = args.json_file
csv_file = '{0}.csv'.format(json_file.split('.json')[0])
column_names = get_superset_of_column_names_from_file(json_file)
read_and_write_file(json_file, csv_file, column_names)
Error I am getting in the command line:
Traceback (most recent call last):
File "json_to_csv_converter.py", line 122, in column_names=get_superset_of_column_names_from_file
File "json_to_csv_converter.py", line 25, in get_superset_of_column_names_from_file
for line in fin:
File "C:\Users\Bengi\Appdata\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py" line 23, in decode
return codecs.charmap_decode(input, self_errors,decoding_table)[0]
Unicode Decode Error: 'charmap' codec cant decode byte 0X9d in position 1102: character maps to
Do you have file encoding problem. You should put encoding='utf8' after json file open functions like: with open(json_file_path, encoding='utf8') as fin:
Judging by the error message there appears to be something wrong with your input file. It looks like json_to_csv_converter.py has determined that the file encoding is Windows 1252, but there is one or more invalid characters in the file, namely '\x9d', which is not a valid 1252 code point.
Check that your file is properly encoded. I'd guess that the file is UTF8 encoded, but for some reason it is being processed as if it was Windows 1252. Did you edit the file?
Winzip seems to mangle it somehow. I worked around this by:
Using 7-Zip to extract the tar file.
Editing the script to force use of UTF-8 encoding, like so:
with open(json_file_path, encoding='utf8') as fin: