Converting a Text file to JSON format using Python - python

I am not new to programming but not good at python data structures. I would like to know a way to convert a text file into JSON format using python since I heard using python the task is much easier with a module called import.json.
The file looks like
Source Target Value
B cells Streptococcus pneumoniae 226
B cells Candida albicans 136
B cells Mycoplasma 120
For the first line "B cells" is the source, target is the "Streptococcus pneumoniae" and value is "226". I just started with the code, but couldnot finish it. Please help
import json
prot2_names = {}
tmpfil = open("file.txt", "r");
for lin in tmpfil.readlines():
flds = lin.rstrip().split("\t")
prot2_names[flds[0]] = "\"" + flds[1] + "\""
print prot2_names+"\t",
tmpfil.close()
Wants the output to be like
{
"nodes": [
{
"name": "B cells"
},
{
"name": "Streptococcus pneumoniae"
},
{
"name": "Candida albicans"
},
{
"name": "Mycoplasma"
},
{
"links": [
{
"source": 0,
"target": 1,
"value": "226"
},
{
"source": 0,
"target": 2,
"value": "136"
},
{
"source": 0,
"target": 3,
"value": "120"
}
]
}

You can read it as a csv file and convert it into json. But, be careful with spaces as you've used it as separator, the values with spaces should be carefully handled. Otherwise, if possible make the separator , instead of space.
the working code for what you're trying,
import csv
import json
with open('file.txt', 'rb') as csvfile:
filereader = csv.reader(csvfile, delimiter=' ')
i = 0
header = []
out_data = []
for row in filereader:
row = [elem for elem in row if elem]
if i == 0:
i += 1
header = row
else:
row[0:2] = [row[0]+" "+row[1]]
_dict = {}
for elem, header_elem in zip(row, header):
_dict[header_elem] = elem
out_data.append(_dict)
print json.dumps(out_data)
output,
[
{
"Source":"B cells",
"Target":"Streptococcus",
"Value":"pneumoniae"
},
{
"Source":"B cells",
"Target":"Candida",
"Value":"albicans"
},
{
"Source":"B cells",
"Target":"Mycoplasma",
"Value":"120"
},
{
"Source":"B cells",
"Target":"Neisseria",
"Value":"111"
},
{
"Source":"B cells",
"Target":"Pseudomonas",
"Value":"aeruginosa"
}
]
UPDATE: Just noticed your updated question with json sample that you require. Hope, you could build it with the above example I've written.

Related

generate JSON file content

I am biggener with JSON
I have the following in my JSON file (file-1) :
{
"Aps": [
{
"type": "opo",
"price_min": 7,
"price_max":10,
"app_time": 0,
"arguments": {
"prices": 15,
"apps_num": "112"
},
"attributes": {
"name":"user1",
"priority":"100"
}
}
}
How write python code that generates another JSON file that contain the same content of file-1 but duplicated 100 time in each time the name of user is different user2, user3 ... user100 and also it's priority.
I have tried the following but it is not working :
for lp in range(100):
with open("sample.json", "w") as outfile:
outfile.write(json_object)
but it is not working ..
the required output is as follow :
{
"Aps": [
{
"type": "opo",
"price_min": 7,
"price_max":10,
"app_time": 0,
"arguments": {
"prices": 15,
"apps_num": "112"
},
"attributes": {
"name":"user1",
"priority":"100"
}
},
{
"type": "opo",
"price_min": 7,
"price_max":10,
"app_time": 0,
"arguments": {
"prices": 15,
"apps_num": "112"
},
"attributes": {
"name":"user2",
"priority":"90"
}
},
{
"type": "opo",
"price_min": 7,
"price_max":10,
"app_time": 0,
"arguments": {
"prices": 15,
"apps_num": "112"
},
"attributes": {
"name":"user2",
"priority":"80"
}
},
..............
}
I made a little code here using json and copy module
json for reading and writing json files
copy because I had some trouble with reference variables see documentation for copy; if I changed 'temp' dict then it would affect all occurrences in 'file' dict
import json
import copy
repeats = 100
file = json.loads(open('file.json', 'r').read())
temp1 = json.loads(open('file.json', 'r').read())['Aps'][0]
for repeat in range(repeats):
temp = copy.deepcopy(temp1)
temp['attributes']['name'] = f"user{repeat + 2}"
temp['attributes']['priority'] = f"{repeat*10+100 - repeat*20}"
file['Aps'].append(temp)
temp1 = copy.deepcopy(temp)
json.dump(file, open('file1.json', 'w'), indent=4)
You should first convert your json file to a python object (dict):
import json
file = open('sample.json')
data = json.load(file)
file.close()
Now you can do stuff with your Aps list, like appending the first object 100 times to your list.
for dups in range(100):
data['Aps'].append(data['Aps'][0])
Then you save your dict to a json file again:
with open("sample.json", "w") as outputfile:
json.dump(data, outputfile)

Json nested encryption value - Python

I have a json output file and I am trying to encrypt a value of key(name) in it using sha256 encryption method. Have two occurence of name in a list of dict but everytime when I write, the changes reflecting once. Can anybody tell me where I am missing?
Json structure:
Output.json
{
"site": [
{
"name": "google",
"description": "Hi I am google"
},
{
"name": "microsoft",
"description": "Hi, I am microsoft"
}
],
"veg": [
{
"status": "ok",
"slot": null
},
{
"status": "ok"
}
]
}
Code:
import hashlib
import json
class test():
def __init__(self):
def encrypt(self):
with open("Output.json", "r+") as json_file:
res = json.load(json_file)
for i in res['site']:
for key,val in i.iteritems():
if 'name' in key:
hs = hashlib.sha256(val.encode('utf-8')).hexdigest()
res['site'][0]['name'] = hs
json_file.seek(0)
json_file.write(json.dumps(res,indent=4))
json_file.truncate()
Current Output.json
{
"site": [
{
"name": "bbdefa2950f49882f295b1285d4fa9dec45fc4144bfb07ee6acc68762d12c2e3",
"description": "Hi I am google"
},
{
"name": "microsoft",
"description": "Hi, I am microsoft"
}
],
"veg": [
{
"status": "ok",
"slot": null
},
{
"status": "ok"
}
]
}
I think your problem is in this line:
res['site'][0]['name'] = hs
you are always changing the name field of the first map in the site list. I think you want this to be:
i['name'] = hs
so that you are updating the map you are currently working on (pointed to by i).
Instead of iterating over each item in the dictionary, you could make use of the fact that dictionaries are made for looking up values by key, and do this:
if 'name' in i:
val = i['name']
hs = hashlib.sha256(val.encode('utf-8')).hexdigest()
i['name'] = hs
json_file.seek(0)
json_file.write(json.dumps(res, indent=4))
json_file.truncate()
instead of this:
for key,val in i.iteritems():
if 'name' in key:
...
Also, iteritems() should be items(), and if 'name' in key should be if key == 'name', as key is a string. As it is, you'd be matching any entry with a key name containing the substring 'name'.
UPDATE: I noticed that you are writing the entire file multiple times, once for each name entry that you encrypt. Even without this I would recommend that you open the file twice...once for reading and once for writing. This is preferred over opening a file for both reading and writing, and having to seek and truncate. So, here are all of my suggested changes, along with a few other tweaks, in a full version of your code:
import hashlib
import json
class Test:
def encrypt(self, infile, outfile=None):
if outfile is None:
outfile = infile
with open(infile) as json_file:
res = json.load(json_file)
for i in res['site']:
if 'name' in i:
i['name'] = hashlib.sha256(i['name'].encode('utf-8')).hexdigest()
with open(outfile, "w") as json_file:
json.dump(res, json_file, indent=4)
Test().encrypt("/tmp/input.json", "/tmp/output.json")
# Test().encrypt("/tmp/Output.json") # <- this form will read and write to the same file
Resulting file contents:
{
"site": [
{
"name": "bbdefa2950f49882f295b1285d4fa9dec45fc4144bfb07ee6acc68762d12c2e3",
"description": "Hi I am google"
},
{
"name": "9fbf261b62c1d7c00db73afb81dd97fdf20b3442e36e338cb9359b856a03bdc8",
"description": "Hi, I am microsoft"
}
],
"veg": [
{
"status": "ok",
"slot": null
},
{
"status": "ok"
}
]
}

Convert csv to json multi-document?

I have below two requirement using Python:
Convert csv to multi-document json.
Ignore "" or null objects.
Have mentioned both code and csv. Currently I am getting only json array objects but I need to create in multi-document json.
my csv
_id,riderDetails.0.category,riderDetails.0.code,riderDetails.1.category,riderDetails.1.code
1111,re,remg,er,error
2111,we,were,ty,
code
import csv
import json
def make_record(row):
return {
"_id" : row["_id"],
"riderDetails" : [
{
"category" : row["riderDetails.0.category"],
"code" : row["riderDetails.0.code"],
},
{
"category" : row["riderDetails.1.category"] ,
"code" : row["riderDetails.1.code"],
}
]
}
with open('N:/Exide/Mongo/rr22.csv', 'r', newline='') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
with open('N:/Exide/Mongo/mm22.json', 'w') as jsonfile:
out = json.dumps([make_record(row) for row in reader])
jsonfile.write(out)
Code Output
[{
"_id": "1111",
"riderDetails": [
{
"category": "re",
"code": "remg"
},
{
"category": "er",
"code": "error"
}
]
},
{
"_id": "2111",
"riderDetails": [
{
"category": "we",
"code": "were"
},
{
"category": "",
"code": ""
}
]
}]
Expected Output
{
"_id": "1111",
"riderDetails": [
{
"category": "re",
"code": "remg"
},
{
"category": "er",
"code": "error"
}
]
}
{
"_id": "2111",
"riderDetails": [
{
"category": "we",
"code": "were"
}
]
}
Can someone help me in achieving expected output?
The data in the my csv file in your question doesn't produce the output shown, but that's probably due to a minor posting error, so I'll ignore it.
Also note that the file you are producing isn't a strictly valid JSON format file — perhaps that's what you meant by the term "multi-document json"…
Regardless, you accomplish what you need by modifying the make_record() function so it "cleans-up" the record and removes any empty/missing values before it returns it.
This is done in two steps.
First go through from each detail in riderDetails and remove any keys that have empty values.
Lastly, go though each riderDetails again and remove any details that are completely empty (because the first step removed all of it contents or none were provided in the csv file being read).
import csv
import json
csv_inp = 'rr22.csv'
json_outp = 'mm22.json'
def make_record(row):
# Reformat data is row.
record = {
"_id": row["_id"],
"riderDetails": [
{
"category": row["riderDetails.0.category"],
"code": row["riderDetails.0.code"],
},
{
"category": row["riderDetails.1.category"],
"code": row["riderDetails.1.code"],
}
]
}
# Remove empty values from each riderDetail.
record['riderDetails'] = [{key: value for key, value in riderDetail.items() if value}
for riderDetail in record['riderDetails']]
# Remove completely empty riderDetails.
record['riderDetails'] = [riderDetail for riderDetail in record['riderDetails']
if riderDetail]
return record
with open(csv_inp, 'r', newline='') as csvfile, \
open(json_outp, 'w') as jsonfile:
for row in csv.DictReader(csvfile, delimiter=','):
jsonfile.write(json.dumps(make_record(row), indent=4) + '\n')
# jsonfile.write(json.dumps(make_record(row)) + '\n')
using glob
import glob, os
pt = 'N:/Exide/Mongo/*.csv'
for file in glob.glob(pt):
get_name = file.split("/")[-1].replace(".csv",".json")
with open(file , 'r', newline='') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
out = [make_record(row) for row in reader]
saving_path = os.path.join('N:/Exide/Mongo/',get_name)
with open(saving_path , 'w') as jsonfile:
json.dump(out , jsonfile)
you get [{},{}] becuse you writing list of dictionary into file

Python - Problem extracting data from nested json

I have a problem extracting data from json, I tried n different ways. I was able to extract the ID itself, unfortunately I can't manage to show the details of the field.
Below is my json
{
"params": {
"cid": "15482782896",
"datemax": "20190831",
"datemin": "20190601",
"domains": [
"url.com"
],
},
"results": {
"59107": {
"url.com": {
"1946592": {
"data": {
"2019-06-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 21,
"url": "url3.com"
}
}
}
},
"2019-07-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 4,
"url": "url3.com"
}
}
}
},
"2019-08-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 2,
"url": "url3.com"
}
}
}
}
},
"keyword": {
"title": "python_1",
"volume": 10
}
},
"1946602": {
"data": {
"2019-06-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 5,
"url": "url1.com"
}
}
}
},
"2019-07-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 12,
"url": "url1.com"
}
}
}
},
"2019-08-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 10.25,
"url": "url1.com"
}
}
}
}
},
"keyword": {
"title": "python_2",
"volume": 20
}
}
}
}
}
}
I tried the following code but I got the result in the form of id itself
import json
import csv
def get_leaves(item, key=None):
if isinstance(item, dict):
leaves = {}
for i in item.keys():
leaves.update(get_leaves(item[i], i))
return leaves
elif isinstance(item, list):
leaves = {}
for i in item:
leaves.update(get_leaves(i, key))
return leaves
else:
return {key : item}
with open('me_filename') as f_input:
json_data = json.load(f_input)
fieldnames = set()
for entry in json_data:
fieldnames.update(get_leaves(entry).keys())
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
csv_output.writeheader()
csv_output.writerows(get_leaves(entry) for entry in json_data)
I also tried to use the pandas but also failed to parse properly
import io
import json
import pandas as pd
with open('me_filename', encoding='utf-8') as f_input:
df = pd.read_json(f_input , orient='None')
df.to_csv('output.csv', encoding='utf-8')
The result I'd need to get it :
ID Name page volume url 2019-06-01 2019-07-01 2019-08-01 2019-09-01
1946592 python_1 url.com 10 url3.com 21 4 2 null
1946602 python_2 url.com 20 url1.com 5 12 10,25 null
What could I do wrong?
Hmm this is a bit of a convoluted solution and it looks very messy and no-longer looks like the code provided however I believe it will resolve your issue.
First of all I had a problem with the provided Json (due to the trailing ',' on line 8) however have managed to generate:
Output (temp.csv)
ID,Name,Page,Volume,Url,2019-08-01,2019-07-01,2019-06-01,
1946592,python_1,url.com,10,url3.com,2,4,21,
1946602,python_2,url.com,20,url1.com,10.25,12,5,
using the following:
import json
dates: set = set()
# Collect the data
def get_breakdown(json):
collected_data = []
for result in json['results']:
for page in json['results'][result]:
for _id in json['results'][result][page]:
data_struct = {
'ID': _id,
'Name': json['results'][result][page][_id]['keyword']['title'],
'Page': page,
'Volume': json['results'][result][page][_id]['keyword']['volume'],
'Dates': {}
}
for date in dates:
if date in json['results'][result][page][_id]['data']:
data_struct['URL'] = json['results'][result][page][_id]['data'][date]['ENGINE']['DEVICE']['']['url']
data_struct['Dates'][date] = {'Position' : json['results'][result][page][_id]['data'][date]['ENGINE']['DEVICE']['']['position']}
else:
data_struct['Dates'][date] = {'Position' : 'null'}
collected_data.append(data_struct)
return collected_data
# Collect all dates across the whole data
# structure and save them to a set
def get_dates(json):
for result in json['results']:
for page in json['results'][result]:
for _id in json['results'][result][page]:
for date in json['results'][result][page][_id]['data']:
dates.add(date)
# Write to .csv file
def write_csv(collected_data, file_path):
f = open(file_path, "w")
# CSV Title
date_string = ''
for date in dates:
date_string = '{0}{1},'.format(date_string, date)
f.write('ID,Name,Page,Volume,Url,{0}\n'.format(date_string))
# Data
for data in collected_data:
position_string = ''
for date in dates:
position_string = '{0}{1},'.format(position_string, data['Dates'][date]['Position'])
f.write('{0},{1},{2},{3},{4},{5}\n'.format(
data['ID'],
data['Name'],
data['Page'],
data['Volume'],
data['URL'],
position_string
))
# Code Body
with open('me_filename.json') as f_input:
json_data = json.load(f_input)
get_dates(json_data)
write_csv(get_breakdown(json_data), "output.csv")
Hopefully you can follow the code and it does what is expected. I am sure that it can be made much more reliable - however as previously mentioned I couldn't make it work with the base code you provided.
After a small modification your code works great, but I noticed that showing the date as the next line would be a better solution in the format.
I tried to modify your solution to this form, but I'm still too weak in python to easily deal with it. Can you still tell me how you can do it to achieve this csv file format?
Output(temp.csv)
ID,Name,Page,Volume,Url,data,value,
1946592,python_1,url.com,10,url3.com,2019-08-01,2
1946592,python_1,url.com,10,url3.com,2019-07-01,4
1946592,python_1,url.com,10,url3.com,2019-06-01,21
1946602,python_2,url.com,20,url1.com,2019-08-01,10.25,
1946602,python_2,url.com,20,url1.com,2019-07-01,12,
1946602,python_2,url.com,20,url1.com,2019-06-01,5,

How can i sort individual json like objects in a file based on one key

I have series of values like this in a file.
{
"canceled": false,
"complete_time": "2017-06-08T15:55:45.616942",
"create_time": "2017-06-08T15:55:44.370344",
"entity_list": [
{
"entity_type": 2,
"uuid": "xxxxx"
},
{
"entity_name": "",
"uuid": "xxxx"
}
],
"last_updated_time": "2017-06-08T15:55:45.616942",
"progress_status": 3,
"request": {
"arg": {
"parent_task_uuid": "xxx",
"task_uuid": "xxxx",
"transition": 2,
"vm_uuid": "xxx"
},
"method_name": """""
},
"response": {
"error_code": 0,
"error_detail": "",
"ret": {}
},
"start_time": "2017-06-08T15:55:44.452703",
"uuid": "xxxxx"
}
{
"canceled": false,
"complete_time": "2017-06-08T15:55:45.616942",
"create_time": "2017-06-08T15:55:44.370344",
"entity_list": [
{
"entity_type": 2,
"uuid": "xxxxx"
},
{
"entity_name": "",
"uuid": "xxxx"
}
],
"last_updated_time": "2017-06-08T15:55:45.616942",
"progress_status": 3,
"request": {
"arg": {
"parent_task_uuid": "xxx",
"task_uuid": "xxxx",
"transition": 2,
"vm_uuid": "xxx"
},
"method_name": """""
},
"response": {
"error_code": 0,
"error_detail": "",
"ret": {}
},
"start_time": "2017-06-08T15:55:44.452703",
"uuid": "xxxxx"
}
I want to sort these individual chunks {} based on the 'last_updated_field'. If its like a JSON, i have written the code to work in Python but since this is not a valid JSON, how can i make this work.
while True:
line = sys.stdin.readline()
if not line: break
line = line.strip()
json_obj = json.loads(line)
lines.append(json_obj)
lines = sorted(lines, key=lambda k: k['last_updated_time'], reverse=True)
So your problem is not exactly sorting these objects, your last line should work just fine. The issue is how to read the slightly malformed json file. This is kind of hacky way to digest your file. You may google for a json package that has better toleration for format deviation.
import json
chunks = list()
temp_lines = list()
fp = open(FILE_PATH)
for line in fp:
line = line.replace(r'"""""', r'"\""')
temp_lines.append(line)
if line.startswith('}'):
chunks.append(json.loads(''.join(temp_lines)))
temp_lines.clear()
chunks = sorted(chunks, key=lambda x: x['last_updated_time'], reverse=True)
To get back to your original horrible format, add these two lines
chunks = '\n'.join([json.dumps(x, indent=4) for x in chunks])
chunks = chunks.replace(r'"\""', r'"""""')
You could try accumulating lines until you form a valid json. Put all these jsons in a list, then sort it like you know how to e.g. something like
import sys
all_jsons = []
current_lines = []
counter = 0
while True:
line = sys.stdin.readline()
if not line:
break
line = line.strip()
current_lines.append(line)
nb_inc = line.count("{")
nb_dec = line.count("}")
counter += nb_inc
counter -= nb_dec
if counter == 0:
# We have met as many opening bracket as closing, this is a full json
all_jsons.append("\n".join(current_lines))
current_lines = []
# sort your json files
Here I accumulate lines until I have encountered as many closing bracket as opening. In which case I join all the lines into one json formatted string.

Categories

Resources