Convert pandas data frame to JSON with strings separated - python

I have a pandas.dataframe named 'df' with the following format:
group_name
Positive_Sentiment
Negative_Sentiment
group1
helpful, great support
slow customer service, weak interface, bad management
I would like to convert this dataframe to a JSON file with the following format:
[{
"Group Name": "group1",
"Postive Sentiment": [
"helpful",
"great support"
],
"Negative Sentiment": [
"slow customer service",
"weak interface",
"bad management"
]
}
]
So far I have used this:
import json
b = []
for i in range(len(df)):
x={}
x['Group Name']=df.iloc[i]['group_name']
x['Positive Sentiment']= [df.iloc[i]['Positive_Sentiment']]
x['Negative Sentiment']= [df.iloc[i]['Negative_Sentiment']]
b.append(x)
##Export
with open('AnalysisResults.json', 'w') as f:
json.dump(b, f, indent = 2)
This results in:
[{
"Group Name": "group1",
"Postive Sentiment": [
"helpful,
great support"
],
"Negative Sentiment": [
"slow customer service,
weak interface,
bad UX"
]
}
]
You can see it is quite close. The crucial difference is the double-quotes around the ENTIRE contents of each row (e.g., "helpful, great support") instead of each comma-separated string in the row (e.g., "helpful", "great support"). I would like double-quotes around each string.

You can apply split(",") to your columns:
from io import StringIO
import pandas as pd
import json
inp = StringIO("""group_name Positive_Sentiment Negative_Sentiment
group1 helpful, great support slow customer service, weak interface, bad management
group2 great, good support interface meeeh, bad management""")
df = pd.read_csv(inp, sep="\s{2,}")
def split_and_strip(sentiment):
[x.strip() for x in sentiment.split(",")]
df["Positive_Sentiment"] = df["Positive_Sentiment"].apply(split_and_strip)
df["Negative_Sentiment"] = df["Negative_Sentiment"].apply(split_and_strip)
print(json.dumps(df.to_dict(orient="record"), indent=4))
# to save directly to a file:
with open("your_file.json", "w+") as f:
json.dump(df.to_dict(orient="record"), f, indent=4)
Output:
[
{
"group_name": "group1",
"Positive_Sentiment": [
"helpful",
"great support"
],
"Negative_Sentiment": [
"slow customer service",
"weak interface",
"bad management"
]
},
{
"group_name": "group2",
"Positive_Sentiment": [
"great",
"good support"
],
"Negative_Sentiment": [
"interface meeeh",
"bad management"
]
}
]

Related

How can I format json via python?

I have a very basic problem that I can't figure out. I keep getting a "End of file expected.json" when trying to write object data to a json file. I was wondering how I can fix that? I do it by writing in a for loop. Not sure how I can format.
This is the code in question
with open("data.json", "w") as outfile:
for x,y in structures.infrastructures.items():
outfile.write(Sector(x, y["Depended on by"],y["Depends on"], y["Sub sections"]).toJson())
and this is the output
{
"name": "Chemical",
"depended_on": [
"Critical Manufacturing",
"Nuclear Reactors, Waste, and Material Management",
"Commercial",
"Healthcare and Public Health",
"Food and Agriculture",
"Energy"
],
"depends_on": [
"Emergency Services",
"Energy",
"Food and Agriculture",
"Healthcare and Public Health",
"Information Technology",
"Nuclear Reactors, Waste, and Material Management",
"Transportation Systems",
"Water"
],
"sub_sections": [
"Chemical Plants",
"Chemical Refineries",
"Labs"
],
"Status": 0,
"Strain": 0
}{ -> this is where the error is
"name": "Commercial",
"depended_on": [
....
....
etc
This is my toJson method:
def toJson(self):
return json.dumps(self, default=lambda o: o.__dict__, indent=4)
But yeah how can I implement it where my object data is written in JSON format?
A valid json file can only contain one object. Collect your data into a list and write it with a single call, or simulate the format with your code.
You have to use a function to parse the .json
I hope this helps
https://scriptcrunch.com/parse-json-file-using-python/#:~:text=How%20To%20Parse%20JSON%20File%20Content%20Using%20Python,also%20loop%20through%20all%20the%20JSON%20objects.%20

Replacing words in JSON file using Python

smpl.json file:
[
{
"add":"dtlz",
"emp_details":[
[
"Shubham",
"ksing.shubh#gmail.com",
"intern"
],
[
"Gaurav",
"gaurav.singh#cobol.in",
"developer"
],
[
"Nikhil",
"nikhil#geeksforgeeks.org",
"Full Time"
]
]
}
]
Python file:
import json
with open('smpl.json', 'r') as file:
json_data = json.load(file)
for item in json_data["emp_details"]:
if item[''] in ['Shubham']:
item[''] = 'Indra'
with open('zz_smpl.json', 'w') as file:
json.dump(json_data, file, indent=4)
Since I'm having trouble with the code. Any help would be great.
Looking forward for your help.Thanks in advance!!!
1st, you need to understand list/arrays and maps data structures, and how they are represented by JSON. Seriously, you must understand those data structures in order to use JSON.
An empty array a1
a1 = []
Array with 3 integers
a2 = [1, 2, 3]
To address the 2nd value
a2[0] is 1st value
a2[1] is 2nd value
In python, to subset a2 into 2nd and 3rd value
a3 = a2[1:]
Maps/dicts are containers of key:value pairs.
And empty map (called a dict in python)
d1 = {}
Maps with 2 pairs
d2 = { 'name' : 'Chandra Gupta Maurya' , 'age' : 2360 }
d3 = { 'street' : 'ashoka' , 'location' : 'windsor place' , 'city' : 'delhi' }
such that value of
d2['name'] is 'Chandra Gupta Maurya'
An array of two maps. When you do this in python (and javaScript)
ad1 = [ d2, d3 ]
you are equivalently doing this:
ad1 = [
{ 'name' : 'Chandra Gupta Maurya' , 'age' : 2360 } ,
{ 'street' : 'ashoka' , 'location' : 'windsor place' , 'city' : 'delhi' }
]
so that ad1[0] is
{ 'name' : 'Chandra Gupta Maurya' , 'age' : 2360 }
Obviously "emp_details" is in position 0 of an array
json_data[0]['emp_details']
json_data[0]['emp_details'] itself is the key to an array of maps.
>>> json.dumps (json_data[0]["emp_details"] , indent=2)
produces
'[\n [\n "Shubham",\n "ksing.shubh#gmail.com",\n "intern"\n ],\n [\n "Gaurav",\n "gaurav.singh#cobol.in",\n "developer"\n ],\n [\n "Nikhil",\n "nikhil#geeksforgeeks.org",\n "Full Time"\n ]\n]'
and
>>> print ( json.dumps (json_data[0]["emp_details"], indent=2) )
produces
[
[
"Shubham",
"ksing.shubh#gmail.com",
"intern"
],
[
"Gaurav",
"gaurav.singh#cobol.in",
"developer"
],
[
"Nikhil",
"nikhil#geeksforgeeks.org",
"Full Time"
]
]
Therefore,
>>> json_data[0]["emp_details"][1]
['Gaurav', 'gaurav.singh#cobol.in', 'developer']
Then you might wish to do the replacement
>>> json_data[0]["emp_details"][1][2] = 'the rain in maine falls plainly insane'
>>> json_data[0]["emp_details"][1][1] = "I'm sure the lure in jaipur pours with furore"
>>> print ( json.dumps (json_data, indent=2) )
produces
[
{
"add": "dtlz",
"emp_details": [
[
"Shubham",
"ksing.shubh#gmail.com",
"intern"
],
[
"Gaurav",
"I'm sure the lure in jaipur pours with furore",
"the rain in maine falls plainly insane"
],
[
"Nikhil",
"nikhil#geeksforgeeks.org",
"Full Time"
]
]
}
]
There are 2 problems with your code.
First, the JSON contains an array as the root. Therefore you need to get emp_details property of the first item:
for item in json_data[0]["emp_details"]:
Then in item variable, you need to check the item at index zero:
if item[0] in ['Shubham']:
Here is the full working code:
import json
with open('smpl.json', 'r') as file:
json_data = json.load(file)
for item in json_data[0]["emp_details"]:
if item[0] in ['Shubham']:
item[0] = 'Indra'
with open('zz_smpl.json', 'w') as file:
json.dump(json_data, file, indent=4)
The working repl.it link: https://repl.it/#HarunYlmaz/python-json-write
Here's a more generic solution where outermost json array could have multiple entries (dictionaries):
import json
with open('test.json', 'r') as file:
json_data = json.load(file)
for item in json_data:
for emp in item['emp_details']:
if emp[0] in ['Shubham']:
emp[0] = 'Indra'
with open('zz_smpl.json', 'w') as file:
json.dump(json_data, file, indent=4)

extract urls from json file without data name using python

i have json file that containd the metadata of 900 articles and i want to extract the Urls from it. my file start like this
[
{
"title": "The histologic phenotypes of …",
"authors": [
{
"name": "JE Armes"
},
],
"publisher": "Wiley Online Library",
"article_url": "https://onlinelibrary.wiley.com/doi/abs/10.1002/(SICI)1097-0142(19981201)83:11%3C2335::AID-CNCR13%3E3.0.CO;2-N",
"cites": 261,
"use": true
},
{
"title": "Comparative epidemiology of pemphigus in ...",
"authors": [
{
"name": "S Bastuji-Garin"
},
{
"name": "R Souissi"
}
],
"year": 1995,
"publisher": "search.ebscohost.com",
"article_url": "http://search.ebscohost.com/login.aspx?direct=true&profile=ehost&scope=site&authtype=crawler&jrnl=0022202X&AN=12612836&h=B9CC58JNdE8SYy4M4RyVS%2FrPdlkoZF%2FM5hifWcv%2FwFvGxUCbEaBxwQghRKlK2vLtwY2WrNNl%2B3z%2BiQawA%2BocoA%3D%3D&crl=c",
"use": true
},
.........
I want to inspect the file with objectpath to create json.tree for the extraxtion of the url. this is the code i want to execute
1. import json
2. import objectpath
3. with open("Data_sample.json") as datafile: data = json.load(datafile)
4. jsonnn_tree = objectpath.Tree(data['name of data'])
5. result_tuple = tuple(jsonnn_tree.execute('$..article_url'))
But in the step 4 for the creation of the tree, I have to insert the name of the data whitch i think that i haven't in my file. How can i replace this line?
You can get all the article urls using a list comprehension.
import json
with open("Data_sample.json") as fh:
articles = json.load(fh)
article_urls = [article['article_url'] for article in articles]
You can instantiate the tree like this:
tobj = op.Tree(your_data)
results = tobj.execute("$.article_url")
And in the end:
results = [x for x in results]
will yield:
["url1", "url2", ...]
Did you try removing the reference and just using:
jsonnn_tree = objectpath.Tree(data)

How to read this JSON into dataframe with specfic dataframe format

This is my JSON string, I want to make it read into dataframe in the following tabular format.
I have no idea what should I do after pd.Dataframe(json.loads(data))
JSON data, edited
{
"data":[
{
"data":{
"actual":"(0.2)",
"upper_end_of_central_tendency":"-"
},
"title":"2009"
},
{
"data":{
"actual":"2.8",
"upper_end_of_central_tendency":"-"
},
"title":"2010"
},
{
"data":{
"actual":"-",
"upper_end_of_central_tendency":"2.3"
},
"title":"longer_run"
}
],
"schedule_id":"2014-03-19"
}
That's a somewhat overly nested JSON. But if that's what you have to work with, and assuming your parsed JSON is in jdata:
datapts = jdata['data']
rownames = ['actual', 'upper_end_of_central_tendency']
colnames = [ item['title'] for item in datapts ] + ['schedule_id' ]
sched_id = jdata['schedule_id']
rows = [ [item['data'][rn] for item in datapts ] + [sched_id] for rn in rownames]
df = pd.DataFrame(rows, index=rownames, columns=colnames)
df is now:
If you wanted to simplify that a bit, you could construct the core data without the asymmetric schedule_id field, then add that after the fact:
datapts = jdata['data']
rownames = ['actual', 'upper_end_of_central_tendency']
colnames = [ item['title'] for item in datapts ]
rows = [ [item['data'][rn] for item in datapts ] for rn in rownames]
d2 = pd.DataFrame(rows, index=rownames, columns=colnames)
d2['schedule_id'] = jdata['schedule_id']
That will make an identical DataFrame (i.e. df == d2). It helps when learning pandas to try a few different construction strategies, and get a feel for what is more straightforward. There are more powerful tools for unfolding nested structures into flatter tables, but they're not as easy to understand first time out of the gate.
(Update) If you wanted a better structuring on your JSON to make it easier to put into this format, ask pandas what it likes. E.g. df.to_json() output, slightly prettified:
{
"2009": {
"actual": "(0.2)",
"upper_end_of_central_tendency": "-"
},
"2010": {
"actual": "2.8",
"upper_end_of_central_tendency": "-"
},
"longer_run": {
"actual": "-",
"upper_end_of_central_tendency": "2.3"
},
"schedule_id": {
"actual": "2014-03-19",
"upper_end_of_central_tendency": "2014-03-19"
}
}
That is a format from which pandas' read_json function will immediately construct the DataFrame you desire.

Python JSON parse - Extract attribute

Im having more difficulty with this than I should be!
Im trying to extract postalCode from the below bing maps JSON:
{
"authenticationResultCode":"ValidCredentials",
"brandLogoUri":"http:\/\/dev.virtualearth.net\/Branding\/logo_powered_by.png",
"copyright":"Copyright © 2014 Microsoft and its suppliers. All rights reserved. This API cannot be accessed and the content and any results may not be used, reproduced or transmitted in any manner without express written permission from Microsoft Corporation.",
"resourceSets":[
{
"estimatedTotal":1,
"resources":[
{
"__type":"Location:http:\/\/schemas.microsoft.com\/search\/local\/ws\/rest\/v1",
"bbox":[
56.216052482429326,
-2.9494141659354827,
56.223777917570679,
-2.9308900340645176
],
"name":"Street, Leven, KY8 5",
"point":{
"type":"Point",
"coordinates":[
56.2199152,
-2.9401521
]
},
"address":{
"addressLine":"Street",
"adminDistrict":"Scotland",
"adminDistrict2":"Fife",
"countryRegion":"United Kingdom",
"formattedAddress":"Street, Leven, KY8 5",
"locality":"Leven",
"postalCode":"KY8 5"
},
"confidence":"Medium",
"entityType":"Address",
"geocodePoints":[
{
"type":"Point",
"coordinates":[
56.2199152,
-2.9401521
],
"calculationMethod":"Interpolation",
"usageTypes":[
"Display",
"Route"
]
}
],
"matchCodes":[
"Good"
]
}
]
}
],
"statusCode":200,
"statusDescription":"OK",
"traceId":"8fdd75362a694e02a45fa17d6e7c0e95|DB40080932|02.00.108.1000|DB4SCH010061257, DB4SCH010061346"
}
My code only returns the field names and not the attribute:
r = requests.get(current_url)
json_data = r.json()
for item in json_data['resourceSets'][0]['resources']:
for field in item['address']:
print field
What am I missing? Sorry for the novice question!
for field in item['address'] by default iterate through the key in item['address'] (a dictionary) only, so you need to:
for item in json_data['resourceSets'][0]['resources']:
for field in item['address']:
print field, item['address'][field]
For loops over dictionaries in Python iterate just over the keys. If you want the values as well, you should use .items():
for field, value in item['address'].items():
print field, value

Categories

Resources