DataFrame to JSON giving absurd output - python

I have this code
eventSummary = {}
eventSummary['jobClassID'] = data['jobClassID'].split()
eventSummary['modelID'] =data['modelID'].split()
eventSummary['startDate'] = pd.to_datetime(start).date().strftime("%Y-%m-%d").split()
eventSummary['hourlySummary']=merged_df.to_json(orient = 'records')
dataF = pd.DataFrame(eventSummary)
dataF.to_json(path_or_buf = '*\merged2.json', orient="records")
which is giving me json like this :
{
"jobClassID":"NorthernRegion|FarNorthDistrict|GeneralDuties|Constable|FIR|Senior|Lead|5",
"modelID":"4",
"startDate":
"2019-01-01",
"hourlySummary":
"[{\"Day\":1,\"Hour\":0,\"Month\":1,\"priority\":3,\"Count\":3,\"Hourly Season\":1.5589041096,\"Weekly Season\":1.9205974843,\"Monthly Season\":2.0497311828,\"Noise Factor\":0.6517894485,\"groupId\":0,\"K (Service)\":0.6955185685,\"Theta (Service)\":10924.8330481261,\"K (Travel)\":0.6819906983,\"Theta (Travel)\":5694.9711557023}]
}
But I want output to be like this:
{
"jobClassID":["NorthernRegion|FarNorthDistrict|GeneralDuties|Constable|FIR|Senior|Lead|5"],
"modelID":["4"],
"startDate":["2019-01-01"],
"hourlySummary":[
{"Day":1,"Hour":0,"Month":1,"priority":3,"Count":3,"Hourly Season":1.5589041096,"Weekly Season":1.9205974843,"Monthly Season":2.0497311828,"Noise Factor":0.6517894485,"groupId":0,"K (Service)":0.6955185685,"Theta (Service)":10924.8330481261,"K (Travel)":0.6819906983,"Theta (Travel)":5694.9711557023}]
}
How to achieve this?
Thanks in advance.

to_json returns a string, not a dictionary which is what you want. You should use to_dictmethod instead.

eventSummary['jobClassID'] = [data['jobClassID'].split()]
eventSummary['modelID'] = [data['modelID'].split()]
eventSummary['startDate'] = [pd.to_datetime(start).date().strftime("%Y-%m-
%d").split()]
eventSummary['hourlySummary']= [merged_df.to_dict(orient='records')]
dataF = pd.DataFrame(eventSummary)
dataF.to_json(path_or_buf = '*\merged12.json', orient="records", lines = True)
This code is giving results as expected.
Thanks for your efforts, everyone.

Related

Python How to add nested fields to Yaml file

I need to modify a YAML file and add several fields.I am using the ruamel.yaml package.
First I load the YAML file:
data = yaml.load(file_name)
I can easily add new simple fields, like-
data['prop1'] = "value1"
The problem I face is that I need to add a nested dictionary incorporate with array:
prop2:
prop3:
- prop4:
prop5: "Some title"
prop6: "Some more data"
I tried to define-
record_to_add = dict(prop2 = dict(prop3 = ['prop4']))
This is working, but when I try to add beneath it prop5 it fails-
record_to_add = dict(prop2 = dict(prop3 = ['prop4'= dict(prop5 = "Value")]))
I get
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
What am I doing wrong?
The problem has little to do with ruamel.yaml. This:
['prop4'= dict(prop5 = "Value")]
is invalid Python as a list ([ ]) expects comma separated values. You would need to use something like:
record_to_add = dict(prop2 = dict(prop3 = dict(prop4= [dict(prop5 = "Some title"), dict(prop6='Some more data'),])))
As your program is incomplete I am not sure if you are using the old API or not. Make sure to use
import ruamel.yaml
yaml = ruamel.yaml.YAML()
and not
import ruamel.yaml as yaml
Its because of having ['prop4'= <> ].Instead record_to_add = dict(prop2 = dict(prop3 = [dict(prop4 = dict(prop5 = "Value"))])) should work.
Another alternate would be,
import yaml
data = {
"prop1": {
"prop3":
[{ "prop4":
{
"prop5": "some title",
"prop6": "some more data"
}
}]
}
}
with open(filename, 'w') as outfile:
yaml.dump(data, outfile, default_flow_style=False)

Python write mutiple array value into csv

with my code, i read the values of JSON data and insert into array
def retrive_json():
with open('t_v1.json') as json_data:
d = json.load(json_data)
array = []
for i in d['ride']:
origin_lat = i['origin']['lat']
origin_lng = i['origin']['lng']
destination_lat = i['destination']['lat']
destination_lng = i['destination']['lng']
array.append([origin_lat,origin_lng,destination_lat,destination_lng])
return array
the result array is this :
[[39.72417, -104.99984, 39.77446, -104.9379], [39.77481, -104.93618, 39.6984, -104.9652]]
how i can write each element of each array into specific field in csv?
i have try in this way:
wrt = csv.writer(open(t_.csv', 'w'), delimiter=',',lineterminator='\n')
for x in jjson:
wrt.writerow([x])
but the value of each array are store all in one field
How can solved it and write each in a field?
this is my json file:
{
"ride":[
{
"origin":{
"lat":39.72417,
"lng":-104.99984,
"eta_seconds":null,
"address":""
},
"destination":{
"lat":39.77446,
"lng":-104.9379,
"eta_seconds":null,
"address":null
}
},
{
"origin":{
"lat":39.77481,
"lng":-104.93618,
"eta_seconds":null,
"address":"10 Albion Street"
},
"destination":{
"lat":39.6984,
"lng":-104.9652,
"eta_seconds":null,
"address":null
}
}
]
}
Let's say we have this:
jsonstring = """{
"ride":[
{
"origin":{
"lat":39.72417,
"lng":-104.99984,
"eta_seconds":null,
"address":""
},
"destination":{
"lat":39.77446,
"lng":-104.9379,
"eta_seconds":null,
"address":null
}
},
{
"origin":{
"lat":39.77481,
"lng":-104.93618,
"eta_seconds":null,
"address":"10 Albion Street"
},
"destination":{
"lat":39.6984,
"lng":-104.9652,
"eta_seconds":null,
"address":null
}
}
]
}"""
Here is a pandas solution:
import pandas as pd
import json
# Load json to dataframe
df = pd.DataFrame(json.loads(jsonstring)["ride"])
# Create the new columns
df["o1"] = df["origin"].apply(lambda x: x["lat"])
df["o2"] = df["origin"].apply(lambda x: x["lng"])
df["d1"] = df["destination"].apply(lambda x: x["lat"])
df["d2"] = df["destination"].apply(lambda x: x["lng"])
#export
print(df.iloc[:,2:].to_csv(index=False, header=True))
#use below for file
#df.iloc[:,2:].to_csv("output.csv", index=False, header=True)
Returns:
o1,o2,d1,d2
39.72417,-104.99984,39.77446,-104.9379
39.77481,-104.93618,39.6984,-104.9652
Condensed answer:
import pandas as pd
import json
with open('data.json') as json_data:
d = json.load(json_data)
df = pd.DataFrame(d["ride"])
df["o1"],df["o2"] = zip(*df["origin"].apply(lambda x: (x["lat"],x["lng"])))
df["d1"],df["d2"] = zip(*df["destination"].apply(lambda x: (x["lat"],x["lng"])))
df.iloc[:,2:].to_csv("t_.csv",index=False,header=False)
Or, maybe the most readable solution:
import json
from pandas.io.json import json_normalize
open('data.json') as json_data:
d = json.load(json_data)
df = json_normalize(d["ride"])
cols = ["origin.lat","origin.lng","destination.lat","destination.lng"]
df[cols].to_csv("output.csv",index=False,header=False)
This might help:
import json
import csv
def retrive_json():
with open('data.json') as json_data:
d = json.load(json_data)
array = []
for i in d['ride']:
origin_lat = i['origin']['lat']
origin_lng = i['origin']['lng']
destination_lat = i['destination']['lat']
destination_lng = i['destination']['lng']
array.append([origin_lat,origin_lng,destination_lat,destination_lng])
return array
res = retrive_json()
csv_cols = ["orgin_lat", "origin_lng", "dest_lat", "dest_lng"]
with open("output_csv.csv", 'w') as out:
writer = csv.DictWriter(out, fieldnames=csv_cols)
writer.writeheader()
for each_list in res:
d = dict(zip(csv_cols,each_list))
writer.writerow(d)
Output csv generated is:
orgin_lat,origin_lng,dest_lat,dest_lng
39.72417,-104.99984,39.77446,-104.9379
39.77481,-104.93618,39.6984,-104.9652
To me it looks like you've got an array of arrays and you want the individual elements. Therefore you'll want to use a nested for loop. Your current for loop is getting each array, to then split up each array into it's elements you'll want to loop through those. I'd suggest something like this:
for x in jjson:
for y in x:
wrt.writerow([y])
Obviously you might want to update your bracketing etc this is just me giving you an idea of how to solve your issue.
Let me know how it goes!
Why the csv-Library?
array = [[1, 2, 3, 4], [5, 6, 7, 8]]
with open('test.csv', 'w') as csv_file :
csv_file.write("# Header Info\n" \
"# Value1, Value2, Value3, Value4\n") # The header might be optional
for row in array :
csv_file.write(",".join(row) + "\n")

How to go about storing users data in Json

I have been messing with Json a bit today and doing some trial and error but I couldn't seem to make this work:
def checkForNewBooty(chan):
j = urllib2.urlopen('http://tmi.twitch.tv/group/user/' + chan + '/chatters')
j_obj = json.load(j)
viewers = j_obj['chatters']['viewers']
moderators = j_obj['chatters']['moderators']
for x in viewers and moderators:
print(json.dumps('users' = {'johhny' = {'Points' = 0, 'Time Joined' = 9938}}))
Json example of what I'm trying to do:
{
users = {
"johhnyknoxville"
}
}
What is the proper way of doing this?
Python dictionaries (which are serialized using JSON) use : and not =.
Try:
json.dumps({'users': {'johhny': {'Points': 0, 'Time Joined': 9938}}})

JSON Parsing help in Python

I have below data in JSON format, I have started with code below which throws a KEY ERROR.
Not sure how to get all data listed in headers section.
I know I am not doing it right in json_obj['offers'][0]['pkg']['Info']: but not sure how to do it correctly.
how can I get to different nodes like info,PricingInfo,Flt_Info etc?
{
"offerInfo":{
"siteID":"1",
"language":"en_US",
"currency":"USD"
},
"offers":{
"pkg":[
{
"offerDateRange":{
"StartDate":[
2015,
11,
8
],
"EndDate":[
2015,
11,
14
]
},
"Info":{
"Id":"111"
},
"PricingInfo":{
"BaseRate":1932.6
},
"flt_Info":{
"Carrier":"AA"
}
}
]
}
}
import os
import json
import csv
f = open('api.csv','w')
writer = csv.writer(f,delimiter = '~')
headers = ['Id' , 'StartDate', 'EndDate', 'Id', 'BaseRate', 'Carrier']
default = ''
writer.writerow(headers)
string = open('data.json').read().decode('utf-8')
json_obj = json.loads(string)
for pkg in json_obj['offers'][0]['pkg']['Info']:
row = []
row.append(json_obj['id']) # just to test,but I need column values listed in header section
writer.writerow(row)
It looks like you're accessing the json incorrectly. After you have accessed json_obj['offers'], you accessed [0], but there is no array there. json_obj['offers'] gives you another dictionary.
For example, to get PricingInfo like you asked, access like this:
json_obj['offers']['pkg'][0]['PricingInfo']
or 11 from the StartDate like this:
json_obj['offers']['pkg'][0]['offerDateRange']['StartDate'][1]
And I believe you get the KEY ERROR because you access [0] in the dictionary, which since that isn't a key, you get the error.
try to substitute this piece of code:
for pkg in json_obj['offers'][0]['pkg']['Info']:
row = []
row.append(json_obj['id']) # just to test,but I need column values listed in header section
writer.writerow(row)
With this:
for pkg in json_obj['offers']['pkg']:
row.append(pkg['Info']['Id'])
year = pkg['offerDateRange']['StartDate'][0]
month = pkg['offerDateRange']['StartDate'][1]
day = pkg['offerDateRange']['StartDate'][2]
StartDate = "%d-%d-%d" % (year,month,day)
print StartDate
writer.writerow(row)
Try this
import os
import json
import csv
string = open('data.json').read().decode('utf-8')
json_obj = json.loads(string)
print json_obj["offers"]["pkg"][0]["Info"]["Id"]
print str(json_obj["offers"]["pkg"][0]["offerDateRange"]["StartDate"][0]) +'-'+ str(json_obj["offers"]["pkg"][0]["offerDateRange"]["StartDate"][1])+'-'+str(json_obj["offers"]["pkg"][0]
["offerDateRange"]["StartDate"][2])
print str(json_obj["offers"]["pkg"][0]["offerDateRange"]["EndDate"][0]) +'-'+ str(json_obj["offers"]["pkg"][0]["offerDateRange"]["EndDate"][1])+'-'+str(json_obj["offers"]["pkg"][0]
["offerDateRange"]["EndDate"][2])
print json_obj["offers"]["pkg"][0]["Info"]["Id"]
print json_obj["offers"]["pkg"][0]["PricingInfo"]["BaseRate"]
print json_obj["offers"]["pkg"][0]["flt_Info"]["Carrier"]

Getting certain information from string

I'm new to python as was wondering how I could get the estimatedWait and routeName from this string.
{
"lastUpdated": "07:52",
"filterOut": [],
"arrivals": [
{
"routeId": "B16",
"routeName": "B16",
"destination": "Kidbrooke",
"estimatedWait": "due",
"scheduledTime": "06: 53",
"isRealTime": true,
"isCancelled": false
},
{
"routeId":"B13",
"routeName":"B13",
"destination":"New Eltham",
"estimatedWait":"29 min",
"scheduledTime":"07:38",
"isRealTime":true,
"isCancelled":false
}
],
"serviceDisruptions":{
"infoMessages":[],
"importantMessages":[],
"criticalMessages":[]
}
}
And then save this to another string which would be displayed on the lxterminal of the raspberry pi 2. I would like only the 'routeName' of B16 to be saved to the string. How do I do that?
You just have to deserialise the object and then use the index to access the data you want.
To find only the B16 entries you can filter the arrivals list.
import json
obj = json.loads(json_string)
# filter only the b16 objects
b16_objs = filter(lambda a: a['routeName'] == 'B16', obj['arrivals'])
if b16_objs:
# get the first item
b16 = b16_objs[0]
my_estimatedWait = b16['estimatedWait']
print(my_estimatedWait)
You can use string.find() to get the indices of those value identifiers
and extract them.
Example:
def get_vaules(string):
waitIndice = string.find('"estimatedWait":"')
routeIndice = string.find('"routeName":"')
estimatedWait = string[waitIndice:string.find('"', waitIndice)]
routeName = string[routeIndice:string.find('"', routeIndice)]
return estimatedWait, routeName
Or you could just deserialize the json object (highly recommended)
import json
def get_values(string):
jsonData = json.loads(string)
estimatedWait = jsonData['arrivals'][0]['estimatedWait']
routeName = jsonData['arrivals'][0]['routeName']
return estimatedWait, routeName
Parsing values from a JSON file using Python?

Categories

Resources