map JSON to CSV - python

I try to request data from a sites XHR and then save it as a CSV. The response is JSON and one part of the response is nested.
I have two issues:
First: It doesn't iterate. It returns only data for the first object but in two rows. The same data on both rows except from the last column where the first row have floorPlan:AltText and the second row floorPlan:url
Second: I doesn't like my some of my characters"'charmap' codec can't encode character '\u2560' in position 147:...". Seems to be some UTF-8 problem
The format of the response is (it is shortened to be more readable here):
[{
"id":"67d3686f-848b-e911-a971",
"name":"1302",
"url":"/site/",
"residenceType":"3",
"objectStatus":"1",
"price":570000.0,
"fee":245.0,
"apartmentNumber":"1302",
"address":"Major street 8",
"rooms":4.0,
"floor":3.0,
"primaryArea":92.0,
"inhabitDate":"2022-02-28T23:00:00Z",
"floorPlan":{"url":"/externalfiles/image/1.jpg","altText":"Drawing"}},
{"id":"69d3686f-848b-e911-a971-000d3ab795ed",
"name":"1303",
"url":"/site2/",
"residenceType":"3",
"objectStatus":"1",
"price":320000.0,
"fee":113.0,
"apartmentNumber":"1303",
"address":"Major Street 8",
"rooms":2.0,
"floor":3.0,
"primaryArea":47.0,
"inhabitDate":"2022-02-28T23:00:00Z",
"floorPlan":{"url":"/externalfiles/image/2.jpg","altText":"Drawing"}},
And my code is:
import requests
import pandas as pd
import csv
import json
h = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
u = "https://cdn-search-standard-prod.azureedge.net/api/v1/search/getstageobjects/23d8dbc1-005a-e911-a961-000d3aba65fd"
x = requests.get(u,headers=h).json()
f = csv.writer(open("test.csv", "w+"))
f.writerow(["id", "name", "residenceType", "objectStatus", "price", "fee", "apartmentNumber"])
for x in x:
f.writerow([x["id"],
x["name"],
x["residenceType"],
x["objectStatus"],
x["price"],
x["fee"],
x["apartmentNumber"],
x["floorPlan"]["url"]])
df=pd.DataFrame(x)
df.to_csv(r'C:\Users\abc\Documents\Python Scripts\file_20200627.csv', index=False, sep=';',encoding='utf-8')

if you want to convert x to csv use pandas you are tring to convert x to csv but x is not a csv writer object and also you are writing rows at the same time
import requests
import pandas as pd
import csv
import json
h = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
u = "https://cdn-search-standard-prod.azureedge.net/api/v1/search/getstageobjects/23d8dbc1-005a-e911-a961-000d3aba65fd"
x = requests.get(u,headers=h).json()
f = csv.writer(open("test.csv", "w+"))
f.writerow(["id", "name", "residenceType", "objectStatus", "price", "fee", "apartmentNumber"])
print(x)
for x in x:
f.writerow([x["id"],
x["name"],
x["residenceType"],
x["objectStatus"],
x["price"],
x["fee"],
x["apartmentNumber"],
x["floorPlan"]["url"]])
df=pd.DataFrame(x)
df.to_csv(r'C:\Users\abc\Documents\file.csv', index=False, sep=';',encoding='utf-8')
output
id name url residenceType objectStatus price fee secondaryArea apartmentNumber address rooms floor maxFloor concept primaryArea plotArea inhabitDate inhabitDateEnd inhabitDateTimeStamp inhabitDateDetermined swanLabel parkingSpaceStatus floorPlan
altText 75d3686f-848b-e911-a971-000d3ab795ed 1504 /stockholm-lan/jarfalla-kommun/bolinder-strand... 3 1 3840000.0 1974.0 0.0 1504 Fabriksvägen 8 3.0 5.0 15.0 782ffe43-f9ef-40eb-8d92-cfd90aa6b147 73.0 0.0 2022-02-28T23:00:00Z 2022-04-30T22:00:00Z 1646089200000 False True 1 Planritning
url 75d3686f-848b-e911-a971-000d3ab795ed 1504 /stockholm-lan/jarfalla-kommun/bolinder-strand... 3 1 3840000.0 1974.0 0.0 1504 Fabriksvägen 8 3.0 5.0 15.0 782ffe43-f9ef-40eb-8d92-cfd90aa6b147 73.0 0.0 2022-02-28T23:00:00Z 2022-04-30T22:00:00Z 1646089200000 False True 1 /externalfiles/image/23d8dbc1-005a-e911-a961-0...
output csv file

Related

How to change Json data output in table format

import requests
from pprint import pprint
import pandas as pd
baseurl = "https://www.nseindia.com/"
url = f'https://www.nseindia.com/api/live-analysis-oi-spurts-underlyings'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, '
'like Gecko) '
'Chrome/80.0.3987.149 Safari/537.36',
'accept-language': 'en,gu;q=0.9,hi;q=0.8', 'accept-encoding': 'gzip, deflate, br'}
session = requests.Session()
request = session.get(baseurl, headers=headers, timeout=30)
cookies = dict(request.cookies)
res = session.get(url, headers=headers, timeout=30, cookies=cookies)
print(res.json())
I tried df = pd.DataFrame(res.json()) but couldn't get data in table format. How to do that Plz. Also how to select few particular columns only in data output instead of all columns.
Try this :
import json
import codecs
df = pd.DataFrame(json.loads(codecs.decode(bytes(res.text, 'utf-8'), 'utf-8-sig'))['data'])
And to select a specific columns, you can use :
mini_df = df[['symbol', 'latestOI', 'prevOI', 'changeInOI', 'avgInOI']]
>>> print(mini_df)

How to get correct date format from JSON string in Python?

I am trying to get some data from a JSON url using Python and convert in Pandas PD. Everything is working OK. Only there is a column for date. It is coming weired. How can I format it into correct date format? My code is given below:
sym_1 = 'NIFTY'
headers_gen = {"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36"}
def PCR(sym_1):
url_pcr = "https://opstra.definedge.com/api/futures/pcr/chart/" + sym_1
req_pcr = requests.get(url_pcr, headers=headers_gen)
text_data_pcr= req_pcr.text
json_dict_pcr= json.loads(text_data_pcr)
df_pcr = pd.DataFrame.from_dict(json_dict_pcr['data'])
print(df_pcr)
return df_pcr
pd.to_datetime(..., unit="ms") fixes things.
I also simplified the requests code a tiny bit and added error handling.
import pandas as pd
import requests
headers_gen = {
"accept-encoding": "gzip, deflate",
"accept-language": "en-US,en;q=0.9",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36",
}
def PCR(sym_1):
req_pcr = requests.get(f"https://opstra.definedge.com/api/futures/pcr/chart/{sym_1}", headers=headers_gen)
req_pcr.raise_for_status()
data = req_pcr.json()
df_pcr = pd.DataFrame.from_dict(data['data'])
df_pcr[0] = pd.to_datetime(df_pcr[0], unit='ms')
return df_pcr
if __name__ == '__main__':
print(PCR('NIFTY'))
outputs
0 1 2 ... 6 7 8
0 2019-04-26 05:30:00 11813.50 1.661348 ... NaN NaN NaN
1 2019-04-30 05:30:00 11791.55 1.587803 ... NaN NaN NaN
2 2019-05-02 05:30:00 11765.40 1.634619 ... NaN NaN NaN
.. ... ... ... ... ... ... ...
735 2022-04-18 00:00:00 17229.60 1.169555 ... 0.963420 0.771757 1.328892
736 2022-04-19 00:00:00 16969.35 1.014768 ... 1.385167 0.980847
sym_1 = 'NIFTY'
headers_gen = {"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36"}
def PCR(sym_1):
url_pcr = "https://opstra.definedge.com/api/futures/pcr/chart/" + sym_1
req_pcr = requests.get(url_pcr, headers=headers_gen)
text_data_pcr= req_pcr.text
json_dict_pcr= json.loads(text_data_pcr)
df_pcr = pd.DataFrame.from_dict(json_dict_pcr['data'])
df_pcr[0] = df_pcr[0].apply(lambda x: datetime.utcfromtimestamp(x / 1000).astimezone(pytz.timezone('Asia/Kolkata')))
print(df_pcr)
return df_pcr
Updated to use apply and return datetime instead of string but AKX's answer is much more elegant.
Updated to use IST

Formatting Python beautifulsoup data and remove duplicates first columns values

I have the following snippet that already works however, I wanted to clean up a bit in the formatting by removing some duplicates 1st column data and make it more readable.
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import re, random, ctypes
import requests
from time import sleep
url = 'https://bscscan.com/tokentxns'
user_agent_list = [
"header = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0Gecko/20100101 Firefox/86.0'}",
"header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'}",
"header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15'}",
"header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'}",
"header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'}",
"header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'}"
]
header = random.choice(user_agent_list)
pausesleep = float(random.randint(10000,30000)) / 10000 #orig
req = requests.get(url,header, timeout=10)
soup = BeautifulSoup(req.content, 'html.parser')
rows = soup.findAll('table')[0].findAll('tr')
for row in rows[1:]:
tds = row.find_all('td')
txnhash = tds[1].text[0:]
age = tds[2].text[0:]
value = tds[7].text[0:]
token = tds[8].text[0:]
link = urljoin(url, tds[8].find('a')['href'])
print (str(txnhash) + " " + str(value) + " " + str(token))
Current Output:
0x70e16e1cbcd30d1c3a2abb03a3d3c43fc324aa794c45b10cd5ef1001e9af0915 899.885819768 TrusterCoin (TSC)
0x70e16e1cbcd30d1c3a2abb03a3d3c43fc324aa794c45b10cd5ef1001e9af0915 0.62679168 Wrapped BNB (WBNB)
0x52d862d3f920370d84039f2dccb40edc7343699310d3436b71738d4176997398 388,214,984,514.909719227 WoofCoin (WOOF)
0x52d862d3f920370d84039f2dccb40edc7343699310d3436b71738d4176997398 0.003 Wrapped BNB (WBNB)
0x4fe83f2ebad772b4292e81f418a6f54572f7462934358a356787f8d777c58c8b 26.737674146727101117 Binance-Peg ... (BUSD)
0x4fe83f2ebad772b4292e81f418a6f54572f7462934358a356787f8d777c58c8b 1.251364193609566793 Binance-Peg ... (ADA)
0x4fe83f2ebad772b4292e81f418a6f54572f7462934358a356787f8d777c58c8b 0.03997685638568537 Binance-Peg ... (ADA)
0x4fe83f2ebad772b4292e81f418a6f54572f7462934358a356787f8d777c58c8b 0.041171860015645402 Binance-Peg ... (ADA)
0x4fe83f2ebad772b4292e81f418a6f54572f7462934358a356787f8d777c58c8b 0.089939749761843203 Wrapped BNB (WBNB)
Wanted Improvement:
0x70e16e1cbcd30d1c3a2abb03a3d3c43fc324aa794c45b10cd5ef1001e9af0915 899.885819768 TrusterCoin (TSC)
0.62679168 Wrapped BNB (WBNB)
0x52d862d3f920370d84039f2dccb40edc7343699310d3436b71738d4176997398 388,214,984,514.909719227 WoofCoin (WOOF)
0.003 Wrapped BNB (WBNB)
0x4fe83f2ebad772b4292e81f418a6f54572f7462934358a356787f8d777c58c8b 26.737674146727101117 Binance-Peg ... (BUSD)
1.251364193609566793 Binance-Peg ... (ADA)
0.03997685638568537 Binance-Peg ... (ADA)
0.041171860015645402 Binance-Peg ... (ADA)
0.089939749761843203 Wrapped BNB (WBNB)
Try this:
from urllib.request import Request, urlopen,urljoin
from bs4 import BeautifulSoup
import re, random, ctypes
import requests
from time import sleep
url = 'https://bscscan.com/tokentxns'
user_agent_list = [
"header = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0Gecko/20100101 Firefox/86.0'}",
"header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'}",
"header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15'}",
"header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'}",
"header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'}",
"header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'}"
]
header = random.choice(user_agent_list)
pausesleep = float(random.randint(10000,30000)) / 10000
req = requests.get(url,header, timeout=10)
soup = BeautifulSoup(req.content, 'html.parser')
rows = soup.findAll('table')[0].findAll('tr')
ne=[]
for row in rows[1:]:
tds = row.find_all('td')
txnhash = tds[1].text[0:]
age = tds[2].text[0:]
value = tds[7].text[0:]
token = tds[8].text[0:]
link = urljoin(url, tds[8].find('a')['href'])
if str(txnhash) not in ne:
ne.append(str(txnhash))
print (str(txnhash),end=" ")
else:# If you want those tab also then. Otherwise remove else
print("\t\t\t",end=" ")
print(str(value) + " " + str(token))
We are creating list of txnhash in ne and then checking everytime if new txnhash is in that list or not.

Pandas include Key to json file

import requests
import pandas as pd
import json
url = 'http://www.fundamentus.com.br/resultado.php'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
fundamentus = requests.get(url, headers=headers)
dfs = pd.read_html(fundamentus.text)
table = dfs[0]
table.to_json('table7.json', orient='records', indent=2)
this is giving me the following:
[{
"Papel":"VNET3",
"Cota\u00e7\u00e3o":0.0,
"P\/L":0.0,
"P\/VP":0.0,
"PSR":0.0,
"Div.Yield":"0,00%",
"P\/Ativo":0.0,
"P\/Cap.Giro":0,
"P\/EBIT":0.0,
"P\/Ativ Circ.Liq":0,
"EV\/EBIT":0.0,
"EV\/EBITDA":0.0,
"Mrg Ebit":"0,00%",
"Mrg. L\u00edq.":"0,00%",
"Liq. Corr.":0,
"ROIC":"0,00%",
"ROE":"12,99%",
"Liq.2meses":"000",
"Patrim. L\u00edq":"9.257.250.00000",
"D\u00edv.Brut\/ Patrim.":0.0,
"Cresc. Rec.5a":"-2,71%"
},
{
"Papel":"CFLU4",
"Cota\u00e7\u00e3o":1.0,
"P\/L":0.0,
"P\/VP":0.0,
"PSR":0.0,
"Div.Yield":"0,00%",
"P\/Ativo":0.0,
"P\/Cap.Giro":0,
"P\/EBIT":0.0,
"P\/Ativ Circ.Liq":0,
"EV\/EBIT":0.0,
"EV\/EBITDA":0.0,
"Mrg Ebit":"8,88%",
"Mrg. L\u00edq.":"10,72%",
"Liq. Corr.":110,
"ROIC":"17,68%",
"ROE":"32,15%",
"Liq.2meses":"000",
"Patrim. L\u00edq":"60.351.00000",
"D\u00edv.Brut\/ Patrim.":6.0,
"Cresc. Rec.5a":"8,14%"
}
]
But I need the following.
[ VNET3 = {
"Cota\u00e7\u00e3o":0.0,
"P\/L":0.0,
"P\/VP":0.0,
"PSR":0.0,
"Div.Yield":"0,00%",
"P\/Ativo":0.0,
"P\/Cap.Giro":0,
"P\/EBIT":0.0,
"P\/Ativ Circ.Liq":0,
"EV\/EBIT":0.0,
"EV\/EBITDA":0.0,
"Mrg Ebit":"0,00%",
"Mrg. L\u00edq.":"0,00%",
"Liq. Corr.":0,
"ROIC":"0,00%",
"ROE":"12,99%",
"Liq.2meses":"000",
"Patrim. L\u00edq":"9.257.250.00000",
"D\u00edv.Brut\/ Patrim.":0.0,
"Cresc. Rec.5a":"-2,71%"
},
CFLU4 = {
"Cota\u00e7\u00e3o":1.0,
"P\/L":0.0,
"P\/VP":0.0,
"PSR":0.0,
"Div.Yield":"0,00%",
"P\/Ativo":0.0,
"P\/Cap.Giro":0,
"P\/EBIT":0.0,
"P\/Ativ Circ.Liq":0,
"EV\/EBIT":0.0,
"EV\/EBITDA":0.0,
"Mrg Ebit":"8,88%",
"Mrg. L\u00edq.":"10,72%",
"Liq. Corr.":110,
"ROIC":"17,68%",
"ROE":"32,15%",
"Liq.2meses":"000",
"Patrim. L\u00edq":"60.351.00000",
"D\u00edv.Brut\/ Patrim.":6.0,
"Cresc. Rec.5a":"8,14%"
}
]
The enconding is comming wrongly as well.
For example: "Cota\u00e7\u00e3o"
I tried: table.to_json('table7.json',**force_ascii=True**, orient='records', indent=2)
I also tried.
table.to_json('table7.json',**encoding='utf8'**, orient='records', indent=2)
But no success.
So I tried to read with json because the Idea was read it and convert it.
This is the json reader statement.
jasonfile = open('table7.json', 'r')
stocks = jasonfile.read()
jason_object = json.loads(stocks)
print(str(jason_object['Papel']))
But I've got.
**print(str(jason_object['Papel']))
TypeError: list indices must be integers or slices, not str**
Thanks in advance.
You have list with many dictionaries so you have to use index like [0] to get one dictionary
print( jason_object[0]['Papel'] )
And text Cota\u00e7\u00e3o can be correct. It is how JSON keeps native chars.
But if you print it
print('Cota\u00e7\u00e3o')
then you should get
Cotação
When I run
for key in jason_object[0].keys():
print(key)
then I get on screen
VNET3
Papel
Cotação
P/L
P/VP
PSR
Div.Yield
P/Ativo
P/Cap.Giro
P/EBIT
P/Ativ Circ.Liq
EV/EBIT
EV/EBITDA
Mrg Ebit
Mrg. Líq.
Liq. Corr.
ROIC
ROE
Liq.2meses
Patrim. Líq
Dív.Brut/ Patrim.
Cresc. Rec.5a
But if I open table7.json in text editor then I see Cota\u00e7\u00e3o
List [ VNET3 = { .. }] it is not correct JSON nor Python structure.
Correct JSON and Python structure is dictionary { "VNET3": { .. } }
new_data = dict()
for item in jason_object:
key = item['Papel']
item.pop('Papel')
val = item
new_data[key] = val
print(new_data)
Minimal working code
import requests
import pandas as pd
import json
url = 'http://www.fundamentus.com.br/resultado.php'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
response = requests.get(url, headers=headers)
dfs = pd.read_html(response.text)
table = dfs[0]
table.to_json('table7.json', orient='records', indent=2)
jasonfile = open('table7.json', 'r')
jason_object = json.loads(jasonfile.read())
#print(jason_object[0]['Papel'])
#for key in jason_object[0].keys():
# print(key)
new_data = dict()
for item in jason_object:
key = item['Papel']
item.pop('Papel')
val = item
new_data[key] = val
print(new_data)
Tested on Python 3.7, Linux Mint which as default uses UTF-8 in console/terminal.

How to get a string formatted JSON into a table

I have the following string formatted JSON data. How can I convert data into a table format in R or Python?
I've tried df = pd.DataFrame(data), but that doesn't work, because data is a string.
data = '{"Id":"048f7de7-81a4-464d-bd6d-df3be3b1e7e8","RecordType":20, "CreationTime":"2019-10-08T12:12:32","Operation":"SetScheduledRefresh", "OrganizationId":"39b03722-b836-496a-85ec-850f0957ca6b","UserType":0, "UserAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36", "ItemName":"ASO Daily Statistics","Schedules":{"RefreshFrequency":"Daily", "TimeZone":"E. South America Standard Time","Days":["All"], "Time":["07:30:00","10:30:00","13:30:00","16:30:00","19:30:00","22:30:00"]}, "IsSuccess":true,"ActivityId":"4e8b4514-24be-4ba5-a7d3-a69e8cb8229e"}'
Desired Output:
output =
------------------------------------------------------------------
ID | RecordType | CreationTime
048f7de7-81a4-464d-bd6d-df3be3b1e7e8 | 20 | 2019-10-08T12:12:32
Error:
ValueError Traceback (most recent call last)
<ipython-input-26-039b238b38ef> in <module>
----> 1 df = pd.DataFrame(data)
e:\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
483 )
484 else:
--> 485 raise ValueError("DataFrame constructor not properly called!")
486
487 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
In Python:
Given data
str.replace true with True
Use ast.literal_eval to convert data from a str to dict
pandas.io.json.json_normalize to convert the json to a pandas dataframe
import pandas as pd
from ast import literal_eval
from pandas.io.json import json_normalize
data = '{"Id":"048f7de7-81a4-464d-bd6d-df3be3b1e7e8","RecordType":20, "CreationTime":"2019-10-08T12:12:32","Operation":"SetScheduledRefresh", "OrganizationId":"39b03722-b836-496a-85ec-850f0957ca6b","UserType":0, "UserAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36", "ItemName":"ASO Daily Statistics","Schedules":{"RefreshFrequency":"Daily", "TimeZone":"E. South America Standard Time","Days":["All"], "Time":["07:30:00","10:30:00","13:30:00","16:30:00","19:30:00","22:30:00"]}, "IsSuccess":true,"ActivityId":"4e8b4514-24be-4ba5-a7d3-a69e8cb8229e"}'
data = data.replace('true', 'True')
data = literal_eval(data)
{'ActivityId': '4e8b4514-24be-4ba5-a7d3-a69e8cb8229e',
'CreationTime': '2019-10-08T12:12:32',
'Id': '048f7de7-81a4-464d-bd6d-df3be3b1e7e8',
'IsSuccess': True,
'ItemName': 'ASO Daily Statistics',
'Operation': 'SetScheduledRefresh',
'OrganizationId': '39b03722-b836-496a-85ec-850f0957ca6b',
'RecordType': 20,
'Schedules': {'Days': ['All'],
'RefreshFrequency': 'Daily',
'Time': ['07:30:00',
'10:30:00',
'13:30:00',
'16:30:00',
'19:30:00',
'22:30:00'],
'TimeZone': 'E. South America Standard Time'},
'UserAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
'UserType': 0}
Create the dataframe:
df = json_normalize(data)
Id RecordType CreationTime Operation OrganizationId UserType UserAgent ItemName IsSuccess ActivityId Schedules.RefreshFrequency Schedules.TimeZone Schedules.Days Schedules.Time
048f7de7-81a4-464d-bd6d-df3be3b1e7e8 20 2019-10-08T12:12:32 SetScheduledRefresh 39b03722-b836-496a-85ec-850f0957ca6b 0 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36 ASO Daily Statistics True 4e8b4514-24be-4ba5-a7d3-a69e8cb8229e Daily E. South America Standard Time [All] [07:30:00, 10:30:00, 13:30:00, 16:30:00, 19:30:00, 22:30:00]
You will need the reticulate library: You will need to change all true to True. Look at the code below
a <- 'string = {"Id":"048f7de7-81a4-464d-bd6d-df3be3b1e7e8","RecordType":20,
"CreationTime":"2019-10-08T12:12:32","Operation":"SetScheduledRefresh",
"OrganizationId":"39b03722-b836-496a-85ec-850f0957ca6b","UserType":0,
"UserAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36",
"ItemName":"ASO Daily Statistics","Schedules":{"RefreshFrequency":"Daily",
"TimeZone":"E. South America Standard Time","Days":["All"],
"Time":["07:30:00","10:30:00","13:30:00","16:30:00","19:30:00","22:30:00"]},
"IsSuccess":true,"ActivityId":"4e8b4514-24be-4ba5-a7d3-a69e8cb8229e"}'
data.frame(reticulate::py_eval(gsub('true','True',sub('.*=\\s+','',a))))

Categories

Resources