How can I convert this byte or string to a dataframe? - python

I have a data in this format(bytes):
b'{"datatable":{"data":[["AAPL","1980-12-12",28.75,28.87,28.75,28.75,2093900.0,0.0,1.0,0.42270591588018,0.42447025361603,0.42270591588018,0.42270591588018,117258400.0],
["AAPL","1980-12-15",27.38,27.38,27.25,27.25,785200.0,0.0,1.0,0.40256306006259,0.40256306006259,0.40065169418209,0.40065169418209,43971200.0],
["AAPL","1980-12-16",25.37,25.37,25.25,25.25,472000.0,0.0,1.0,0.37301040298714,0.37301040298714,0.37124606525129,0.37124606525129,26432000.0],
["AAPL","1980-12-17",25.87,26.0,25.87,25.87,385900.0,0.0,1.0,0.38036181021984,0.38227317610034,0.38036181021984,0.38036181021984,21610400.0],
["AAPL","1980-12-18",26.63,26.75,26.63,26.63,327900.0,0.0,1.0,0.39153594921354,0.39330028694939,0.39153594921354,0.39153594921354,18362400.0],
["AAPL","1980-12-19",28.25,28.38,28.25,28.25,217100.0,0.0,1.0,0.41535450864748,0.41726587452798,0.41535450864748,0.41535450864748,12157600.0],
.....,{"name":"adj_high","type":"BigDecimal(50,28)"},{"name":"adj_low","type":"BigDecimal(50,28)"},{"name":"adj_close","type":"BigDecimal(50,28)"},{"name":"adj_volume","type":"double"}]},"meta":{"next_cursor_id":null}}'
I can convert this by using .decode('utf-8'). However, I want to convert the type into DataFrame or some other format so that I can work with this data.
Any help would be appreciated.
Here are errors when I try pd.DataFrame()
ValueError: DataFrame constructor not properly called!
Thank you for giving me great direction!
I have used
apple = json.loads(apple1)
apple
to get
{'datatable': {'columns': [{'name': 'ticker', 'type': 'String'},
{'name': 'date', 'type': 'Date'},
{'name': 'open', 'type': 'BigDecimal(34,12)'},
{'name': 'high', 'type': 'BigDecimal(34,12)'},
{'name': 'low', 'type': 'BigDecimal(34,12)'},
{'name': 'close', 'type': 'BigDecimal(34,12)'},
{'name': 'volume', 'type': 'BigDecimal(37,15)'},
{'name': 'ex-dividend', 'type': 'BigDecimal(42,20)'},
{'name': 'split_ratio', 'type': 'double'},
{'name': 'adj_open', 'type': 'BigDecimal(50,28)'},
{'name': 'adj_high', 'type': 'BigDecimal(50,28)'},
{'name': 'adj_low', 'type': 'BigDecimal(50,28)'},
{'name': 'adj_close', 'type': 'BigDecimal(50,28)'},
{'name': 'adj_volume', 'type': 'double'}],
'data': [['AAPL',
'1980-12-12',
28.75,
28.87,
28.75,
28.75,
2093900.0,
0.0,
1.0,
0.42270591588018,
0.42447025361603,
0.42270591588018,
0.42270591588018,
117258400.0],
['AAPL',
'1980-12-15',
27.38,
27.38,
27.25,
27.25,
785200.0,
0.0,
1.0,
0.40256306006259,
0.40256306006259,
0.40065169418209,
0.40065169418209,
43971200.0],
and if I run:
pd.DataFrame(apple['datatable']['data'])
I get:
apple dataframe
Which is good, but I would like to have column name as: [date, open, high, low, close, volume, ex-dividend, split_ratio, adj_open, adj_high, adj_low, adj_close, adj_volume] rather than [0,1,2,3,4,5,6,7,8,9,10,11,12,13].
Also, I would like to delete current column 1('AAPL') and index as numbers so that it looks like a time series with date as the first column.
Can you help me on this?

You might need to tidy up the data first but doing the following works.
import json
import pandas as pd
pd.DataFrame(json.loads(data.decode('utf-8'))['datatable']['data'])

Related

Apply function to specific element's value of a list of dictionaries [duplicate]

This question already has answers here:
Getting a map() to return a list in Python 3.x
(11 answers)
Closed last month.
tbl_headers = db_admin.execute("SELECT name, type FROM PRAGMA_TABLE_INFO(?);", table_name)
tbl_headers is same below:
[{'name': 'id', 'type': 'INTEGER'}, {'name': 'abasdfasd', 'type': 'TEXT'}, {'name': 'sx', 'type': 'TEXT'}, {'name': 'password', 'type': 'NULL'}, {'name': 'asdf', 'type': 'TEXT'}]
I need apply hash_in() function on the 'name' values are in dictionary elements of above list.
Have tried these:
tbl_headers = [hash_in(i['name']) for i in tbl_headers]
suppresses dictionaries and return only a list of 'name' values:
['sxtw001c001h', 'sxtw001c001r001Z001e001c001r001Z001a001Z', 'sxtw001w001r', 'sxtw001c001q001n001v001r001r001Z001o', 'sxtw001e001c001r001Z']
OR
tbl_headers = map(hash_in, tbl_headers)
Returns error.
Update
The Output result I have seek is same:
[{'name': hash_in('id'), 'type': 'INTEGER'}, {'name': hash_in('abasdfasd'), 'type': 'TEXT'}, {'name': hash_in('sx'), 'type': 'TEXT'}, {'name': ('password'), 'type': 'NULL'}, {'name': ('asdf'), 'type': 'TEXT'}]
Appreciate you.
Try this list comprehension:
tbl_headers = [{'name': hash_in(i['name']), 'type': i['type']} for i in tbl_headers]

Reading a JSON with multiple nested lists and transforming into a DataFrama

I have a JSON inside a list. And this JSON have lists inside of lists. Something like that:
my data = [{'page': 1,
'page_size': 100,
'total_pages': 11,
'total_results': 1057,
'items': [{'jw_entity_id': 'ts88361',
'id': 88361,
'title': 'Love, Death & Robots',
'object_type': 'show',
'scoring': [{'provider_type': 'imdb:votes', 'value': 131937},
{'provider_type': 'tmdb:score', 'value': 8.2},
{'provider_type': 'imdb:score', 'value': 8.4}]},
{'jw_entity_id': 'tm374139',
'id': 374139,
'title': 'Sonic - O Filme',
'object_type': 'movie',
'scoring': [{'provider_type': 'tmdb:id', 'value': 454626},
{'provider_type': 'imdb:score', 'value': 6.5},
{'provider_type': 'tmdb:score', 'value': 7.4}]
I managed to transform it into a DataFrame, but one of the column scoring/provider_type still with values nested. How can I "unpack" that list and integrate into de DataFrame?
from pandas import json_normalize
df = pd.concat([json_normalize(entry, 'items')
for entry in my_data])
This is what I get now:
{'jw_entity_id': {0: 'ts88361', 1: 'tm374139'},
'id': {0: 88361, 1: 374139},
'title': {0: 'Love, Death & Robots', 1: 'Sonic - O Filme'},
'object_type': {0: 'show', 1: 'movie'},
'scoring': {0: [{'provider_type': 'imdb:votes', 'value': 131937},
{'provider_type': 'tmdb:score', 'value': 8.2},
{'provider_type': 'imdb:score', 'value': 8.4}],
1: [{'provider_type': 'tmdb:id', 'value': 454626},
{'provider_type': 'imdb:score', 'value': 6.5},
{'provider_type': 'tmdb:score', 'value': 7.4}]}}
I need the scoring column "unpacked", with the imdb:score as a column.
The Structure of your dictionaries in the scoring column is a bit convoluted with the repeating keys.
You can concatenate Dataframes created from these lists:
df = pd.concat([pd.json_normalize(entry, 'items')
for entry in my_data])
df_scor = pd.concat([
pd.DataFrame({x['provider_type']: [x['value']]
for x in l }
)
for l in df['scoring'].to_list()
]).reset_index(drop=True)
df = df.drop('scoring', axis=1).join(df_scor['imdb:score']) # here we keep only imdb:score
print(df)
Output:
jw_entity_id id title object_type imdb:score
0 ts88361 88361 Love, Death & Robots show 8.4
1 tm374139 374139 Sonic - O Filme movie 6.5

How to load a nested json file into a pandas DataFrame

please help I cannot seem to get the json data into a Dataframe.
loaded the data
data =json.load(open(r'path'))#this works fine and displays:
json data
{'type': 'FeatureCollection', 'name': 'Altstadt Nord', 'crs': {'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}}, 'features': [{'type': 'Feature', 'properties': {'Name': 'City-Martinsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9595637, 50.9418396], [6.956624, 50.9417382], [6.9543173, 50.941603], [6.9529869, 50.9413664], [6.953062, 50.9408593], [6.9532873, 50.9396289], [6.9533624, 50.9388176], [6.9529333, 50.9378373], [6.9527509, 50.9371815], [6.9528367, 50.9360659], [6.9532122, 50.9352884], [6.9540705, 50.9350653], [6.9553258, 50.9350044], [6.9568815, 50.9351667], [6.9602074, 50.9355047], [6.9608189, 50.9349165], [6.9633939, 50.9348827], [6.9629433, 50.9410622], [6.9616236, 50.9412176], [6.9603898, 50.9414881], [6.9595637, 50.9418396]]]}}, {'type': 'Feature', 'properties': {'Name': 'Gereonsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9629433, 50.9410622], [6.9629433, 50.9431646], [6.9611408, 50.9433539], [6.9601752, 50.9436649], [6.9588234, 50.9443409], [6.9579651, 50.9449763], [6.9573213, 50.945801], [6.9563128, 50.9451926], [6.9551756, 50.9448546], [6.9535663, 50.9446518], [6.9523432, 50.9449763], [6.9494464, 50.9452602], [6.9473435, 50.9454495], [6.9466998, 50.9456928], [6.9458415, 50.946531], [6.9434168, 50.9453954], [6.9424726, 50.9451926], [6.9404342, 50.9429888], [6.9404771, 50.9425156], [6.9403269, 50.9415016], [6.9400479, 50.9405281], [6.9426228, 50.9399872], [6.9439103, 50.9400143], [6.9453051, 50.9404875], [6.9461634, 50.9408931], [6.9467427, 50.941096], [6.9475581, 50.9410013], [6.9504227, 50.9413191], [6.9529869, 50.9413664], [6.9547464, 50.9416368], [6.9595637, 50.9418396], [6.9603898, 50.9414881], [6.9616236, 50.9412176], [6.9629433, 50.9410622]]]}}, {'type': 'Feature', 'properties': {'Name': 'Kunibertsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9629433, 50.9431646], [6.9637129, 50.9454917], [6.9651506, 50.9479252], [6.9666097, 50.9499124], [6.9667599, 50.9500882], [6.9587777, 50.9502504], [6.9573213, 50.945801], [6.9579651, 50.9449763], [6.9588234, 50.9443409], [6.9601752, 50.9436649], [6.9611408, 50.9433539], [6.9629433, 50.9431646]]]}}, {'type': 'Feature', 'properties': {'Name': 'Nördlich Neumarkt', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9390331, 50.9364418], [6.9417153, 50.9358738], [6.9462214, 50.9358062], [6.9490109, 50.9355628], [6.9505129, 50.9353329], [6.9523798, 50.9352924], [6.9532122, 50.9352884], [6.9528367, 50.9360659], [6.9527509, 50.9371815], [6.9529333, 50.9378373], [6.9533624, 50.9388176], [6.9532381, 50.9398222], [6.9529869, 50.9413664], [6.9504227, 50.9413191], [6.9475581, 50.9410013], [6.9467427, 50.941096], [6.9453051, 50.9404875], [6.9439103, 50.9400143], [6.9424663, 50.9399574], [6.9400479, 50.9405281], [6.9390331, 50.9364418]]]}}]}
now i cannot seem to fit it into a Dataframe //
pd.DataFrame(data) --> ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.full error
I tried to flatten with json_flatten but ModuleNotFoundError: No module named 'flatten_json' even though I installed json-flatten via pip install
also tried df =pd.DataFrame.from_dict(data,orient='index')
df
Out[22]:
0
type FeatureCollection
name Altstadt Nord
crs {'type': 'name', 'properties': {'name': 'urn:o...
features [{'type': 'Feature', 'properties': {'Name': 'C...
df Out[22]
I think you can use json_normalize to load them to pandas.
test.json in this case is your full json file (with double quotes).
import json
from pandas.io.json import json_normalize
with open('path_to_json.json') as f:
data = json.load(f)
df = json_normalize(data, record_path=['features'], meta=['name'])
print(df)
This results in a dataframe as shown below.
You can further add record field in the normalize method to create more columns for the polygon coordinates.
You can find more documentation at https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html
Hope that helps.
The json data contains elements with different datatypes and these cannot be loaded into one single dataframe.
View datatypes in the json:
[type(data[k]) for k in data.keys()]
# Out: [str, str, dict, list]
data.keys()
# Out: dict_keys(['type', 'name', 'crs', 'features'])
You can load each single chunk of data in a separate dataframe like this:
df_crs = pd.DataFrame(data['crs'])
df_features = pd.DataFrame(data['features'])
data['type'] and data['name'] are strings
data['type']
# Out 'FeatureCollection'
data['name']
# Out 'Altstadt Nord'

How to convert json into a pandas dataframe?

I'm trying to covert an api response from json to a dataframe in pandas. the problem I am having is that de data is nested in the json format and I am not getting the right columns in my dataframe.
The data is collect from a api with the following format:
{'tickets': [{'url': 'https...',
'id': 1,
'external_id': None,
'via': {'channel': 'web',
'source': {'from': {}, 'to': {}, 'rel': None}},
'created_at': '2020-05-01T04:16:33Z',
'updated_at': '2020-05-23T03:02:49Z',
'type': 'incident',
'subject': 'Subject',
'raw_subject': 'Raw subject',
'description': 'Hi, this is the description',
'priority': 'normal',
'status': 'closed',
'recipient': None,
'requester_id': 409467360874,
'submitter_id': 409126461453,
'assignee_id': 409126461453,
'organization_id': None,
'group_id': 360009916453,
'collaborator_ids': [],
'follower_ids': [],
'email_cc_ids': [],
'forum_topic_id': None,
'problem_id': None,
'has_incidents': False,
'is_public': True,
'due_at': None,
'tags': ['tag_1',
'tag_2',
'tag_3',
'tag_4'],
'custom_fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
{'id': 360041487874, 'value': 'value of the second custom field'},
{'id': 360041489414, 'value': 'value of the third custom field'},
{'id': 360040980053, 'value': 'correo_electrónico'},
{'id': 360040980373, 'value': 'suscribe_newsletter'},
{'id': 360042046173, 'value': None},
{'id': 360041028574, 'value': 'product'},
{'id': 360042103034, 'value': None}],
'satisfaction_rating': {'score': 'unoffered'},
'sharing_agreement_ids': [],
'comment_count': 2,
'fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
{'id': 360041487874, 'value': 'value of the second custom field'},
{'id': 360041489414, 'value': 'value of the third custom field'},
{'id': 360040980053, 'value': 'correo_electrónico'},
{'id': 360040980373, 'value': 'suscribe_newsletter'},
{'id': 360042046173, 'value': None},
{'id': 360041028574, 'value': 'product'},
{'id': 360042103034, 'value': None}],
'followup_ids': [],
'ticket_form_id': 360003608013,
'deleted_ticket_form_id': 360003608013,
'brand_id': 360004571673,
'satisfaction_probability': None,
'allow_channelback': False,
'allow_attachments': True},
What I already tried is the following: I have converted the JSON format into a dict as following:
x = response.json()
df = pd.DataFrame(x['tickets'])
But I'm struggling with the output. I don't know how to get a correct, ordered, normalized dataframe.
(I'm new in this :) )
Let's supose you get your request data by this code r = requests.get(url, auth)
Your data ins't clear yet, so let's get a dataframe of it data = pd.read_json(json.dumps(r.json, ensure_ascii = False))
But, probably you will get a dataframe with one single row.
When I faced a problem like this, I wrote this function to get the full data:
listParam = []
def listDict(entry):
if type(entry) is dict:
listParam.append(entry)
elif type(entry) is list:
for ent in entry:
listDict(ent)
Because your data looks like a dict because of {'tickets': ...} you will need to get the information like that:
listDict(data.iloc[0][0])
And then,
pd.DataFrame(listParam)
I can't show the results because you didn't post the complete data nor told where I can find the data to test, but this will probably work.
You have to convert the json to dictionary first and then convert the dictionary value for key 'tickets' into dataframe.
file = open('file.json').read()
ticketDictionary = json.loads(file)
df = pd.DataFrame(ticketDictionary['tickets'])
'file.json' contains your data here.
df now contains your dataFrame in this format.
For the lists within the response you can have separate dataframes if required:
for field in df['fields']:
df = pd.DataFrame(field)
It will give you this for lengths:
id value
0 360042034433 value of the first custom field
1 360041487874 value of the second custom field
2 360041489414 value of the third custom field
3 360040980053 correo_electrónico
4 360040980373 suscribe_newsletter
5 360042046173 None
6 360041028574 product
7 360042103034 None
This can be one way to structure as you haven't mentioned the exact expected format.

Convert nested dictionary within JSON from a string

I have JSON data that I loaded that appears to have a bit of a messy data structure where nested dictionaries are wrapped in single quotes and recognized as a string, rather than a single dictionary which I can loop through. What is the best way to drop the single quotes from the key-value property ('value').
Provided below is an example of the structure:
for val in json_data:
print(val)
{'id': 'status6',
'title': 'Estimation',
'text': '> 2 days',
'type': 'color',
'value': '{"index":14,"post_id":null,"changed_at":"2020-06-12T09:04:58.659Z"}',
'name': 'Internal: online course'},
{'id': 'date',
'title': 'Deadline',
'text': '2020-06-26',
'type': 'date',
'value': '{"date":"2020-06-26","changed_at":"2020-06-12T11:33:37.195Z"}',
'name': 'Internal: online course'},
{'id': 'tags',
'title': 'Tags',
'text': 'Internal',
'type': 'tag',
'value': '{"tag_ids":[3223513]}',
'name': 'Internal: online course'},
If I add a nested look targeting ['value'], it loops by character and not key-value pair in the dictionary.
Using json.loads to convert string to dict
import json
json_data = [{'id': 'status6',
'title': 'Estimation',
'text': '> 2 days',
'type': 'color',
'value': '{"index":14,"post_id":null,"changed_at":"2020-06-12T09:04:58.659Z"}',
'name': 'Internal: online course'},
{'id': 'date',
'title': 'Deadline',
'text': '2020-06-26',
'type': 'date',
'value': '{"date":"2020-06-26","changed_at":"2020-06-12T11:33:37.195Z"}',
'name': 'Internal: online course'},
{'id': 'tags',
'title': 'Tags',
'text': 'Internal',
'type': 'tag',
'value': '{"tag_ids":[3223513]}',
'name': 'Internal: online course'}]
# the result is a Python dictionary:
for val in json_data:
print(json.loads(val['value']))
this should be work!!

Categories

Resources