convert json file to data frame in python - python

I am new to json file handling. I have a below json file
{"loanAccount":{"openAccount":[{"accountNumber":"986985874","accountOpenDate":"2020-02-045-11:00","accountCode":"NA","relationship":"Main account","loanTerm":"120"}]}}
I want to convert this into dataframe. I am using the below code :
import pandas as pd
from pandas.io.json import json_normalize
data1 = pd.read_json (r'./DLResponse1.json',lines=True)
df = pd.DataFrame.from_dict(data1, orient='columns')
This is giving me the below output :
index loanAccount
0 {'openAccount': [{'accountNumber': '986985874', 'accountOpenDate': '2020-02-045-11:00', 'accountCode': 'NA', 'relationship': 'Main account', 'loanTerm': '120'}]}}
However I want to extract in the below format :
loanAccount openAccount accountNumber accountOpenDate accountCode relationship loanTerm
986985874 2020-02-045-11:00 NA Main account 120

you may use:
# s is your json, you can read from file
pd.DataFrame(s["loanAccount"]["openAccount"])
output:
if you want also to have the other json keys as columns you may use:
pd.DataFrame([{"loanAccount": '', "openAccount": '', **s["loanAccount"]["openAccount"][0]}])

Related

How to transform json format into string column for python dataframe?

I got this dataframe:
Dataframe: df_case_1
Id RecordType
0 1234 {'attributes': {'type': 'RecordType', 'url': '/services/data/v55.0/sobjects/RecordType/1234', 'name', 'XYZ'}}
1 4321 {'attributes': {'type': 'RecordType', 'url': '/services/data/v55.0/sobjects/RecordType/4321', 'name', 'ABC'}}
I want to have this dataframe:
Dataframe: df_case_final
Id RecordType
0 1234 'XYZ'
1 4321 'ABC'
At the moment I use this statemane but it gives me the name on position 0 for every case object.
df_case_1['RecordType'] = df_case_1.RecordType[0]['Name']
How to build the statement, that I give me the correct name for every id, like in df_case_final?
Thanks
There are 3 Ways you can convert JSON to Pandas Dataframe
# 1. Use json_normalize() to convert JSON to DataFrame
dict= json.loads(data)
df = json_normalize(dict['technologies'])
# 2. Convert JSON to DataFrame Using read_json()
df2 = pd.read_json(jsonStr, orient ='index')
# 3. Use pandas.DataFrame.from_dict() to Convert JSON to DataFrame
dict= json.loads(data)
df2 = pd.DataFrame.from_dict(dict, orient="index")
Now, after converting Json to df take the last column and append it to your original dataframe
split your df by coma & trim un-neccessary cols
import pandas as pd
df=pd.read_csv(r"Hansmuff.csv")
df[['1', '2','3','required']]=df['RecordType'].str.split(',', expand=True)
df = df.drop(columns=['RecordType', '1','2','3'])
df['required'] = df['required'].str.strip('{}')
print(df)
output
Id required
0 1234 'XYZ'
1 4321 'ABC'

JSON_Normalize (nested json) to csv

I have been trying via pandas to extract data from a txt file containing json utf-8 encoded data.
Direct link to data file - http://download.companieshouse.gov.uk/psc-snapshot-2022-02-06_8of20.zip
Data's structure looks like the following examples:
{"company_number":"04732933","data":{"address":{"address_line_1":"Windsor Road","locality":"Torquay","postal_code":"TQ1 1ST","premises":"Windsor Villas","region":"Devon"},"country_of_residence":"England","date_of_birth":{"month":1,"year":1964},"etag":"5623f35e4bb5dc9cb37e134cb2ac0ca3151cd01f","kind":"individual-person-with-significant-control","links":{"self":"/company/04732933/persons-with-significant-control/individual/8X3LALP5gAh5dAYEOYimeiRiJMQ"},"name":"Ms Karen Mychals","name_elements":{"forename":"Karen","surname":"Mychals","title":"Ms"},"nationality":"British","natures_of_control":["ownership-of-shares-50-to-75-percent"],"notified_on":"2016-04-06"}}
{"company_number":"10118870","data":{"address":{"address_line_1":"Hilltop Road","address_line_2":"Bearpark","country":"England","locality":"Durham","postal_code":"DH7 7TL","premises":"54"},"ceased_on":"2019-04-15","country_of_residence":"England","date_of_birth":{"month":9,"year":1983},"etag":"5b3c984156794e5519851b7f1b22d1bbd2a5c5df","kind":"individual-person-with-significant-control","links":{"self":"/company/10118870/persons-with-significant-control/individual/hS6dYoZ234aXhmI6Q9y83QbAhSY"},"name":"Mr Patrick John Burns","name_elements":{"forename":"Patrick","middle_name":"John","surname":"Burns","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-25-to-50-percent","voting-rights-25-to-50-percent"],"notified_on":"2017-04-06"}}
The simplepd.read_json did not work initially (I would get ValueError: Trailing data errors) until lines=true was used (using jupyternotebook for this).
import pandas as pd
import json
df = pd.read_json(r'E:\JSON_data\psc-snapshot-2022-02-06_8of20.txt', encoding='utf8', lines=True)
this is how the data structure is displayed via df.head() :
company_number data
0 06851805 {'address': {'address_line_1': 'Briar Road', '...
1 04732933 {'address': {'address_line_1': 'Windsor Road',...
2 10118870 {'address': {'address_line_1': 'Hilltop Road',...
3 10118870 {'address': {'address_line_1': 'Hilltop Road',...
4 09565353 {'address': {'address_line_1': 'Old Hertford R...
After looking through stackoverflow and several online tutorials I tried using pd.json_normalize(df) but keep getting a AttributeError: 'str' object has no attribute 'values' error. I would like to ultimately export this json file into a csv file.
thank you in advance for any advice!
You can solve that problem by applying the json_normalize just to the data column.
import pandas as pd
import json
df = pd.read_json(r'E:\JSON_data\psc-snapshot-2022-02-06_8of20.txt', encoding='utf8', lines=True)
#json_normalize
df2 = pd.json_normalize(df['data'])
df = pd.concat([df, df2], axis=1)
#output to csv
df.to_csv("./OUTPUT_FILE_NAME")
print(df)
company_number ... name_elements.middle_name
0 4732933 ... NaN
1 10118870 ... John
[2 rows x 24 columns]

Extract the country zip code based on the full country code - DataFrame Python

I have in my data information about place as full post code for example CZ25145. I would like to create new column for this with value CZ. How to do this?
I have this:
import pandas as pd
df = pd.DataFrame({
'CODE_LOAD_PLACE' : ['PL43100', 'CZ25905', 'DE29333', 'DE29384', 'SK92832']
},)
I would like to get it like below:
df = pd.DataFrame({
'CODE_LOAD_PLACE' : ['PL43100', 'CZ25905', 'DE29333', 'DE29384', 'SK92832'],
'COUNTRY_LOAD_PLACE' : ['PL', 'CZ', 'DE', 'DE', 'SK']
},)
I try use .factorize and .groupby but no positive final effect.
Use .str and select the first 2 characters:
df["COUNTRY_LOAD_PLACE"] = df["CODE_LOAD_PLACE"].str[:2]

MemoryError in Python when saving list to dataframe

I was trying to do the following, which is to save a python list that contains json strings into a dataframe in jupyternotebook
df = pd.io.json.json_normalize(mon_list)
df[['gfmsStr','_id']]
But then I received this error:
MemoryError
Then if I run other blocks, they all start to show the memory error. I am wondering what caused this and if there is anyway I can increase the memory to avoid the error.
Thanks!
update:
what's in mon_list is like the following:
mon_list[1]
[{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}},
{'name': {'given': 'Mose', 'family': 'Regner'}},
{'id': 2, 'name': 'Faye Raker'}]
Do you really have a list? Or do you have a JSON file? What format is the "mon_list" variable?
This is how you convert a list to a Dataframe
# import pandas as pd
import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
https://www.geeksforgeeks.org/create-a-pandas-dataframe-from-lists/

Convert list of Dictionaries to a Dataframe [duplicate]

This question already has answers here:
Convert list of dictionaries to a pandas DataFrame
(7 answers)
Closed 4 years ago.
I am facing a basic problem of converting a list of dictionaries obtained from parsing a column with text in json format. Below is the brief snapshot of data:
[{u'PAGE TYPE': u'used-serp.model.brand.city'},
{u'BODY TYPE': u'MPV Cars',
u'ENGINE CAPACITY': u'1461',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Renault Lodgy',
u'OEM NAME': u'Renault',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'PAGE TYPE': u'used-serp.brand.city'},
{u'BODY TYPE': u'SUV Cars',
u'ENGINE CAPACITY': u'2477',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Mitsubishi Pajero',
u'OEM NAME': u'Mitsubishi',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'BODY TYPE': u'Hatchback Cars',
u'ENGINE CAPACITY': u'1198',
u'FUEL TYPE': u' Petrol , Diesel',
u'MODEL NAME': u'Volkswagen Polo',
u'OEM NAME': u'Volkswagen',
u'PAGE TYPE': u'New-ModelPage.GalleryTab'},
Furthermore, the code i am using to parse is detailed below:
stdf_noncookie = []
stdf_noncookiejson = []
for index, row in df_noncookie.iterrows():
try:
loop_data = json.loads(row['attributes'])
stdf_noncookie.append(loop_data)
except ValueError:
loop_nondata = row['attributes']
stdf_noncookiejson.append(loop_nondata)
stdf_noncookie is the list of dictionaries i am trying to convert into a pandas dataframe. 'attributes' is the column with text in json format. I have tried to get some learning from this link, however this was not able to solve my problem. Any suggestion/tips for converting a list of dictionaries to panda dataframe will be helpful.
To convert your list of dicts to a pandas dataframe use the following:
stdf_noncookiejson = pd.DataFrame.from_records(data)
pandas.DataFrame.from_records
DataFrame.from_records (data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)
You can set the index, name the columns etc as you read it in
If youre working with json you can also use the read_json method
stdf_noncookiejson = pd.read_json(data)
pandas.read_json
pandas.read_json (path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True,
keep_default_dates=True, numpy=False, precise_float=False,
date_unit=None, encoding=None, lines=False)
Reference this answer.
Assuming d is your List of Dictionaries, simply use:
df = pd.DataFrame(d)
Simply, you can use the pandas DataFrame constructor.
import pandas as pd
print (pd.DataFrame(data))
Finally found a way to convert a list of dict to panda dataframe. Below is the code:
Method A
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = stdf_noncookie.apply(pd.Series)
Method B
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = pd.DataFrame(stdf_noncookie.tolist())
Method A is much quicker than Method B. I will create another post asking for help on the difference between the two methods. Also, on some datasets Method B is not working.
I was able to do it with a list comprehension. But my problem was that I left my dict's json encoded so they looked like strings.
d = r.zrangebyscore('live-ticks', '-inf', time.time())
dform = [json.loads(i) for i in d]
df = pd.DataFram(dfrom)

Categories

Resources