Convert list of Dictionaries to a Dataframe [duplicate]

Convert list of Dictionaries to a Dataframe [duplicate] - python

This question already has answers here:
Convert list of dictionaries to a pandas DataFrame
(7 answers)
Closed 4 years ago.
I am facing a basic problem of converting a list of dictionaries obtained from parsing a column with text in json format. Below is the brief snapshot of data:
[{u'PAGE TYPE': u'used-serp.model.brand.city'},
{u'BODY TYPE': u'MPV Cars',
u'ENGINE CAPACITY': u'1461',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Renault Lodgy',
u'OEM NAME': u'Renault',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'PAGE TYPE': u'used-serp.brand.city'},
{u'BODY TYPE': u'SUV Cars',
u'ENGINE CAPACITY': u'2477',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Mitsubishi Pajero',
u'OEM NAME': u'Mitsubishi',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'BODY TYPE': u'Hatchback Cars',
u'ENGINE CAPACITY': u'1198',
u'FUEL TYPE': u' Petrol , Diesel',
u'MODEL NAME': u'Volkswagen Polo',
u'OEM NAME': u'Volkswagen',
u'PAGE TYPE': u'New-ModelPage.GalleryTab'},
Furthermore, the code i am using to parse is detailed below:
stdf_noncookie = []
stdf_noncookiejson = []
for index, row in df_noncookie.iterrows():
try:
loop_data = json.loads(row['attributes'])
stdf_noncookie.append(loop_data)
except ValueError:
loop_nondata = row['attributes']
stdf_noncookiejson.append(loop_nondata)
stdf_noncookie is the list of dictionaries i am trying to convert into a pandas dataframe. 'attributes' is the column with text in json format. I have tried to get some learning from this link, however this was not able to solve my problem. Any suggestion/tips for converting a list of dictionaries to panda dataframe will be helpful.

To convert your list of dicts to a pandas dataframe use the following:
stdf_noncookiejson = pd.DataFrame.from_records(data)
pandas.DataFrame.from_records
DataFrame.from_records (data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)
You can set the index, name the columns etc as you read it in
If youre working with json you can also use the read_json method
stdf_noncookiejson = pd.read_json(data)
pandas.read_json
pandas.read_json (path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True,
keep_default_dates=True, numpy=False, precise_float=False,
date_unit=None, encoding=None, lines=False)

Reference this answer.
Assuming d is your List of Dictionaries, simply use:
df = pd.DataFrame(d)

Simply, you can use the pandas DataFrame constructor.
import pandas as pd
print (pd.DataFrame(data))

Finally found a way to convert a list of dict to panda dataframe. Below is the code:
Method A
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = stdf_noncookie.apply(pd.Series)
Method B
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = pd.DataFrame(stdf_noncookie.tolist())
Method A is much quicker than Method B. I will create another post asking for help on the difference between the two methods. Also, on some datasets Method B is not working.

I was able to do it with a list comprehension. But my problem was that I left my dict's json encoded so they looked like strings.
d = r.zrangebyscore('live-ticks', '-inf', time.time())
dform = [json.loads(i) for i in d]
df = pd.DataFram(dfrom)

Related

JSON Fields with Panda DataFrame

I'm using Python (google colb) and I have a json dataframe with some fields like:
[{'ActedBy': ['team'], 'ActedAt': '2022-03-07T22:43:46Z', 'Status': 'Completed', 'LAB': 'No'}]
I need to get the "ActedAt" in order to get the "date" how can I get this?
Thanks!

You have an array of dictionaries. First, grab a dictionary from the array by index, then proceed to get the ActedAt property. Something like this:
json = [{'ActedBy': ['team'], 'ActedAt': '2022-03-07T22:43:46Z', 'Status': 'Completed', 'LAB': 'No'}]
# index into a variable for explicit readability
index = 0
# get the date you want
date = json[index]['ActedAt']
print(date)

python convert text rows to dictionary based on conditional match

I have the below string and need help on how write an if condition in a for loop that check if the row.startswith('name') then take the value and store is in a variable called name. Similarly for dob as well.
Once the for loop completes the output should be a dictionary as below which i can convert to a pandas dataframe.
'name john\n \n\nDOB\n12/08/1984\n\ncurrent company\ngoogle\n'
This is what i have tried so far but do not know how to get the values into a dictionary
for row in lines.split('\n'):
if row.startswith('name'):
name = row.split()[-1]
Final Ouput
data = {"name":"john", "dob": "12/08/1984"}

Try using a list comprehension and split:
s = '''name
john
dob
12/08/1984
current company
google'''
d = dict([i.splitlines() for i in s.split('\n\n')])
print(d)
Output:
{'name': 'john', 'dob': '12/08/1984', 'current company': 'google'}

String to dataframe

I am developing QR code scanner to csv. I am very much new in python. I am getting this output after scanning QR 'Employee ID: 101\nEmployee Name: Abhinav Jha\nDesignation: Student\nDepartment: Mechanical'.
Can anyone help me to covert it to csv. Thanks in advance.

Well, you can convert this incoming string to a python dictionary and then you can easily write the values in a CSV file.
dict_1 = {(item.split(':')[0]).strip():(item.split(':')[1]).strip() for item in s.split('\n')}
print(dict_1)
output -
{'Employee ID': '101',
'Employee Name': 'Abhinav Jha',
'Designation': 'Student',
'Department': 'Mechanical'}
Now, if you wanna access keys then use the .values() function -
print(dict_1.values()) # will print ['101', 'Abhinav Jha', 'Student', 'Mechanical']
I'm assuming you wanna write these values in a CSV file.

MemoryError in Python when saving list to dataframe

I was trying to do the following, which is to save a python list that contains json strings into a dataframe in jupyternotebook
df = pd.io.json.json_normalize(mon_list)
df[['gfmsStr','_id']]
But then I received this error:
MemoryError
Then if I run other blocks, they all start to show the memory error. I am wondering what caused this and if there is anyway I can increase the memory to avoid the error.
Thanks!
update:
what's in mon_list is like the following:
mon_list[1]
[{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}},
{'name': {'given': 'Mose', 'family': 'Regner'}},
{'id': 2, 'name': 'Faye Raker'}]

Do you really have a list? Or do you have a JSON file? What format is the "mon_list" variable?
This is how you convert a list to a Dataframe
# import pandas as pd
import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
https://www.geeksforgeeks.org/create-a-pandas-dataframe-from-lists/

Flatten nested JSON and concatenate to dataframe using pandas

I have searched for a lot of similar topics online, but I have not found the solution yet.
My pandas dataframe looks like this:
index FOR
0 [{'id': '2766', 'name': '0803 Computer Softwar...
1 [{'id': '2766', 'name': '0803 Computer Softwar...
2 [{'id': '2766', 'name': '0803 Computer Softwar...
3 [{'id': '2766', 'name': '0803 Computer Softwar...
4 [{'id': '2766', 'name': '0803 Computer Softwar...
And I would like to flatten all 4 rows to become like the following dataframe while below is just the result for the first row:
index id name
0 2766 0803 Computer Software
I found a similar solution here. Unfortunately, I got a "TypeError" as the following:
TypeError: the JSON object must be str, bytes or bytearray, not 'list'
My code was:
dfs = []
for i in test['FOR']:
data = json.loads(i)
dfx = pd.json_normalize(data)
dfs.append(dfx)
df = pd.concat(dfs).reset_index(inplace = True)
print(df)
Would anyone can help me here?
Thank you very much!

try using literal_eval from the ast standard lib.
from ast import literal_eval
df_flattened = pd.json_normalize(df['FOR'].map(literal_eval))
then drop duplicates.
print(df_flattened.drop_duplicates())
id name
0 2766 0803 Computer Software

After a few weeks not touching related works,
I encountered another similar case and
I think I have got the solution so far for this case.
Please feel free to correct me or provide any other ideas.
I really appreciated all the helps and all the generous support!
chuck = []
for i in range(len(test)):
chuck.append(json_normalize(test.iloc[i,:]['FOR']))
test_df = pd.concat(chuck)
And then drop duplicated columns for the test_df

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert list of Dictionaries to a Dataframe [duplicate] - python

Reference this answer. Assuming d is your List of Dictionaries, simply use: df = pd.DataFrame(d)

Simply, you can use the pandas DataFrame constructor. import pandas as pd print (pd.DataFrame(data))

I was able to do it with a list comprehension. But my problem was that I left my dict's json encoded so they looked like strings. d = r.zrangebyscore('live-ticks', '-inf', time.time()) dform = [json.loads(i) for i in d] df = pd.DataFram(dfrom)

Related

JSON Fields with Panda DataFrame

python convert text rows to dictionary based on conditional match

String to dataframe

MemoryError in Python when saving list to dataframe

Flatten nested JSON and concatenate to dataframe using pandas

Categories

Resources