This question already has answers here:
Convert list of dictionaries to a pandas DataFrame
(7 answers)
Closed 4 years ago.
I am facing a basic problem of converting a list of dictionaries obtained from parsing a column with text in json format. Below is the brief snapshot of data:
[{u'PAGE TYPE': u'used-serp.model.brand.city'},
{u'BODY TYPE': u'MPV Cars',
u'ENGINE CAPACITY': u'1461',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Renault Lodgy',
u'OEM NAME': u'Renault',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'PAGE TYPE': u'used-serp.brand.city'},
{u'BODY TYPE': u'SUV Cars',
u'ENGINE CAPACITY': u'2477',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Mitsubishi Pajero',
u'OEM NAME': u'Mitsubishi',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'BODY TYPE': u'Hatchback Cars',
u'ENGINE CAPACITY': u'1198',
u'FUEL TYPE': u' Petrol , Diesel',
u'MODEL NAME': u'Volkswagen Polo',
u'OEM NAME': u'Volkswagen',
u'PAGE TYPE': u'New-ModelPage.GalleryTab'},
Furthermore, the code i am using to parse is detailed below:
stdf_noncookie = []
stdf_noncookiejson = []
for index, row in df_noncookie.iterrows():
try:
loop_data = json.loads(row['attributes'])
stdf_noncookie.append(loop_data)
except ValueError:
loop_nondata = row['attributes']
stdf_noncookiejson.append(loop_nondata)
stdf_noncookie is the list of dictionaries i am trying to convert into a pandas dataframe. 'attributes' is the column with text in json format. I have tried to get some learning from this link, however this was not able to solve my problem. Any suggestion/tips for converting a list of dictionaries to panda dataframe will be helpful.
To convert your list of dicts to a pandas dataframe use the following:
stdf_noncookiejson = pd.DataFrame.from_records(data)
pandas.DataFrame.from_records
DataFrame.from_records (data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)
You can set the index, name the columns etc as you read it in
If youre working with json you can also use the read_json method
stdf_noncookiejson = pd.read_json(data)
pandas.read_json
pandas.read_json (path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True,
keep_default_dates=True, numpy=False, precise_float=False,
date_unit=None, encoding=None, lines=False)
Reference this answer.
Assuming d is your List of Dictionaries, simply use:
df = pd.DataFrame(d)
Simply, you can use the pandas DataFrame constructor.
import pandas as pd
print (pd.DataFrame(data))
Finally found a way to convert a list of dict to panda dataframe. Below is the code:
Method A
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = stdf_noncookie.apply(pd.Series)
Method B
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = pd.DataFrame(stdf_noncookie.tolist())
Method A is much quicker than Method B. I will create another post asking for help on the difference between the two methods. Also, on some datasets Method B is not working.
I was able to do it with a list comprehension. But my problem was that I left my dict's json encoded so they looked like strings.
d = r.zrangebyscore('live-ticks', '-inf', time.time())
dform = [json.loads(i) for i in d]
df = pd.DataFram(dfrom)
Related
I'm using Python (google colb) and I have a json dataframe with some fields like:
[{'ActedBy': ['team'], 'ActedAt': '2022-03-07T22:43:46Z', 'Status': 'Completed', 'LAB': 'No'}]
I need to get the "ActedAt" in order to get the "date" how can I get this?
Thanks!
You have an array of dictionaries. First, grab a dictionary from the array by index, then proceed to get the ActedAt property. Something like this:
json = [{'ActedBy': ['team'], 'ActedAt': '2022-03-07T22:43:46Z', 'Status': 'Completed', 'LAB': 'No'}]
# index into a variable for explicit readability
index = 0
# get the date you want
date = json[index]['ActedAt']
print(date)
I have the below string and need help on how write an if condition in a for loop that check if the row.startswith('name') then take the value and store is in a variable called name. Similarly for dob as well.
Once the for loop completes the output should be a dictionary as below which i can convert to a pandas dataframe.
'name john\n \n\nDOB\n12/08/1984\n\ncurrent company\ngoogle\n'
This is what i have tried so far but do not know how to get the values into a dictionary
for row in lines.split('\n'):
if row.startswith('name'):
name = row.split()[-1]
Final Ouput
data = {"name":"john", "dob": "12/08/1984"}
Try using a list comprehension and split:
s = '''name
john
dob
12/08/1984
current company
google'''
d = dict([i.splitlines() for i in s.split('\n\n')])
print(d)
Output:
{'name': 'john', 'dob': '12/08/1984', 'current company': 'google'}
I am developing QR code scanner to csv. I am very much new in python. I am getting this output after scanning QR 'Employee ID: 101\nEmployee Name: Abhinav Jha\nDesignation: Student\nDepartment: Mechanical'.
Can anyone help me to covert it to csv. Thanks in advance.
Well, you can convert this incoming string to a python dictionary and then you can easily write the values in a CSV file.
dict_1 = {(item.split(':')[0]).strip():(item.split(':')[1]).strip() for item in s.split('\n')}
print(dict_1)
output -
{'Employee ID': '101',
'Employee Name': 'Abhinav Jha',
'Designation': 'Student',
'Department': 'Mechanical'}
Now, if you wanna access keys then use the .values() function -
print(dict_1.values()) # will print ['101', 'Abhinav Jha', 'Student', 'Mechanical']
I'm assuming you wanna write these values in a CSV file.
I was trying to do the following, which is to save a python list that contains json strings into a dataframe in jupyternotebook
df = pd.io.json.json_normalize(mon_list)
df[['gfmsStr','_id']]
But then I received this error:
MemoryError
Then if I run other blocks, they all start to show the memory error. I am wondering what caused this and if there is anyway I can increase the memory to avoid the error.
Thanks!
update:
what's in mon_list is like the following:
mon_list[1]
[{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}},
{'name': {'given': 'Mose', 'family': 'Regner'}},
{'id': 2, 'name': 'Faye Raker'}]
Do you really have a list? Or do you have a JSON file? What format is the "mon_list" variable?
This is how you convert a list to a Dataframe
# import pandas as pd
import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
https://www.geeksforgeeks.org/create-a-pandas-dataframe-from-lists/
I have searched for a lot of similar topics online, but I have not found the solution yet.
My pandas dataframe looks like this:
index FOR
0 [{'id': '2766', 'name': '0803 Computer Softwar...
1 [{'id': '2766', 'name': '0803 Computer Softwar...
2 [{'id': '2766', 'name': '0803 Computer Softwar...
3 [{'id': '2766', 'name': '0803 Computer Softwar...
4 [{'id': '2766', 'name': '0803 Computer Softwar...
And I would like to flatten all 4 rows to become like the following dataframe while below is just the result for the first row:
index id name
0 2766 0803 Computer Software
I found a similar solution here. Unfortunately, I got a "TypeError" as the following:
TypeError: the JSON object must be str, bytes or bytearray, not 'list'
My code was:
dfs = []
for i in test['FOR']:
data = json.loads(i)
dfx = pd.json_normalize(data)
dfs.append(dfx)
df = pd.concat(dfs).reset_index(inplace = True)
print(df)
Would anyone can help me here?
Thank you very much!
try using literal_eval from the ast standard lib.
from ast import literal_eval
df_flattened = pd.json_normalize(df['FOR'].map(literal_eval))
then drop duplicates.
print(df_flattened.drop_duplicates())
id name
0 2766 0803 Computer Software
After a few weeks not touching related works,
I encountered another similar case and
I think I have got the solution so far for this case.
Please feel free to correct me or provide any other ideas.
I really appreciated all the helps and all the generous support!
chuck = []
for i in range(len(test)):
chuck.append(json_normalize(test.iloc[i,:]['FOR']))
test_df = pd.concat(chuck)
And then drop duplicated columns for the test_df