If we have the following list of dictionary:
df=[{'answers': ['Yes', 'No'], 'question': 'status', 'type': 'string'}]
How can I split the above list into two dictionaries with only single value for answers key:
df=[{'answers': ['Yes'], 'question': 'status', 'type': 'string'}, {'answers': ['No'], 'question': 'status', 'type': 'string'}]
I think you mean you work with DataFrames (hence your variable name df). In that case pandas has already a function for this specific use case: explode:
import pandas as pd
df = pd.DataFrame([{'answers': ['Yes', 'No'], 'question': 'status', 'type': 'string'}])
print(df.explode('answers'))
Output:
answers question type
0 Yes status string
0 No status string
Edit: you can easily get back to a dictionary form with to_dict:
df = df.explode('answers')
print(df.to_dict(orient='records'))
Output:
[{'answers': 'Yes', 'question': 'status', 'type': 'string'}, {'answers': 'No', 'question': 'status', 'type': 'string'}]
You can try this:
import copy
df=[{'answers': ['Yes', 'No'], 'question': 'status', 'type': 'string'}]
df2=copy.deepcopy(df)
content = df[0]
for x,y in content.items():
if (len(y) > 1) and isinstance(y,list):
df[0][x]=y[0]
df2[0][x]=y[1]
# print(df)
print(df2)
print(df)
however this works only for 2 lists not more
Related
I have this snapshot of my dataset
test={'data': [{'name': 'john',
'insights': {'data': [{'account_id': '123',
'test_id': '456',
'date_start': '2022-12-31',
'date_stop': '2023-01-29',
'impressions': '4070',
'spend': '36.14'}],
'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
'status': 'ACTIVE',
'id': '789'},
{'name': 'jack', 'status': 'PAUSED', 'id': '420'}]
}
I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.
When I tried json_normalize(), it raises an error because some of the keys are missing, when 'status':'PAUSED'. Is there a way to remove the values when the keys are missing from the list or another way of using json_normalize()? I tried errors='ignore' but it doesnt work as well.
I have a sample dataframe as below
I want this dataframe converted to this below format in python so I can pass it into dtype
{
'FirstName':'string',
'LastName':'string',
'Department':'integer',
'EmployeeID':'string', }
Could anyone please let me know how this can be done.
To note above: I need the exact string {'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'} from the exact dataframe.
The dataframe has list of primary key names and its datatype. I want to pass this primary_key and datatype combination into concat_df.to_csv(csv_buffer, sep=",", index=False, dtype = {'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'})
dict/zip the two series:
import pandas as pd
data = pd.DataFrame({
'Column_Name': ['FirstName', 'LastName', 'Department', 'EmployeeID'],
'Datatype': ['string', 'string', 'integer', 'string'],
})
mapping = dict(zip(data['Column_Name'], data['Datatype']))
print(mapping)
prints out
{'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'}
use to record which is much more handy.
print(dict(df.to_records(index=False)))
Should Gives #
{'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'}
Edit :
If you want keys alone then
d = dict(df.to_records(index=False))
print(list(d.keys()))
should Gives #
['FirstName', 'LastName', 'Department', 'EmployeeID']
You can do an easy dict comprehension with your data:
Input data:
data = pd.DataFrame({'Column_Name' : ['FirstName', 'LastName', 'Department'], 'Datatype' : ['Jane', 'Doe', 666]})
Dict comprehension:
{n[0]:n[1] for n in data.to_numpy()}
This will give you:
{'FirstName': 'Jane', 'LastName': 'Doe', 'Department': '666'}
There are for sure other ways, e.g. using the pandas to_dict function, but I am not very familiar with this.
Edit:
But keep in mind, a dictionary needs unique values.
Your categories (first, lastname) looks like general categories. This here will only work for a single person, otherwise you have multiple keys.
I am trying to extract a seat of data from a column that is of type pandas.core.series.Series.
I tried
df['col1'] = df['details'].astype(str).str.findall(r'name\=(.*?),')
but the above returns null
Given below is how the data looks like in column df['details']
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
Trying to extract value corresponding to name field
Expected output : Name1
try this: simple, change according to your need.
import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
print(df['name'][0])
#or if DataFrame inside a column itself
df['details'][0]['name']
NOTE: as you mentioned details is one of the dataset that you have in the existing dataset
import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
#Name column
print(df.name)
#Find specific values in Series
indeces = df.name.str.find("Name") #Returns indeces of such values
df.iloc[index] # Returns all columns that fields name contain "Name"
df.name.iloc[index] # Returns all values from column name, which contain "Name"
Hope, this example will help you.
EDIT:
Your data frame has column 'details', which contain a dict {'id':101, ...}
>>> df['details']
0 {'id': 101, 'name': 'Name1', 'state': 'active'...
And you want to get value from field 'name', so just try:
>>> df['details'][0]['name']
'Name1'
The structure in your series is a dictionary.
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
You can just point to the element 'name' from that dict with the following command
df['details'][0]['name']
If the name could be different you can get the list of the keys in the dictionary and apply your regex on that list to get your field's name.
Hope that it can help you.
I am pulling json data from an API and have a number of columns in my dataframe that contain dictionaries. These dictionaries are written so that the id and the value are two separate entries in the dictionary like so:
{'id': 'AnnualUsage', 'value': '13071'}
Some of the rows for these columns contain only one dictionary entry like shown above, but others can contain up to 7:
[{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
When I attempt to break this dictionary down into separate column attributes
CTG_df2 = pd.concat([CTG_df['id'], CTG_df['applicationUDFs'].apply(pd.Series)], axis=1)
I end up with columns in a dataframe each containing a dictionary of the above entry i.e.
{'id': 'AnnualUsageDE', 'value': '13071'}
Is there a way for me to convert my dictionary values into new key-value pairs? For instance I would like to convert from:
{'id': 'AnnualUsageDE', 'value': '13071'}
to
{'AnnualUsageDE': '13071'}
If this is possible I will then be able to create new columns from these attributes.
You can do a dict comprehension. From your list of dicts, compose a new dict where the key is the id of each element and the value is the value of each element.
original = [{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
newdict = {subdict['id']: subdict['value'] for subdict in original}
print(newdict)
# {'AnnualUsage': '13071',
# 'Number': '06860702',
# 'NumberOfMe': '3',
# 'Prem': '960002',
# 'ProjectID': '0039',
# 'Region': 'CHR',
# 'Tariff': 'Multiple',
# 'TestId': 'Z13753'}
You can iterate through the values and set each of them to the dictionary value:
newdict = dict()
for x in original:
newdict[x["id"]] = x["value"]
I am extracting some data from an API and having challenges transforming it into a proper dataframe.
The resulting DataFrame df is arranged as such:
Index Column
0 {'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
1 {'different-email#email.com': [{'action': 'data', 'date': 'date'}]}
I am trying to split the emails into one column and the list into a separate column:
Index Column1 Column2
0 email#email.com [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
Ideally, each 'action'/'date' would have it's own separate row, however I believe I can do the further unpacking myself.
After looking around I tried/failed lots of solutions such as:
df.apply(pd.Series) # does nothing
pd.DataFrame(df['column'].values.tolist()) # makes each dictionary key as a separate colum
where most of the rows are NaN except one which has the pair value
Edit:
As many of the questions asked the initial format of the data in the API, it's a list of dictionaries:
[{'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]},{'different-email#email.com': [{'action': 'data', 'date': 'date'}]}]
Thanks
One naive way of doing this is as below:
inp = [{'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
, {'different-email#email.com': [{'action': 'data', 'date': 'date'}]}]
index = 0
df = pd.DataFrame()
for each in inp: # iterate through the list of dicts
for k, v in each.items(): #take each key value pairs
for eachv in v: #the values being a list, iterate through each
print (str(eachv))
df.set_value(index,'Column1',k)
df.set_value(index,'Column2',str(eachv))
index += 1
I am sure there might be a better way of writing this. Hope this helps :)
Assuming you have already read it as dataframe, you can use following -
import ast
df['Column'] = df['Column'].apply(lambda x: ast.literal_eval(x))
df['email'] = df['Column'].apply(lambda x: x.keys()[0])
df['value'] = df['Column'].apply(lambda x: x.values()[0])