I have a sample dataframe as below
I want this dataframe converted to this below format in python so I can pass it into dtype
{
'FirstName':'string',
'LastName':'string',
'Department':'integer',
'EmployeeID':'string', }
Could anyone please let me know how this can be done.
To note above: I need the exact string {'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'} from the exact dataframe.
The dataframe has list of primary key names and its datatype. I want to pass this primary_key and datatype combination into concat_df.to_csv(csv_buffer, sep=",", index=False, dtype = {'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'})
dict/zip the two series:
import pandas as pd
data = pd.DataFrame({
'Column_Name': ['FirstName', 'LastName', 'Department', 'EmployeeID'],
'Datatype': ['string', 'string', 'integer', 'string'],
})
mapping = dict(zip(data['Column_Name'], data['Datatype']))
print(mapping)
prints out
{'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'}
use to record which is much more handy.
print(dict(df.to_records(index=False)))
Should Gives #
{'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'}
Edit :
If you want keys alone then
d = dict(df.to_records(index=False))
print(list(d.keys()))
should Gives #
['FirstName', 'LastName', 'Department', 'EmployeeID']
You can do an easy dict comprehension with your data:
Input data:
data = pd.DataFrame({'Column_Name' : ['FirstName', 'LastName', 'Department'], 'Datatype' : ['Jane', 'Doe', 666]})
Dict comprehension:
{n[0]:n[1] for n in data.to_numpy()}
This will give you:
{'FirstName': 'Jane', 'LastName': 'Doe', 'Department': '666'}
There are for sure other ways, e.g. using the pandas to_dict function, but I am not very familiar with this.
Edit:
But keep in mind, a dictionary needs unique values.
Your categories (first, lastname) looks like general categories. This here will only work for a single person, otherwise you have multiple keys.
Related
I have this snapshot of my dataset
test={'data': [{'name': 'john',
'insights': {'data': [{'account_id': '123',
'test_id': '456',
'date_start': '2022-12-31',
'date_stop': '2023-01-29',
'impressions': '4070',
'spend': '36.14'}],
'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
'status': 'ACTIVE',
'id': '789'},
{'name': 'jack', 'status': 'PAUSED', 'id': '420'}]
}
I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.
When I tried json_normalize(), it raises an error because some of the keys are missing, when 'status':'PAUSED'. Is there a way to remove the values when the keys are missing from the list or another way of using json_normalize()? I tried errors='ignore' but it doesnt work as well.
If we have the following list of dictionary:
df=[{'answers': ['Yes', 'No'], 'question': 'status', 'type': 'string'}]
How can I split the above list into two dictionaries with only single value for answers key:
df=[{'answers': ['Yes'], 'question': 'status', 'type': 'string'}, {'answers': ['No'], 'question': 'status', 'type': 'string'}]
I think you mean you work with DataFrames (hence your variable name df). In that case pandas has already a function for this specific use case: explode:
import pandas as pd
df = pd.DataFrame([{'answers': ['Yes', 'No'], 'question': 'status', 'type': 'string'}])
print(df.explode('answers'))
Output:
answers question type
0 Yes status string
0 No status string
Edit: you can easily get back to a dictionary form with to_dict:
df = df.explode('answers')
print(df.to_dict(orient='records'))
Output:
[{'answers': 'Yes', 'question': 'status', 'type': 'string'}, {'answers': 'No', 'question': 'status', 'type': 'string'}]
You can try this:
import copy
df=[{'answers': ['Yes', 'No'], 'question': 'status', 'type': 'string'}]
df2=copy.deepcopy(df)
content = df[0]
for x,y in content.items():
if (len(y) > 1) and isinstance(y,list):
df[0][x]=y[0]
df2[0][x]=y[1]
# print(df)
print(df2)
print(df)
however this works only for 2 lists not more
I am trying to extract a seat of data from a column that is of type pandas.core.series.Series.
I tried
df['col1'] = df['details'].astype(str).str.findall(r'name\=(.*?),')
but the above returns null
Given below is how the data looks like in column df['details']
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
Trying to extract value corresponding to name field
Expected output : Name1
try this: simple, change according to your need.
import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
print(df['name'][0])
#or if DataFrame inside a column itself
df['details'][0]['name']
NOTE: as you mentioned details is one of the dataset that you have in the existing dataset
import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
#Name column
print(df.name)
#Find specific values in Series
indeces = df.name.str.find("Name") #Returns indeces of such values
df.iloc[index] # Returns all columns that fields name contain "Name"
df.name.iloc[index] # Returns all values from column name, which contain "Name"
Hope, this example will help you.
EDIT:
Your data frame has column 'details', which contain a dict {'id':101, ...}
>>> df['details']
0 {'id': 101, 'name': 'Name1', 'state': 'active'...
And you want to get value from field 'name', so just try:
>>> df['details'][0]['name']
'Name1'
The structure in your series is a dictionary.
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
You can just point to the element 'name' from that dict with the following command
df['details'][0]['name']
If the name could be different you can get the list of the keys in the dictionary and apply your regex on that list to get your field's name.
Hope that it can help you.
I am pulling json data from an API and have a number of columns in my dataframe that contain dictionaries. These dictionaries are written so that the id and the value are two separate entries in the dictionary like so:
{'id': 'AnnualUsage', 'value': '13071'}
Some of the rows for these columns contain only one dictionary entry like shown above, but others can contain up to 7:
[{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
When I attempt to break this dictionary down into separate column attributes
CTG_df2 = pd.concat([CTG_df['id'], CTG_df['applicationUDFs'].apply(pd.Series)], axis=1)
I end up with columns in a dataframe each containing a dictionary of the above entry i.e.
{'id': 'AnnualUsageDE', 'value': '13071'}
Is there a way for me to convert my dictionary values into new key-value pairs? For instance I would like to convert from:
{'id': 'AnnualUsageDE', 'value': '13071'}
to
{'AnnualUsageDE': '13071'}
If this is possible I will then be able to create new columns from these attributes.
You can do a dict comprehension. From your list of dicts, compose a new dict where the key is the id of each element and the value is the value of each element.
original = [{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
newdict = {subdict['id']: subdict['value'] for subdict in original}
print(newdict)
# {'AnnualUsage': '13071',
# 'Number': '06860702',
# 'NumberOfMe': '3',
# 'Prem': '960002',
# 'ProjectID': '0039',
# 'Region': 'CHR',
# 'Tariff': 'Multiple',
# 'TestId': 'Z13753'}
You can iterate through the values and set each of them to the dictionary value:
newdict = dict()
for x in original:
newdict[x["id"]] = x["value"]
I have a nested dictionary with the following structure. I am trying to convert it to pandas dataframe, however I have problems to split the 'mappings' dictionary to have it in separate columns.
{'16':
{'label': 't1',
'prefLab': 'name',
'altLabel': ['test1', 'test3'],
'map': [{'id': '16', 'idMap': {'ciID': 16, 'map3': '033441'}}]
},
'17':
{'label': 't2',
'prefLab': 'name2',
'broader': ['18'],
'altLabel': ['test2'],
'map': [{'id': '17', 'idMap': {'ciID': 17, 'map1': 1006558, 'map2': 1144}}]
}
}
ideal outcome would be a dataframe with the following structure.
label prefLab broader altLab ciID, map1, map2, map3 ...
16
17
Try with this: assuming your json format name is "data" then
train = pd.DataFrame.from_dict(data, orient='index')