I have a dataframe df
id price date zipcode
u734 8923944 2017-01-05 AERIU87
uh72 9084582 2017-07-28 BJDHEU3
u029 299433 2017-09-31 038ZJKE
I want to create a dictionary with the following structure
{'id': xxx, 'data': {'price': xxx, 'date': xxx, 'zipcode': xxx}}
What I have done so far
ids = df['id']
prices = df['price']
dates = df['date']
zips = df['zipcode']
d = {'id':idx, 'data':{'price':p, 'date':d, 'zipcode':z} for idx,p,d,z in zip(ids,prices,dates,zips)}
>>> SyntaxError: invalid syntax
but I get the error above.
What would be the correct way to do this, using either
list comprehension
OR
pandas .to_dict()
bonus points: what is the complexity of the algorithm, and is there a more efficient way to do this?
I'd suggest the list comprehension.
v = df.pop('id')
data = [
{'id' : i, 'data' : j}
for i, j in zip(v, df.to_dict(orient='records'))
]
Or a compact version,
data = [dict(id=i, data=j) for i, j in zip(df.pop('id'), df.to_dict(orient='r'))]
Note that, if you're popping id inside the expression, it has to be the first argument to zip.
print(data)
[{'data': {'date': '2017-09-31',
'price': 299433,
'zipcode': '038ZJKE'},
'id': 'u029'},
{'data': {'date': '2017-01-05',
'price': 8923944,
'zipcode': 'AERIU87'},
'id': 'u734'},
{'data': {'date': '2017-07-28',
'price': 9084582,
'zipcode': 'BJDHEU3'},
'id': 'uh72'}]
Related
I am getting output in this format
But I want output in this format
Any Help will be appreciated
Thankyou in Advance
I've tried to convert my data into an array but it doesn't work as i want
This is my output :
{'date': '2021-12-30 17:31:05.865139', 'sub_data': [{'key': 'day0', 'value': 255}, {'key': 'day1', 'value': 1}, {'key': 'day3', 'value': 8}, {'key': 'day7', 'value': 2}, {'key': 'day15', 'value': 3}, {'key': 'day30', 'value': 5}]}
{'date': '2021-12-31 17:31:05.907697', 'sub_data': [{'key': 'day0', 'value': 222}, {'key': 'day1', 'value': 1}, {'key': 'day3', 'value': 0}, {'key': 'day7', 'value': 0}, {'key': 'day15', 'value': 1}, {'key': 'day30', 'value': 0}]}]
There are a few ways you can generate a pandas dataframe the way you want. The output data you provide is very nested and you have to pull out data. A problem is, that in the sub-set data the dictionary keys are called 'key" and not the actual name. With a custom function you can prepare the data as needed:
Option I:
def generate_dataframe(dataset):
# Init empty DataFrame - bad practice
df_result = pd.DataFrame()
for data in dataset:
dataframe_row = {}
# Convert date
date_time_obj = datetime.strptime(data['date'], '%Y-%m-%d %H:%M:%S.%f')
dataframe_row['date'] = date_time_obj.strftime("%d%b%y")
for vals in data['sub_data']:
dataframe_row[vals['key']] = vals['value']
df_result = df_result.append(dataframe_row, ignore_index=True)
return df_result
dataset =[output_I,output_II]
df = generate_dataframe(dataset)
Option II:
Extract data and transpose sub data
def process_sub_data(data):
# convert subdate to dataframe first
df_data = pd.DataFrame(data['sub_data'])
# Transpose dataframe
df_data = df_data.T
# Make first row to column
df_data.columns = df_data.iloc[0]
df_data = df_data.iloc[1:].reset_index(drop=True)
Option III
You can try to format nested data with
df_res = pd.json_normalize(data, max_level=2)
This will not work properly as your column names (day1, ... day30) are not the keys of the dict
Hope I could help :)
I have a nested JSON like below. I want to convert it into a pandas dataframe. As part of that, I also need to parse the weight value only. I don't need the unit.
I also want the number values converted from string to numeric.
Any help would be appreciated. I'm relatively new to python. Thank you.
JSON Example:
{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'},
'gender': 'male'}
Sample output below:
id name weight gender
123 joe 100 male
use " from pandas.io.json import json_normalize ".
id name weight.number weight.unit gender
123 joe 100 lbs male
if you want to discard the weight unit, just flatten the json:
temp = {'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}
temp['weight'] = temp['weight']['number']
then turn it into a dataframe:
pd.DataFrame(temp)
Something like this should do the trick:
json_data = [{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}]
# convert the data to a DataFrame
df = pd.DataFrame.from_records(json_data)
# conver id to an int
df['id'] = df['id'].apply(int)
# get the 'number' field of weight and convert it to an int
df['weight'] = df['weight'].apply(lambda x: int(x['number']))
df
I have a set:
CompanyList={'Apple','LG','Samsung'}
and a pandas DataFrame:
sales=[{'name':'Samsung Korea','model':'S1'},
{'name':'Samsung Vienam','model':'J1'},
{'name':'LG America','model':'L1'}
]
df=pd.DataFrame(sales)
I'd like to go through the CompanyList, then generate new Sub-DataFrame from 'sales' DataFrame. The expected results are
dataSamsung = [{'name': 'Samsung', 'model': 'S1'},{'name': 'Samsung', 'model': 'J1'}]
dataLG = [{'name': 'LG', 'model': 'L1'}]
I tried:
customer={}
for i in companyList:
customer[i] = df[df.name.str.contains('i')]
but this gives me a wrong answer. Could you help me to fix this case?
Try apply:
df['name']=df['name'].apply(lambda x: [i for i in CompanyList if i in x][0])
apply with list comprehension.
I am extracting some data from an API and having challenges transforming it into a proper dataframe.
The resulting DataFrame df is arranged as such:
Index Column
0 {'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
1 {'different-email#email.com': [{'action': 'data', 'date': 'date'}]}
I am trying to split the emails into one column and the list into a separate column:
Index Column1 Column2
0 email#email.com [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
Ideally, each 'action'/'date' would have it's own separate row, however I believe I can do the further unpacking myself.
After looking around I tried/failed lots of solutions such as:
df.apply(pd.Series) # does nothing
pd.DataFrame(df['column'].values.tolist()) # makes each dictionary key as a separate colum
where most of the rows are NaN except one which has the pair value
Edit:
As many of the questions asked the initial format of the data in the API, it's a list of dictionaries:
[{'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]},{'different-email#email.com': [{'action': 'data', 'date': 'date'}]}]
Thanks
One naive way of doing this is as below:
inp = [{'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
, {'different-email#email.com': [{'action': 'data', 'date': 'date'}]}]
index = 0
df = pd.DataFrame()
for each in inp: # iterate through the list of dicts
for k, v in each.items(): #take each key value pairs
for eachv in v: #the values being a list, iterate through each
print (str(eachv))
df.set_value(index,'Column1',k)
df.set_value(index,'Column2',str(eachv))
index += 1
I am sure there might be a better way of writing this. Hope this helps :)
Assuming you have already read it as dataframe, you can use following -
import ast
df['Column'] = df['Column'].apply(lambda x: ast.literal_eval(x))
df['email'] = df['Column'].apply(lambda x: x.keys()[0])
df['value'] = df['Column'].apply(lambda x: x.values()[0])
I am very new to python programming and have yet to buy a textbook on the matter (I am buying one from the store or Amazon today). In the meantime, can you help me with the following problem I have encountered?
I have an list of dictionary objects like this:
stock = [
{ 'date': '2012', 'amount': '1.45', 'type': 'one'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'}
]
I would like to sort the list by the amount date column and then by the amount column so that the sorted list looks like this:
stock = [
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2012', 'amount': '1.45', 'type': 'one'}
]
I now think I need to use sorted() but as a beginner I am having difficulties understanding to concepts I see.
I tried this:
from operator import itemgetter
all_amounts = itemgetter("amount")
stock.sort(key = all_amounts)
but this resulted in an list that was sorted alphanumerically rather than numerically.
Can someone please tell me how to achieve this seemingly simple sort? Thank-you!
Your sorting condition is too complicated for an operator.itemgetter. You will have to use a lambda function:
stock.sort(key=lambda x: (int(x['date']), float(x['amount'])))
or
all_amounts = lambda x: (int(x['date']), float(x['amount']))
stock.sort(key=all_amounts)
Start by converting your data into a proper format:
stock = [
{ 'date': int(x['date']), 'amount': float(x['amount']), 'type': x['type']}
for x in stock
]
Now stock.sort(key=all_amounts) will return correct results.
As you appear to be new in programming, here's a word of general advice if I may:
Proper data structure is 90 percent of success. Do not try to work around broken data by writing more code. Create a structure adequate to your task and write as less code as possible.
You can also use the fact that python's sort is stable:
stock.sort(key=lambda x: int(x["amount"]))
stock.sort(key=lambda x: int(x["date"]))
Since the items with the same key keep their relative positions when sorting (they're never swapped), you can build up a complicated sort by sorting multiple times.