I want to create a new data frame using for loop.
for (name, series) in Quantitative.iteritems():
data = {'Name': pd.Series(name), 'Count': pd.Series(Quantitative[name].size),
'% Miss.': pd.Series((sum(Quantitative[name].isnull().values.ravel()) / Quantitative[name].size) * 100),
'Card.': pd.Series(Quantitative[name].unique().size), 'Min': pd.Series(Quantitative[name].min()),
'1st Qrt.': pd.Series(Quantitative[name].quantile(0.25)), 'Mean': pd.Series(Quantitative[name].mean()),
'Median': pd.Series(Quantitative[name].median()), '3rd Qrt.': pd.Series(Quantitative[name].quantile(0.75)),
'Max': pd.Series(Quantitative[name].max()), 'Std.': pd.Series(Quantitative[name].std())}
dt = pd.DataFrame(data)
print(pd.DataFrame(dt))
However, it creates multiple dictionaries. How can I merge them together?
You don't need to create dataframe for each item in dict and then merge them. Use list comprehension to create rows list and create dataframe from it:
rows = [[
name, series.size, (sum(series.isnull().values.ravel()) / series.size) * 100,
series.unique().size, series.min(), series.quantile(0.25), series.mean(),
series.median(), series.quantile(0.75), series.max(), series.std()
] for name, series in Quantitative.iteritems()]
dt = pd.DataFrame(
rows,
columns=['Name', 'Count', '% Miss.', 'Card.', 'Min', '1st Qrt.',
'Mean', 'Median', '3rd Qrt.', 'Max', 'Std.'])
print(dt)
Related
I have this dataframe df where
>>> df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/11'],
'Event':['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost':[10000, 5000, 15000, 2000],
'Name':['Roy', 'Abraham', 'Blythe', 'Sophia'],
'Age':['20', '10', '13', '17']})
I want to determine the column index with the corresponding name. I tried it with this:
>>> list(df.columns)
But the solution above only returns the column names without index numbers.
How can I code it so that it would return the column names and the corresponding index for that column? Like This:
0 Date
1 Event
2 Cost
3 Name
4 Age
Simpliest is add pd.Series constructor:
pd.Series(list(df.columns))
Or convert columns to Series and create default index:
df.columns.to_series().reset_index(drop=True)
Or:
df.columns.to_series(index=False)
You can use loop like this:
myList = list(df.columns)
index = 0
for value in myList:
print(index, value)
index += 1
A nice short way to get a dictionary:
d = dict(enumerate(df))
output: {0: 'Date', 1: 'Event', 2: 'Cost', 3: 'Name', 4: 'Age'}
For a Series, pd.Series(list(df)) is sufficient as iteration occurs directly on the column names
In addition to using enumerate, this also can get a numbers in order using zip, as follows:
import pandas as pd
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/11'],
'Event':['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost':[10000, 5000, 15000, 2000],
'Name':['Roy', 'Abraham', 'Blythe', 'Sophia'],
'Age':['20', '10', '13', '17']})
result = list(zip([i for i in range(len(df.columns))], df.columns.values,))
for r in result:
print(r)
#(0, 'Date')
#(1, 'Event')
#(2, 'Cost')
#(3, 'Name')
#(4, 'Age')
I have a dictionary like so: {key_1: pd.Dataframe, key_2: pd.Dataframe, ...}.
Each of these dfs within the dictionary has a column called 'ID'.
Not all instances appear in each dataframe meaning that the dataframes are of different size.
Is there anyway I could combine these into one large dataframe?
Here's a minimal reproducible example of the data:
data1 = [{'ID': 's1', 'country': 'Micronesia', 'Participants':3},
{'ID':'s2', 'country': 'Thailand', 'Participants': 90},
{'ID':'s3', 'country': 'China', 'Participants': 36},
{'ID':'s4', 'country': 'Peru', 'Participants': 30}]
data2 = [{'ID': '1', 'country': 'Micronesia', 'Kids_per_participant':3},
{'ID':'s2', 'country': 'Thailand', 'Kids_per_participant': 9},
{'ID':'s3', 'country': 'China', 'Kids_per_participant': 39}]
data3= [{'ID': 's1', 'country': 'Micronesia', 'hair_style_rank':3},
{'ID':'s2', 'country': 'Thailand', 'hair_style_rank': 9}]
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
dict_example={'df1_key':df1,'df2_key':df2,'df3_key':df3}
pd.merge(dict_example.values(), on="ID", how="outer")
For a dict with arbitrary number of keys you could do this
i=list(dict_example.keys())
newthing = dict_example[i[0]]
for j in range(1,len(i)):
newthing = newthing.merge(dict_example[i[j]],on='ID', how = 'outer')
First make a list of your dataframes. Second create a first DataFrame. Then iterate through the rest of your DataFrames and merge each one after that. I did notice you have country for each ID, but it's not listing in your initial on statement. Do you want to join on country also? If so replace the merge above with this changing the join criteria to a list including country
newthing = newthing.merge(dict_example[i[j]],on=['ID','country'], how = 'outer')
Documents on merge
If you don't care about altering your DataFrames code could be shorter like this
for j in range(1,len(i)):
df1 = df1.merge(dict_example[i[j]],on=['ID','country'], how = 'outer')
I want to convert this dict into a pandas dataframe where each key becomes a column and values in the list become the rows:
my_dict:
{'Last updated': ['2021-05-18T15:24:19.000Z', '2021-05-18T15:24:19.000Z'],
'Symbol': ['BTC', 'BNB', 'XRP', 'ADA', 'BUSD'],
'Name': ['Bitcoin', 'Binance Coin', 'XRP', 'Cardano', 'Binance USD'],
'Rank': [1, 3, 7, 4, 25],
}
The lists in my_dict can also have some missing values, which should appear as NaNs in dataframe.
This is how I'm currently trying to append it into my dataframe:
df = pd.DataFrame(columns = ['Last updated',
'Symbol',
'Name',
'Rank',]
df = df.append(my_dict, ignore_index=True)
#print(df)
df.to_excel(r'\walletframe.xlsx', index = False, header = True)
But my output only has a single row containing all the values.
The answer was pretty simple, instead of using
df = df.append(my_dict)
I used
df = pd.DataFrame.from_dict(my_dict).T
Which transposes the dataframe so it doesn't has any missing values for columns.
Credits to #Ank who helped me find the solution!
I pulled a list of historical option price of AAPL from the RobinHoood function robin_stocks.get_option_historicals(). The data was returned in a form of dictional of list of dictionary as shown below.
I am having difficulties to convert the below object (named historicalData) into a DataFrame. Can someone please help?
historicalData = {'data_points': [{'begins_at': '2020-10-05T13:30:00Z',
'open_price': '1.430000',
'close_price': '1.430000',
'high_price': '1.430000',
'low_price': '1.430000',
'volume': 0,
'session': 'reg',
'interpolated': False},
{'begins_at': '2020-10-05T13:40:00Z',
'open_price': '1.430000',
'close_price': '1.340000',
'high_price': '1.440000',
'low_price': '1.320000',
'volume': 0,
'session': 'reg',
'interpolated': False}],
'open_time': '0001-01-01T00:00:00Z',
'open_price': '0.000000',
'previous_close_time': '0001-01-01T00:00:00Z',
'previous_close_price': '0.000000',
'interval': '10minute',
'span': 'week',
'bounds': 'regular',
'id': '22b49380-8c50-4c76-8fb1-a4d06058f91e',
'instrument': 'https://api.robinhood.com/options/instruments/22b49380-8c50-4c76-8fb1-a4d06058f91e/'}
I tried the below code code but that didn't help:
import pandas as pd
df = pd.DataFrame(historicalData)
df
You didn't write that you want only data_points (as in the
other answer), so I assume that you want your whole dictionary
converted to a DataFrame.
To do it, start with your code:
df = pd.DataFrame(historicalData)
It creates a DataFrame, with data_points "exploded" to
consecutive rows, but they are still dictionaries.
Then rename open_price column to open_price_all:
df.rename(columns={'open_price': 'open_price_all'}, inplace=True)
The reason is to avoid duplicated column names after join
to be performed soon (data_points contain also open_price
attribute and I want the corresponding column from data_points
to "inherit" this name).
The next step is to create a temporary DataFrame - a split of
dictionaries in data_points to individual columns:
wrk = df.data_points.apply(pd.Series)
Print wrk to see the result.
And the last step is to join df with wrk and drop
data_points column (not needed any more, since it was
split into separate columns):
result = df.join(wrk).drop(columns=['data_points'])
This is pretty easy to solve with the below. I have chucked the dataframe to a list via list comprehension
import pandas as pd
df_list = [pd.DataFrame(dic.items(), columns=['Parameters', 'Value']) for dic in historicalData['data_points']]
You then could do:
df_list[0]
which will yield
Parameters Value
0 begins_at 2020-10-05T13:30:00Z
1 open_price 1.430000
2 close_price 1.430000
3 high_price 1.430000
4 low_price 1.430000
5 volume 0
6 session reg
7 interpolated False
I am extracting some data from an API and having challenges transforming it into a proper dataframe.
The resulting DataFrame df is arranged as such:
Index Column
0 {'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
1 {'different-email#email.com': [{'action': 'data', 'date': 'date'}]}
I am trying to split the emails into one column and the list into a separate column:
Index Column1 Column2
0 email#email.com [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
Ideally, each 'action'/'date' would have it's own separate row, however I believe I can do the further unpacking myself.
After looking around I tried/failed lots of solutions such as:
df.apply(pd.Series) # does nothing
pd.DataFrame(df['column'].values.tolist()) # makes each dictionary key as a separate colum
where most of the rows are NaN except one which has the pair value
Edit:
As many of the questions asked the initial format of the data in the API, it's a list of dictionaries:
[{'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]},{'different-email#email.com': [{'action': 'data', 'date': 'date'}]}]
Thanks
One naive way of doing this is as below:
inp = [{'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
, {'different-email#email.com': [{'action': 'data', 'date': 'date'}]}]
index = 0
df = pd.DataFrame()
for each in inp: # iterate through the list of dicts
for k, v in each.items(): #take each key value pairs
for eachv in v: #the values being a list, iterate through each
print (str(eachv))
df.set_value(index,'Column1',k)
df.set_value(index,'Column2',str(eachv))
index += 1
I am sure there might be a better way of writing this. Hope this helps :)
Assuming you have already read it as dataframe, you can use following -
import ast
df['Column'] = df['Column'].apply(lambda x: ast.literal_eval(x))
df['email'] = df['Column'].apply(lambda x: x.keys()[0])
df['value'] = df['Column'].apply(lambda x: x.values()[0])