I have dataframe with time-series data and want to plot number of each item over time.
date item ordered
1 01-05-2020 1 1
2 01-05-2020 1 23
3 03-06-2020 2 4
4 03-07-2020 2 5
5 04-09-2020 3 4
df_new = df.groupby(df[['date','item']])['ordered'].sum().reset_index()
df_new.plot()
Use DataFrame.pivot_table before ploting, also dont convert DatetimeIndex to column by reset_index before ploting:
df_new = df.pivot_table(index='date', columns='item', values='ordered', aggfunc='sum')
print (df_new)
item 1 2 3
date
01-05-2020 24.0 NaN NaN
03-06-2020 NaN 4.0 NaN
03-07-2020 NaN 5.0 NaN
04-09-2020 NaN NaN 4.0
df_new.plot()
Your solution:
df_new = df.groupby(['date','item'])['ordered'].sum().unstack()
print (df_new)
item 1 2 3
date
01-05-2020 24.0 NaN NaN
03-06-2020 NaN 4.0 NaN
03-07-2020 NaN 5.0 NaN
04-09-2020 NaN NaN 4.0
df_new.plot()
Related
I have some datas I would like to organize for visualization and statistics but I don't know how to proceed.
The data are in 3 columns (stimA, stimB and subjectAnswer) and 10 rows (numero of pairs) and they are from a pairwise comparison test, in panda's dataFrame format. Example :
stimA
stimB
subjectAnswer
1
2
36
3
1
55
5
3
98
...
...
...
My goal is to organize them as a matrix with each row and column corresponding to one stimulus with the subjectAnswer data grouped to the left side of the matrix' diagonal (in my example, the subjectAnswer 36 corresponding to stimA 1 and stimB 2 should go to the index [2][1]), like this :
stimA/stimB
1
2
3
4
5
1
...
2
36
3
55
4
...
5
...
...
98
I succeeded in pivoting the first table to the matrix but I couldn't succeed the arrangement on the left side of the diag of my datas, here is my code :
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
session1 = pd.read_csv(filepath, names=['stimA', 'stimB', 'subjectAnswer'])
pivoted = session1.pivot('stimA','stimB','subjectAnswer')
Which gives :
session1 :
stimA stimB subjectAnswer
0 1 3 6
1 4 3 21
2 4 5 26
3 2 3 10
4 1 2 6
5 1 5 6
6 4 1 6
7 5 2 13
8 3 5 15
9 2 4 26
pivoted :
stimB 1 2 3 4 5
stimA
1 NaN 6.0 6.0 NaN 6.0
2 NaN NaN 10.0 26.0 NaN
3 NaN NaN NaN NaN 15.0
4 6.0 NaN 21.0 NaN 26.0
5 NaN 13.0 NaN NaN NaN
The expected output for pivoted :
stimB 1 2 3 4 5
stimA
1 NaN NaN Nan NaN NaN
2 6.0 NaN Nan NaN NaN
3 6.0 10.0 NaN NaN NaN
4 6.0 26.0 21.0 NaN NaN
5 6.0 13.0 15.0 26.0 NaN
Thanks a lot for your help !
If I understand you correctly, the stimuli A and B are interchangeable. So to get the matrix layout you want, you can swap A with B in those rows where A is smaller than B. In other words, you don't use the original A and B for the pivot table, but the maximum and minimum of A and B:
session1['stim_min'] = np.min(session1[['stimA', 'stimB']], axis=1)
session1['stim_max'] = np.max(session1[['stimA', 'stimB']], axis=1)
pivoted = session1.pivot('stim_max', 'stim_min', 'subjectAnswer')
pivoted
stim_min 1 2 3 4
stim_max
2 6.0 NaN NaN NaN
3 6.0 10.0 NaN NaN
4 6.0 26.0 21.0 NaN
5 6.0 13.0 15.0 26.0
sort the columns stimA and stimB along the columns axis and assign two temporary columns namely x and y in the dataframe. Here sorting is required because we need to ensure that the resulting matrix clipped on the upper right side.
Pivot the dataframe with index as y, columns as x and values as subjectanswer, then reindex the reshaped frame in order to ensure that all the available unique stim names are present in the index and columns of the matrix
session1[['x', 'y']] = np.sort(session1[['stimA', 'stimB']], axis=1)
i = np.union1d(session1['x'], session1['y'])
session1.pivot('y', 'x','subjectAnswer').reindex(i, i)
x 1 2 3 4 5
y
1 NaN NaN NaN NaN NaN
2 6.0 NaN NaN NaN NaN
3 6.0 10.0 NaN NaN NaN
4 6.0 26.0 21.0 NaN NaN
5 6.0 13.0 15.0 26.0 NaN
I've imported a .csv into pandas and want to extract specific values and put them into a new column whilst maintaining the existing shape.
So df[::3] extracts the data-
1 1
2 4
3 7
4
5
6
7
I want it to look like
1 1
2
3
4 4
5
6
7 7
Here is a solution:
df = pd.read_csv(r"C:/users/k_sego/colsplit.csv",sep=";")
df1 = df[['col1']]
df2 = df[['col2']]
DF = pd.merge(df1,df2, how='outer',left_on=['col1'],right_on=['col2'])
and the result is
col1 col2
0 1.0 1.0
1 2.0 NaN
2 3.0 NaN
3 4.0 4.0
4 5.0 NaN
5 6.0 NaN
6 7.0 7.0
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 NaN NaN
I'm working at a data frame like this:
id type1 type2 type3
0 1 dog NaN NaN
1 2 cat NaN NaN
2 3 dog cat NaN
3 4 cow NaN NaN
4 5 dog NaN NaN
5 6 cat NaN NaN
6 7 cat dog cow
7 8 dog NaN NaN
How can I transfer it to the following dataframe? Thank you.
id dog cat cow
0 1 1.0 NaN NaN
1 2 NaN 1.0 NaN
2 3 1.0 1.0 NaN
3 4 NaN NaN 1.0
4 5 1.0 NaN NaN
5 6 NaN 1.0 NaN
6 7 1.0 1.0 1.0
7 8 1.0 NaN NaN
First filter ony type columns by DataFrame.filter, reshape by DataFrame.stack, so possible call Series.str.get_dummies. Then for 0/1 output use max by first level of MultiIndex and change 1 to NaNs by DataFrame.mask. Last add first column by DataFrame.join:
df1 = df.filter(like='type').stack().str.get_dummies().max(level=0).mask(lambda x: x == 0)
Or use get_dummies and max per columns names and last change 1 to NaNs:
df1 = (pd.get_dummies(df.filter(like='type'), prefix='', prefix_sep='')
.max(level=0, axis=1)
.mask(lambda x: x == 0))
df = df[['id']].join(df1)
print (df)
id cat cow dog
0 1 NaN NaN 1.0
1 2 1.0 NaN NaN
2 3 1.0 NaN 1.0
3 4 NaN 1.0 NaN
4 5 NaN NaN 1.0
5 6 1.0 NaN NaN
6 7 1.0 1.0 1.0
7 8 NaN NaN 1.0
So I have two dataframes
eqdf
symbol qty
0 DABIND 1
1 INFTEC 6
2 DISHTV 8
3 HINDAL 40
4 NATMIN 5
5 POWGRI 40
6 CHEPET 6
premdf
share strike lprice premperc d_strike
0 HINDAL 250.0 237.90 1.975620 5.086171
1 RELIND 1280.0 1254.30 1.642350 2.048952
2 POWGRI 205.0 201.15 1.118568 1.913995
I want to compare columns premdf['share'] and eqdf['symbol'] and if there is a match premperc,d_strike,strike value is to be added to the end of the eqdf row in which there is a match.
I have tried
eqdf.loc[eqdf['symbol']==premdf['share'],eqdf['premperc'] == premdf['premperc']]
I keep getting errors
ValueError: Can only compare identically-labeled Series objects
Expected Output:
eqdf
symbol qty premperc d_strike strike
0 DABIND 1 NaN NaN NaN
1 INFTEC 6 NaN NaN NaN
2 DISHTV 8 NaN NaN NaN
3 HINDAL 40 1.975620 5.086171 250.0
4 NATMIN 5 NaN NaN NaN
5 POWGRI 40 1.118568 1.913995 205.0
6 CHEPET 6 NaN NaN NaN
What is the correct way to do this?
Thanks
rename and merge
eqdf.merge(premdf.rename(columns={'share': 'symbol'}), 'left')
symbol qty strike lprice premperc d_strike
0 DABIND 1 NaN NaN NaN NaN
1 INFTEC 6 NaN NaN NaN NaN
2 DISHTV 8 NaN NaN NaN NaN
3 HINDAL 40 250.0 237.90 1.975620 5.086171
4 NATMIN 5 NaN NaN NaN NaN
5 POWGRI 40 205.0 201.15 1.118568 1.913995
6 CHEPET 6 NaN NaN NaN NaN
I have following database:
df = pandas.DataFrame({'Buy':[10,np.nan,2,np.nan,np.nan,4],'Sell':[np.nan,7,np.nan,9,np.nan,np.nan]})
Out[37]:
Buy Sell
0 10.0 NaN
1 NaN 7.0
2 2.0 NaN
3 NaN 9.0
4 NaN NaN
5 4.0 NaN
I want o create two more columns called Quant and B/S
for Quant it is working fine as follows:
df['Quant'] = df['Buy'].fillna(df['Sell']) # Fetch available value from both column and if both values are Nan then output is Nan.
Output is:
df
Out[39]:
Buy Sell Quant
0 10.0 NaN 10.0
1 NaN 7.0 7.0
2 2.0 NaN 2.0
3 NaN 9.0 9.0
4 NaN NaN NaN
5 4.0 NaN 4.0
But I want to create B/S on the basis of "from which column they have taken value while creating Quant"
You can perform an equality test and feed into numpy.where:
df['B/S'] = np.where(df['Quant'] == df['Buy'], 'B', 'S')
For the case where both values are null, you can use an additional step:
df.loc[df[['Buy', 'Sell']].isnull().all(1), 'B/S'] = np.nan
Example
from io import StringIO
import pandas as pd
mystr = StringIO("""Buy Sell
10 nan
nan 8
4 nan
nan 5
nan 7
3 nan
2 nan
nan nan""")
df = pd.read_csv(mystr, delim_whitespace=True)
df['Quant'] = df['Buy'].fillna(df['Sell'])
df['B/S'] = np.where(df['Quant'] == df['Buy'], 'B', 'S')
df.loc[df[['Buy', 'Sell']].isnull().all(1), 'B/S'] = np.nan
Result
print(df)
Buy Sell Quant B/S
0 10.0 NaN 10.0 B
1 NaN 8.0 8.0 S
2 4.0 NaN 4.0 B
3 NaN 5.0 5.0 S
4 NaN 7.0 7.0 S
5 3.0 NaN 3.0 B
6 2.0 NaN 2.0 B
7 NaN NaN NaN NaN