DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
(0.519) (1.117) (1.152) 0.772 1.490 (0.850) (1.189) (0.759)
0.030 0.047 0.632 (0.608) (0.322) 0.939 0.346 0.651
1.290 (0.179) 0.006 0.850 (1.141) 0.758 0.682
1.500 (1.228) 1.840 (1.594) (0.282) (0.907)
(1.540) 0.689 (0.683) 0.005 0.543
(0.197) (0.664) (0.636) 0.878
(0.942) 0.764 (0.137)
0.693 1.647
0.197
I have above dataframe:
i need below dataframe using random value from above dataframe:
DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
(0.664) 1.290 0.682 0.030 (0.683) (0.636) (0.683) 1.840 (1.540)
1.490 (0.907) (0.850) (0.197) (1.228) 0.682 1.290 0.939
0.047 0.682 0.346 0.689 (0.137) 1.490 0.197
0.047 0.878 0.651 0.047 0.047 (0.197)
(1.141) 0.758 0.878 1.490 0.651
1.647 1.490 0.772 1.490
(0.519) 0.693 0.346
(0.137) 0.850
0.197
I've tried this code :
df2= df1.sample(len(df1))
print(df2)
But Output is
DP1 DP2 DP3 DP4 DP5 DP6 DP7 DP8 DP9
OP8 0.735590 1.762630 NaN NaN NaN NaN NaN NaN NaN
OP7 -0.999665 0.817949 -0.147698 NaN NaN NaN NaN NaN NaN
OP2 0.031430 0.049994 0.682040 -0.667445 -0.360034 1.089516 0.426642 0.916619 NaN
OP3 1.368955 -0.191781 0.006623 0.932736 -1.277548 0.880056 0.841018 NaN NaN
OP1 -0.551065 -1.195305 -1.243199 0.847178 1.668630 -0.986300 -1.465904 -1.069986 NaN
OP4 1.592201 -1.314628 1.985683 -1.749389 -0.315828 -1.052629 NaN NaN NaN
OP6 -0.208647 -0.710424 -0.686654 0.963221 NaN NaN NaN NaN NaN
OP10 NaN NaN NaN NaN NaN NaN NaN NaN NaN
OP9 0.209244 NaN NaN NaN NaN NaN NaN NaN NaN
OP5 -1.635306 0.737937 -0.736907 0.005545 0.607974 NaN NaN NaN NaN
You can use np.random.choice() for the sampling.
Assuming df is something like this:
df = pd.DataFrame({'DP 1': ['(0.519)','0.030','1.290','1.500','(1.540)','(0.197)','(0.942)','0.693','0.197'],'DP 2': ['(1.117)','0.047','(0.179)','(1.228)','0.689','(0.664)','0.764','1.647',np.nan],'DP 3': ['(1.152)','0.632','0.006','1.840','(0.683)','(0.636)','(0.137)',np.nan,np.nan],'DP 4': ['0.772','(0.608)','0.850','(1.594)','0.005','0.878',np.nan,np.nan,np.nan],'DP 5': ['1.490','(0.322)','(1.141)','(0.282)','0.543',np.nan,np.nan,np.nan,np.nan],'DP 6': ['(0.850)','0.939','0.758','(0.907)',np.nan,np.nan,np.nan,np.nan,np.nan],'DP 7': ['(1.189)','0.346','0.682',np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'DP 8': ['(0.759)','0.651',np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'DP 9': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'DP 10': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})
# DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
# 0 (0.519) (1.117) (1.152) 0.772 1.490 (0.850) (1.189) (0.759) NaN NaN
# 1 0.030 0.047 0.632 (0.608) (0.322) 0.939 0.346 0.651 NaN NaN
# 2 1.290 (0.179) 0.006 0.850 (1.141) 0.758 0.682 NaN NaN NaN
# 3 1.500 (1.228) 1.840 (1.594) (0.282) (0.907) NaN NaN NaN NaN
# 4 (1.540) 0.689 (0.683) 0.005 0.543 NaN NaN NaN NaN NaN
# 5 (0.197) (0.664) (0.636) 0.878 NaN NaN NaN NaN NaN NaN
# 6 (0.942) 0.764 (0.137) NaN NaN NaN NaN NaN NaN NaN
# 7 0.693 1.647 NaN NaN NaN NaN NaN NaN NaN NaN
# 8 0.197 NaN NaN NaN NaN NaN NaN NaN NaN NaN
First extract the choices from all non-null values of df:
choices = df.values[~pd.isnull(df.values)]
# array(['(0.519)', '(1.117)', '(1.152)', '0.772', '1.490', '(0.850)',
# '(1.189)', '(0.759)', '0.030', '0.047', '0.632', '(0.608)',
# '(0.322)', '0.939', '0.346', '0.651', '1.290', '(0.179)', '0.006',
# '0.850', '(1.141)', '0.758', '0.682', '1.500', '(1.228)', '1.840',
# '(1.594)', '(0.282)', '(0.907)', '(1.540)', '0.689', '(0.683)',
# '0.005', '0.543', '(0.197)', '(0.664)', '(0.636)', '0.878',
# '(0.942)', '0.764', '(0.137)', '0.693', '1.647', '0.197'],
# dtype=object)
Then take a np.random.choice() from choices for all non-null cells:
df = df.applymap(lambda x: np.random.choice(choices) if not pd.isnull(x) else x)
# DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
# 0 (0.179) 0.682 0.758 (1.152) (0.137) (1.152) 0.939 (0.759) NaN NaN
# 1 1.500 (1.152) (0.197) 0.772 1.840 1.840 0.772 (0.850) NaN NaN
# 2 0.878 0.005 (1.540) 0.764 (0.519) 0.682 (1.152) NaN NaN NaN
# 3 0.758 (0.137) 1.840 1.647 1.647 (0.942) NaN NaN NaN NaN
# 4 0.693 (0.683) (0.759) 1.500 (0.197) NaN NaN NaN NaN NaN
# 5 0.006 (0.137) 0.764 (1.117) NaN NaN NaN NaN NaN NaN
# 6 (0.664) 0.632 (1.141) NaN NaN NaN NaN NaN NaN NaN
# 7 0.543 (0.664) NaN NaN NaN NaN NaN NaN NaN NaN
# 8 (0.137) NaN NaN NaN NaN NaN NaN NaN NaN NaN
I have a dataframe like:
ID Sim Items
1 0.345 [7,7]
2 0.604 [2,7,3,8,5]
3 0.082 [9,1,9,1]
I want to form a pivot_table by:
df.pivot_table(index ="ID" , columns = "Items", values="Sim")
To do that,
I have to extract list elements in items column and repeat the ID,Sim values for each unique elements in row_list.
To be as:
ID Sim Items
1 0.345 7
2 0.604 2
2 0.604 7
2 0.604 3
2 0.604 8
2 0.604 5
3 0.082 9
3 0.082 1
pivot table :
7 2 3 8 5 1 9
1 0.345 - - - - - -
2 0.604 0.604 0.604 0.604 0.604
3 - - - - - 0.082 0.082
Is there any pythonic approach for that? Or any suggestions?
Use explode(new in pandas 0.25+) before pivot;
df.explode('Items').pivot_table(index ="ID" , columns = "Items", values="Sim")
Items 1 2 3 5 7 8 9
ID
1 NaN NaN NaN NaN 0.345 NaN NaN
2 NaN 0.604 0.604 0.604 0.604 0.604 NaN
3 0.082 NaN NaN NaN NaN NaN 0.082
for lower versions of pandas, you can try with:
(df.drop('Items',1).join(pd.DataFrame(df['Items'].tolist())
.stack(dropna=False).droplevel(1).rename('Items'))
.pivot_table(index ="ID" , columns = "Items", values="Sim"))
Items 1 2 3 5 7 8 9
ID
1 NaN NaN NaN NaN 0.345 NaN NaN
2 NaN 0.604 0.604 0.604 0.604 0.604 NaN
3 0.082 NaN NaN NaN NaN NaN 0.082
If exact ordering matters , use reindex with unique of Items after explode:
(df.explode('Items').pivot_table(index ="ID" , columns = "Items", values="Sim")
.reindex(df.explode('Items')['Items'].unique(),axis=1))
Items 7 2 3 8 5 9 1
ID
1 0.345 NaN NaN NaN NaN NaN NaN
2 0.604 0.604 0.604 0.604 0.604 NaN NaN
3 NaN NaN NaN NaN NaN 0.082 0.082
I am working with a dataframe such as below.
df.head()
Out[20]:
Date Price Open High ... Vol. Change % A Day % OC %
0 2016-04-25 9577.5 9650.0 9685.0 ... 306230.0 -0.83 1.79 -0.75
1 2016-04-26 9660.0 9567.5 9695.0 ... 389490.0 0.86 1.52 0.97
2 2016-04-27 9627.5 9660.0 9682.5 ... 277940.0 -0.34 1.02 -0.34
3 2016-04-28 9595.0 9625.0 9667.5 ... 75120.0 -0.34 1.36 -0.31
4 2016-04-29 9532.5 9567.5 9597.5 ... 138340.0 -0.65 0.73 -0.37
I sliced it with some conditions. As a result I got a list of sliced indices con_down_success whose length is 96.
Also, I made a list such as,
con_down_success_D1 = [x+1 for x in con_down_success]
What I want to do is below.
df.iloc[con_down_success_D1,:].Low/df.iloc[con_down_success,:].Price
This code is supposed to show calculated series but too many are NaNs like below.
(df.iloc[con_down_success_D1,:].Low/df.iloc[con_down_success,:].Price).tail(12)
Out[26]:
778 0.995716
779 NaN
787 NaN
788 NaN
794 NaN
795 NaN
821 NaN
822 NaN
827 NaN
828 NaN
830 NaN
831 NaN
All of the two series have actual numbers, not NaN or NA. For example, below is no problem.
df.iloc[831,:].Low/df.iloc[830,:].Price
Out[18]: 0.9968354430379747
Could you tell me how to handle the dataframe to show what I want?
Thanks in advance.
I have the following multi-level dataframe (partial)
Px_last FINAL RETURN Stock_RES WANTED
Stock Date
ALKM 10/27/2016 0.0013 1 -53.85 NaN -53.85
1/17/2017 0.0009 1 111.11 NaN 57.26
1/18/2017 0.0012 1 233.33 NaN 290.60
1/23/2018 0.0012 1 16.67 NaN 307.26
1/30/2018 0.0019 1 -42.11 NaN 265.16
ANDI 12/28/2017 0.0017 1 370.59 NaN 370.59
2/14/2018 0.0324 1 20.00 NaN 390.59
APPZ 9/22/2017 0.0002 1 -50.00 NaN -50.00
12/5/2017 0.0001 1 -100.00 NaN -150.00
12/6/2017 0.0001 1 0.00 NaN -150.00
I can do a cumulative sum for the entire dataframe with the following code
df3['TTL_SUM'] = df3['RETURN'].cumsum()
But want I want to do is a cumulative sum by each stock but when I do the following I get a column of NaN. Does anyone know what I am doing wrong here? SEE dataframe above
df3['Stock_RES'] = df3.groupby(level=0)['RETURN'].sum()
It does seem to work when I assign that to a variable but ultimately I want to get it in the dataframe
RESULTS = df3.groupby(level=0)['RETURN'].sum()
Can someone help me out. Seems like the same code to me so not sure why it won't add directly into a dataframe.
You were using sum and not cumsum in a groupby context.
df.assign(WANTED1=df.groupby('Stock').RETURN.cumsum())
Px_last FINAL RETURN Stock_RES WANTED WANTED1
Stock Date
ALKM 10/27/2016 0.0013 1 -53.85 NaN -53.85 -53.85
1/17/2017 0.0009 1 111.11 NaN 57.26 57.26
1/18/2017 0.0012 1 233.33 NaN 290.60 290.59
1/23/2018 0.0012 1 16.67 NaN 307.26 307.26
1/30/2018 0.0019 1 -42.11 NaN 265.16 265.15
ANDI 12/28/2017 0.0017 1 370.59 NaN 370.59 370.59
2/14/2018 0.0324 1 20.00 NaN 390.59 390.59
APPZ 9/22/2017 0.0002 1 -50.00 NaN -50.00 -50.00
12/5/2017 0.0001 1 -100.00 NaN -150.00 -150.00
12/6/2017 0.0001 1 0.00 NaN -150.00 -150.00
I am trying to split a data set for training and testing using Pandas.
data = pd.read_csv("housingdata.csv", header=None)
train = testing.sample(frac=0.6)
train.reindex()
test = testing.loc[~testing.index.isin(train.index)]
print train
print test
when I print the data, I get
0 1 2 3 4
9 0.17004 12.5 7.87 0 0.524
1 0.02731 0.0 7.07 0 0.469
5 0.02985 0.0 2.18 0 0.458
3 0.03237 0.0 2.18 0 0.458
7 0.14455 12.5 7.87 0 0.524
6 0.08829 12.5 7.87 0 0.524
0 1 2 3 4
0 0.00632 18.0 2.31 0 0.538
2 0.02729 0.0 7.07 0 0.469
4 0.06905 0.0 2.18 0 0.458
8 0.21124 12.5 7.87 0 0.524
As noticed, the row indices are re-shuffled. How to re-index the rows in both the data sets?
This however does not change global settings. Eg.,
train.iloc[0,4]
gives 0.524
As #EdChum's comments point out, it's not exactly clear what behavior you're looking for. But if all you want to do is to give both new dataframes indices going from 0, 1, 2 ... n then you can use reset_index():
train.reset_index(inplace=True, drop=True)
test.reset_index(inplace=True, drop=True)