I have a dataframe with 5 columns: M1, M2, M3, M4 and M5. Each column contains floating-point values. Now I want to combine the data of 5 columns into one.
I tried
cols = list(df.columns)
df_new['Total'] = []
df_new['Total'] = [df_new['Total'].append(df[i], ignore_index=True) for i in cols]
But I'm getting this
I'm using Python 3.8.5 and Pandas 1.1.2.
Here's a part of my df
M1 M2 M3 M4 M5
0 5 12 20 26
0.5 5.5 12.5 20.5 26.5
1 6 13 21 27
1.5 6.5 13.5 21.5 27.5
2 7 14 22 28
2.5 7.5 14.5 22.5 28.5
10 15 22 30 36
10.5 15.5 22.5 30.5 36.5
11 16 23 31 37
11.5 16.5 23.5 31.5 37.5
12 17 24 32 38
12.5 17.5 24.5 32.5 38.5
And this is what I'm expecting
0
0.5
1
1.5
2
2.5
10
10.5
11
11.5
12
12.5
5
5.5
6
6.5
7
7.5
15
15.5
16
16.5
17
17.5
12
12.5
13
13.5
14
14.5
22
22.5
23
23.5
24
24.5
20
20.5
21
21.5
22
22.5
30
30.5
31
31.5
32
32.5
26
26.5
27
27.5
28
28.5
36
36.5
37
37.5
38
38.5
import pandas as pd
Just make use of concat() method and list comprehension:
result=pd.concat((df[x] for x in df.columns),ignore_index=True)
Now If you print result then you will get your desired output
Performance(concat() vs unstack()):
I want to paint the share price cell green if it is higher than the target price and red if it is lower than the alert price and my code is not working as it keeps popping errors.
This is the code that I use
temp_df.style.apply(lambda x: ["background: red" if v < x.iloc[:,1:] and x.iloc[:,1:] != 0 else "" for v in x], subset=['Share Price'], axis = 0)
temp_df.style.apply(lambda x: ["background: green" if v > x.iloc[:,2:] and x.iloc[:,2:] != 0 else "" for v in x], subset=['Share Price'], axis = 0)
Can anyone give me an idea on how to do it?
Index Share Price Alert/Entry Target
0 622.0 424.0 950.0
1 6880.0 5200.0 7450.0
2 62860.0 40000.0 60000.0
3 7669.0 5500.0 8000.0
4 5295.0 3500.0 5500.0
5 227.0 165.0 250.0
6 3970.0 3200.0 4250.0
7 1300.0 850.0 1650.0
8 8480.0 6500.0 8500.0
9 11.3 0.0 0.0
10 66.0 58.0 75.0
11 7.3 6.4 9.6
12 114.8 75.0 130.0
13 172.3 90.0 0.0
14 2.6 2.4 3.2
15 76.8 68.0 85.0
16 19.6 15.4 21.0
17 21.9 11.0 18.6
18 35.4 29.0 42.0
19 12.5 9.2 0.0
20 15.5 0.0 0.0
21 449.8 0.0 0.0
22 4.3 3.6 5.0
23 47.4 40.0 55.0
24 0.6 0.5 0.6
25 49.2 45.0 72.0
26 13.9 0.0 0.0
27 3.0 2.4 4.5
28 2.4 1.8 4.2
29 54.0 0.0 0.0
30 293.5 100.0 250.0
31 190000.0 140000.0 220000.0
32 52200.0 46000.0 58000.0
33 100500.0 75000.0 115000.0
34 4.9 3.8 6.5
35 0.2 0.0 0.0
36 1430.0 980.0 1450.0
37 1585.0 0.0 0.0
38 15.6 11.0 18.0
39 3.3 2.8 6.0
40 52.5 45.0 68.0
41 46.5 35.0 0.0
42 193.6 135.0 0.0
43 122.8 90.0 0.0
44 222.6 165.0 265.0
Provided that "Index" is also a column:
temp_df.style.apply(lambda x: ["background: green" if (i==1 and v > x.iloc[3] and x.iloc[3] != 0) else ("background: red" if (i==1 and v < x.iloc[2]) else "") for i, v in enumerate(x)], axis=1)
i: aims to define the column Share Price to be styled (column: 1)
I have the following multi-index dataframe where one df represents the daily high of hypothetical stocks and the other consists of their previous day close.
High_Price Yest_Close
Ticker ABC XYZ RST ABC XYZ. RST
2/1/19 3 10 90 2 9 88
1/31/19 3.5 9 88 4 9.5 89
1/30/19 2.5 9.5 86 3 9.8 85
1/29/19 4 8.5 92 3.5 8 93
1/28/19 4.5 8.2 95 4.8 8 96
1/27/19 2.8 7 94 2.6 6.5 93
1/26/19 2.6 6.5 93 2.7 7 92
I want to append another dataframe that represents the max value between the two dfs (High_Price and Yest_Close). So the third df should look like the following:
High_Price Yest_Close Max
Ticker ABC XYZ RST ABC XYZ RST ABC XYZ RST
2/1/19 3 10 90 2 9 88 3 10 90
1/31/19 3.5 9 88 4 9.5 89 4 9.5 89
1/30/19 2.5 9.5 86 3 9.8 85 3 9.8 86
1/29/19 4 8.5 92 3.5 8 93 4 8.5 93
1/28/19 4.5 8.2 95 4.8 8 96 4.8 8.2 96
1/27/19 2.8 7 94 2.6 6.5 93 2.8 7 94
1/26/19 2.6 6.5 93 2.7 7 92 2.7 7 93
I tried the following logic but it's not getting me the proper result:
df['Max',ticker] = df[['High_Price','Yest_Close']].max(axis=1)
How should I fix my code to get the result I'm lookking for?
You want level=1 inside max ,then create a multiindex followed by df.join:
m = df[['High_Price','Yest_Close']].max(level=1,axis=1)
m.columns = pd.MultiIndex.from_product((['Max'],m.columns))
out = df.join(m)
High_Price Yest_Close Max
ABC XYZ RST ABC XYZ RST ABC XYZ RST
Ticker
2/1/19 3.0 10.0 90 2.0 9.0 88 3.0 10.0 90.0
1/31/19 3.5 9.0 88 4.0 9.5 89 4.0 9.5 89.0
1/30/19 2.5 9.5 86 3.0 9.8 85 3.0 9.8 86.0
1/29/19 4.0 8.5 92 3.5 8.0 93 4.0 8.5 93.0
1/28/19 4.5 8.2 95 4.8 8.0 96 4.8 8.2 96.0
1/27/19 2.8 7.0 94 2.6 6.5 93 2.8 7.0 94.0
1/26/19 2.6 6.5 93 2.7 7.0 92 2.7 7.0 93.0
This is my dataset. For this problem just consider the first and the last column.
45,37.25,14.5,-43.15,8.6
46,37.25,13.5,-42.15,8.6
47,37.25,12.5,-41.15,8.6
48,37.25,11.5,-40.15,8.6
49,37.25,10.5,-39.15,8.6
50,37.25,9.5,-38.15,8.6
51,36.25,8.5,-37.15,7.6
52,35.25,7.5,-36.15,6.6
53,34.25,6.5,-35.15,5.6
54,33.25,5.5,-34.15,4.6
55,32.25,4.5,-33.15,3.6
56,31.25,3.5,-32.15,2.6
57,30.25,2.5,-31.15,1.6
58,29.25,1.5,-30.15,0.6
59,28.25,0.5,-29.15,-0.4
60,27.25,-0.5,-28.15,-1.4
61,26.25,-0.5,-27.15,-1.4
62,25.25,-0.5,-26.15,-1.4
63,24.25,-0.5,-25.15,-1.4
64,23.25,-0.5,-24.15,-1.4
65,22.25,-0.5,-23.15,-1.4
The output expecting is:
Below 50,8.6
51,7.6
52,6.6
53,5.6
54,4.6
55,3.6
56,2.6
57,1.6
58,0.6
59,-0.4
Above 60, -1.4
The logic here is if the value of the last columns is same for 5 continuous rows then break the loop and return the output above.
I am trying to solve in Pandas way, but not getting any thoughts to start with. Any help will be appreciated.
As suggested in comments by #Erfan, there is probably a mistake in the first column of the output.
Here, one solution assuming you want to keep the first row of each group:
# I renamed the columns
print(df)
# a x y z b
# 0 45 37.25 14.5 -43.15 8.6
# 1 46 37.25 13.5 -42.15 8.6
# 2 47 37.25 12.5 -41.15 8.6
# 3 48 37.25 11.5 -40.15 8.6
# 4 49 37.25 10.5 -39.15 8.6
# 5 50 37.25 9.5 -38.15 8.6
# 6 51 36.25 8.5 -37.15 7.6
# 7 52 35.25 7.5 -36.15 6.6
# 8 53 34.25 6.5 -35.15 5.6
# 9 54 33.25 5.5 -34.15 4.6
# 10 55 32.25 4.5 -33.15 3.6
# 11 56 31.25 3.5 -32.15 2.6
# 12 57 30.25 2.5 -31.15 1.6
# 13 58 29.25 1.5 -30.15 0.6
# 14 59 28.25 0.5 -29.15 -0.4
# 15 60 27.25 -0.5 -28.15 -1.4
# 16 61 26.25 -0.5 -27.15 -1.4
# 17 62 25.25 -0.5 -26.15 -1.4
# 18 63 24.25 -0.5 -25.15 -1.4
# 19 64 23.25 -0.5 -24.15 -1.4
# 20 65 22.25 -0.5 -23.15 -1.4
def valid(x):
if len(x) < 5: return x
return x.head(1)
df["ids"] = (df.b != df.b.shift()).cumsum()
output = df.groupby("ids").apply(valid).reset_index(level=0, drop=True)[df.columns[:-1]]
print(output)
# a x y z b
# 0 45 37.25 14.5 -43.15 8.6
# 6 51 36.25 8.5 -37.15 7.6
# 7 52 35.25 7.5 -36.15 6.6
# 8 53 34.25 6.5 -35.15 5.6
# 9 54 33.25 5.5 -34.15 4.6
# 10 55 32.25 4.5 -33.15 3.6
# 11 56 31.25 3.5 -32.15 2.6
# 12 57 30.25 2.5 -31.15 1.6
# 13 58 29.25 1.5 -30.15 0.6
# 14 59 28.25 0.5 -29.15 -0.4
# 15 60 27.25 -0.5 -28.15 -1.4
If you want the last row (in the case there are more than 5 consecutives same row), replace x.head(1) by x.tail(1) or whatever function you want.
Here is one way
n=5
s1=df.iloc[:,-1].diff().ne(0).cumsum()
s2=s1.groupby(s1).transform('count')>n
pd.concat([df[s2].groupby(s1).head(1),df[~s2]]).sort_index()
1 2 3 4 5
0 45 37.25 14.5 -43.15 8.6
6 51 36.25 8.5 -37.15 7.6
7 52 35.25 7.5 -36.15 6.6
8 53 34.25 6.5 -35.15 5.6
9 54 33.25 5.5 -34.15 4.6
10 55 32.25 4.5 -33.15 3.6
11 56 31.25 3.5 -32.15 2.6
12 57 30.25 2.5 -31.15 1.6
13 58 29.25 1.5 -30.15 0.6
14 59 28.25 0.5 -29.15 -0.4
15 60 27.25 -0.5 -28.15 -1.4
I am trying to do data analysis of some rainfall data. Example of the data looks like this:-
10 18/05/2016 26.9 40 20.8 34 52.2 20.8 46.5 45
11 19/05/2016 25.5 32 0.3 41.6 42 0.3 56.3 65.2
12 20/05/2016 8.5 29 18.4 9 36 18.4 28.6 46
13 21/05/2016 24.5 18 TRACE 3.5 17 TRACE 4.4 40
14 22/05/2016 0.6 18 0 6.5 14 0 8.6 20
15 23/05/2016 3.5 9 0.6 4.3 14 0.6 7 15
16 24/05/2016 3.6 25 T 3 12 T 14.9 9
17 25/05/2016 25 21 2.2 25.6 50 2.2 25 9
The rainfall data contain a specific string 'TRACE' or 'T' (both meaning non measurable rainfall amount). For analysis, I would like to convert this strings in to '1.0' (float). My desired data should look like this so as to plot the values as line diagram:-
10 18/05/2016 26.9 40 20.8 34 52.2 20.8 46.5 45
11 19/05/2016 25.5 32 0.3 41.6 42 0.3 56.3 65.2
12 20/05/2016 8.5 29 18.4 9 36 18.4 28.6 46
13 21/05/2016 24.5 18 1.0 3.5 17 1.0 4.4 40
14 22/05/2016 0.6 18 0 6.5 14 0 8.6 20
15 23/05/2016 3.5 9 0.6 4.3 14 0.6 7 15
16 24/05/2016 3.6 25 1.0 3 12 1.0 14.9 9
17 25/05/2016 25 21 2.2 25.6 50 2.2 25 9
Can some one point me to right direction?
You can use df.replace, and then converting the numeric to float using df.astype (the original datatype would be object, so any operations on these columns would still suffer from performance issues):
df = df.replace('^T(RACE)?$', 1.0, regex=True)
df.iloc[:, 1:] = df.iloc[:, 1:].astype(float) # converting object columns to floats
This will replace all T or TRACE elements with 1.0.
Output:
10 18/05/2016 26.9 40 20.8 34.0 52.2 20.8 46.5 45.0
11 19/05/2016 25.5 32 0.3 41.6 42.0 0.3 56.3 65.2
12 20/05/2016 8.5 29 18.4 9.0 36.0 18.4 28.6 46.0
13 21/05/2016 24.5 18 1 3.5 17.0 1 4.4 40.0
14 22/05/2016 0.6 18 0 6.5 14.0 0 8.6 20.0
15 23/05/2016 3.5 9 0.6 4.3 14.0 0.6 7.0 15.0
16 24/05/2016 3.6 25 1 3.0 12.0 1 14.9 9.0
17 25/05/2016 25.0 21 2.2 25.6 50.0 2.2 25.0 9.0
Use replace by dict:
df = df.replace({'T':1.0, 'TRACE':1.0})
And then if necessary convert columns to float:
cols = df.columns.difference(['Date','another cols dont need convert'])
df[cols] = df[cols].astype(float)
df = df.replace({'T':1.0, 'TRACE':1.0})
cols = df.columns.difference(['Date','a'])
df[cols] = df[cols].astype(float)
print (df)
a Date 2 3 4 5 6 7 8 9
0 10 18/05/2016 26.9 40.0 20.8 34.0 52.2 20.8 46.5 45.0
1 11 19/05/2016 25.5 32.0 0.3 41.6 42.0 0.3 56.3 65.2
2 12 20/05/2016 8.5 29.0 18.4 9.0 36.0 18.4 28.6 46.0
3 13 21/05/2016 24.5 18.0 1.0 3.5 17.0 1.0 4.4 40.0
4 14 22/05/2016 0.6 18.0 0.0 6.5 14.0 0.0 8.6 20.0
5 15 23/05/2016 3.5 9.0 0.6 4.3 14.0 0.6 7.0 15.0
6 16 24/05/2016 3.6 25.0 1.0 3.0 12.0 1.0 14.9 9.0
7 17 25/05/2016 25.0 21.0 2.2 25.6 50.0 2.2 25.0 9.0
print (df.dtypes)
a int64
Date object
2 float64
3 float64
4 float64
5 float64
6 float64
7 float64
8 float64
9 float64
dtype: object
Extending the answer from #jezrael, you can replace and convert to floats in a single statement (assumes the first column is Date and the remaining are the desired numeric columns):
df.iloc[:, 1:] = df.iloc[:, 1:].replace({'T':1.0, 'TRACE':1.0}).astype(float)