I'm using pandas in Python, and I have performed some crosstab calculations and concatenations, and at the end up with a data frame that looks like this:
ID 5 6 7 8 9 10 11 12 13
Total 87.0 3.0 9.0 6.0 92.0 7.0 3.0 3.0 20.0
Regular 72.0 2.0 8.0 5.0 81.0 7.0 3.0 3.0 18.0
CR 22.0 0.0 0.0 0.0 17.0 0.0 0.0 0.0 3.0
HDG 20.0 0.0 0.0 0.0 24.0 4.0 0.0 0.0 1.0
PPG 30.0 2.0 8.0 5.0 40.0 3.0 3.0 3.0 14.0
Superior 15.0 1.0 1.0 1.0 11.0 0.0 0.0 0.0 2.0
CR 3.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
HDG 5.0 1.0 1.0 1.0 4.0 0.0 0.0 0.0 0.0
PPG 7.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 2.0
The problem is that I want the last 4 rows, that start with Superior to be places before Total row. So, simply I want to switch the positions of last 4 rows with the 4 rows that start with Regular. How can I achieve this in pandas? So that I get this:
ID 5 6 7 8 9 10 11 12 13
Total 87.0 3.0 9.0 6.0 92.0 7.0 3.0 3.0 20.0
Superior 15.0 1.0 1.0 1.0 11.0 0.0 0.0 0.0 2.0
CR 3.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
HDG 5.0 1.0 1.0 1.0 4.0 0.0 0.0 0.0 0.0
PPG 7.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 2.0
Regular 72.0 2.0 8.0 5.0 81.0 7.0 3.0 3.0 18.0
CR 22.0 0.0 0.0 0.0 17.0 0.0 0.0 0.0 3.0
HDG 20.0 0.0 0.0 0.0 24.0 4.0 0.0 0.0 1.0
PPG 30.0 2.0 8.0 5.0 40.0 3.0 3.0 3.0 14.0
More generalized solution Categorical and argsort, I know this df was ordered , so ffill is safe here
s=df.ID
s=s.where(s.isin(['Total','Regular','Superior'])).ffill()
s=pd.Categorical(s,['Total','Superior','Regular'],ordered=True)
df=df.iloc[np.argsort(s)]
df
Out[188]:
ID 5 6 7 8 9 10 11 12 13
0 Total 87.0 3.0 9.0 6.0 92.0 7.0 3.0 3.0 20.0
5 Superior 15.0 1.0 1.0 1.0 11.0 0.0 0.0 0.0 2.0
6 CR 3.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
7 HDG 5.0 1.0 1.0 1.0 4.0 0.0 0.0 0.0 0.0
8 PPG 7.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 2.0
1 Regular 72.0 2.0 8.0 5.0 81.0 7.0 3.0 3.0 18.0
2 CR 22.0 0.0 0.0 0.0 17.0 0.0 0.0 0.0 3.0
3 HDG 20.0 0.0 0.0 0.0 24.0 4.0 0.0 0.0 1.0
4 PPG 30.0 2.0 8.0 5.0 40.0 3.0 3.0 3.0 14.0
Here's one way:
import numpy as np
df.iloc[1:,:] = np.roll(df.iloc[1:,:].values, 4, axis=0)
ID 5 6 7 8 9 10 11 12 13
0 Total 87.0 3.0 9.0 6.0 92.0 7.0 3.0 3.0 20.0
1 Superior 15.0 1.0 1.0 1.0 11.0 0.0 0.0 0.0 2.0
2 CR 3.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
3 HDG 5.0 1.0 1.0 1.0 4.0 0.0 0.0 0.0 0.0
4 PPG 7.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 2.0
5 Regular 72.0 2.0 8.0 5.0 81.0 7.0 3.0 3.0 18.0
6 CR 22.0 0.0 0.0 0.0 17.0 0.0 0.0 0.0 3.0
7 HDG 20.0 0.0 0.0 0.0 24.0 4.0 0.0 0.0 1.0
8 PPG 30.0 2.0 8.0 5.0 40.0 3.0 3.0 3.0 14.0
For a specific answer to this question, just use iloc
df.iloc[[0,5,6,7,8,1,2,3,4],:]
For a more generalized solution,
m = (df.ID.eq('Superior') | df.ID.eq('Regular')).cumsum()
pd.concat([df[m==0], df[m==2], df[m==1]])
or
order = (2,1)
pd.concat([df[m==0], *[df[m==c] for c in order]])
where order defines the mapping from previous ordering to new ordering.
Related
I have a DataFrame looking like this:
year 2015 2016 2017 2018 2019 2015 2016 2017 2018 2019 ... 2015 2016 2017 2018 2019 2015 2016 2017 2018 2019
PATIENTS PATIENTS PATIENTS PATIENTS PATIENTS month month month month month ... diffs_24h diffs_24h diffs_24h diffs_24h diffs_24h diffs_168h diffs_168h diffs_168h diffs_168h diffs_168h
date
2016-01-01 00:00:00 0.0 2.0 1.0 7.0 3.0 1.0 1.0 1.0 1.0 1.0 ... NaN -1.0 -4.0 2.0 -2.0 NaN -3.0 -2.0 -3.0 -6.0
2016-01-01 01:00:00 6.0 6.0 7.0 6.0 7.0 1.0 1.0 1.0 1.0 1.0 ... NaN 4.0 0.0 0.0 1.0 NaN 3.0 1.0 2.0 -1.0
2016-01-01 02:00:00 2.0 7.0 6.0 2.0 3.0 1.0 1.0 1.0 1.0 1.0 ... NaN 4.0 3.0 -1.0 0.0 NaN 6.0 2.0 -3.0 0.0
2016-01-01 03:00:00 0.0 2.0 2.0 4.0 6.0 1.0 1.0 1.0 1.0 1.0 ... NaN -1.0 0.0 2.0 4.0 NaN -1.0 -2.0 3.0 3.0
2016-01-01 04:00:00 1.0 2.0 5.0 8.0 0.0 1.0 1.0 1.0 1.0 1.0 ... NaN -1.0 5.0 7.0 -1.0 NaN -2.0 3.0 5.0 -2.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2016-12-31 19:00:00 6.0 7.0 6.0 6.0 6.0 12.0 12.0 12.0 12.0 12.0 ... -9.0 -1.0 -7.0 1.0 -2.0 1.0 0.0 -6.0 -4.0 0.0
2016-12-31 20:00:00 2.0 2.0 5.0 5.0 3.0 12.0 12.0 12.0 12.0 12.0 ... -9.0 -7.0 -12.0 -1.0 -10.0 -2.0 -6.0 -2.0 -1.0 -4.0
2016-12-31 21:00:00 4.0 5.0 3.0 3.0 3.0 12.0 12.0 12.0 12.0 12.0 ... -2.0 -3.0 -10.0 -2.0 -11.0 -2.0 -2.0 -2.0 -3.0 -2.0
2016-12-31 22:00:00 5.0 2.0 6.0 6.0 3.0 12.0 12.0 12.0 12.0 12.0 ... 0.0 -6.0 -4.0 5.0 -4.0 2.0 -1.0 0.0 2.0 -3.0
2016-12-31 23:00:00 1.0 3.0 4.0 4.0 6.0 12.0 12.0 12.0 12.0 12.0 ... -6.0 -1.0 -11.0 2.0 -3.0 -4.0 -2.0 -7.0 -2.0 -2.0
and I want to end with a DataFrame in which the first level is the years but having a single year with all of the columns inside. How can I achieve that?
Example:
year 2015 2016 2017 2018 2019
PATIENTS month PATIENTS motnh PATIENTS month PATIENTS month PATIENTS month ...
date
2016-01-01 00:00:00 0.0 2.0 1.0 7.0 3.0 1.0 1.0 1.0 1.0 1.0 ... NaN -1.0 -4.0 2.0 -2.0 NaN -3.0 -2.0 -3.0 -6.0
2016-01-01 01:00:00 6.0 6.0 7.0 6.0 7.0 1.0 1.0 1.0 1.0 1.0 ... NaN 4.0 0.0 0.0 1.0 NaN 3.0 1.0 2.0 -1.0
2016-01-01 02:00:00 2.0 7.0 6.0 2.0 3.0 1.0 1.0 1.0 1.0 1.0 ... NaN 4.0 3.0 -1.0 0.0 NaN 6.0 2.0 -3.0 0.0
2016-01-01 03:00:00 0.0 2.0 2.0 4.0 6.0 1.0 1.0 1.0 1.0 1.0 ... NaN -1.0 0.0 2.0 4.0 NaN -1.0 -2.0 3.0 3.0
2016-01-01 04:00:00 1.0 2.0 5.0 8.0 0.0 1.0 1.0 1.0 1.0 1.0 ... NaN -1.0 5.0 7.0 -1.0 NaN -2.0 3.0 5.0 -2.0
... ... ... ... ... ... ... ... ... ... ... .
I think you only need sort your columns:
new_df = df.sort_index(axis=1, level=0)
This question already has answers here:
How are iloc and loc different?
(6 answers)
Closed 2 years ago.
I have a dataframe named df_expanded that looks like this. The key column is 'A' and the important indices for this question are 29-31 (this is a modified version since the actual dataframe is huge):
>>> display(df_expanded)
A C Cl Co D E F Fa G Ga H HW
index
...
2.0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 11.0 2.0 8.0 0.0 0.0
2.0 0.0 0.0 0.0 0.0 0.0 4.2 0.0 11.8 2.4 8.6 0.0 0.0
2.0 0.0 0.0 0.0 0.0 0.0 4.4 0.0 12.6 2.8 9.2 0.0 0.0
2.0 0.0 0.0 0.0 0.0 0.0 4.6 0.0 13.4 3.2 9.8 0.0 0.0
2.0 0.0 0.0 0.0 0.0 0.0 4.8 0.0 14.2 3.6 10.4 0.0 0.0
3.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 15.0 4.0 11.0 0.0 0.0
3.0 0.4 0.0 0.0 0.0 0.0 5.2 0.0 16.0 4.0 11.6 0.0 0.0
3.0 0.8 0.0 0.0 0.0 0.0 5.4 0.0 17.0 4.0 12.2 0.0 0.0
3.0 1.2 0.0 0.0 0.0 0.0 5.6 0.0 18.0 4.0 12.8 0.0 0.0
3.0 1.6 0.0 0.0 0.0 0.0 5.8 0.0 19.0 4.0 13.4 0.0 0.0
4.0 2.0 0.0 0.0 0.0 0.0 6.0 0.0 20.0 4.0 14.0 0.0 0.0
4.0 2.0 0.0 0.0 0.0 0.0 6.0 0.0 21.2 4.0 14.4 0.0 0.0
4.0 2.0 0.0 0.0 0.0 0.0 6.0 0.0 22.4 4.0 14.8 0.0 0.0
4.0 2.0 0.0 0.0 0.0 0.0 6.0 0.0 23.6 4.0 15.2 0.0 0.0
4.0 2.0 0.0 0.0 0.0 0.0 6.0 0.0 24.8 4.0 15.6 0.0 0.0
5.0 2.0 0.0 0.0 0.0 0.0 6.0 0.0 26.0 4.0 16.0 0.0 0.0
5.0 2.0 0.0 0.0 0.0 0.0 6.2 0.0 27.4 4.0 16.6 0.0 0.0
5.0 2.0 0.0 0.0 0.0 0.0 6.4 0.0 28.8 4.0 17.2 0.0 0.0
5.0 2.0 0.0 0.0 0.0 0.0 6.6 0.0 30.2 4.0 17.8 0.0 0.0
5.0 2.0 0.0 0.0 0.0 0.0 6.8 0.0 31.6 4.0 18.4 0.0 0.0
6.0 2.0 0.0 0.0 0.0 0.0 7.0 0.0 33.0 4.0 19.0 0.0 0.0
6.0 2.0 0.0 0.0 0.0 0.0 7.0 1.0 33.4 4.0 19.2 0.0 0.0
6.0 2.0 0.0 0.0 0.0 0.0 7.0 2.0 33.8 4.0 19.4 0.0 0.0
6.0 2.0 0.0 0.0 0.0 0.0 7.0 3.0 34.2 4.0 19.6 0.0 0.0
6.0 2.0 0.0 0.0 0.0 0.0 7.0 4.0 34.6 4.0 19.8 0.0 0.0
7.0 2.0 0.0 0.0 0.0 0.0 7.0 5.0 35.0 4.0 20.0 0.0 0.0
7.0 2.0 0.0 0.0 0.0 0.0 7.0 5.0 36.2 4.0 20.4 0.0 0.0
7.0 2.0 0.0 0.0 0.0 0.0 7.0 5.0 37.4 4.0 20.8 0.0 0.0
7.0 2.0 0.0 0.0 0.0 0.0 7.0 5.0 38.6 4.0 21.2 0.0 0.0
7.0 2.0 0.0 0.0 0.0 0.0 7.0 5.0 39.8 4.0 21.6 0.0 0.0
8.0 2.0 0.0 0.0 0.0 0.0 7.0 5.0 41.0 4.0 22.0 0.0 0.0
8.0 2.0 0.0 0.0 0.0 0.0 7.2 5.0 41.0 4.0 22.2 0.0 0.0
8.0 2.0 0.0 0.0 0.0 0.0 7.4 5.0 41.0 4.0 22.4 0.0 0.0
8.0 2.0 0.0 0.0 0.0 0.0 7.6 5.0 41.0 4.0 22.6 0.0 0.0
8.0 2.0 0.0 0.0 0.0 0.0 7.8 5.0 41.0 4.0 22.8 0.0 0.0
9.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 41.0 4.0 23.0 0.0 0.0
9.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 41.6 4.0 23.4 0.0 0.0
9.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 42.2 4.0 23.8 0.0 0.0
9.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 42.8 4.0 24.2 0.0 0.0
9.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 43.4 4.0 24.6 0.0 0.0
10.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 44.0 4.0 25.0 0.0 0.0
10.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 45.2 4.0 25.6 0.0 0.0
10.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 46.4 4.0 26.2 0.0 0.0
10.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 47.6 4.0 26.8 0.0 0.0
10.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 48.8 4.0 27.4 0.0 0.0
11.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 50.0 4.0 28.0 0.0 0.0
11.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 50.4 4.4 28.4 0.2 0.0
11.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 50.8 4.8 28.8 0.4 0.0
11.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 51.2 5.2 29.2 0.6 0.0
11.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 51.6 5.6 29.6 0.8 0.0
12.0 2.0 0.0 0.0 0.0 0.0 8.0 5.0 52.0 6.0 30.0 1.0 0.0
12.0 2.0 0.0 0.0 0.0 0.0 8.0 5.8 52.6 6.0 30.6 1.0 0.0
12.0 2.0 0.0 0.0 0.0 0.0 8.0 6.6 53.2 6.0 31.2 1.0 0.0
12.0 2.0 0.0 0.0 0.0 0.0 8.0 7.4 53.8 6.0 31.8 1.0 0.0
12.0 2.0 0.0 0.0 0.0 0.0 8.0 8.2 54.4 6.0 32.4 1.0 0.0
13.0 2.0 0.0 0.0 0.0 0.0 8.0 9.0 55.0 6.0 33.0 1.0 0.0
13.0 2.0 0.0 0.0 0.0 0.0 8.0 10.0 55.4 6.0 33.6 1.0 0.0
13.0 2.0 0.0 0.0 0.0 0.0 8.0 11.0 55.8 6.0 34.2 1.0 0.0
13.0 2.0 0.0 0.0 0.0 0.0 8.0 12.0 56.2 6.0 34.8 1.0 0.0
13.0 2.0 0.0 0.0 0.0 0.0 8.0 13.0 56.6 6.0 35.4 1.0 0.0
14.0 2.0 0.0 0.0 0.0 0.0 8.0 14.0 57.0 6.0 36.0 1.0 0.0
14.0 2.0 0.0 0.0 0.0 0.0 8.0 15.6 57.0 6.4 36.2 1.0 0.0
14.0 2.0 0.0 0.0 0.0 0.0 8.0 17.2 57.0 6.8 36.4 1.0 0.0
14.0 2.0 0.0 0.0 0.0 0.0 8.0 18.8 57.0 7.2 36.6 1.0 0.0
14.0 2.0 0.0 0.0 0.0 0.0 8.0 20.4 57.0 7.6 36.8 1.0 0.0
15.0 2.0 0.0 0.0 0.0 0.0 8.0 22.0 57.0 8.0 37.0 1.0 0.0
15.0 2.0 0.0 0.2 0.0 0.0 8.0 22.6 58.0 8.4 37.0 1.0 0.0
15.0 2.0 0.0 0.4 0.0 0.0 8.0 23.2 59.0 8.8 37.0 1.0 0.0
15.0 2.0 0.0 0.6 0.0 0.0 8.0 23.8 60.0 9.2 37.0 1.0 0.0
15.0 2.0 0.0 0.8 0.0 0.0 8.0 24.4 61.0 9.6 37.0 1.0 0.0
16.0 2.0 0.0 1.0 0.0 0.0 8.0 25.0 62.0 10.0 37.0 1.0 0.0
16.0 2.0 0.0 1.0 0.0 0.0 8.2 25.0 63.2 10.0 37.0 1.0 0.0
16.0 2.0 0.0 1.0 0.0 0.0 8.4 25.0 64.4 10.0 37.0 1.0 0.0
16.0 2.0 0.0 1.0 0.0 0.0 8.6 25.0 65.6 10.0 37.0 1.0 0.0
16.0 2.0 0.0 1.0 0.0 0.0 8.8 25.0 66.8 10.0 37.0 1.0 0.0
17.0 2.0 0.0 1.0 0.0 0.0 9.0 25.0 68.0 10.0 37.0 1.0 0.0
17.0 2.0 0.0 1.2 0.0 0.0 9.0 26.2 68.4 10.4 37.2 1.0 0.0
17.0 2.0 0.0 1.4 0.0 0.0 9.0 27.4 68.8 10.8 37.4 1.0 0.0
17.0 2.0 0.0 1.6 0.0 0.0 9.0 28.6 69.2 11.2 37.6 1.0 0.0
17.0 2.0 0.0 1.8 0.0 0.0 9.0 29.8 69.6 11.6 37.8 1.0 0.0
18.0 2.0 0.0 2.0 0.0 0.0 9.0 31.0 70.0 12.0 38.0 1.0 0.0
18.0 2.0 0.0 2.0 0.0 0.0 9.0 31.8 70.0 12.0 38.0 1.0 0.0
18.0 2.0 0.0 2.0 0.0 0.0 9.0 32.6 70.0 12.0 38.0 1.0 0.0
18.0 2.0 0.0 2.0 0.0 0.0 9.0 33.4 70.0 12.0 38.0 1.0 0.0
18.0 2.0 0.0 2.0 0.0 0.0 9.0 34.2 70.0 12.0 38.0 1.0 0.0
19.0 2.0 0.0 2.0 0.0 0.0 9.0 35.0 70.0 12.0 38.0 1.0 0.0
19.0 2.0 0.6 2.0 0.0 0.0 9.4 35.2 70.0 12.4 38.0 1.0 0.0
19.0 2.0 1.2 2.0 0.0 0.0 9.8 35.4 70.0 12.8 38.0 1.0 0.0
19.0 2.0 1.8 2.0 0.0 0.0 10.2 35.6 70.0 13.2 38.0 1.0 0.0
19.0 2.0 2.4 2.0 0.0 0.0 10.6 35.8 70.0 13.6 38.0 1.0 0.0
20.0 2.0 3.0 2.0 0.0 0.0 11.0 36.0 70.0 14.0 38.0 1.0 0.0
20.0 2.0 3.4 2.0 0.0 0.0 11.4 36.0 70.0 14.2 38.4 1.0 0.0
20.0 2.0 3.8 2.0 0.0 0.0 11.8 36.0 70.0 14.4 38.8 1.0 0.0
20.0 2.0 4.2 2.0 0.0 0.0 12.2 36.0 70.0 14.6 39.2 1.0 0.0
20.0 2.0 4.6 2.0 0.0 0.0 12.6 36.0 70.0 14.8 39.6 1.0 0.0
21.0 2.0 5.0 2.0 0.0 0.0 13.0 36.0 70.0 15.0 40.0 1.0 0.0
21.0 2.0 5.6 2.0 0.0 0.0 13.2 36.6 70.0 15.4 40.0 1.0 0.0
21.0 2.0 6.2 2.0 0.0 0.0 13.4 37.2 70.0 15.8 40.0 1.0 0.0
21.0 2.0 6.8 2.0 0.0 0.0 13.6 37.8 70.0 16.2 40.0 1.0 0.0
21.0 2.0 7.4 2.0 0.0 0.0 13.8 38.4 70.0 16.6 40.0 1.0 0.0
22.0 2.0 8.0 2.0 0.0 0.0 14.0 39.0 70.0 17.0 40.0 1.0 0.0
22.0 2.0 8.2 2.2 0.0 0.0 14.0 40.2 70.4 17.0 40.4 1.0 0.0
22.0 2.0 8.4 2.4 0.0 0.0 14.0 41.4 70.8 17.0 40.8 1.0 0.0
22.0 2.0 8.6 2.6 0.0 0.0 14.0 42.6 71.2 17.0 41.2 1.0 0.0
22.0 2.0 8.8 2.8 0.0 0.0 14.0 43.8 71.6 17.0 41.6 1.0 0.0
23.0 2.0 9.0 3.0 0.0 0.0 14.0 45.0 72.0 17.0 42.0 1.0 0.0
23.0 2.0 9.0 3.0 0.0 0.0 14.2 45.6 72.0 17.6 42.4 1.0 0.0
23.0 2.0 9.0 3.0 0.0 0.0 14.4 46.2 72.0 18.2 42.8 1.0 0.0
23.0 2.0 9.0 3.0 0.0 0.0 14.6 46.8 72.0 18.8 43.2 1.0 0.0
23.0 2.0 9.0 3.0 0.0 0.0 14.8 47.4 72.0 19.4 43.6 1.0 0.0
24.0 2.0 9.0 3.0 0.0 0.0 15.0 48.0 72.0 20.0 44.0 1.0 0.0
24.0 2.0 9.0 3.0 0.0 0.0 15.2 48.0 72.0 20.6 44.2 1.0 0.0
24.0 2.0 9.0 3.0 0.0 0.0 15.4 48.0 72.0 21.2 44.4 1.0 0.0
24.0 2.0 9.0 3.0 0.0 0.0 15.6 48.0 72.0 21.8 44.6 1.0 0.0
24.0 2.0 9.0 3.0 0.0 0.0 15.8 48.0 72.0 22.4 44.8 1.0 0.0
25.0 2.0 9.0 3.0 0.0 0.0 16.0 48.0 72.0 23.0 45.0 1.0 0.0
25.0 2.0 9.4 3.0 0.0 0.0 16.4 48.0 72.0 23.2 45.0 1.0 0.4
25.0 2.0 9.8 3.0 0.0 0.0 16.8 48.0 72.0 23.4 45.0 1.0 0.8
25.0 2.0 10.2 3.0 0.0 0.0 17.2 48.0 72.0 23.6 45.0 1.0 1.2
25.0 2.0 10.6 3.0 0.0 0.0 17.6 48.0 72.0 23.8 45.0 1.0 1.6
26.0 2.0 11.0 3.0 0.0 0.0 18.0 48.0 72.0 24.0 45.0 1.0 2.0
26.0 2.0 11.6 3.0 0.2 0.0 18.2 48.0 72.0 24.0 45.0 1.0 2.6
26.0 2.0 12.2 3.0 0.4 0.0 18.4 48.0 72.0 24.0 45.0 1.0 3.2
26.0 2.0 12.8 3.0 0.6 0.0 18.6 48.0 72.0 24.0 45.0 1.0 3.8
26.0 2.0 13.4 3.0 0.8 0.0 18.8 48.0 72.0 24.0 45.0 1.0 4.4
27.0 2.0 14.0 3.0 1.0 0.0 19.0 48.0 72.0 24.0 45.0 1.0 5.0
27.0 2.0 14.4 3.0 1.0 0.0 19.4 48.0 72.4 24.4 45.2 1.0 5.4
27.0 2.0 14.8 3.0 1.0 0.0 19.8 48.0 72.8 24.8 45.4 1.0 5.8
27.0 2.0 15.2 3.0 1.0 0.0 20.2 48.0 73.2 25.2 45.6 1.0 6.2
27.0 2.0 15.6 3.0 1.0 0.0 20.6 48.0 73.6 25.6 45.8 1.0 6.6
28.0 2.0 16.0 3.0 1.0 0.0 21.0 48.0 74.0 26.0 46.0 1.0 7.0
28.0 2.0 16.6 3.0 1.0 0.0 21.2 48.2 74.0 26.6 46.0 1.0 7.2
28.0 2.0 17.2 3.0 1.0 0.0 21.4 48.4 74.0 27.2 46.0 1.0 7.4
28.0 2.0 17.8 3.0 1.0 0.0 21.6 48.6 74.0 27.8 46.0 1.0 7.6
28.0 2.0 18.4 3.0 1.0 0.0 21.8 48.8 74.0 28.4 46.0 1.0 7.8
29.0 2.0 19.0 3.0 1.0 0.0 22.0 49.0 74.0 29.0 46.0 1.0 8.0
29.0 2.0 19.2 3.4 1.0 0.0 22.0 50.0 74.0 29.0 46.4 1.0 8.0
29.0 2.0 19.4 3.8 1.0 0.0 22.0 51.0 74.0 29.0 46.8 1.0 8.0
29.0 2.0 19.6 4.2 1.0 0.0 22.0 52.0 74.0 29.0 47.2 1.0 8.0
29.0 2.0 19.8 4.6 1.0 0.0 22.0 53.0 74.0 29.0 47.6 1.0 8.0
30.0 2.0 20.0 5.0 1.0 0.0 22.0 54.0 74.0 29.0 48.0 1.0 8.0
30.0 1.6 16.0 4.2 0.8 0.0 17.8 44.2 59.6 23.8 38.4 0.8 6.8
30.0 1.2 12.0 3.4 0.6 0.0 13.6 34.4 45.2 18.6 28.8 0.6 5.6
30.0 0.8 8.0 2.6 0.4 0.0 9.4 24.6 30.8 13.4 19.2 0.4 4.4
30.0 0.4 4.0 1.8 0.2 0.0 5.2 14.8 16.4 8.2 9.6 0.2 3.2
31.0 0.0 0.0 1.0 0.0 0.0 1.0 5.0 2.0 3.0 0.0 0.0 2.0
31.0 0.0 0.0 1.0 0.0 0.0 1.2 5.4 2.2 3.6 0.0 0.0 2.8
31.0 0.0 0.0 1.0 0.0 0.0 1.4 5.8 2.4 4.2 0.0 0.0 3.6
31.0 0.0 0.0 1.0 0.0 0.0 1.6 6.2 2.6 4.8 0.0 0.0 4.4
31.0 0.0 0.0 1.0 0.0 0.0 1.8 6.6 2.8 5.4 0.0 0.0 5.2
...
When indexing this dataframe, I feel like df_expanded.iloc[31] should return the following:
A C Cl Co D E F Fa G Ga H HW
index
31.0 0.0 0.0 1.0 0.0 0.0 1.0 5.0 2.0 3.0 0.0 0.0 2.0
31.0 0.0 0.0 1.0 0.0 0.0 1.2 5.4 2.2 3.6 0.0 0.0 2.8
31.0 0.0 0.0 1.0 0.0 0.0 1.4 5.8 2.4 4.2 0.0 0.0 3.6
31.0 0.0 0.0 1.0 0.0 0.0 1.6 6.2 2.6 4.8 0.0 0.0 4.4
31.0 0.0 0.0 1.0 0.0 0.0 1.8 6.6 2.8 5.4 0.0 0.0 5.2
However, the following is returned:
>>> print(df_expanded.iloc[31])
A 2.0
C 0.0
Cl 0.0
Co 0.0
D 0.0
E 7.0
F 1.0
Fa 33.4
G 4.0
Ga 19.2
H 0.0
HW 0.0
Why is it that indexing the 31st index returns 2.0 for A (and the cumulative values for other columns as well) instead of what is shown when df_expanded is displayed? I can't figure out why it's working like this, so any kind of help would be greatly appreciated!
df_expanded.iloc[31]
does not return what you expect because it return the 31st row in your dataset, you can check your 31st row, and that is exactly what you get.
For what you want, instead try
df_expanded.loc[31]
I have a csv file like the below:
A, B
1,2
3,4
5,6
C,D
7,8
9,10
11,12
E,F
13,14
15,16
As you can see, and imagine, when I import this data using pd.read_csv, pandas creates the whole thing making with two columns (A,B) and a bunch of lines. It's correct because of the shape. However, I want do create various columns (A,B,C,D...). Fortunately, there're a blank space at the end of each "column", and I think that this could be used to separete theses lines in some way. However, I don't know how to proced with this.
The data:
https://raw.githubusercontent.com/AlessandroMDO/Dinamica_de_Voo/master/data.csv
It's normal behavior of pandas.read_csv, but usually data is not stored in csv files this way.
You can try to read the csv, strip extra spaces and split it by empty lines to parts first. Then read each part using pandas.read_csv and StringIO and concatenate them together using pandas.concat.
import pandas as pd
from io import StringIO
with open('test.csv', 'r') as f:
parts = f.read().strip().split('\n\n')
df = pd.concat([pd.read_csv(StringIO(part)) for part in parts], axis=1)
I have tried this with your csv:
Alpha Cd Alpha CL Alpha ... Cnp Alpha Cnr Alpha Clr
0 -14.0 0.08941 -14.0 -0.19430 -14.0 ... 0.0 -14.0 0.0 -14.0 0.0
1 -12.0 0.07646 -12.0 -0.17150 -12.0 ... 0.0 -12.0 0.0 -12.0 0.0
2 -10.0 0.06509 -10.0 -0.14710 -10.0 ... 0.0 -10.0 0.0 -10.0 0.0
3 -8.0 0.05545 -8.0 -0.12150 -8.0 ... 0.0 -8.0 0.0 -8.0 0.0
4 -6.0 0.04766 -6.0 -0.09479 -6.0 ... 0.0 -6.0 0.0 -6.0 0.0
5 -4.0 0.04181 -4.0 -0.06722 -4.0 ... 0.0 -4.0 0.0 -4.0 0.0
6 -2.0 0.03797 -2.0 -0.03905 -2.0 ... 0.0 -2.0 0.0 -2.0 0.0
7 0.0 0.03620 0.0 -0.01054 0.0 ... 0.0 0.0 0.0 0.0 0.0
8 2.0 0.03651 2.0 0.01806 2.0 ... 0.0 2.0 0.0 2.0 0.0
9 4.0 0.03960 4.0 0.05879 4.0 ... 0.0 4.0 0.0 4.0 0.0
10 6.0 0.04814 6.0 0.12650 6.0 ... 0.0 6.0 0.0 6.0 0.0
11 8.0 0.06494 8.0 0.22050 8.0 ... 0.0 8.0 0.0 8.0 0.0
12 10.0 0.09268 10.0 0.33960 10.0 ... 0.0 10.0 0.0 10.0 0.0
13 12.0 0.13390 12.0 0.48240 12.0 ... 0.0 12.0 0.0 12.0 0.0
14 14.0 0.19110 14.0 0.64710 14.0 ... 0.0 14.0 0.0 14.0 0.0
[15 rows x 36 columns]
I read in a .csv file. I have the following data frame that counts vowels and consonants in a string in the column Description. This works great, but my problem is I want to split Description into 8 columns and count the consonants and vowels for each column. The second part of my code allows for me to split Description into 8 columns. How can I count the vowels and consonants on all 8 columns the Description is split into?
import pandas as pd
import re
def anti_vowel(s):
result = re.sub(r'[AEIOU]', '', s, flags=re.IGNORECASE)
return result
data = pd.read_csv('http://core.secure.ehc.com/src/util/detail-price-list/TristarDivision_SummitMedicalCenter_CM.csv')
data.dropna(inplace = True)
data['Vowels'] = data['Description'].str.count(r'[aeiou]', flags=re.I)
data['Consonant'] = data['Description'].str.count(r'[bcdfghjklmnpqrstvwxzy]', flags=re.I)
print (data)
This is the code I'm using to split the column Description into 8 columns.
import pandas as pd
data = data["Description"].str.split(" ", n = 8, expand = True)
data = pd.read_csv('http://core.secure.ehc.com/src/util/detail-price-list/TristarDivision_SummitMedicalCenter_CM.csv')
data.dropna(inplace = True)
data = data["Description"].str.split(" ", n = 8, expand = True)
print (data)
Now how can I put it all together?
In order to read each column of the 8 and count consonants I know i can use the following replacing the 0 with 0-7:
testconsonant = data[0].str.count(r'[bcdfghjklmnpqrstvwxzy]', flags=re.I)
testvowel = data[0].str.count(r'[aeiou]', flags=re.I)
Desired output would be:
Description [0] vowel count consonant count Description [1] vowel count consonant count Description [2] vowel count consonant count Description [3] vowel count consonant count Description [4] vowel count consonant count all the way to description [7]
stack then unstack
stacked = data.stack()
pd.concat({
'Vowels': stacked.str.count('[aeiou]', flags=re.I),
'Consonant': stacked.str.count('[bcdfghjklmnpqrstvwxzy]', flags=re.I)
}, axis=1).unstack()
Consonant Vowels
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
0 3.0 5.0 5.0 1.0 2.0 NaN NaN NaN NaN 1.0 0.0 0.0 0.0 0.0 NaN NaN NaN NaN
1 8.0 5.0 1.0 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN
2 8.0 5.0 1.0 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN
3 8.0 5.0 1.0 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN
4 3.0 5.0 3.0 1.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 NaN
5 3.0 5.0 3.0 1.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 NaN
6 3.0 4.0 0.0 1.0 0.0 0.0 0.0 NaN NaN 3.0 1.0 0.0 0.0 0.0 0.0 0.0 NaN NaN
7 3.0 3.0 0.0 1.0 0.0 0.0 0.0 NaN NaN 3.0 1.0 0.0 1.0 0.0 0.0 0.0 NaN NaN
8 3.0 3.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 3.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
9 3.0 3.0 0.0 1.0 0.0 0.0 0.0 NaN NaN 3.0 1.0 0.0 1.0 0.0 0.0 0.0 NaN NaN
10 3.0 3.0 0.0 1.0 0.0 0.0 0.0 0.0 NaN 3.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN
11 3.0 3.0 0.0 2.0 2.0 NaN NaN NaN NaN 3.0 0.0 0.0 0.0 0.0 NaN NaN NaN NaN
12 3.0 3.0 0.0 1.0 0.0 0.0 0.0 0.0 NaN 3.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN
13 3.0 3.0 0.0 2.0 2.0 NaN NaN NaN NaN 3.0 1.0 0.0 0.0 0.0 NaN NaN NaN NaN
14 3.0 5.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15 3.0 3.0 0.0 3.0 1.0 NaN NaN NaN NaN 3.0 0.0 0.0 0.0 1.0 NaN NaN NaN NaN
If you want to combine this with the data dataframe, you can do:
stacked = data.stack()
pd.concat({
'Data': data,
'Vowels': stacked.str.count('[aeiou]', flags=re.I),
'Consonant': stacked.str.count('[bcdfghjklmnpqrstvwxzy]', flags=re.I)
}, axis=1).unstack()
I have dataframe not sequences. if I use len(df.columns), my data has 3586 columns. How to re-order the data sequences?
ID V1 V10 V100 V1000 V1001 V1002 ... V990 V991 V992 V993 V994
A 1 9.0 2.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
B 1 1.2 0.1 3.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
C 2 8.6 8.0 2.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
D 3 0.0 2.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0
E 4 7.8 6.6 3.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0
I used this df = df.reindex(sorted(df.columns), axis=1) (based on this question Re-ordering columns in pandas dataframe based on column name) but still not working.
thank you
First get all columns without pattern V + number by filtering with str.contains, then sorting all another values by Index.difference, add together and pass to DataFrame.reindex - get first all non numeric non matched columns in first positions and then sorted V + number columns:
L1 = df.columns[~df.columns.str.contains('^V\d+$')].tolist()
L2 = sorted(df.columns.difference(L1), key=lambda x: float(x[1:]))
df = df.reindex(L1 + L2, axis=1)
print (df)
ID V1 V10 V100 V990 V991 V992 V993 V994 V1000 V1001 V1002
A 1 9.0 2.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
B 1 1.2 0.1 3.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
C 2 8.6 8.0 2.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
D 3 0.0 2.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
E 4 7.8 6.6 3.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0