How to split a DataFrame on each different value in a column?

How to split a DataFrame on each different value in a column? - python

Below is an example DataFrame.
0 1 2 3 4
0 0.0 13.00 4.50 30.0 0.0,13.0
1 0.0 13.00 4.75 30.0 0.0,13.0
2 0.0 13.00 5.00 30.0 0.0,13.0
3 0.0 13.00 5.25 30.0 0.0,13.0
4 0.0 13.00 5.50 30.0 0.0,13.0
5 0.0 13.00 5.75 0.0 0.0,13.0
6 0.0 13.00 6.00 30.0 0.0,13.0
7 1.0 13.25 0.00 30.0 0.0,13.25
8 1.0 13.25 0.25 0.0 0.0,13.25
9 1.0 13.25 0.50 30.0 0.0,13.25
10 1.0 13.25 0.75 30.0 0.0,13.25
11 2.0 13.25 1.00 30.0 0.0,13.25
12 2.0 13.25 1.25 30.0 0.0,13.25
13 2.0 13.25 1.50 30.0 0.0,13.25
14 2.0 13.25 1.75 30.0 0.0,13.25
15 2.0 13.25 2.00 30.0 0.0,13.25
16 2.0 13.25 2.25 30.0 0.0,13.25
I want to split this into new dataframes when the row in column 0 changes.
0 1 2 3 4
0 0.0 13.00 4.50 30.0 0.0,13.0
1 0.0 13.00 4.75 30.0 0.0,13.0
2 0.0 13.00 5.00 30.0 0.0,13.0
3 0.0 13.00 5.25 30.0 0.0,13.0
4 0.0 13.00 5.50 30.0 0.0,13.0
5 0.0 13.00 5.75 0.0 0.0,13.0
6 0.0 13.00 6.00 30.0 0.0,13.0
7 1.0 13.25 0.00 30.0 0.0,13.25
8 1.0 13.25 0.25 0.0 0.0,13.25
9 1.0 13.25 0.50 30.0 0.0,13.25
10 1.0 13.25 0.75 30.0 0.0,13.25
11 2.0 13.25 1.00 30.0 0.0,13.25
12 2.0 13.25 1.25 30.0 0.0,13.25
13 2.0 13.25 1.50 30.0 0.0,13.25
14 2.0 13.25 1.75 30.0 0.0,13.25
15 2.0 13.25 2.00 30.0 0.0,13.25
16 2.0 13.25 2.25 30.0 0.0,13.25
I've tried adapting the following solutions without any luck so far. Split array at value in numpy
Split a large pandas dataframe

Looks like you want to groupby the first colum. You could create a dictionary from the groupby object, and have the groupby keys be the dictionary keys:
out = dict(tuple(df.groupby(0)))
Or we could also build a list from the groupby object. This becomes more useful when we only want positional indexing rather than based on the grouping key:
out = [sub_df for _, sub_df in df.groupby(0)]
We could then index the dict based on the grouping key, or the list based on the group's position:
print(out[0])
0 1 2 3 4
0 0.0 13.0 4.50 30.0 0.0,13.0
1 0.0 13.0 4.75 30.0 0.0,13.0
2 0.0 13.0 5.00 30.0 0.0,13.0
3 0.0 13.0 5.25 30.0 0.0,13.0
4 0.0 13.0 5.50 30.0 0.0,13.0
5 0.0 13.0 5.75 0.0 0.0,13.0
6 0.0 13.0 6.00 30.0 0.0,13.0

Based on
I want to split this into new dataframes when the row in column 0 changes.
If you only want to group when value in column 0 changes , You can try:
d=dict([*df.groupby(df['0'].ne(df['0'].shift()).cumsum())])
print(d[1])
print(d[2])
0 1 2 3 4
0 0.0 13.0 4.50 30.0 0.0,13.0
1 0.0 13.0 4.75 30.0 0.0,13.0
2 0.0 13.0 5.00 30.0 0.0,13.0
3 0.0 13.0 5.25 30.0 0.0,13.0
4 0.0 13.0 5.50 30.0 0.0,13.0
5 0.0 13.0 5.75 0.0 0.0,13.0
6 0.0 13.0 6.00 30.0 0.0,13.0
0 1 2 3 4
7 1.0 13.25 0.00 30.0 0.0,13.25
8 1.0 13.25 0.25 0.0 0.0,13.25
9 1.0 13.25 0.50 30.0 0.0,13.25
10 1.0 13.25 0.75 30.0 0.0,13.25

I will use GroupBy.__iter__:
d = dict(df.groupby(df['0'].diff().ne(0).cumsum()).__iter__())
#d = dict(df.groupby(df[0].diff().ne(0).cumsum()).__iter__())
Note that if there are repeated non-consecutive values different groups will be created, if you only use groupby(0) they will be grouped in the same group

Related

groupby two columns, then reset_index fails due to having the same name

I made the following groupby with my pandas dataframe:
df.groupby([df.Hora.dt.hour, df.Hora.dt.minute]).describe()['Qtd']
after groupby the data is as follows:
count mean std min 25% 50% 75% max
Hora Hora
9 0 11.0 5.909091 2.022600 5.000 5.0 5.0 5.00 10.0
1 197.0 6.421320 4.010210 5.000 5.0 5.0 5.00 30.0
2 125.0 6.040000 4.679054 5.000 5.0 5.0 5.00 50.0
3 131.0 6.450382 5.700491 5.000 5.0 5.0 5.00 60.0
4 182.0 6.401099 5.212458 5.000 5.0 5.0 5.00 50.0
5 147.0 6.054422 5.402666 5.000 5.0 5.0 5.00 60.0
6 59.0 6.779661 6.416756 5.000 5.0 5.0 5.00 45.0
7 16.0 6.875000 5.123475 5.000 5.0 5.0 5.00 25.0
when trying to use reset_index() I get an error, because the index names are the same:
ValueError: cannot insert Hora, already exists
How do I reset_index and get the data as follows:
Hora Minute count
9 0 11.0
9 1 197.0
9 2 125.0
9 3 131.0
9 4 182.0
9 5 147.0
9 6 59.0
9 7 16.0

You can first rename and then reset_index:
(
df.rename_axis(index=['Hora', 'Minute'])
.reset_index()
['count']
)

How to reindex data frame in Pandas?

I'm using pandas in Python, and I have performed some crosstab calculations and concatenations, and at the end up with a data frame that looks like this:
ID 5 6 7 8 9 10 11 12 13
Total 87.0 3.0 9.0 6.0 92.0 7.0 3.0 3.0 20.0
Regular 72.0 2.0 8.0 5.0 81.0 7.0 3.0 3.0 18.0
CR 22.0 0.0 0.0 0.0 17.0 0.0 0.0 0.0 3.0
HDG 20.0 0.0 0.0 0.0 24.0 4.0 0.0 0.0 1.0
PPG 30.0 2.0 8.0 5.0 40.0 3.0 3.0 3.0 14.0
Superior 15.0 1.0 1.0 1.0 11.0 0.0 0.0 0.0 2.0
CR 3.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
HDG 5.0 1.0 1.0 1.0 4.0 0.0 0.0 0.0 0.0
PPG 7.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 2.0
The problem is that I want the last 4 rows, that start with Superior to be places before Total row. So, simply I want to switch the positions of last 4 rows with the 4 rows that start with Regular. How can I achieve this in pandas? So that I get this:
ID 5 6 7 8 9 10 11 12 13
Total 87.0 3.0 9.0 6.0 92.0 7.0 3.0 3.0 20.0
Superior 15.0 1.0 1.0 1.0 11.0 0.0 0.0 0.0 2.0
CR 3.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
HDG 5.0 1.0 1.0 1.0 4.0 0.0 0.0 0.0 0.0
PPG 7.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 2.0
Regular 72.0 2.0 8.0 5.0 81.0 7.0 3.0 3.0 18.0
CR 22.0 0.0 0.0 0.0 17.0 0.0 0.0 0.0 3.0
HDG 20.0 0.0 0.0 0.0 24.0 4.0 0.0 0.0 1.0
PPG 30.0 2.0 8.0 5.0 40.0 3.0 3.0 3.0 14.0

More generalized solution Categorical and argsort, I know this df was ordered , so ffill is safe here
s=df.ID
s=s.where(s.isin(['Total','Regular','Superior'])).ffill()
s=pd.Categorical(s,['Total','Superior','Regular'],ordered=True)
df=df.iloc[np.argsort(s)]
df
Out[188]:
ID 5 6 7 8 9 10 11 12 13
0 Total 87.0 3.0 9.0 6.0 92.0 7.0 3.0 3.0 20.0
5 Superior 15.0 1.0 1.0 1.0 11.0 0.0 0.0 0.0 2.0
6 CR 3.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
7 HDG 5.0 1.0 1.0 1.0 4.0 0.0 0.0 0.0 0.0
8 PPG 7.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 2.0
1 Regular 72.0 2.0 8.0 5.0 81.0 7.0 3.0 3.0 18.0
2 CR 22.0 0.0 0.0 0.0 17.0 0.0 0.0 0.0 3.0
3 HDG 20.0 0.0 0.0 0.0 24.0 4.0 0.0 0.0 1.0
4 PPG 30.0 2.0 8.0 5.0 40.0 3.0 3.0 3.0 14.0

Here's one way:
import numpy as np
df.iloc[1:,:] = np.roll(df.iloc[1:,:].values, 4, axis=0)
ID 5 6 7 8 9 10 11 12 13
0 Total 87.0 3.0 9.0 6.0 92.0 7.0 3.0 3.0 20.0
1 Superior 15.0 1.0 1.0 1.0 11.0 0.0 0.0 0.0 2.0
2 CR 3.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
3 HDG 5.0 1.0 1.0 1.0 4.0 0.0 0.0 0.0 0.0
4 PPG 7.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 2.0
5 Regular 72.0 2.0 8.0 5.0 81.0 7.0 3.0 3.0 18.0
6 CR 22.0 0.0 0.0 0.0 17.0 0.0 0.0 0.0 3.0
7 HDG 20.0 0.0 0.0 0.0 24.0 4.0 0.0 0.0 1.0
8 PPG 30.0 2.0 8.0 5.0 40.0 3.0 3.0 3.0 14.0

For a specific answer to this question, just use iloc
df.iloc[[0,5,6,7,8,1,2,3,4],:]
For a more generalized solution,
m = (df.ID.eq('Superior') | df.ID.eq('Regular')).cumsum()
pd.concat([df[m==0], df[m==2], df[m==1]])
or
order = (2,1)
pd.concat([df[m==0], *[df[m==c] for c in order]])
where order defines the mapping from previous ordering to new ordering.

How do you get a dataframe to perform a SQL groupby operation on multiple columns?

I have a dataframe that looks like the following:
index Player Team Matchup Game_Date WL Min PTS FGM FGA FG% 3PM 3PA 3P% FTM FTA FT% OREB DREB REB AST STL BLK TOV PF Plus_Minus Triple_Double Double_Double FPT 2PA 2PM 2P%
1 John Long TOR TOR # BOS 04/20/1997 W 6 0 0 3.0 0.0 0 1.0 0.0 0 0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 2.0 0.0 1.0 3.00 2.0 0 0.000000
2 Walt Williams TOR TOR # BOS 04/20/1997 W 29 7 3 9.0 33.3 1 2.0 50.0 0 0 0.0 0.0 3.0 3.0 2.0 2.0 1.0 1.0 5.0 20.0 0.0 1.0 21.25 7.0 2 28.571429
3 Todd Day BOS BOS vs. TOR 04/20/1997 L 36 22 8 17.0 47.1 4 8.0 50.0 2 2 100.0 0.0 6.0 6.0 4.0 0.0 0.0 3.0 2.0 -21.0 0.0 1.0 37.50 9.0 4 44.444444
4 Doug Christie TOR TOR # BOS 04/20/1997 W 39 27 8 19.0 42.1 3 9.0 33.3 8 8 100.0 0.0 1.0 1.0 5.0 3.0 1.0 0.0 2.0 30.0 0.0 1.0 46.75 10.0 5 50.000000
5 Brett Szabo BOS BOS vs. TOR 04/20/1997 L 25 5 1 4.0 25.0 0 0.0 0 3 4 75.0 1.0 2.0 3.0 1.0 0.0 0.0 0.0 5.0 -11.0 0.0 1.0 11.75 4.0 1 25.000000
I would like to create a dataframe that groups on team and game_date. I am trying the following code:
df2 = df.groupby(['Game_Date', 'Team', 'Matchup', 'WL'], as_index=False)['Min', 'PTS', 'FGM', 'FGA', '3PM', '3PA', 'FTM', 'FTA',
'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TOV', '2PM', '2PA',
'PF'].sum()
However, when I run it, I get the following dataframe:
Game_Date Team Matchup WL Min PTS FGM FGA 3PM 3PA OREB DREB REB AST STL BLK TOV 2PM 2PA PF
04/11/2018 DEN DEN # MIN L 21966 12552 4707 11506.0 2615 5230.0 1046.0 3138.0 4184.0 2615.0 523.0 523.0 1046.0 2092 6276.0 2092.0
04/11/2018 MEM MEM # OKC L 125520 64329 23012 47593.0 6799 16213.0 4707.0 16736.0 21443.0 11506.0 4707.0 1046.0 6799.0 16213 31380.0 11506.0
04/11/2018 MIN MIN vs. DEN W 40271 20397 7322 15167.0 523 2092.0 1046.0 4707.0 5753.0 2615.0 1569.0 1046.0 2092.0 6799 13075.0 4184.0
04/11/2018 NOP NOP vs. SAS W 124997 63806 27196 46024.0 2615 9937.0 5753.0 20920.0 26673.0 15690.0 5753.0 2615.0 9937.0 24581 36087.0 10460.0
04/11/2018 OKC OKC vs. MEM W 126043 71651 24581 44455.0 10460 22489.0 4184.0 19351.0 23535.0 16736.0 4184.0 1569.0 7322.0 14121 21966.0 12029.0
Why is it grouping incorrectly?

Separating strings from numerical data in a .txt file by python [duplicate]

This question already has answers here:
How read Common Data Format (CDF) in Python
(4 answers)
Closed 4 years ago.
I have a .txt file that looks like this:
08/19/93 UW ARCHIVE 100.0 1962 W IEEE 14 Bus Test Case
BUS DATA FOLLOWS 14 ITEMS
1 Bus 1 HV 1 1 3 1.060 0.0 0.0 0.0 232.4 -16.9 0.0 1.060 0.0 0.0 0.0 0.0 0
2 Bus 2 HV 1 1 2 1.045 -4.98 21.7 12.7 40.0 42.4 0.0 1.045 50.0 -40.0 0.0 0.0 0
3 Bus 3 HV 1 1 2 1.010 -12.72 94.2 19.0 0.0 23.4 0.0 1.010 40.0 0.0 0.0 0.0 0
4 Bus 4 HV 1 1 0 1.019 -10.33 47.8 -3.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
5 Bus 5 HV 1 1 0 1.020 -8.78 7.6 1.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
6 Bus 6 LV 1 1 2 1.070 -14.22 11.2 7.5 0.0 12.2 0.0 1.070 24.0 -6.0 0.0 0.0 0
7 Bus 7 ZV 1 1 0 1.062 -13.37 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
8 Bus 8 TV 1 1 2 1.090 -13.36 0.0 0.0 0.0 17.4 0.0 1.090 24.0 -6.0 0.0 0.0 0
9 Bus 9 LV 1 1 0 1.056 -14.94 29.5 16.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.19 0
10 Bus 10 LV 1 1 0 1.051 -15.10 9.0 5.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
11 Bus 11 LV 1 1 0 1.057 -14.79 3.5 1.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
12 Bus 12 LV 1 1 0 1.055 -15.07 6.1 1.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
13 Bus 13 LV 1 1 0 1.050 -15.16 13.5 5.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
14 Bus 14 LV 1 1 0 1.036 -16.04 14.9 5.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
I need to remove the characters from this file and need only the numerical data in a matrix form. I am relatively new to python, so any kind of help will be really appreciated. Thank you.

I would suggest reading the Data in a Pandas Dataframe and than deleting the column with text or create a second Frame without the text column.
Try:
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]

As it is simple to do this in pandas if data is correct, here is my take:
import pandas as pd
data = '''\
08/19/93 UW ARCHIVE 100.0 1962 W IEEE 14 Bus Test Case
BUS DATA FOLLOWS 14 ITEMS
1 Bus 1 HV 1 1 3 1.060 0.0 0.0 0.0 232.4 -16.9 0.0 1.060 0.0 0.0 0.0 0.0 0
2 Bus 2 HV 1 1 2 1.045 -4.98 21.7 12.7 40.0 42.4 0.0 1.045 50.0 -40.0 0.0 0.0 0
3 Bus 3 HV 1 1 2 1.010 -12.72 94.2 19.0 0.0 23.4 0.0 1.010 40.0 0.0 0.0 0.0 0
4 Bus 4 HV 1 1 0 1.019 -10.33 47.8 -3.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
5 Bus 5 HV 1 1 0 1.020 -8.78 7.6 1.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
6 Bus 6 LV 1 1 2 1.070 -14.22 11.2 7.5 0.0 12.2 0.0 1.070 24.0 -6.0 0.0 0.0 0
7 Bus 7 ZV 1 1 0 1.062 -13.37 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
8 Bus 8 TV 1 1 2 1.090 -13.36 0.0 0.0 0.0 17.4 0.0 1.090 24.0 -6.0 0.0 0.0 0
9 Bus 9 LV 1 1 0 1.056 -14.94 29.5 16.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.19 0
10 Bus 10 LV 1 1 0 1.051 -15.10 9.0 5.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
11 Bus 11 LV 1 1 0 1.057 -14.79 3.5 1.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
12 Bus 12 LV 1 1 0 1.055 -15.07 6.1 1.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
13 Bus 13 LV 1 1 0 1.050 -15.16 13.5 5.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
'''
fileobj = pd.compat.StringIO(data)
# change fileobj to filepath and sep to `\t`
df = pd.read_csv(fileobj, sep='\s+', header=None, skiprows=2)
df = df.loc[:,df.dtypes != 'object']
print(df)
Returns:
0 2 4 5 6 7 8 9 10 11 12 13 14 \
0 1 1 1 1 3 1.060 0.00 0.0 0.0 232.4 -16.9 0.0 1.060
1 2 2 1 1 2 1.045 -4.98 21.7 12.7 40.0 42.4 0.0 1.045
2 3 3 1 1 2 1.010 -12.72 94.2 19.0 0.0 23.4 0.0 1.010
3 4 4 1 1 0 1.019 -10.33 47.8 -3.9 0.0 0.0 0.0 0.000
4 5 5 1 1 0 1.020 -8.78 7.6 1.6 0.0 0.0 0.0 0.000
5 6 6 1 1 2 1.070 -14.22 11.2 7.5 0.0 12.2 0.0 1.070
6 7 7 1 1 0 1.062 -13.37 0.0 0.0 0.0 0.0 0.0 0.000
7 8 8 1 1 2 1.090 -13.36 0.0 0.0 0.0 17.4 0.0 1.090
8 9 9 1 1 0 1.056 -14.94 29.5 16.6 0.0 0.0 0.0 0.000
9 10 10 1 1 0 1.051 -15.10 9.0 5.8 0.0 0.0 0.0 0.000
10 11 11 1 1 0 1.057 -14.79 3.5 1.8 0.0 0.0 0.0 0.000
11 12 12 1 1 0 1.055 -15.07 6.1 1.6 0.0 0.0 0.0 0.000
12 13 13 1 1 0 1.050 -15.16 13.5 5.8 0.0 0.0 0.0 0.000
15 16 17 18 19
0 0.0 0.0 0.0 0.00 0
1 50.0 -40.0 0.0 0.00 0
2 40.0 0.0 0.0 0.00 0
3 0.0 0.0 0.0 0.00 0
4 0.0 0.0 0.0 0.00 0
5 24.0 -6.0 0.0 0.00 0
6 0.0 0.0 0.0 0.00 0
7 24.0 -6.0 0.0 0.00 0
8 0.0 0.0 0.0 0.19 0
9 0.0 0.0 0.0 0.00 0
10 0.0 0.0 0.0 0.00 0
11 0.0 0.0 0.0 0.00 0
12 0.0 0.0 0.0 0.00 0

python script breaks output lines

If I run
import numpy as np
import pandas as pd
import sys
df = pd.read_csv(sys.argv[1]) # note to self: argv[0] is script file content
description = df.groupby(['option','subcase']).describe()
totals = df.groupby('option').describe().set_index(np.array(['total'] * df['option'].nunique()), append=True)
description = description.append(totals).sort_index()
print(description)
on .csv
option,subcase,cost,time
A,sub1,4,3
A,sub1,2,0
A,sub2,3,8
A,sub2,1,2
B,sub1,13,0
B,sub1,11,0
B,sub2,5,2
B,sub2,3,4
, I get an output like this:
cost time \
count mean std min 25% 50% 75% max count
option subcase
A sub1 2.0 3.0 1.414214 2.0 2.50 3.0 3.50 4.0 2.0
sub2 2.0 2.0 1.414214 1.0 1.50 2.0 2.50 3.0 2.0
total 4.0 2.5 1.290994 1.0 1.75 2.5 3.25 4.0 4.0
B sub1 2.0 12.0 1.414214 11.0 11.50 12.0 12.50 13.0 2.0
sub2 2.0 4.0 1.414214 3.0 3.50 4.0 4.50 5.0 2.0
total 4.0 8.0 4.760952 3.0 4.50 8.0 11.50 13.0 4.0
mean std min 25% 50% 75% max
option subcase
A sub1 1.50 2.121320 0.0 0.75 1.5 2.25 3.0
sub2 5.00 4.242641 2.0 3.50 5.0 6.50 8.0
total 3.25 3.403430 0.0 1.50 2.5 4.25 8.0
B sub1 0.00 0.000000 0.0 0.00 0.0 0.00 0.0
sub2 3.00 1.414214 2.0 2.50 3.0 3.50 4.0
total 1.50 1.914854 0.0 0.00 1.0 2.50 4.0
This is annoying, especially if you want to save it as a .csv instead of displaying it in a console.
(e.g. python myscript.py my.csv > my.summary)
How do I stop this linebreak from happening?

Adding : pd.set_option
pd.set_option('expand_frame_repr', False)
print(description)
cost time
count mean std min 25% 50% 75% max count mean std min 25% 50% 75% max
option subcase
A sub1 2.0 3.0 1.414214 2.0 2.50 3.0 3.50 4.0 2.0 1.50 2.121320 0.0 0.75 1.5 2.25 3.0
sub2 2.0 2.0 1.414214 1.0 1.50 2.0 2.50 3.0 2.0 5.00 4.242641 2.0 3.50 5.0 6.50 8.0
total 4.0 2.5 1.290994 1.0 1.75 2.5 3.25 4.0 4.0 3.25 3.403430 0.0 1.50 2.5 4.25 8.0
B sub1 2.0 12.0 1.414214 11.0 11.50 12.0 12.50 13.0 2.0 0.00 0.000000 0.0 0.00 0.0 0.00 0.0
sub2 2.0 4.0 1.414214 3.0 3.50 4.0 4.50 5.0 2.0 3.00 1.414214 2.0 2.50 3.0 3.50 4.0
total 4.0 8.0 4.760952 3.0 4.50 8.0 11.50 13.0 4.0 1.50 1.914854 0.0 0.00 1.0 2.50 4.0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split a DataFrame on each different value in a column? - python

I will use GroupBy.iter: d = dict(df.groupby(df['0'].diff().ne(0).cumsum()).iter()) #d = dict(df.groupby(df[0].diff().ne(0).cumsum()).iter()) Note that if there are repeated non-consecutive values different groups will be created, if you only use groupby(0) they will be grouped in the same group

Related

groupby two columns, then reset_index fails due to having the same name

How to reindex data frame in Pandas?

How do you get a dataframe to perform a SQL groupby operation on multiple columns?

Separating strings from numerical data in a .txt file by python [duplicate]

python script breaks output lines

Categories

Resources