Get the subarray with same numbers and consecutive index - python

I have a text file like this
0, 23.00, 78.00, 75.00, 105.00, 2,0.97
1, 371.00, 305.00, 38.00, 48.00, 0,0.85
1, 24.00, 78.00, 75.00, 116.00, 2,0.98
1, 372.00, 306.00, 37.00, 48.00, 0,0.84
2, 28.00, 87.00, 74.00, 101.00, 2,0.97
2, 372.00, 307.00, 35.00, 47.00, 0,0.80
3, 32.00, 86.00, 73.00, 98.00, 2,0.98
3, 363.00, 310.00, 34.00, 46.00, 0,0.83
4, 40.00, 77.00, 71.00, 98.00, 2,0.94
4, 370.00, 307.00, 38.00, 47.00, 0,0.84
4, 46.00, 78.00, 74.00, 116.00, 2,0.97
5, 372.00, 308.00, 34.00, 46.00, 0,0.57
5, 43.00, 66.00, 67.00, 110.00, 2,0.96
Code I tried
frames = []
x = []
y = []
labels = []
with open(file, 'r') as lb:
for line in lb:
line = line.replace(',', ' ')
arr = line.split()
frames.append(arr[0])
x.append(arr[1])
y.append(arr[2])
labels.append(arr[5])
print(np.shape(frames))
for d, a in enumerate(frames):
compare = []
if a == frames[d+2]:
compare.append(x[d])
compare.append(x[d+1])
compare.append(x[d+2])
xm = np.argmin(compare)
label = {0: int(labels[d]), 1: int(labels[d+1]), 2: int(labels[d+2])}.get(xm)
elif a == frames[d+1]:
compare.append(x[d])
compare.append(x[d+1])
xm = np.argmin(compare)
label = {0: int(labels[d]), 1: int(labels[d+1])}.get(xm)
In the first line, because the first number (0) is unique so I extract the sixth number (2) easily.
But after that, I got many lines with the same first number, so I want somehow to store all the lines with the same first number to compare the second number, then extract the sixth number of the line which has the lowest second number.
Can someone provide python solutions for me? I tried readline() and next() but don't know how to solve it.

you can read the file with pandas.read_csv instead, and things will come much more easily
import pandas as pd
df = pd.read_csv(file_path, header = None)
You'll read the file as a table
0 1 2 3 4 5 6
0 0 23.0 78.0 75.0 105.0 2 0.97
1 1 371.0 305.0 38.0 48.0 0 0.85
2 1 24.0 78.0 75.0 116.0 2 0.98
3 1 372.0 306.0 37.0 48.0 0 0.84
4 2 28.0 87.0 74.0 101.0 2 0.97
5 2 372.0 307.0 35.0 47.0 0 0.80
6 3 32.0 86.0 73.0 98.0 2 0.98
7 3 363.0 310.0 34.0 46.0 0 0.83
8 4 40.0 77.0 71.0 98.0 2 0.94
9 4 370.0 307.0 38.0 47.0 0 0.84
10 4 46.0 78.0 74.0 116.0 2 0.97
11 5 372.0 308.0 34.0 46.0 0 0.57
12 5 43.0 66.0 67.0 110.0 2 0.96
then you can group in subtables based on one of the columns (in your case column 0)
for group, sub_df in d.groupby(0):
row = sub_df[1].idxmin() # returns the index of the minimum value for column 1
df.loc[row, 5] # this is the number you are looking for

I think this is what you need using pandas:
import pandas as pd
df = pd.read_table('./test.txt', sep=',', names = ('1','2','3','4','5','6','7'))
print(df)
# 1 2 3 4 5 6 7
# 0 0 23.0 78.0 75.0 105.0 2 0.97
# 1 1 371.0 305.0 38.0 48.0 0 0.85
# 2 1 24.0 78.0 75.0 116.0 2 0.98
# 3 1 372.0 306.0 37.0 48.0 0 0.84
# 4 2 28.0 87.0 74.0 101.0 2 0.97
# 5 2 372.0 307.0 35.0 47.0 0 0.80
# 6 3 32.0 86.0 73.0 98.0 2 0.98
# 7 3 363.0 310.0 34.0 46.0 0 0.83
# 8 4 40.0 77.0 71.0 98.0 2 0.94
# 9 4 370.0 307.0 38.0 47.0 0 0.84
# 10 4 46.0 78.0 74.0 116.0 2 0.97
# 11 5 372.0 308.0 34.0 46.0 0 0.57
# 12 5 43.0 66.0 67.0 110.0 2 0.96
df_new = df.loc[df.groupby("1")["6"].idxmin()]
print(df_new)
# 1 2 3 4 5 6 7
# 0 0 23.0 78.0 75.0 105.0 2 0.97
# 1 1 371.0 305.0 38.0 48.0 0 0.85
# 5 2 372.0 307.0 35.0 47.0 0 0.80
# 7 3 363.0 310.0 34.0 46.0 0 0.83
# 9 4 370.0 307.0 38.0 47.0 0 0.84
# 11 5 372.0 308.0 34.0 46.0 0 0.57

Related

How do I manipulate columns and their values using yfinance's yf.download() function? [duplicate]

I have a Pandas data frame, as shown below, with multiple columns and would like to get the total of column, MyColumn.
print df
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
My attempt:
I have attempted to get the sum of the column using groupby and .sum():
Total = df.groupby['MyColumn'].sum()
print Total
This causes the following error:
TypeError: 'instancemethod' object has no attribute '__getitem__'
Expected Output
I'd have expected the output to be as follows:
319
Or alternatively, I would like df to be edited with a new row entitled TOTAL containing the total:
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
TOTAL 319
You should use sum:
Total = df['MyColumn'].sum()
print(Total)
319
Then you use loc with Series, in that case the index should be set as the same as the specific column you need to sum:
df.loc['Total'] = pd.Series(df['MyColumn'].sum(), index=['MyColumn'])
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
because if you pass scalar, the values of all rows will be filled:
df.loc['Total'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
Total 319 319 319.0 319.0
Two other solutions are with at, and ix see the applications below:
df.at['Total', 'MyColumn'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
df.ix['Total', 'MyColumn'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
Note: Since Pandas v0.20, ix has been deprecated. Use loc or iloc instead.
Another option you can go with here:
df.loc["Total", "MyColumn"] = df.MyColumn.sum()
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#Total NaN 319.0 NaN NaN
You can also use append() method:
df.append(pd.DataFrame(df.MyColumn.sum(), index = ["Total"], columns=["MyColumn"]))
Update:
In case you need to append sum for all numeric columns, you can do one of the followings:
Use append to do this in a functional manner (doesn't change the original data frame):
# select numeric columns and calculate the sums
sums = df.select_dtypes(pd.np.number).sum().rename('total')
# append sums to the data frame
df.append(sums)
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#total NaN 319.0 400.0 398.0
Use loc to mutate data frame in place:
df.loc['total'] = df.select_dtypes(pd.np.number).sum()
df
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#total NaN 638.0 800.0 796.0
Similar to getting the length of a dataframe, len(df), the following worked for pandas and blaze:
Total = sum(df['MyColumn'])
or alternatively
Total = sum(df.MyColumn)
print Total
There are two ways to sum of a column
dataset = pd.read_csv("data.csv")
1: sum(dataset.Column_name)
2: dataset['Column_Name'].sum()
If there is any issue in this the please correct me..
As other option, you can do something like below
Group Valuation amount
0 BKB Tube 156
1 BKB Tube 143
2 BKB Tube 67
3 BAC Tube 176
4 BAC Tube 39
5 JDK Tube 75
6 JDK Tube 35
7 JDK Tube 155
8 ETH Tube 38
9 ETH Tube 56
Below script, you can use for above data
import pandas as pd
data = pd.read_csv("daata1.csv")
bytreatment = data.groupby('Group')
bytreatment['amount'].sum()

Randomly replace 10% of dataframe with NaNs?

I have a randomly generated 10*10 dataset and I need to replace 10% of dataset randomly with NaN.
import pandas as pd
import numpy as np
Dataset = pd.DataFrame(np.random.randint(0, 100, size=(10, 10)))
Try the following method. I had used this when I was setting up a hackathon and needed to inject missing data for the competition. -
You can use np.random.choice to create a mask of the same shape as the dataframe. Just make sure to set the percentage of the choice p for True and False values where True represents the values that will be replaced by nans.
Then simply apply the mask using df.mask
import pandas as pd
import numpy as np
p = 0.1 #percentage missing data required
df = pd.DataFrame(np.random.randint(0,100,size=(10,10)))
mask = np.random.choice([True, False], size=df.shape, p=[p,1-p])
new_df = df.mask(mask)
print(new_df)
0 1 2 3 4 5 6 7 8 9
0 50.0 87 NaN 14 78.0 44.0 19.0 94 28 28.0
1 NaN 58 3.0 75 90.0 NaN 29.0 11 47 NaN
2 91.0 30 98.0 77 3.0 72.0 74.0 42 69 75.0
3 68.0 92 90.0 90 NaN 60.0 74.0 72 58 NaN
4 39.0 51 NaN 81 67.0 43.0 33.0 37 13 40.0
5 73.0 0 59.0 77 NaN NaN 21.0 74 55 98.0
6 33.0 64 0.0 59 27.0 32.0 17.0 3 31 43.0
7 75.0 56 21.0 9 81.0 92.0 89.0 82 89 NaN
8 53.0 44 49.0 31 76.0 64.0 NaN 23 37 NaN
9 65.0 15 31.0 21 84.0 7.0 24.0 3 76 34.0
EDIT:
Updated my answer for the exact 10% values that you are looking for. It uses itertools and sample to get a set of indexes to mask, and then sets them to nan values. Should be exact as you expected.
from itertools import product
from random import sample
p = 0.1
n = int(df.shape[0]*df.shape[1]*p) #Calculate count of nans
#Sample exactly n indexes
ids = sample(list(product(range(df.shape[0]), range(df.shape[1]))), n)
idx, idy = list(zip(*ids))
data = df.to_numpy().astype(float) #Get data as numpy
data[idx, idy]=np.nan #Update numpy view with np.nan
#Assign to new dataframe
new_df = pd.DataFrame(data, columns=df.columns, index=df.index)
print(new_df)
0 1 2 3 4 5 6 7 8 9
0 52.0 50.0 24.0 81.0 10.0 NaN NaN 75.0 14.0 81.0
1 45.0 3.0 61.0 67.0 93.0 NaN 90.0 34.0 39.0 4.0
2 1.0 NaN NaN 71.0 57.0 88.0 8.0 9.0 62.0 20.0
3 78.0 3.0 82.0 1.0 75.0 50.0 33.0 66.0 52.0 8.0
4 11.0 46.0 58.0 23.0 NaN 64.0 47.0 27.0 NaN 21.0
5 70.0 35.0 54.0 NaN 70.0 82.0 69.0 94.0 20.0 NaN
6 54.0 84.0 16.0 76.0 77.0 50.0 82.0 31.0 NaN 31.0
7 71.0 79.0 93.0 11.0 46.0 27.0 19.0 84.0 67.0 30.0
8 91.0 85.0 63.0 1.0 91.0 79.0 80.0 14.0 75.0 1.0
9 50.0 34.0 8.0 8.0 10.0 56.0 49.0 45.0 39.0 13.0

Is there a way to replace a whole pandas dataframe row using ffill, if one value of a specific column is NaN?

I am trying to sort a dataframe where some rows are all NaN. I want to fill these using ffill. I'm currently trying this although i feel like it's a mismatch of a few commands
df.loc[df['A'].isna(), :] = df.fillna(method='ffill')
This gives an error:
AttributeError: 'NoneType' object has no attribute 'fillna'
but I want to filter the NaNs I fill using ffill if one of the columns is NaN. i.e.
A B C D E
0 45 88 NaN NaN 3
1 62 34 2 86 NaN
2 85 65 11 31 5
3 NaN NaN NaN NaN NaN
4 90 38 34 93 8
5 0 94 45 10 10
6 58 NaN 23 60 11
7 10 32 5 15 11
8 NaN NaN NaN NaN NaN
So I would only like to fill a row IFF the value of A is NaN, whilst leaving C,0 and D,0 as NaN. Giving the below dataframe
A B C D E
0 45 88 NaN NaN 3
1 62 34 2 86 NaN
2 85 65 11 31 5
3 85 65 11 31 5
4 90 38 34 93 8
5 0 94 45 10 10
6 58 NaN 23 60 11
7 10 32 5 15 11
8 10 32 5 15 11
So just to clarify, the ONLY rows that get replaced with ffill are 3,8 and the reason is because the value of column A in rows 3 and 8 are NaN
Thanks
---Update---
When I'm debugging and evaluate the expression : df.loc[df['A'].isna(), :]
I get
3 NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN
So I assume whats happening here is, I then attempt ffill on this new dataframe only containing 3 and 8 and obviously i cant ffill NaNs with NaNs.
Change values only to those row that start with nan
df.loc[df['A'].isna(), :] = df.ffill().loc[df['A'].isna(), :]
A B C D E
0 45.0 88.0 NaN NaN 3.0
1 62.0 34.0 2.0 86.0 NaN
2 85.0 65.0 11.0 31.0 5.0
3 85.0 65.0 11.0 31.0 5.0
4 90.0 38.0 34.0 93.0 8.0
5 0.0 94.0 45.0 10.0 10.0
6 58.0 NaN 23.0 60.0 11.0
7 10.0 32.0 5.0 15.0 11.0
8 10.0 32.0 5.0 15.0 11.0
Try using a mask to identify the relevant rows where column A is null. The take those same rows from the forward filled dataframe.
mask = df['A'].isnull()
df.loc[mask, :] = df.ffill().loc[mask, :]
>>> df
A B C D E
0 45.0 88.0 NaN NaN 3.0
1 62.0 34.0 2.0 86.0 NaN
2 85.0 65.0 11.0 31.0 5.0
3 85.0 65.0 11.0 31.0 5.0
4 90.0 38.0 34.0 93.0 8.0
5 0.0 94.0 45.0 10.0 10.0
6 58.0 NaN 23.0 60.0 11.0
7 10.0 32.0 5.0 15.0 11.0
8 10.0 32.0 5.0 15.0 11.0
you just want to fill (DataFrame.ffill ) where(DataFrame.where) df['A'] is nan and the rest leave it as before (df):
df=df.ffill().where(df['A'].isna(),df)
print(df)
A B C D E
0 45.0 88.0 NaN NaN 3.0
1 62.0 34.0 2.0 86.0 NaN
2 85.0 65.0 11.0 31.0 5.0
3 85.0 65.0 11.0 31.0 5.0
4 90.0 38.0 34.0 93.0 8.0
5 0.0 94.0 45.0 10.0 10.0
6 58.0 NaN 23.0 60.0 11.0
7 10.0 32.0 5.0 15.0 11.0
8 10.0 32.0 5.0 15.0 11.0

Parse data frame by rows

I have a data frame that has 5 columns named as '0','1','2','3','4'
small_pd
Out[53]:
0 1 2 3 4
0 93.0 94.0 93.0 33.0 0.0
1 92.0 94.0 92.0 33.0 0.0
2 92.0 93.0 92.0 33.0 0.0
3 92.0 94.0 20.0 33.0 76.0
I want to use row-wise the input above to feed a function that does the following. I give as example for the first and second row
firstrow:
takeValue[0,0]-takeValue[0,1]+takeValue[0,2]-takeValue[0,3]+takeValue[0,4]
secondrow:
takeValue[1,0]-takeValue[1,1]+takeValue[1,2]-takeValue[1,3]+takeValue[1,4]
for the third row onwards and then assign all those results as an extra column.
small_pd['extracolumn']
Is there a way to avoid a typical for loop in python and do it in a much better way?
Can you please advice me?
Thanks a lot
Alex
You can use pd.apply
df = pd.DataFrame(data={"0":[93,92,92,92],
"1":[94,94,93,94],
"2":[93,92,92,20],
"3":[33,33,33,33],
"4":[0,0,0,76]})
def calculation(row):
return row["0"]-row["1"]+row["2"]-row["3"]+row["4"]
df['extracolumn'] = df.apply(calculation,axis=1)
print(df)
0 1 2 3 4 result
0 93 94 93 33 0 59
1 92 94 92 33 0 57
2 92 93 92 33 0 58
3 92 94 20 33 76 61
Dont use apply, because loops under the hood, so slow.
Get pair and unpair columns by indexing by DataFrame.iloc, sum them and then subtract for vectorized, so fast solution:
small_pd['extracolumn'] = small_pd.iloc[:, ::2].sum(1) - small_pd.iloc[:, 1::2].sum(1)
print (small_pd)
0 1 2 3 4 extracolumn
0 93.0 94.0 93.0 33.0 0.0 59.0
1 92.0 94.0 92.0 33.0 0.0 57.0
2 92.0 93.0 92.0 33.0 0.0 58.0
3 92.0 94.0 20.0 33.0 76.0 61.0
Verify:
a = small_pd.iloc[0,0]-small_pd.iloc[0,1]+small_pd.iloc[0,2]-
small_pd.iloc[0,3]+small_pd.iloc[0,4]
b = small_pd.iloc[1,0]-small_pd.iloc[1,1]+small_pd.iloc[1,2]-
small_pd.iloc[1,3]+small_pd.iloc[1,4]
print (a, b)
59.0 57.0

Get total of Pandas column

I have a Pandas data frame, as shown below, with multiple columns and would like to get the total of column, MyColumn.
print df
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
My attempt:
I have attempted to get the sum of the column using groupby and .sum():
Total = df.groupby['MyColumn'].sum()
print Total
This causes the following error:
TypeError: 'instancemethod' object has no attribute '__getitem__'
Expected Output
I'd have expected the output to be as follows:
319
Or alternatively, I would like df to be edited with a new row entitled TOTAL containing the total:
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
TOTAL 319
You should use sum:
Total = df['MyColumn'].sum()
print(Total)
319
Then you use loc with Series, in that case the index should be set as the same as the specific column you need to sum:
df.loc['Total'] = pd.Series(df['MyColumn'].sum(), index=['MyColumn'])
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
because if you pass scalar, the values of all rows will be filled:
df.loc['Total'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
Total 319 319 319.0 319.0
Two other solutions are with at, and ix see the applications below:
df.at['Total', 'MyColumn'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
df.ix['Total', 'MyColumn'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
Note: Since Pandas v0.20, ix has been deprecated. Use loc or iloc instead.
Another option you can go with here:
df.loc["Total", "MyColumn"] = df.MyColumn.sum()
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#Total NaN 319.0 NaN NaN
You can also use append() method:
df.append(pd.DataFrame(df.MyColumn.sum(), index = ["Total"], columns=["MyColumn"]))
Update:
In case you need to append sum for all numeric columns, you can do one of the followings:
Use append to do this in a functional manner (doesn't change the original data frame):
# select numeric columns and calculate the sums
sums = df.select_dtypes(pd.np.number).sum().rename('total')
# append sums to the data frame
df.append(sums)
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#total NaN 319.0 400.0 398.0
Use loc to mutate data frame in place:
df.loc['total'] = df.select_dtypes(pd.np.number).sum()
df
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#total NaN 638.0 800.0 796.0
Similar to getting the length of a dataframe, len(df), the following worked for pandas and blaze:
Total = sum(df['MyColumn'])
or alternatively
Total = sum(df.MyColumn)
print Total
There are two ways to sum of a column
dataset = pd.read_csv("data.csv")
1: sum(dataset.Column_name)
2: dataset['Column_Name'].sum()
If there is any issue in this the please correct me..
As other option, you can do something like below
Group Valuation amount
0 BKB Tube 156
1 BKB Tube 143
2 BKB Tube 67
3 BAC Tube 176
4 BAC Tube 39
5 JDK Tube 75
6 JDK Tube 35
7 JDK Tube 155
8 ETH Tube 38
9 ETH Tube 56
Below script, you can use for above data
import pandas as pd
data = pd.read_csv("daata1.csv")
bytreatment = data.groupby('Group')
bytreatment['amount'].sum()

Categories

Resources