Pandas, Dataframe, conditional sum of column for each row - python

I am new to python and trying to move some of my work from excel to python, and wanted an excel SUMIFS equivalent in pandas, for example something like:
SUMIFS(F:F, D:D, "<="&C2, B:B, B2, F:F, ">"&0)
I my case, I have 6 columns, a unique Trade ID, an Issuer, a Trade date, a release date, a trader, and a quantity. I wanted to get a column which show the sum of available quantity for release at each row. Something like the below:
A B C D E F G
ID Issuer TradeDate ReleaseDate Trader Quantity SumOfAvailableRelease
1 Horse 1/1/2012 13/3/2012 Amy 7 0
2 Horse 2/2/2012 15/5/2012 Dave 2 0
3 Horse 14/3/2012 NaN Dave -3 7
4 Horse 16/5/2012 NaN John -4 9
5 Horse 20/5/2012 10/6/2012 John 2 9
6 Fish 6/6/2013 20/6/2013 John 11 0
7 Fish 25/6/2013 9/9/2013 Amy 4 11
8 Fish 8/8/2013 15/9/2013 Dave 5 11
9 Fish 25/9/2013 NaN Amy -3 20
Usually, in excel, I just pull the SUMIFS formulas down the whole column and it will work, I am not sure how I can do it in python.
Many thanks!

What you could do is a df.where
so for example you could say
Qdf = df.where(df["Quantity"]>=5)
and then do you sum, Idk what you want to do since I have 0 knowledge about excell but I hope this helps

Related

Pivot table rank by Name(Index) and Title(Column)

I have a dataset that looks like this:
The count represents the number of times they worked.
Title Name Count
Coach Bob 4
teacher sam 5
driver mark 8
Coach tina 10
teacher kate 3
driver frank 2
I want to create a table which I think will have to be a pivot, that sorts by count times worked, the name and title, so for example the output would look like this:
coach teacher driver
tina 10 sam 5 mark 8
bob 4 kate 3 drank 2
I am familiar with general pivot table code but I think Im going to need to use something a little bit more comprehensive.
DF_PIV = pd.pivot_table(DF, values=['count'], index=['title','Name'], columns=['title']
aggfunc=np.max)
I get an error ValueError: Grouper for 'view_title' not 1-dimensional, but I do not even think I on the right track here.
You can try:
(df.set_index(['Title', df.groupby('Title').cumcount()])
.unstack(0)
.astype(str)
.T
.groupby(level=1).agg(' '.join)
.T)
Output:
Title Coach driver teacher
0 Bob 4 mark 8 sam 5
1 tina 10 frank 2 kate 3

Get latest value looked up from other dataframe

My first data frame
product=pd.DataFrame({
'Product_ID':[101,102,103,104,105,106,107,101],
'Product_name':['Watch','Bag','Shoes','Smartphone','Books','Oil','Laptop','New Watch'],
'Category':['Fashion','Fashion','Fashion','Electronics','Study','Grocery','Electronics','Electronics'],
'Price':[299.0,1350.50,2999.0,14999.0,145.0,110.0,79999.0,9898.0],
'Seller_City':['Delhi','Mumbai','Chennai','Kolkata','Delhi','Chennai','Bengalore','New York']
})
My 2nd data frame has transactions
customer=pd.DataFrame({
'id':[1,2,3,4,5,6,7,8,9],
'name':['Olivia','Aditya','Cory','Isabell','Dominic','Tyler','Samuel','Daniel','Jeremy'],
'age':[20,25,15,10,30,65,35,18,23],
'Product_ID':[101,0,106,0,103,104,0,0,107],
'Purchased_Product':['Watch','NA','Oil','NA','Shoes','Smartphone','NA','NA','Laptop'],
'City':['Mumbai','Delhi','Bangalore','Chennai','Chennai','Delhi','Kolkata','Delhi','Mumbai']
})
I want Price from 1st data frame to come in the merged dataframe. Common element being 'Product_ID'. Note that against product_ID 101, there are 2 prices - 299.00 and 9898.00. I want the later one to come in the merged data set i.e. 9898.0 (Since this is latest price)
Currently my code is not giving the right answer. It is giving both
customerpur = pd.merge(customer,product[['Price','Product_ID']], on="Product_ID", how = "left")
customerpur
id name age Product_ID Purchased_Product City Price
0 1 Olivia 20 101 Watch Mumbai 299.0
1 1 Olivia 20 101 Watch Mumbai 9898.0
There is no explicit timestamp so I assume the index is the order of the dataframe. You can drop duplicates at the end:
customerpur.drop_duplicates(subset = ['id'], keep = 'last')
result:
id name age Product_ID Purchased_Product City Price
1 1 Olivia 20 101 Watch Mumbai 9898.0
2 2 Aditya 25 0 NA Delhi NaN
3 3 Cory 15 106 Oil Bangalore 110.0
4 4 Isabell 10 0 NA Chennai NaN
5 5 Dominic 30 103 Shoes Chennai 2999.0
6 6 Tyler 65 104 Smartphone Delhi 14999.0
7 7 Samuel 35 0 NA Kolkata NaN
8 8 Daniel 18 0 NA Delhi NaN
9 9 Jeremy 23 107 Laptop Mumbai 79999.0
Please note keep = 'last' argument since we are keeping only last price registered.
Deduplication should be done before merging if Yuo care about performace or dataset is huge:
product = product.drop_duplicates(subset = ['Product_ID'], keep = 'last')
In your data frame there is no indicator of latest entry, so you might need to first remove the the first entry for id 101 from product dataframe as follows:
result_product = product.drop_duplicates(subset=['Product_ID'], keep='last')
It will keep the last entry based on Product_ID and you can do the merge as:
pd.merge(result_product, customer, on='Product_ID')

Sum based on grouping in pandas dataframe?

I have a pandas dataframe df which contains:
major men women rank
Art 5 4 1
Art 3 5 3
Art 2 4 2
Engineer 7 8 3
Engineer 7 4 4
Business 5 5 4
Business 3 4 2
Basically I am needing to find the total number of students including both men and women as one per major regardless of the rank column. So for Art for example, the total should be all men + women totaling 23, Engineer 26, Business 17.
I have tried
df.groupby(['major_category']).sum()
But this separately sums the men and women rather than combining their totals.
Just add both columns and then groupby:
(df.men+df.women).groupby(df.major).sum()
major
Art 23
Business 17
Engineer 26
dtype: int64
melt() then groupby():
df.drop('rank',1).melt('major').groupby('major',as_index=False).sum()
major value
0 Art 23
1 Business 17
2 Engineer 26

Pandas: Fill in missing indexes with specific ordered values that are already in column.

I have extracted a one-column dataframe with specific values. Now this is what the dataframe looks like:
Commodity
0 Cocoa
4 Coffee
6 Maize
7 Rice
10 Sugar
12 Wheat
Now I want to respectively fill each index that has no value with the value above it in the column so It should look something like this:
Commodity
0 Cocoa
1 Cocoa
2 Cocoa
3 Cocoa
4 Coffee
5 Coffee
6 Maize
7 Rice
8 Rice
9 Rice
10 Sugar
11 Sugar
12 Wheat
I don't seem to get anything from the pandas documentation Working with Text Data. Thanks for your help!
I create a new index with pd.RangeIndex. It works like range so I need to pass it a number one greater than the max number in the current index.
df.reindex(pd.RangeIndex(df.index.max() + 1)).ffill()
Commodity
0 Cocoa
1 Cocoa
2 Cocoa
3 Cocoa
4 Coffee
5 Coffee
6 Maize
7 Rice
8 Rice
9 Rice
10 Sugar
11 Sugar
12 Wheat
First expand the index to include all numbers
s = pd.Series(['Cocoa', 'Coffee', 'Maize', 'Rice', 'Sugar', 'Wheat',], index=[0,4,6,7,10, 12], name='Commodity')
s = s.reindex(range(s.index.max() + 1))
Then do a backfill
s.bfill()

get the distinct column values and union the dataframes

I am trying to convert sql statement
SELECT distinct table1.[Name],table1.[Phno]
FROM table1
union
select distinct table2.[Name],table2.[Phno] from table2
UNION
select distinct table3.[Name],table3.[Phno] from table3;
Now I have 4 dataframes: table1, table2, table3.
table1
Name Phno
0 Andrew 6175083617
1 Andrew 6175083617
2 Frank 7825942358
3 Jerry 3549856785
4 Liu 9659875695
table2
Name Phno
0 Sandy 7859864125
1 Nikhil 9526412563
2 Sandy 7859864125
3 Tina 7459681245
4 Surat 9637458725
table3
Name Phno
0 Patel 9128257489
1 Mary 3679871478
2 Sandra 9871359654
3 Mary 3679871478
4 Hali 9835167465
now I need to get distinct values of these dataframes and union them and get the output to be:
sample output
Name Phno
0 Andrew 6175083617
1 Frank 7825942358
2 Jerry 3549856785
3 Liu 9659875695
4 Sandy 7859864125
5 Nikhil 9526412563
6 Tina 7459681245
7 Surat 9637458725
8 Patel 9128257489
9 Mary 3679871478
10 Sandra 9871359654
11 Hali 9835167465
I tried to get the unique values for one dataframe table1 as shown below:
table1_unique = pd.unique(table1.values.ravel()) #which gives me
table1_unique
array(['Andrew', 6175083617L, 'Frank', 7825942358L, 'Jerry', 3549856785L,
'Liu', 9659875695L], dtype=object)
But i get them as an array. I even tried converting them as dataframe using:
table1_unique1 = pd.DataFrame(table1_unique)
table1_unique1
0
0 Andrew
1 6175083617
2 Frank
3 7825942358
4 Jerry
5 3549856785
6 Liu
7 9659875695
How do I get unique values in a dataframe, so that I can concat them as per my sample output. Hope this is clear. Thanks!!
a = table1df[['Name','Phno']].drop_duplicates()
b = table2df[['Name','Phno']].drop_duplicates()
c = table3df[['Name','Phno']].drop_duplicates()
result = pd.concat([a,b,c])

Categories

Resources