Get max calls by a person Pandas Python - python

Let's say, I have number A and they call several people B
A B
123 987
123 987
123 124
435 567
435 789
653 876
653 876
999 654
999 654
999 654
999 123
I want to find to whom the person in A has called maximum times and also the number of times.
OUTPUT:
A B Count
123 987 2
435 567 or789 1
653 876 2
999 654 3
How one can think of it is,
A B
123 987 2
124 1
435 567 1
789 1
653 876 2
999 654 3
123 1
Can somebody help me out on how to do this?

Try this
# count the unique values in rows
df.value_counts(['A','B']).sort_index()
A B
123 124 1
987 2
435 567 1
789 1
653 876 2
999 123 1
654 3
dtype: int64
To get the highest values for each unique A:
v = df.value_counts(['A','B'])
# remove duplicated rows
v[~v.reset_index(level=0).duplicated('A').values]
A B
999 654 3
123 987 2
653 876 2
435 567 1
dtype: int64

Use SeriesGroupBy.value_counts which by default sorting values, so get first rows per A by GroupBy.head:
df = df.groupby('A')['B'].value_counts().groupby(level=0).head(1).reset_index(name='Count')
print (df)
A B Count
0 123 987 2
1 435 567 1
2 653 876 2
3 999 654 3
Another idea:
df = df.value_counts(['A','B']).reset_index(name='Count').drop_duplicates('A')
print (df)
A B Count
0 999 654 3
1 123 987 2
2 653 876 2
4 435 567 1

Related

how to merge or join pandas pivot table with python

I try to merge or join two pandas's pivot table
t tried using merge and it does works but the result wasn't i expecting tobe, its giving me duplicate for either table
df1.fillna('', inplace=True)
df1.reset_index(inplace=True)
df2.fillna('', inplace=True)
df2.reset_index(inplace=True)
df = pd.merge(df1, df2, how='left', on=['KEY1', 'KEY2'])
First Pivot Table:
KEY1 KEY2 KEY3 Column_A0 Column_A1 Column_A2
Row_X0 Row_Y0 Row_Z0 123 123
Row_Y1 Row_Z0 456
Row_Z1 789 789
Row_X1 Row_Y0 Row_Z0 123
Row_Z1 789
Row_Z2 456 789
Second Pivot Table:
KEY1 KEY2 KEY3 Column_B0 Column_B1 Column_B2 Column_B3
Row_X0 Row_Y0 Row_W0 1 234
Row_W1 2 345
Row_W2 3 456
Row_Y1 Row_W0 4 567 1 234
Row_W1 7 890 2 345
Row_W2 8 901 3 456
Row_W3 9 12 4 567
Row_X1 Row_Y0 Row_W0 7 890
Row_W1 8 901
Row_W2 9 12
The result I expect:
KEY1 KEY2 KEY3_X Column_A0 Column_A1 Column_A2 KEY3_Y Column_B0 Column_B1 Column_B2 Column_B3
Row_X0 Row_Y0 Row_Z0 123 123 Row_W0 1 234
Row_W1 2 345
Row_W2 3 456
Row_Y1 Row_Z0 456 Row_W0 4 567 1 234
Row_Z1 789 789 Row_W1 7 890 2 345
Row_W2 8 901 3 456
Row_W3 9 12 4 567
Row_W0 7 890
Row_X1 Row_Y0 Row_Z0 123 Row_W1 8 901
Row_Z1 789 Row_W2 9 12
Row_Z2 456 789
is there any i can do to make this happen? thank you
Concat() by row or column. The pd.concat function allows you to tables using the column and or row

How to join all columns in dataframe? [duplicate]

This question already has answers here:
Pandas: Multiple columns into one column
(4 answers)
How to stack/append all columns into one column in Pandas? [duplicate]
(4 answers)
Closed 10 months ago.
I would like one column to have all the other columns in the data frame combined.
here is what the dataframe looks like
0 1 2
0 123 321 231
1 232 321 231
2 432 432 432
dataframe name = task_ba
I would like it to look like this
0
0 123
1 232
2 432
3 321
4 321
5 432
6 231
7 231
8 432
Easiest and fastest option, use the underlying numpy array:
df2 = pd.DataFrame(df.values.ravel(order='F'))
NB. If you prefer a series, use pd.Series instead
Output:
0
0 123
1 232
2 432
3 321
4 321
5 432
6 231
7 231
8 432
You can use pd.DataFrame.melt() and then drop the variable column:
>>> df
0 1 2
0 123 321 231
1 232 321 231
2 432 432 432
>>> df.melt().drop("variable", axis=1) # Drops the 'variable' column
value
0 123
1 232
2 432
3 321
4 321
5 432
6 231
7 231
8 432
Or if you want 0 as your column name:
>>> df.melt(value_name=0).drop("variable", axis=1)
0
0 123
1 232
2 432
3 321
4 321
5 432
6 231
7 231
8 432
You can learn all this (and more!) in the official documentation.

Selecting Items in dataframe

Using Python 3
I have a dataframe sort of like this:
productCode productType storeCode salesAmount moreInfo
111 1 111 111 info
111 1 112 112 info
456 4 456 456 info
and so on for thousands of rows
I want to select (and have a list with the codes for) the X amount of the best selling unique products for each different store.
How would I accomplish that?
Data:
df = pd.DataFrame({'productCode': [111,111,456,123,125],
'productType' : [1,1,4,3,3],
'storeCode' : [111,112,112,456,456],
'salesAmount' : [111,112,34,456,1235]})
productCode productType storeCode salesAmount
0 111 1 111 111
1 111 1 112 112
2 456 4 112 34
3 123 3 456 456
4 125 3 456 1235
It sounds like you want the best selling product at each storeCode? In which case:
df.sort_values('salesAmount', ascending=False).groupby('storeCode').head(1)
productCode productType storeCode salesAmount
4 125 3 456 1235
1 111 1 112 112
0 111 1 111 111
Instead, if you want the best selling of each productType at each storeCode, then:
df.sort_values('salesAmount', ascending=False).groupby(['storeCode', 'productType']).head(1)
productCode productType storeCode salesAmount
4 125 3 456 1235
1 111 1 112 112
0 111 1 111 111
2 456 4 112 34

merge_asof with multiple columns and forward direction

I have 2 dataframes:
q = pd.DataFrame({'ID':[700,701,701,702,703,703,702],'TX':[0,0,1,0,0,1,1],'REF':[100,120,144,100,103,105,106]})
ID TX REF
0 700 0 100
1 701 0 120
2 701 1 144
3 702 0 100
4 703 0 103
5 703 1 105
6 702 1 106
and
p = pd.DataFrame({'ID':[700,701,701,702,703,703,702,708],'REF':[100,121,149,100,108,105,106,109],'NOTE':['A','B','V','V','T','A','L','M']})
ID REF NOTE
0 700 100 A
1 701 121 B
2 701 149 V
3 702 100 V
4 703 108 T
5 703 105 A
6 702 106 L
7 708 109 M
I wish to merge p with q in such way that ID are equals AND the REF is exact OR higher.
Example 1:
for p: ID=700 and REF=100 and
for q: ID=700 and RED=100 So that's a clear match!
Example 2
for p:
1 701 0 120
2 701 1 144
they would match to:
1 701 121 B
2 701 149 V
this way:
1 701 0 120 121 B 121 is just after 120
2 701 1 144 149 V 149 comes after 144
When I use the below code NOTE: I only indicate the REF which is wrong. Should be ID AND REF:
p = p.sort_values(by=['REF'])
q = q.sort_values(by=['REF'])
pd.merge_asof(p, q, on='REF', direction='forward').sort_values(by=['ID_x','TX'])
I get this problem:
My expected result should be something like this:
ID TX REF REF_2 NOTE
0 700 0 100 100 A
1 701 0 120 121 B
2 701 1 144 149 V
3 702 0 100 100 V
4 703 0 103 108 T
5 703 1 105 105 A
6 702 1 106 109 L
Does this work?
pd.merge_asof(q.sort_values(['REF', 'ID']),
p.sort_values(['REF', 'ID']),
on='REF',
direction='forward',
by='ID').sort_values('ID')
Output:
ID TX REF NOTE
0 700 0 100 A
5 701 0 120 B
6 701 1 144 V
1 702 0 100 V
4 702 1 106 L
2 703 0 103 A
3 703 1 105 A

Concat column name with data of first row, Python 3.6 Dataframe

I want to add data of first row of the dataframe to its column name & delete first row.
Source DataFrame:
2013K2 2013K3 2013K4 2013K5
ABC1 ABC2 ABC3 ABC4
324 5435 543 543
6543 543 657 765
765 876 876 9876
Need to rename column name as Column Name +'|'+ Data of First row:
2013K2|ABC1 2013K3|ABC2 2013K4|ABC3 2013K5|ABC4
324 5435 543 543
6543 543 657 765
765 876 876 9876
IIUC
df.columns=df.columns+'|'+df.iloc[0,:]
df.iloc[1:,]
Out[41]:
0 2013K2|ABC1 2013K3|ABC2 2013K4|ABC3 2013K5|ABC4
1 324 5435 543 543
2 6543 543 657 765
3 765 876 876 9876
df=df.iloc[1:,]
df
Out[43]:
0 2013K2|ABC1 2013K3|ABC2 2013K4|ABC3 2013K5|ABC4
1 324 5435 543 543
2 6543 543 657 765
3 765 876 876 9876
You could do it like this, using T and set_index, then combining multiindex columns to a single column heading using map and format.
df_out = df.T.set_index(0, append=True).T
df_out.columns = df_out.columns.map('{0[0]}|{0[1]}'.format)
df_out
Output:
2013K2|ABC1 2013K3|ABC2 2013K4|ABC3 2013K5|ABC4
1 324 5435 543 543
2 6543 543 657 765
3 765 876 876 9876
you can use the following one-liner:
In [148]: df = df.rename(columns=lambda x: x+'|'+df.iloc[0][x]).iloc[1:]
In [149]: df
Out[149]:
2013K2|ABC1 2013K3|ABC2 2013K4|ABC3 2013K5|ABC4
1 324 5435 543 543
2 6543 543 657 765
3 765 876 876 9876
One can use simple for loop and other basic functions to create a new list of column names and change the dataframe:
newcols = [] # empty list to have new column names
for i in range(len(df.columns)):
newcols.append(df.columns[i]+'|'+df.iloc[0,i]) # make and add new name to list
df.columns = newcols # assign column names
df = df.iloc[1:,:] # exclude first row
print(df)
Output:
2013K2|ABC1 2013K3|ABC2 2013K4|ABC3 2013K5|ABC4
1 324 5435 543 543
2 6543 543 657 765
3 765 876 876 9876

Categories

Resources