I would like to calculate the average 'service response time' per conversation id as a variable in a dataframe (in minutes).
The 'service response time' is calculated by the difference between the 'created_at' variable from Y and X in minutes:
X = the first row where owner_type == "User" and is_interaction == 1.
Y = the first row after X where owner_type == "Agent" and owner_id != 1.
Update:
id
owner_type
owner_id
conversation_id
message
created_at
is_interaction
260943
Agent
1
26276
a
01/03/2022 15:00
265544
Agent
1
26276
b
05/03/2022 12:01
266749
User
153263
26276
c
05/03/2022 15:49
1
266750
User
153263
26276
d
05/03/2022 15:49
1
266753
Agent
14
26276
e
05/03/2022 15:51
267003
Agent
1
26276
f
06/03/2022 12:01
268900
User
153263
26276
g
06/03/2022 17:01
1
268904
Agent
1
26276
h
07/03/2022 12:00
271141
Agent
1
26276
i
09/03/2022 12:00
271725
User
153263
26276
j
09/03/2022 13:01
1
271728
User
153263
26276
k
09/03/2022 13:01
1
271727
Agent
10
26277
l
09/03/2022 13:01
272085
Agent
1
26276
m
10/03/2022 12:01
Any ideas on how to calculate this?
Update:
The resulted output should look like this:
You should replace the column name "Average Response Time (in minutes)" for "srt" in the dataframe. Ignore the "Average" in this column name, because it's not and the "Date" column if not needed.
Best regards,
Milan Passchier
Update 04.11.2022
If you had a unique ID for each event, then it would be easier. And more: 'X = the first row where owner_type == "User" and is_interaction == 1.' This is not the first row at all, but the last one before 'Y = the first row after X where owner_type == "Agent" and owner_id != 1'.
I offer two options. In both cases, the created_at column is converted to the desired format using pd.to_datetime, and a 'srt' column is created with empty values. Explicit loc indexing is used.
In the first one, the main logic is in list comprehensions (they are many times faster than a loop).
More:
First, a list bbb is created, in which the condition is checked at each iteration:
if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1
if it is met, then the iteration number is written and the my_func function is called, which is fed the iteration number and 'conversation_id'. The function takes a dataframe by slice starting from i to the last one. Finds a row with 'Agent' that does not equal 1 and has the same 'conversation_id'. The first available line is taken:
m = aaa.index[0]
If there are no such strings, then the function returns -1.
Thus, we get the list bbb, in which the User indexes are on the left and Agent on the right.
In the fff list, the last lines where the Agent index stops repeating are copied.
Further in the loop, with the help of the selected indices, the necessary lines are filled using loc.
code list comprehensions:
import numpy as np
import pandas as pd
df['created_at'] = pd.to_datetime(df['created_at'], errors='raise')
df['srt'] = np.nan
def my_func(i, id):
m = -1
aaa = df[i:]
aaa = aaa[(df.loc[i:, 'conversation_id'] == id) & (df.loc[i:, 'owner_type'] == 'Agent')
& (df.loc[i:, 'owner_id'] != 1)]
if len(aaa) > 0:
m = aaa.index[0]
return m
bbb = np.array([[i, my_func(i, df.loc[i, 'conversation_id'])]
for i in range(len(df)) if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1])
fff = [bbb[i] for i in range(len(bbb) - 1) if (bbb[i, 1] != bbb[i + 1:, 1]).all() == True and bbb[i, 1] != -1]
if len(bbb) > 1 and bbb[-1, 1] != -1:
fff.append(bbb[-1])
fff = np.array(fff)
for i in fff:
df.loc[i[0], 'srt'] = (df.loc[i[1], 'created_at'] - df.loc[i[0], 'created_at']) / np.timedelta64(1, 'm')
print(df)
The solution is where all the logic is in a loop.
import numpy as np
import pandas as pd
df['created_at'] = pd.to_datetime(df['created_at'], errors='raise')
df['srt'] = np.nan
ferst_time_user = 0
cid = 0
ind = 0
for i in range(len(df)):
if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1:
ferst_time_user = df.loc[i, 'created_at']
cid = df.loc[i, 'conversation_id']
ind = i
if ferst_time_user != 0 and df.loc[i, 'conversation_id'] == cid and df.loc[i, 'owner_type'] == 'Agent' and df.loc[i, 'owner_id'] != 1:
df.loc[ind, 'srt'] = (df.loc[i, 'created_at'] - ferst_time_user) / np.timedelta64(1, 'm')
ferst_time_user = 0
ind = 0
cid = 0
print(df)
Output
id owner_type owner_id conversation_id message created_at \
0 260943 Agent 1 26276 a 2022-01-03 15:00:00
1 265544 Agent 1 26276 b 2022-05-03 12:01:00
2 266749 User 153263 26276 c 2022-05-03 15:49:00
3 266750 User 153263 26276 d 2022-05-03 15:49:00
4 266753 Agent 14 26276 e 2022-05-03 15:51:00
5 267003 Agent 1 26276 f 2022-06-03 12:01:00
6 268900 User 153263 26276 g 2022-06-03 17:01:00
7 268904 Agent 1 26276 h 2022-07-03 12:00:00
8 271141 Agent 1 26276 i 2022-09-03 12:00:00
9 271725 User 153263 26276 j 2022-09-03 13:01:00
10 271728 User 153263 26276 k 2022-09-03 13:01:00
11 271727 Agent 10 26277 l 2022-09-03 13:01:00
12 272085 Agent 1 26276 m 2022-10-03 12:01:00
is_interaction srt
0 NaN NaN
1 NaN NaN
2 1.0 NaN
3 1.0 2.0
4 NaN NaN
5 NaN NaN
6 1.0 NaN
7 NaN NaN
8 NaN NaN
9 1.0 NaN
10 1.0 NaN
11 NaN NaN
12 NaN NaN
Related
I have a dataframe df_corp:
ID arrival_date leaving_date
1 01/02/20 05/02/20
2 01/03/20 07/03/20
1 12/02/20 20/02/20
1 07/03/20 10/03/20
2 10/03/20 15/03/20
I would like to find the difference between leaving_date of a row and arrival date of the next entry with respect to ID. Basically I want to know how long before they book again.
So it'll look something like this.
ID arrival_date leaving_date time_between
1 01/02/20 05/02/20 NaN
2 01/03/20 07/03/20 NaN
1 12/02/20 20/02/20 7
1 07/03/20 10/03/20 15
2 10/03/20 15/03/20 3
I've tried grouping by ID to do the sum but I'm seriously lost on how to get the value from the next row and a different column in one.
You need to convert to_datetime and to perform a GroupBy.shift to get the previous departure date:
# arrival
a = pd.to_datetime(df_corp['arrival_date'], dayfirst=True)
# previous departure per ID
l = pd.to_datetime(df_corp['leaving_date'], dayfirst=True).groupby(df_corp['ID']).shift()
# difference in days
df_corp['time_between'] = (a-l).dt.days
output:
ID arrival_date leaving_date time_between
0 1 01/02/20 05/02/20 NaN
1 2 01/03/20 07/03/20 NaN
2 1 12/02/20 20/02/20 7.0
3 1 07/03/20 10/03/20 16.0
4 2 10/03/20 15/03/20 3.0
I'm wondering how to optimize a part of code to remove a loop which takes forever since I have around 350 000 IDs.
Here is the current code, which is not optimal and takes quite a while.
I'm trying to get it working better and if possible removing a loop.
The dataset is made of 4 columns with IDs, start_dates, end_dates and amount. We can have multi rows with same IDs but not the same amount. The main thing is in some rows the dates are not saved in the dataset. In that case we have to find the earlier start_date of the ID and the later end_date and add them to the row where it's not put in the dataframe
ID start_date end_date value
ABC 12/10/2010 12/12/2020 8
ABC 01/01/2020 01/04/2021 9
ABC 43
BCD 14/02/2020 14/03/2020 8
So we should have on the third row the start_date as 12/10/2010 and end date 01/04/2021. In the picture you cant see it but don't forget that BCD start_date could be earlier than ABC but you still use the 12/10/2010 because it is linked to the ID
for x in df['ID'].unique():
tmp = df.loc[df['ID'] == x].reset_index()
df.loc[(df['ID'] == x) & (df['start_date'].isna()), 'start_date'] = tmp['start_date'].min()
df.loc[(df['ID'] == x) & (df['end_date'].isna()), 'end_date'] = tmp['end_date'].max()
I suppose the code is quite clear about what I am trying to do.
But if you have any questions don't hesitate do post them I'll do my best to answer.
set up the job
import pandas as pd
data = { 'ID': ['ABC','ABC','ABC','BCD'], 'start_date' : ['12/10/2010', '01/01/2020',None ,'14/02/2020'], 'end_date': ['12/12/2020', '01/01/2021',None ,'14/03/2020'], 'value': [8,9,43,8]}
df = pd.DataFrame(data)
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])
we get this result
ID start_date end_date value
0 ABC 2010-12-10 2020-12-12 8
1 ABC 2020-01-01 2021-01-01 9
2 ABC NaT NaT 43
3 BCD 2020-02-14 2020-03-14 8
do the work
df.start_date = df.groupby('ID')['start_date'].apply(lambda x: x.fillna(x.min()))
df.end_date = df.groupby('ID')['end_date'].apply(lambda x: x.fillna(x.max()))
we get this result
ID start_date end_date value
0 ABC 2010-12-10 2020-12-12 8
1 ABC 2020-01-01 2021-01-01 9
2 ABC 2010-12-10 2021-01-01 43
3 BCD 2020-02-14 2020-03-14 8
Please, suggest a more suitable title for this question
I have: Two-level indexed DF (crated via groupby):
clicks yield
country report_date
AD 2016-08-06 1 31
2016-12-01 1 0
AE 2016-10-11 1 0
2016-10-13 2 0
I need:
Consequently take country by country data, process it and put it back:
for country in set(DF.get_level_values(0)):
DF_country = process(DF.loc[country])
DF[country] = DF_country
Where process add new rows to DF_country.
Problem is in last string:
ValueError: Wrong number of items passed 2, placement implies 1
I just modify your code, I change the process to add, Base on my understanding process is a self-define function right ?
for country in set(DF.index.get_level_values(0)): # change here
DF_country = DF.loc[country].add(1)
DF.loc[country] = DF_country.values #and here
DF
Out[886]:
clicks yield
country report_date
AD 2016-08-06 2 32
2016-12-01 2 1
AE 2016-10-11 2 1
2016-10-13 3 1
EDIT :
l=[]
for country in set(DF.index.get_level_values(0)):
DF1=DF.loc[country]
DF1.loc['2016-01-01']=[1,2] #adding row here
l.append(DF1)
pd.concat(l,axis=0,keys=set(DF.index.get_level_values(0)))
Out[923]:
clicks yield
report_date
AE 2016-10-11 1 0
2016-10-13 2 0
2016-01-01 1 2
AD 2016-08-06 1 31
2016-12-01 1 0
2016-01-01 1 2
I could use some more help with a project. I am trying to analyze 4.5 million rows of data. I have read the data into a dataframe, have organized the data and now have 3 columns: 1) date as datetime 2) unique identifier 3) price
I need to calculate the year over year change in prices per item but the dates are not uniform and not consistent per item. For example:
date item price
12/31/15 A 110
12/31/15 B 120
12/31/14 A 100
6/24/13 B 100
What I would like is to find as a result is:
date item price previousdate % change
12/31/15 A 110 12/31/14 10%
12/31/15 B 120 6/24/13 20%
12/31/14 A 100
6/24/13 B 100
EDIT - Better example of data
date item price
6/1/2016 A 276.3457646
6/1/2016 B 5.044165645
4/27/2016 B 4.91300186
4/27/2016 A 276.4329163
4/20/2016 A 276.9991265
4/20/2016 B 4.801263717
4/13/2016 A 276.1950213
4/13/2016 B 5.582923328
4/6/2016 B 5.017863509
4/6/2016 A 276.218649
3/30/2016 B 4.64274783
3/30/2016 A 276.554653
3/23/2016 B 5.576438253
3/23/2016 A 276.3135836
3/16/2016 B 5.394435443
3/16/2016 A 276.4222986
3/9/2016 A 276.8929462
3/9/2016 B 4.999951262
3/2/2016 B 4.731349423
3/2/2016 A 276.3972068
1/27/2016 A 276.8458971
1/27/2016 B 4.993033132
1/20/2016 B 5.250379701
1/20/2016 A 276.2899864
1/13/2016 B 5.146639666
1/13/2016 A 276.7041978
1/6/2016 B 5.328296958
1/6/2016 A 276.9465891
12/30/2015 B 5.312301356
12/30/2015 A 256.259668
12/23/2015 B 5.279105491
12/23/2015 A 255.8411198
12/16/2015 B 5.150798234
12/16/2015 A 255.8360529
12/9/2015 A 255.4915183
12/9/2015 B 4.722876886
12/2/2015 A 256.267146
12/2/2015 B 5.083626167
10/28/2015 B 4.876177757
10/28/2015 A 255.6464653
10/21/2015 B 4.551439655
10/21/2015 A 256.1735769
10/14/2015 A 255.9752668
10/14/2015 B 4.693967392
10/7/2015 B 4.911797443
10/7/2015 A 256.2556707
9/30/2015 B 4.262994526
9/30/2015 A 255.8068691
7/1/2015 A 255.7312385
4/22/2015 A 234.6210132
4/15/2015 A 235.3902076
4/15/2015 B 4.154926102
4/1/2015 A 234.4713827
2/25/2015 A 235.1391496
2/18/2015 A 235.1223471
What I have done (with some help from other users) hasn't worked but is below. Thanks for any help you guys can provide or pointing me in the right direction!
import pandas as pd
import datetime as dt
import numpy as np
df = pd.read_csv('...python test file5.csv',parse_dates =['As of Date'])
df = df[['item','price','As of Date']]
def get_prev_year_price(x, df):
try:
return df.loc[x['prev_year_date'], 'price']
#return np.abs(df.time - x)
except Exception as e:
return x['price']
#Function to determine the closest date from given date and list of all dates
def nearest(items, pivot):
return min(items, key=lambda x: abs(x - pivot))
df['As of Date'] = pd.to_datetime(df['As of Date'],format='%m/%d/%Y')
df = df.rename(columns = {df.columns[2]:'date'})
# list of dates
dtlst = [item for item in df['date']]
data = []
data2 = []
for item in df['item'].unique():
item_df = df[df['item'] == item] #select based on items
select_dates = item_df['date'].unique()
item_df.set_index('date', inplace=True) #set date as key index
item_df = item_df.resample('D').mean().reset_index() #fill in missing date
item_df['price'] = item_df['price'].interpolate('nearest') #fill in price with nearest price available
# use max(item_df['date'] where item_df['date'] < item_df['date'] - pd.DateOffset(years=1, days=1))
#possible_date = item_df['date'] - pd.DateOffset(years=1)
#item_df['prev_year_date'] = max(df[df['date'] <= possible_date])
item_df['prev_year_date'] = item_df['date'] - pd.DateOffset(years=1) #calculate 1 year ago date
date_df = item_df[item_df.date.isin(select_dates)] #select dates with useful data
item_df.set_index('date', inplace=True)
date_df['prev_year_price'] = date_df.apply(lambda x: get_prev_year_price(x, item_df),axis=1)
#date_df['prev_year_price'] = date_df.apply(lambda x: nearest(dtlst, x),axis=1)
date_df['change'] = date_df['price'] / date_df['prev_year_price']-1
date_df['item'] = item
data.append(date_df)
data2.append(item_df)
summary = pd.concat(data).sort_values('date', ascending=False)
#print (summary)
#saving the output of the CSV file to see how data looks after being handled
filename = '...python_test_file_save4.csv'
summary.to_csv(filename, index=True, encoding='utf-8')
With current usecase assumptions, this works out for this specific usecase
In [2459]: def change(grp):
...: grp['% change'] = grp.price.diff()
...: grp['previousdate'] = grp.date.shift(1)
...: return grp
Sort on date then groupby and apply the change function, then sort the index back.
In [2460]: df.sort_values('date').groupby('item').apply(change).sort_index()
Out[2460]:
date item price % change previousdate
0 2015-12-31 A 110 10.0 2014-12-31
1 2015-12-31 B 120 20.0 2013-06-24
2 2014-12-31 A 100 NaN NaT
3 2013-06-24 B 100 NaN NaT
This is a good situation for merge_asof, which merges two dataframes by finding the last row of the right dataframe that is less than the key to the left dataframe. We need to add a year to the right dataframe first, since the requirement is 1 year or more difference between dates.
Here is some sample data that you brought up in your comment.
date item price
12/31/15 A 110
12/31/15 B 120
12/31/14 A 100
6/24/13 B 100
12/31/15 C 100
1/31/15 C 80
11/14/14 C 130
11/19/13 C 110
11/14/13 C 200
The dates need to be sorted for merge_asof to work. merge_asof also drops the joining column so we need to put a copy of that back in our right dataframe.
Setup dataframes
df = df.sort_values('date')
df_copy = df.copy()
df_copy['previousdate'] = df_copy['date']
df_copy['date'] += pd.DateOffset(years=1)
Use merge_asof
df_final = pd.merge_asof(df, df_copy,
on='date',
by='item',
suffixes=['current', 'previous'])
df_final['% change'] = (df_final['pricecurrent'] - df_final['priceprevious']) / df_final['priceprevious']
df_final
date item pricecurrent priceprevious previousdate % change
0 2013-06-24 B 100 NaN NaT NaN
1 2013-11-14 C 200 NaN NaT NaN
2 2013-11-19 C 110 NaN NaT NaN
3 2014-11-14 C 130 200.0 2013-11-14 -0.350000
4 2014-12-31 A 100 NaN NaT NaN
5 2015-01-31 C 80 110.0 2013-11-19 -0.272727
6 2015-12-31 A 110 100.0 2014-12-31 0.100000
7 2015-12-31 B 120 100.0 2013-06-24 0.200000
8 2015-12-31 C 100 130.0 2014-11-14 -0.230769
I have a dataframe (df) (orginally from a excel file) and the first 9 rows are like this:
Control Recd_Date/Due_Date Action Signature/Requester
0 2000-1703 2000-01-31 00:00:00 OC/OER/OPA/PMS/ M WEBB
1 NaN 2000-02-29 00:00:00 NaN DATA CORP
2 2000-1776 2000-01-02 00:00:00 OC/ORA/OE/DCP/ G KAN
3 NaN 2000-01-03 00:00:00 OC/ORA/ORO/PNC/ PALM POST
4 NaN NaN FDA/OGROP/ORA/SE-FO/FLA- NaN
5 NaN NaN DO/FLA-CB/ NaN
6 2000-1983 2000-02-02 00:00:00 FDA/OGROP/ORA/CE-FO/CHI- M EGAN
7 NaN 2000-02-03 00:00:00 DO/CHI-CB/ BERNSTEIN LIEBHARD &
8 NaN NaN NaN LONDON LLP
Type(df['Control'][1])=float;
Type(df['Recd_Date/Due_Date'][1])=datetime.datetime;
type(df['Action_Office'][1])=float;
Type(df['Signature/Requester'][1])=unicode
I want to transform this dataframe (e.g. first 9 rows) to this:
Control Recd_Date/Due_Date Action Signature/Requester
0 2000-1703 2000-01-31 00:00:00,2000-02-29 00:00:00 OC/OER/OPA/PMS/ M WEBB,DATA CORP
1 2000-1776 2000-01-02 00:00:00,2000-01-03 00:00:00 OC/ORA/OE/DCP/OC/ORA/ORO/PNC/FDA/OGROP/ORA/SE-FO/FLA-DO/FLA-CB/ G KAN,PALM POST
2 2000-1983 2000-02-02 00:00:00,2000-02-03 00:00:00 FDA/OGROP/ORA/CE-FO/CHI-DO/CHI-CB/ M EGAN,BERNSTEIN LIEBHARD & LONDON LLP
So basically:
Everytime pd.isnull(row['Control']) (This should be the only if condition) is true then merge this row with the previous row (whose 'control' value is not null).
And for 'Recd_Date/Due_Date' and 'Signature/Requester', add ',' (or '/') between each two values (from two merged rows) (e.g. '2000-01-31 00:00:00,2000-02-29 00:00:00' and 'G KAN,PALM POST')
For 'Action', simply merge them without any punctuations added (e.g. FDA/OGROP/ORA/CE-FO/CHI-DO/CHI-CB/)
Can anyone help me out pls? This is the code im trying to get it to work:
for i, row in df.iterrows():
if pd.isnull(df.ix[i]['Control_#']):
df.ix[i-1]['Recd_Date/Due_Date'] = str(df.ix[i-1]['Recd_Date/Due_Date'])+'/'+str(df.ix[i]['Recd_Date/Due_Date'])
df.ix[i-1]['Subject'] = str(df.ix[i-1]['Subject'])+' '+str(df.ix[i]['Subject'])
if str(df.ix[i-1]['Action_Office'])[-1] == '-':
df.ix[i-1]['Action_Office'] = str(df.ix[i-1]['Action_Office'])+str(df.ix[i]['Action_Office'])
else:
df.ix[i-1]['Action_Office'] = str(df.ix[i-1]['Action_Office'])+','+str(df.ix[i]['Action_Office'])
if pd.isnull(df.ix[i-1]['Signature/Requester']):
df.ix[i-1]['Signature/Requester'] = str(df.ix[i-1]['Signature/Requester'])+str(df.ix[i]['Signature/Requester'])
elif str(df.ix[i-1]['Signature/Requester'])[-1] == '&':
df.ix[i-1]['Signature/Requester'] = str(df.ix[i-1]['Signature/Requester'])+' '+str(df.ix[i]['Signature/Requester'])
else:
df.ix[i-1]['Signature/Requester'] = str(df.ix[i-1]['Signature/Requester'])+','+str(df.ix[i]['Signature/Requester'])
df.drop(df.index[i])
How come the drop() doesn't work? I am trying drop the current row (if its ['Control_#'] is null) so the next row (whose ['Control_#'] is null) can be added to the previous row (whose ['Control_#'] is NOT null) iteratively..
Much appreciated!!
I think you need to group the rows together and then join up the column values. The tricky part is finding a way to group together the rows in the way you want. Here is my solution...
1) Grouping Together the Rows: Static variables
Since your groups depend on a sequence in your rows I used a static variable in a method to label every row to a specific group
def rolling_group(val):
if pd.notnull(val): rolling_group.group +=1 #pd.notnull is signal to switch group
return rolling_group.group
rolling_group.group = 0 #static variable
This method is applied along the Control series to sort indexes into groups, which is then used to split up the dataframe to allow you to merge rows
#groups = df.groupby(df['Control'].apply(rolling_group),as_index=False)
That is really the only tricky part after that you can just merge the rows by applying a function to each group that gives you your desired output
Full Solution Code
def rolling_group(val):
if pd.notnull(val): rolling_group.group +=1 #pd.notnull is signal to switch group
return rolling_group.group
rolling_group.group = 0 #static variable
def joinFunc(g,column):
col =g[column]
joiner = "/" if column == "Action" else ","
s = joiner.join([str(each) for each in col if pd.notnull(each)])
s = re.sub("(?<=&)"+joiner," ",s) #joiner = " "
s = re.sub("(?<=-)"+joiner,"",s) #joiner = ""
s = re.sub(joiner*2,joiner,s) #fixes double joiner condition
return s
#edit above - str(each) - to convert to strings...
edit above regex to clean join string joins
if __name__ == "__main__":
df = """ Control Recd_Date/Due_Date Action Signature/Requester
0 2000-1703 2000-01-31 00:00:00 OC/OER/OPA/PMS/ M WEBB
1 NaN 2000-02-29 00:00:00 NaN DATA CORP
2 2000-1776 2000-01-02 00:00:00 OC/ORA/OE/DCP/ G KAN
3 NaN 2000-01-03 00:00:00 OC/ORA/ORO/PNC/ PALM POST
4 NaN NaN FDA/OGROP/ORA/SE-FO/FLA- NaN
5 NaN NaN DO/FLA-CB/ NaN
6 2000-1983 2000-02-02 00:00:00 FDA/OGROP/ORA/CE-FO/CHI- M EGAN
7 NaN 2000-02-03 00:00:00 DO/CHI-CB/ BERNSTEIN LIEBHARD &
8 NaN NaN NaN LONDON LLP"""
df = pd.read_csv(StringIO.StringIO(df),sep = "\s\s+",engine='python')
groups = df.groupby(df['Control'].apply(rolling_group),as_index=False)
groupFunct = lambda g: pd.Series([joinFunc(g,col) for col in g.columns],index=g.columns)
print groups.apply(groupFunct)
output
Control Recd_Date/Due_Date \
0 2000-1703 2000-01-31 00:00:00,2000-02-29 00:00:00
1 2000-1776 2000-01-02 00:00:00,2000-01-03 00:00:00
2 2000-1983 2000-02-02 00:00:00,2000-02-03 00:00:00
Action \
0 OC/OER/OPA/PMS/
1 OC/ORA/OE/DCP/OC/ORA/ORO/PNC/FDA/OGROP/ORA/SE-...
2 FDA/OGROP/ORA/CE-FO/CHI-DO/CHI-CB/
Signature/Requester
0 M WEBB,DATA CORP
1 G KAN,PALM POST
2 M EGAN,BERNSTEIN LIEBHARD & LONDON LLP