I have got a start date ('2019-11-18') and an end date ('2021-02-19'). I am trying to create a list of all the weeks of each month that exist between the start and end date. My expected result should be something like this:
list = ['2019.Nov.3','2019.Nov.4', '2019.Nov.5' .... '2021.Feb.2','2021.Feb.3']
If the first or last date of a month lands on a Wednesday, i will assume that the week belongs to this month (As 3 out of the 5 working days will belong to this month)
I was actually successful in creating a dataframe with all the weeks of the year that exist between the start and end date using the following code:
date_1 = '18-11-19'
first_date = datetime.strptime(date_1, '%d-%m-%y')
date_2 = '19-02-21'
last_date = datetime.strptime(date_2, '%d-%m-%y')
timeline = pd.DataFrame(columns=['Year', 'Week'])
def create_list(df):
start_year = int(first_date.isocalendar()[0])
start_week = int(first_date.isocalendar()[1])
end_year = int(last_date.isocalendar()[0])
end_week = int(last_date.isocalendar()[1])
while start_year < (end_year + 1):
if start_year == end_year:
while start_week < (end_week + 1):
if len(str(start_week)) == 1:
week = f'{start_year}' + '.0' + f'{start_week}'
else:
week = f'{start_year}' + '.' + f'{start_week}'
df = df.append(({'Year': start_year, 'Week': week}), ignore_index=True)
start_week += 1
else:
while start_week < 53:
if len(str(start_week)) == 1:
week = f'{start_year}' + '.0' + f'{start_week}'
else:
week = f'{start_year}' + '.' + f'{start_week}'
df = df.append(({'Year': start_year, 'Week': week}), ignore_index=True)
start_week += 1
start_year += 1
start_week = 1
return df
timeline = create_list(timeline)
I was successfully able to use this as an x axis for my line graph. However, the axis is a bit hard to read and its very difficult to know which week belongs to which month.
I would really appreciate if someone can give me a hand with this!
Edit:
So here is the solution with the guidance of #Serge Ballesta. I hope it helps anyone who might need something similiar in the future!
import pandas as pd
import dateutil.relativedelta
from datetime import datetime
def year_week(date):
if len(str(date.isocalendar()[1])) == 1:
return f'{date.isocalendar()[0]}' + '.0' + f'{date.isocalendar()[1]}'
else:
return f'{date.isocalendar()[0]}' + '.' + f'{date.isocalendar()[1]}'
date_1 = '18-11-19'
first_date = datetime.strptime(date_1, '%d-%m-%y')
date_2 = '19-02-21'
last_date = datetime.strptime(date_2, '%d-%m-%y')
set_first_date = str((first_date - dateutil.relativedelta.relativedelta(months=1)).date())
set_last_date = str((last_date + dateutil.relativedelta.relativedelta(months=1)).date())
s = pd.date_range(set_first_date, set_last_date, freq='W-WED'
).to_series(name='wed').reset_index(drop=True)
df = s.to_frame()
df['week'] = df.apply(lambda x: year_week(x['wed']), axis=1)
df = df.assign(week_of_month=s.groupby(s.dt.strftime('%Y%m')
).cumcount() + 1)
df = df[(s >= pd.Timestamp('2019-11-18'))
& (s <= pd.Timestamp('2021-02-19'))]
df['month_week'] = (df['wed'].dt.strftime('%Y.%b.') + df['week_of_month'].astype(str)).tolist()
df = df.drop(['wed', 'week_of_month'], axis = 1)
print (df)
Printed df:
week month_week
4 2019.47 2019.Nov.3
5 2019.48 2019.Nov.4
6 2019.49 2019.Dec.1
7 2019.50 2019.Dec.2
8 2019.51 2019.Dec.3
.. ... ...
65 2021.03 2021.Jan.3
66 2021.04 2021.Jan.4
67 2021.05 2021.Feb.1
68 2021.06 2021.Feb.2
69 2021.07 2021.Feb.3
I would build a Series of timestamps with a frequency of W-WED to have consistently Wednesday as day of week. That way, we immediately get the correct month for the week.
To have the number of the week in the month, I would start one month before the required start, and use a cumcount on year-month + 1. Then it would be enough to filter only the expected range and properly format the values:
# produce a series of wednesdays starting in 2019-10-01
s = pd.date_range('2019-10-01', '2021-03-31', freq='W-WED'
).to_series(name='wed').reset_index(drop=True)
# compute the week number in the month
df = s.to_frame().assign(week_of_month=s.groupby(s.dt.strftime('%Y%m')
).cumcount() + 1)
# filter the required range
df = df[(s >= pd.Timestamp('2019-11-18'))
& (s <= pd.Timestamp('2021-02-19'))]
# here is the expected list
lst = (df['wed'].dt.strftime('%Y.%b.')+df['week_of_month'].astype(str)).tolist()
lst is as expected:
['2019.Nov.3', '2019.Nov.4', '2019.Dec.1', '2019.Dec.2', '2019.Dec.3', '2019.Dec.4',
'2020.Jan.1', '2020.Jan.2', '2020.Jan.3', '2020.Jan.4', '2020.Jan.5', '2020.Feb.1',
'2020.Feb.2', '2020.Feb.3', '2020.Feb.4', '2020.Mar.1', '2020.Mar.2', '2020.Mar.3',
'2020.Mar.4', '2020.Apr.1', '2020.Apr.2', '2020.Apr.3', '2020.Apr.4', '2020.Apr.5',
'2020.May.1', '2020.May.2', '2020.May.3', '2020.May.4', '2020.Jun.1', '2020.Jun.2',
'2020.Jun.3', '2020.Jun.4', '2020.Jul.1', '2020.Jul.2', '2020.Jul.3', '2020.Jul.4',
'2020.Jul.5', '2020.Aug.1', '2020.Aug.2', '2020.Aug.3', '2020.Aug.4', '2020.Sep.1',
'2020.Sep.2', '2020.Sep.3', '2020.Sep.4', '2020.Sep.5', '2020.Oct.1', '2020.Oct.2',
'2020.Oct.3', '2020.Oct.4', '2020.Nov.1', '2020.Nov.2', '2020.Nov.3', '2020.Nov.4',
'2020.Dec.1', '2020.Dec.2', '2020.Dec.3', '2020.Dec.4', '2020.Dec.5', '2021.Jan.1',
'2021.Jan.2', '2021.Jan.3', '2021.Jan.4', '2021.Feb.1', '2021.Feb.2', '2021.Feb.3']
This may not give you exactly what you need (because of 3 out of 5 days in the last week condition), but may be you can get an idea on how to tweak it to get your desired result.
You can export column res to list with df['res'].to_list()
df = pd.DataFrame({'date': pd.date_range('2019-11-18','2021-02-19', freq=pd.offsets.Week(weekday=0))})
df['year_wk']= df.date.apply(lambda x: x.strftime("%W")).astype(int)
df['mon_beg_wk']= df.date.dt.to_period('M').dt.to_timestamp().dt.strftime("%W").astype(int)
df['mon_wk']= df['year_wk']-df['mon_beg_wk']
df['res']= df['date'].dt.strftime("%Y.%b")+'.'+df['mon_wk'].astype(str)
df
Output
date year_wk mon_beg_wk mon_wk res
0 2019-11-18 46 43 3 2019.Nov.3
1 2019-11-25 47 43 4 2019.Nov.4
2 2019-12-02 48 47 1 2019.Dec.1
3 2019-12-09 49 47 2 2019.Dec.2
4 2019-12-16 50 47 3 2019.Dec.3
5 2019-12-23 51 47 4 2019.Dec.4
6 2019-12-30 52 47 5 2019.Dec.5
7 2020-01-06 1 0 1 2020.Jan.1
8 2020-01-13 2 0 2 2020.Jan.2
9 2020-01-20 3 0 3 2020.Jan.3
10 2020-01-27 4 0 4 2020.Jan.4
11 2020-02-03 5 4 1 2020.Feb.1
12 2020-02-10 6 4 2 2020.Feb.2
13 2020-02-17 7 4 3 2020.Feb.3
14 2020-02-24 8 4 4 2020.Feb.4
15 2020-03-02 9 8 1 2020.Mar.1
16 2020-03-09 10 8 2 2020.Mar.2
17 2020-03-16 11 8 3 2020.Mar.3
18 2020-03-23 12 8 4 2020.Mar.4
19 2020-03-30 13 8 5 2020.Mar.5
20 2020-04-06 14 13 1 2020.Apr.1
21 2020-04-13 15 13 2 2020.Apr.2
22 2020-04-20 16 13 3 2020.Apr.3
23 2020-04-27 17 13 4 2020.Apr.4
24 2020-05-04 18 17 1 2020.May.1
25 2020-05-11 19 17 2 2020.May.2
26 2020-05-18 20 17 3 2020.May.3
27 2020-05-25 21 17 4 2020.May.4
28 2020-06-01 22 22 0 2020.Jun.0
29 2020-06-08 23 22 1 2020.Jun.1
... ... ... ... ... ...
36 2020-07-27 30 26 4 2020.Jul.4
37 2020-08-03 31 30 1 2020.Aug.1
38 2020-08-10 32 30 2 2020.Aug.2
39 2020-08-17 33 30 3 2020.Aug.3
40 2020-08-24 34 30 4 2020.Aug.4
41 2020-08-31 35 30 5 2020.Aug.5
42 2020-09-07 36 35 1 2020.Sep.1
43 2020-09-14 37 35 2 2020.Sep.2
44 2020-09-21 38 35 3 2020.Sep.3
45 2020-09-28 39 35 4 2020.Sep.4
46 2020-10-05 40 39 1 2020.Oct.1
47 2020-10-12 41 39 2 2020.Oct.2
48 2020-10-19 42 39 3 2020.Oct.3
49 2020-10-26 43 39 4 2020.Oct.4
50 2020-11-02 44 43 1 2020.Nov.1
51 2020-11-09 45 43 2 2020.Nov.2
52 2020-11-16 46 43 3 2020.Nov.3
53 2020-11-23 47 43 4 2020.Nov.4
54 2020-11-30 48 43 5 2020.Nov.5
55 2020-12-07 49 48 1 2020.Dec.1
56 2020-12-14 50 48 2 2020.Dec.2
57 2020-12-21 51 48 3 2020.Dec.3
58 2020-12-28 52 48 4 2020.Dec.4
59 2021-01-04 1 0 1 2021.Jan.1
60 2021-01-11 2 0 2 2021.Jan.2
61 2021-01-18 3 0 3 2021.Jan.3
62 2021-01-25 4 0 4 2021.Jan.4
63 2021-02-01 5 5 0 2021.Feb.0
64 2021-02-08 6 5 1 2021.Feb.1
65 2021-02-15 7 5 2 2021.Feb.2
I used datetime.timedelta to do this. It is supposed to work for all start and end dates.
import datetime
import math
date_1 = '18-11-19'
first_date = datetime.datetime.strptime(date_1, '%d-%m-%y')
date_2 = '19-02-21'
last_date = datetime.datetime.strptime(date_2, '%d-%m-%y')
start_week_m=math.ceil(int(first_date.strftime("%d"))/7)#Week number of first month
daysTill_nextWeek=7-int(first_date.strftime("%w"))#Number of days to next sunday.
date_template='%Y.%b.'
tempdate=first_date
weeks=['%s%d' % (tempdate.strftime(date_template),start_week_m)]
tempdate=tempdate+datetime.timedelta(days=daysTill_nextWeek)#tempdate becomes the next sunday
while tempdate < last_date:
temp_year,temp_month=int(tempdate.strftime("%Y")),int(tempdate.strftime("%m"))
print(start_week_m)
weeks.append('%s%d' % (tempdate.strftime(date_template),start_week_m+1))
start_week_m+=1
tempdate=tempdate+datetime.timedelta(days=7)
if temp_month != int(tempdate.strftime("%m")):
print(temp_year,int(tempdate.strftime("%Y")))
start_week_m=0
print(weeks)
prints
['2019.Nov.3', '2019.Nov.4', '2019.Dec.1', '2019.Dec.2', '2019.Dec.3', '2019.Dec.4', '2019.Dec.5', '2020.Jan.1', '2020.Jan.2', '2020.Jan.3', '2020.Jan.4', '2020.Feb.1', '2020.Feb.2', '2020.Feb.3', '2020.Feb.4', '2020.Mar.1', '2020.Mar.2', '2020.Mar.3', '2020.Mar.4', '2020.Mar.5', '2020.Apr.1', '2020.Apr.2', '2020.Apr.3', '2020.Apr.4', '2020.May.1', '2020.May.2', '2020.May.3', '2020.May.4', '2020.May.5', '2020.Jun.1', '2020.Jun.2', '2020.Jun.3', '2020.Jun.4', '2020.Jul.1', '2020.Jul.2', '2020.Jul.3', '2020.Jul.4', '2020.Aug.1', '2020.Aug.2', '2020.Aug.3', '2020.Aug.4', '2020.Aug.5', '2020.Sep.1', '2020.Sep.2', '2020.Sep.3', '2020.Sep.4', '2020.Oct.1', '2020.Oct.2', '2020.Oct.3', '2020.Oct.4', '2020.Nov.1', '2020.Nov.2', '2020.Nov.3', '2020.Nov.4', '2020.Nov.5', '2020.Dec.1', '2020.Dec.2', '2020.Dec.3', '2020.Dec.4', '2021.Jan.1', '2021.Jan.2', '2021.Jan.3', '2021.Jan.4', '2021.Jan.5', '2021.Feb.1', '2021.Feb.2']
Related
I am having issues finding a solution for the cummulative sum for mtd and ytd
I need help to get this result
Use groupby.cumsum combined with periods using to_period:
# ensure datetime
s = pd.to_datetime(df['date'], dayfirst=False)
# group by year
df['ytd'] = df.groupby(s.dt.to_period('Y'))['count'].cumsum()
# group by month
df['mtd'] = df.groupby(s.dt.to_period('M'))['count'].cumsum()
Example (with dummy data):
date count ytd mtd
0 2022-08-26 6 6 6
1 2022-08-27 1 7 7
2 2022-08-28 4 11 11
3 2022-08-29 4 15 15
4 2022-08-30 8 23 23
5 2022-08-31 4 27 27
6 2022-09-01 6 33 6
7 2022-09-02 3 36 9
8 2022-09-03 5 41 14
9 2022-09-04 8 49 22
10 2022-09-05 7 56 29
11 2022-09-06 9 65 38
12 2022-09-07 9 74 47
Considering a df structured like this
Time X
01-01-18 1
01-02-18 20
01-03-18 34
01-04-18 67
01-01-18 89
01-02-18 45
01-03-18 22
01-04-18 1
01-01-19 11
01-02-19 6
01-03-19 78
01-04-19 5
01-01-20 23
01-02-20 6
01-03-20 9
01-04-20 56
01-01-21 78
01-02-21 33
01-03-21 2
01-04-21 67
I want to de-trend the times series from February to April for each year and append it to a new column Y
So far I thought something like this
from datetime import date, timedelta
import pandas as pd
df = pd.read_csv(...)
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
df['Y'] = np.nan
def daterange(start_date, end_date):
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
start_date = df.date(2018, 1, 2)
end_date = df.date(2018, 1, 4)
for date in daterange(start_date, end_date):
df['Y'] = signal.detrend(df['X'])
My concern is that it would iterate over single observations and not over the trend of the selected period. Any way to fix it?
Another issue is how to iterate it over all the years without changing start/end dates each time
When converting strings to the datetime format, you can specify the format directly. infer_datetime_format can mix up day and month.
df['date'] = pd.to_datetime(df['Time'], format='%d-%m-%y')
from scipy.signal import detrend
IIUC, here are two ways to achieve what you want:
1.
I would prefer this way - using .apply():
def f(df):
result = df['X'].copy()
months = df['date'].dt.month
mask = (months >= 2) & (months <= 4)
result[mask] = detrend(result[mask])
return result
df['new'] = df.groupby(df['date'].dt.year, group_keys=False).apply(f)
2.
Another way - using .transform():
ser = df['X'].copy()
ser.index = df['date']
def f(s):
result = s.copy()
months = s.index.month
mask = (months >= 2) & (months <= 4)
result[mask] = detrend(result[mask])
return result
new = ser.groupby(ser.index.year).transform(f)
new.index = df.index
df['new'] = new
Result:
date X new
0 2018-01-01 1 1.000000
1 2018-02-01 20 -22.428571
2 2018-03-01 34 -4.057143
3 2018-04-01 67 33.314286
4 2018-01-01 89 89.000000
5 2018-02-01 45 15.685714
6 2018-03-01 22 -2.942857
7 2018-04-01 1 -19.571429
8 2019-01-01 11 11.000000
9 2019-02-01 6 -24.166667
10 2019-03-01 78 48.333333
11 2019-04-01 5 -24.166667
12 2020-01-01 23 23.000000
13 2020-02-01 6 7.333333
14 2020-03-01 9 -14.666667
15 2020-04-01 56 7.333333
16 2021-01-01 78 78.000000
17 2021-02-01 33 16.000000
18 2021-03-01 2 -32.000000
19 2021-04-01 67 16.000000
So I have a data frame that is something like this
Resource 2020-06-01 2020-06-02 2020-06-03
Name1 8 7 8
Name2 7 9 9
Name3 10 10 10
Imagine that the header is literal all the days of the month. And that there are way more names than just three.
I need to reduce the columns to five. Considering the first column to be the days between 2020-06-01 till 2020-06-05. Then from Saturday till Friday of the same week. Or the last day of the month if it is before Friday. So for June would be these weeks:
week 1: 2020-06-01 to 2020-06-05
week 2: 2020-06-06 to 2020-06-12
week 3: 2020-06-13 to 2020-06-19
week 4: 2020-06-20 to 2020-06-26
week 5: 2020-06-27 to 2020-06-30
I have no problem defining these weeks. The problem is grouping the columns based on them.
I couldn't come up with anything.
Does someone have any ideas about this?
I have to use these code to generate your dataframe.
dates = pd.date_range(start='2020-06-01', end='2020-06-30')
df = pd.DataFrame({
'Name1': np.random.randint(1, 10, size=len(dates)),
'Name2': np.random.randint(1, 10, size=len(dates)),
'Name3': np.random.randint(1, 10, size=len(dates)),
})
df = df.set_index(dates).transpose().reset_index().rename(columns={'index': 'Resource'})
Then, the solution starts from here.
# Set the first column as index
df = df.set_index(df['Resource'])
# Remove the unused column
df = df.drop(columns=['Resource'])
# Transpose the dataframe
df = df.transpose()
# Output:
Resource Name1 Name2 Name3
2020-06-01 00:00:00 3 2 7
2020-06-02 00:00:00 5 6 8
2020-06-03 00:00:00 2 3 6
...
# Bring "Resource" from index to column
df = df.reset_index()
df = df.rename(columns={'index': 'Resource'})
# Add a column "week of year"
df['week_no'] = df['Resource'].dt.weekofyear
# You can simply group by the week no column
df.groupby('week_no').sum().reset_index()
# Output:
Resource week_no Name1 Name2 Name3
0 23 38 42 41
1 24 37 30 43
2 25 38 29 23
3 26 29 40 42
4 27 2 8 3
I don't know what you want to do for the next. If you want your original form, just transpose() it back.
EDIT: OP claimed the week should start from Saturday end up with Friday
# 0: Monday
# 1: Tuesday
# 2: Wednesday
# 3: Thursday
# 4: Friday
# 5: Saturday
# 6: Sunday
df['weekday'] = df['Resource'].dt.weekday.apply(lambda day: 0 if day <= 4 else 1)
df['customised_weekno'] = df['week_no'] + df['weekday']
Output:
Resource Resource Name1 Name2 Name3 week_no weekday customised_weekno
0 2020-06-01 4 7 7 23 0 23
1 2020-06-02 8 6 7 23 0 23
2 2020-06-03 5 9 5 23 0 23
3 2020-06-04 7 6 5 23 0 23
4 2020-06-05 6 3 7 23 0 23
5 2020-06-06 3 7 6 23 1 24
6 2020-06-07 5 4 4 23 1 24
7 2020-06-08 8 1 5 24 0 24
8 2020-06-09 2 7 9 24 0 24
9 2020-06-10 4 2 7 24 0 24
10 2020-06-11 6 4 4 24 0 24
11 2020-06-12 9 5 7 24 0 24
12 2020-06-13 2 4 6 24 1 25
13 2020-06-14 6 7 5 24 1 25
14 2020-06-15 8 7 7 25 0 25
15 2020-06-16 4 3 3 25 0 25
16 2020-06-17 6 4 5 25 0 25
17 2020-06-18 6 8 2 25 0 25
18 2020-06-19 3 1 2 25 0 25
So, you can use customised_weekno for grouping.
I have a dataframe with columns date, day and week of the year.
I need to use week of the year to create a new column with values from 1 to 5.
Lets say i'm on week 35 all the columns with week 35 should have one, the weeks with 36 should have 2 and so on.
Once it reaches week 40 and number 5 the numbers in the new column need to start from 1 at week 41 and continue in this kind of pattern for however long the data range is
def date_table(start='2019-08-26', end='2019-10-27'):
df = pd.DataFrame({"Date": pd.date_range(start, end)})
df["Day"] = df.Date.dt.weekday_name
df["Week"] = df.Date.dt.weekofyear
return df
Calculate the index using modulo and the weeknumber:
import pandas as pd
start='2019-08-26'
end='2019-10-27'
df = pd.DataFrame({"Date": pd.date_range(start, end)})
df["Day"] = df.Date.dt.weekday_name
df["Week"] = df.Date.dt.weekofyear
df["idx"] = df["Week"] % 5 +1 # n % 5 = 0..4 plus 1 == 1..5
print(df)
Output:
0 2019-08-26 Monday 35 1
[...]
6 2019-09-01 Sunday 35 1
7 2019-09-02 Monday 36 2
[...]
13 2019-09-08 Sunday 36 2
14 2019-09-09 Monday 37 3
[...]
20 2019-09-15 Sunday 37 3
21 2019-09-16 Monday 38 4
[...]
27 2019-09-22 Sunday 38 4
28 2019-09-23 Monday 39 5
[...]
34 2019-09-29 Sunday 39 5
35 2019-09-30 Monday 40 1
[...]
[63 rows x 4 columns]
If you want it to start on a not by 5 divisible weeknumber - you can do that too by substracting the modulo 5 value of the first week for all weeknumbers:
# startweeknumber:
startweekmod = df["Week"][0] % 5
# offset by inital weeks mod
df["idx"] = (df["Week"] - startweekmod) % 5 + 1
I have a data frame with index members which looks like this (A,B,C,... are the company names):
df_members
Date 1 2 3 4
0 2016-01-01 A B C D
1 2016-01-02 B C D E
2 2016-01-03 C D E F
3 2016-01-04 F A B C
4 2016-01-05 B C D E
5 2016-01-06 A B C D
and I have a second table including e.g. prices:
df_prices
Date A B C D E F
0 2015-12-30 1 2 3 4 5 6
1 2015-12-31 7 8 9 10 11 12
2 2016-01-01 13 14 15 16 17 18
3 2016-01-02 20 21 22 23 24 25
4 2016-01-03 27 28 29 30 31 32
5 2016-01-04 34 35 36 37 38 39
6 2016-01-05 41 42 43 44 45 46
7 2016-01-06 48 49 50 51 52 53
The goal is to replace all company names in df1 with the price from df_prices resulting in df_result:
df_result
Date 1 2 3 4
0 2016-01-01 13 14 15 16
1 2016-01-02 21 22 23 24
2 2016-01-03 29 30 31 32
3 2016-01-04 39 34 35 36
4 2016-01-05 42 43 44 45
5 2016-01-06 48 49 50 51
I already have a solution where I iterate through all cells in df_members, look for the values in df_prices and write them in a new data frame df_result. The problem is that my data frames are very large and this process takes around 7 hours.
I already tried to use the merge/join, map or lookup function but it could not solve the problem.
My approach is the following:
# Create new dataframes
df_result = pd.DataFrame(columns=df_members.columns, index=unique_dates_list)
# Load prices
df_prices = prices
# Search ticker & write values in new dataframe
for i in range(0,len(df_members)):
for j in range(0,len(df_members.columns)):
if str(df_members.iloc[i, j]) != 'nan' and df_members.iloc[i, j] in df_prices.columns:
df_result.iloc[i, j] = df_prices.iloc[i, df_prices.columns.get_loc(df_members.iloc[i, j])]
Question: Is there a way to map the values more efficiently?
pandas.lookup() will do what you need:
Code:
df_result = pd.DataFrame(columns=[], index=df_members.index)
for column in df_members.columns:
df_result[column] = df_prices.lookup(
df_members.index, df_members[column])
Test Code:
import pandas as pd
df_members = pd.read_fwf(StringIO(
u"""
Date 1 2 3 4
2016-01-01 A B C D
2016-01-02 B C D E
2016-01-03 C D E F
2016-01-04 F A B C
2016-01-05 B C D E
2016-01-06 A B C D"""
), header=1).set_index('Date')
df_prices = pd.read_fwf(StringIO(
u"""
Date A B C D E F
2015-12-30 1 2 3 4 5 6
2015-12-31 7 8 9 10 11 12
2016-01-01 13 14 15 16 17 18
2016-01-02 20 21 22 23 24 25
2016-01-03 27 28 29 30 31 32
2016-01-04 34 35 36 37 38 39
2016-01-05 41 42 43 44 45 46
2016-01-06 48 49 50 51 52 53"""
), header=1).set_index('Date')
df_result = pd.DataFrame(columns=[], index=df_members.index)
for column in df_members.columns:
df_result[column] = df_prices.lookup(
df_members.index, df_members[column])
print(df_result)
Results:
1 2 3 4
Date
2016-01-01 13 14 15 16
2016-01-02 21 22 23 24
2016-01-03 29 30 31 32
2016-01-04 39 34 35 36
2016-01-05 42 43 44 45
2016-01-06 48 49 50 51