Nested for loop python pandas not functioning as desired

Nested for loop python pandas not functioning as desired - python

Code to generate random database for question (minimum reproducible issue):
df_random = pd.DataFrame(np.random.random((2000,3)))
df_random['order_date'] = pd.date_range(start='1/1/2015',
periods=len(df_random), freq='D')
df_random['customer_id'] = np.random.randint(1, 20, df_random.shape[0])
df_random
Output df_random
0 1 2 order_date customer_id
0 0.018473 0.970257 0.605428 2015-01-01 12
... ... ... ... ... ...
1999 0.800139 0.746605 0.551530 2020-06-22 11
Code to extract mean unique transactions month and year wise
for y in (2015,2019):
for x in (1,13):
df2 = df_random[(df_random['order_date'].dt.month == x)&(df_random['order_date'].dt.year== y)]
df2.sort_values(['customer_id','order_date'],inplace=True)
df2["days"] = df2.groupby("customer_id")["order_date"].apply(lambda x: (x - x.shift()) / np.timedelta64(1, "D"))
df_mean=round(df2['days'].mean(),2)
data2 = data.append(pd.DataFrame({'Mean': df_mean , 'Month': x, 'Year': y}, index=[0]), ignore_index=True)
print(data2)
Expected output
Mean Month Year
0 5.00 1 2015
.......................
11 6.62 12 2015
..............Mean values of days after which one transaction occurs in order_date for years 2016 and 2017 Jan to Dec
36 6.03 1 2018
..........................
47 6.76 12 2018
48 8.40 1 2019
.......................
48 8.40 12 2019
Basically I want single dataframe starting from 2015 Jan month to 2019 December
Instead of the expected output I am getting dataframe from Jan 2015 to Dec 2018 , then again Jan 2015 data and then the entire dataset repeats again from 2015 to 2018 many more times.
Please help

Try this:
data2 = pd.DataFrame([])
for y in range(2015,2020):
for x in range(1,13):
df2 = df_random[(df_random['order_date'].dt.month == x)&(df_random['order_date'].dt.year== y)]
df_mean=df2.groupby("customer_id")["order_date"].apply(lambda x: (x - x.shift()) / np.timedelta64(1, "D")).mean().round(2)
data2 = data2.append(pd.DataFrame({'Mean': df_mean , 'Month': x, 'Year': y}, index=[0]), ignore_index=True)
print(data2)
Try this :
df_random.order_date = pd.to_datetime(df_random.order_date)
df_random = df_random.set_index(pd.DatetimeIndex(df_random['order_date']))
output = df_random.groupby(pd.Grouper(freq="M"))[[0,1,2]].agg(np.mean).reset_index()
output['month'] = output.order_date.dt.month
output['year'] = output.order_date.dt.year
output = output.drop('order_date', axis=1)
output
Output
0 1 2 month year
0 0.494818 0.476514 0.496059 1 2015
1 0.451611 0.437638 0.536607 2 2015
2 0.476262 0.567519 0.528129 3 2015
3 0.519229 0.475887 0.612433 4 2015
4 0.464781 0.430593 0.445455 5 2015
... ... ... ... ... ...
61 0.416540 0.564928 0.444234 2 2020
62 0.553787 0.423576 0.422580 3 2020
63 0.524872 0.470346 0.560194 4 2020
64 0.530440 0.469957 0.566077 5 2020
65 0.584474 0.487195 0.557567 6 2020

Avoid any looping and simply include year and month in groupby calculation:
np.random.seed(1022020)
...
# ASSIGN MONTH AND YEAR COLUMNS, THEN SORT COLUMNS
df_random = (df_random.assign(month = lambda x: x['order_date'].dt.month,
year = lambda x: x['order_date'].dt.year)
.sort_values(['customer_id', 'order_date']))
# GROUP BY CALCULATION
df_random["days"] = (df_random.groupby(["customer_id", "year", "month"])["order_date"]
.apply(lambda x: (x - x.shift()) / np.timedelta64(1, "D")))
# FINAL MEAN AGGREGATION BY YEAR AND MONTH
final_df = (df_random.groupby(["year", "month"], as_index=False)["days"].mean().round(2)
.rename(columns={"days":"mean"}))
print(final_df.head())
# year month mean
# 0 2015 1 8.43
# 1 2015 2 5.87
# 2 2015 3 4.88
# 3 2015 4 10.43
# 4 2015 5 8.12
print(final_df.tail())
# year month mean
# 61 2020 2 8.27
# 62 2020 3 8.41
# 63 2020 4 8.81
# 64 2020 5 9.12
# 65 2020 6 7.00
For multiple aggregates, replace the single groupby.mean() to groupby.agg():
final_df = (df_random.groupby(["year", "month"], as_index=False)["days"]
.agg(['count', 'min', 'mean', 'median', 'max'])
.rename(columns={"days":"mean"}))
print(final_df.head())
# count min mean median max
# year month
# 2015 1 14 1.0 8.43 5.0 25.0
# 2 15 1.0 5.87 5.0 17.0
# 3 16 1.0 4.88 5.0 9.0
# 4 14 1.0 10.43 7.5 23.0
# 5 17 2.0 8.12 8.0 17.0
print(final_df.tail())
# count min mean median max
# year month
# 2020 2 15 1.0 8.27 6.0 21.0
# 3 17 1.0 8.41 7.0 16.0
# 4 16 1.0 8.81 7.0 20.0
# 5 16 1.0 9.12 7.0 22.0
# 6 7 2.0 7.00 7.0 17.0

Related

Pandas Method Chaining: getting KeyError on calculated column

I’m scraping web data to get US college football poll top 25 information that I store in a Pandas dataframe. The data has multiple years of poll information, with preseason and final polls for each year. Each poll ranks teams from 1 to 25. Team ranks are determined by the voting points each team received; the team with most points is ranked 1, etc. Both rank and points are included in the dataset. Here's the head of the raw data df:
cols = ['Year','Type', 'Team (FPV)', 'Rank', 'Pts']
all_wks_raw[cols].head()
The dataframe has columns for Rank and Pts (Points). The Rank column (dytpe object) contains numeric ranks of 1-25 plus “RV” for teams that received points but did not rank in the top 25. The Pts column is dtype int64. Since Pts for teams that did not make the top 25 are included in the data, I’m able to re-rank the teams based on Pts and thus extend rankings beyond the top 25. The resulting revrank column ranks teams from 1 to between 37 and 61, depending how many teams received points in that poll. Revrank is the first new column I create.
The revrank column should equal the Rank column for the first 25 teams, but before I can test it I need to create a new column that converts Rank to numeric. The result is rank_int, which is my second created column. Then I try to create a third column that calculates the difference between the two created columns, and this is where I get the KeyError. Here's the chain:
all_wks_clean = (all_wks_raw
#create new column that converts Rank to numeric-this works
.assign(rank_int = pd.to_numeric(all_wks_raw['Rank'], errors='coerce').fillna(0))
#create new column that re-ranks teams based on Points: extends rankings beyond original 25-this works
.assign(gprank = all_wks_raw.reset_index(drop=True).groupby(['Year','Type'])['Pts'].rank(ascending=0,method='min'))
#create new column that takes the difference between gprank and rank_int columns created above-this fails with KeyError: 'gprank'
.assign(ck_rank = all_wks_raw['gprank'] - all_wks_raw['rank_int'])
)
Are the results of the first two assignments not being passed to the third? Am I missing something in the syntax? Thanks for the help.
Edited 7/20/2022 to add complete code; note that this code scrapes data from the College Poll Archive web site:
dict = {1119: [2016, '2016 Final AP Football Poll', 'Final'], 1120: [2017, '2017 Preseason AP Football Poll', 'Preseason'],
1135: [2017, '2017 Final AP Football Poll', 'Final'], 1136: [2018, '2018 Preseason AP Football Poll', 'Preseason'],
1151: [2018, '2018 Final AP Football Poll', 'Final'], 1152: [2019, '2019 Preseason AP Football Poll', 'Preseason']}
#get one week of poll data from College Poll Archive ID parameter
def getdata(id):
coldefs = {'ID':key, 'Year': value[0], 'Title': value[1], 'Type':value[2]} #define dictionary of scalar columns to add to dataframe
urlseg = 'https://www.collegepollarchive.com/football/ap/seasons.cfm?appollid='
url = urlseg + str(id)
dfs = pd.read_html(url)
df = dfs[0].assign(**coldefs)
return df
all_wks_raw = pd.DataFrame()
for key, value in dict.items():
print(key, value[0], value[2])
onewk = getdata(key)
all_wks_raw = all_wks_raw.append(onewk)
all_wks_clean = (all_wks_raw
#create new column that converts Rank to numeric-this works
.assign(rank_int = pd.to_numeric(all_wks_raw['Rank'], errors='coerce').fillna(0))
#create new column that re-ranks teams based on Points: extends rankings beyond original 25-this works
.assign(gprank = all_wks_raw.reset_index(drop=True).groupby(['Year','Type'])['Pts'].rank(ascending=0,method='min'))
#create new column that takes the difference between gprank and rank_int columns created above-this fails with KeyError: 'gprank'
.assign(ck_rank = all_wks_raw['gprank'] - all_wks_raw['rank_int'])
)

If accessing a column that doesn't yet exist, that must be done through a lambda:
dfs = pd.read_html('https://www.collegepollarchive.com/football/ap/seasons.cfm?seasonid=2019')
df = dfs[0][['Team (FPV)', 'Rank', 'Pts']].copy()
df['Year'] = 2016
df['Type'] = 'final'
df = df.assign(rank_int = pd.to_numeric(df['Rank'], errors='coerce').fillna(0).astype(int),
gprank = df.groupby(['Year','Type'])['Pts'].rank(ascending=0,method='min'),
ck_rank = lambda x: x['gprank'].sub(x['rank_int']))
print(df)
Output:
Team (FPV) Rank Pts Year Type rank_int gprank ck_rank
0 LSU (62) 1 1550 2016 final 1 1.0 0.0
1 Clemson 2 1487 2016 final 2 2.0 0.0
2 Ohio State 3 1426 2016 final 3 3.0 0.0
3 Georgia 4 1336 2016 final 4 4.0 0.0
4 Oregon 5 1249 2016 final 5 5.0 0.0
5 Florida 6 1211 2016 final 6 6.0 0.0
6 Oklahoma 7 1179 2016 final 7 7.0 0.0
7 Alabama 8 1159 2016 final 8 8.0 0.0
8 Penn State 9 1038 2016 final 9 9.0 0.0
9 Minnesota 10 952 2016 final 10 10.0 0.0
10 Wisconsin 11 883 2016 final 11 11.0 0.0
11 Notre Dame 12 879 2016 final 12 12.0 0.0
12 Baylor 13 827 2016 final 13 13.0 0.0
13 Auburn 14 726 2016 final 14 14.0 0.0
14 Iowa 15 699 2016 final 15 15.0 0.0
15 Utah 16 543 2016 final 16 16.0 0.0
16 Memphis 17 528 2016 final 17 17.0 0.0
17 Michigan 18 468 2016 final 18 18.0 0.0
18 Appalachian State 19 466 2016 final 19 19.0 0.0
19 Navy 20 415 2016 final 20 20.0 0.0
20 Cincinnati 21 343 2016 final 21 21.0 0.0
21 Air Force 22 209 2016 final 22 22.0 0.0
22 Boise State 23 188 2016 final 23 23.0 0.0
23 UCF 24 78 2016 final 24 24.0 0.0
24 Texas 25 69 2016 final 25 25.0 0.0
25 Texas A&M RV 54 2016 final 0 26.0 26.0
26 Florida Atlantic RV 46 2016 final 0 27.0 27.0
27 Washington RV 39 2016 final 0 28.0 28.0
28 Virginia RV 28 2016 final 0 29.0 29.0
29 USC RV 16 2016 final 0 30.0 30.0
30 San Diego State RV 13 2016 final 0 31.0 31.0
31 Arizona State RV 12 2016 final 0 32.0 32.0
32 SMU RV 10 2016 final 0 33.0 33.0
33 Tennessee RV 8 2016 final 0 34.0 34.0
34 California RV 6 2016 final 0 35.0 35.0
35 Kansas State RV 2 2016 final 0 36.0 36.0
36 Kentucky RV 2 2016 final 0 36.0 36.0
37 Louisiana RV 2 2016 final 0 36.0 36.0
38 Louisiana Tech RV 2 2016 final 0 36.0 36.0
39 North Dakota State RV 2 2016 final 0 36.0 36.0
40 Hawaii NR 0 2016 final 0 41.0 41.0
41 Louisville NR 0 2016 final 0 41.0 41.0
42 Oklahoma State NR 0 2016 final 0 41.0 41.0

Adding to BeRT2me's answer, when chaining, lambda's are pretty much always the way to go. When you use the original dataframe name, pandas looks at the dataframe as it was before the statement was executed. To avoid confusion, go with:
df = df.assign(rank_int = lambda x: pd.to_numeric(x['Rank'], errors='coerce').fillna(0).astype(int),
gprank = lambda x: x.groupby(['Year','Type'])['Pts'].rank(ascending=0,method='min'),
ck_rank = lambda x: x['gprank'].sub(x['rank_int']))
The x you define is the dataframe at that state in the chain.
This helps especially when your chains get longer. E.g, if you filter out some rows or aggregate you get different results (or maybe error) depending what you're trying to do.
For example, if you were just looking at the relative rank of 3 teams:
df = pd.DataFrame({
'Team (FPV)': list('abcde'),
'Rank': list(range(5)),
'Pts': list(range(5)),
})
df['Year'] = 2016
df['Type'] = 'final'
df = (df
.loc[lambda x: x['Team (FPV)'].isin(["b", "c", "d"])]
.assign(bcd_rank = lambda x: x.groupby(['Year','Type'])['Pts'].rank(ascending=0,method='min'))
)
print(df)
gives:
Team (FPV) Rank Pts Year Type bcd_rank
1 b 1 1 2016 final 3.0
2 c 2 2 2016 final 2.0
3 d 3 3 2016 final 1.0
Whereas:
df = pd.DataFrame({
'Team (FPV)': list('abcde'),
'Rank': list(range(5)),
'Pts': list(range(5)),
})
df['Year'] = 2016
df['Type'] = 'final'
df = (df
.loc[lambda x: x['Team (FPV)'].isin(["b", "c", "d"])]
.assign(bcd_rank = df.groupby(['Year','Type'])['Pts'].rank(ascending=0,method='min'))
)
print(df)
gives a different ranking:
Team (FPV) Rank Pts Year Type bcd_rank
1 b 1 1 2016 final 4.0
2 c 2 2 2016 final 3.0
3 d 3 3 2016 final 2.0
If you want to go deeper, I'd recommend https://tomaugspurger.github.io/method-chaining.html to go on your reading list.

How do I replace the values of a specific timeframe with a weighting of the same timeframe from previous years?

I have a dataframe that has data for 4 years which kind of look like this:
Year
Week
Value
2018
1
25
2018
2
28
2018
3
26
2019
1
24
2019
2
34
2019
3
30
2020
1
27
2020
2
33
2020
3
32
2021
1
39
2021
2
43
2021
3
41
What I want to do is to replace the values in 2021 with a weighting of the previous 3 years values in the same time frame. So in this example replace only the values from weeks 1 to 3 in 2021 (there could be other weeks to be left alone) with say: 45%*2020 + 30%*2019 + 25%*2018
Which would give us the following for 2021:
Year
Week
Value
2021
1
20.65
2021
2
32.05
2021
3
29.9
And we got the values for 2021 week 3 by doing:
0.4532 + 0.330 + 0.25*26 = 14.4 + 9 + 6.5 = 29.9
Also, I want to be able to skip years if I want to, 2021 can be based off of 2020, 2019, and 2016 for example.

You can create a custom function as it sounds like you need customizable parameters. There isn't a specific pandas method that can do this:
def f(df=df, years=[], weeks=[], weights=[], current_year=2021):
df = df[df['Week'].isin(weeks)]
series_weights = df['Year'].map({year : weight for year, weight in zip(years, weights)})
df['Value'] = df['Value'] * series_weights
df = df.assign(Year=current_year).groupby(['Year', 'Week'], as_index=False)['Value'].sum()
return df
f(years=[2018,2019,2020], weeks=[1,2,3], weights=[0.25,0.3,0.45])
Out[1]:
Year Week Value
0 2021 1 25.60
1 2021 2 32.05
2 2021 3 29.90

Based on a condition, convert a week date to day on Pandas

I have this dataset, which have year, month, week and sales numbers:
df = pd.DataFrame()
df['year'] = [2011,2011,2011,2012,2012,2012]
df['month'] = [12,12,12,1,1,1]
df['week'] = [51,52,53,1,2,3]
df['sales'] = [10000,12000,11000,5000,12000,11000]
df['date_ix'] = df['year'] * 1000 + (df['week']-1) * 10 + 1
df['date_week'] = pd.to_datetime(df['date_ix'], format='%Y%W%w')
df
year month week sales date_ix date_week
0 2011 12 51 10000 2011501 2011-12-12
1 2011 12 52 12000 2011511 2011-12-19
2 2011 12 53 11000 2011521 2011-12-26
3 2012 1 1 5000 2012001 2011-12-26
4 2012 1 2 12000 2012011 2012-01-02
5 2012 1 3 11000 2012021 2012-01-09
Now date_week is the beginning day of the week (Monday). I want to convert date_week to day except by the first week of the year, where I want to isolate the day (in this case is 2012-01-01 which was Sunday). I have tried this, but something's wrong.
df['date_start'] = np.where((df['year']==2012) & (df['week']==1), \
pd.to_datetime(str(20120101), format='%Y%m%d'), \
pd.to_datetime(df['date_ix'], format='%Y%W%w'))
year month week sales date_ix date_week date_start
0 2011 12 51 10000 2011501 2011-12-12 1323648000000000000
1 2011 12 52 12000 2011511 2011-12-19 1324252800000000000
2 2011 12 53 11000 2011521 2011-12-26 1324857600000000000
3 2012 1 1 5000 2012001 2011-12-26 2012-01-01 00:00:00
4 2012 1 2 12000 2012011 2012-01-02 1325462400000000000
5 2012 1 3 11000 2012021 2012-01-09 1326067200000000000
The expected result should be:
year month week sales date_ix date_week date_start
0 2011 12 51 10000 2011501 2011-12-12 2011-12-12
1 2011 12 52 12000 2011511 2011-12-19 2011-12-19
2 2011 12 53 11000 2011521 2011-12-26 2011-12-26
3 2012 1 1 5000 2012001 2011-12-26 2012-01-01
4 2012 1 2 12000 2012011 2012-01-02 2012-01-02
5 2012 1 3 11000 2012021 2012-01-09 2012-01-09
Please, any help will be greatly appreciated.

You need to enclose df['year']==2012 and df['week']==1 with parentheses because of the priority of == and &.
df['date_start'] = np.where((df['year']==2012) & (df['week']==1), \
pd.to_datetime(str(20120101), format='%Y%m%d'), \
pd.to_datetime(df['date_ix'], format='%Y%W%w'))
Then change pd.to_datetime(str(20120101), format='%Y%m%d') in np.where to pd.to_datetime(df['year'], format='%Y')
df['date_start'] = np.where((df['year']==2012) & (df['week']==1), \
pd.to_datetime(df['year'], format='%Y'),
df['date_week'])
print(df)
year month week sales date_ix date_week date_start
0 2011 12 51 10000 2011501 2011-12-12 2011-12-12
1 2011 12 52 12000 2011511 2011-12-19 2011-12-19
2 2011 12 53 11000 2011521 2011-12-26 2011-12-26
3 2012 1 1 5000 2012001 2011-12-26 2012-01-01
4 2012 1 2 12000 2012011 2012-01-02 2012-01-02
5 2012 1 3 11000 2012021 2012-01-09 2012-01-09

What about this ?
df['date_start'] = pd.to_datetime(df.week.astype(str)+
df.year.astype(str).add('-1') ,format='%V%G-%u')
This will give date_start as the date of the Monday of the week of interest.
(Note that there is a shift with your current date_start, you might want to add a 1 week tmimedelta to compensate for it.

converting date to string and splitting day,month and year

Below is my table with date column as object datatype.
My agenda is to convert object into string datatype and split the date column into day, month and year.
I tried many ways but no luck.
Can someone help on this?
Date x y z a b
09.05.2013 4 31 12472 199.0 1.0
25.12.2013 11 26 1856 1699.0 1.0
18.11.2014 22 25 15263 699.0 1.0
05.03.2015 26 28 5037 2599.0 1.0
14.10.2015 33 6 17270 199.0 1.0

If you are using pandas dataframe, you could do as follows.
df['Date'] = df['Date'].astype(str)
df['Day'] = df['Date'].str[0:2]
df['Month'] = df['Date'].str[3:5]
df['Year'] = df['Date'].str[6:]
df
which gives you the following output.
Date x y z a b Day Month Year
0 09.05.2013 4 31 12472 199.0 1.0 09 05 2013
1 25.12.2013 11 26 1856 1699.0 1.0 25 12 2013
2 18.11.2014 22 25 15263 699.0 1.0 18 11 2014
3 05.03.2015 26 28 5037 2599.0 1.0 05 03 2015
4 14.10.2015 33 6 17270 199.0 1.0 14 10 2015

Filling missing gap with "NAN"

l have a few years data set but some of values are missing.l would like to fill these rows with "NAN"
here is an example data:
year month day min
2011 1 1 -2.3
2011 1 2 -9.1
2011 1 3 -4.7
2011 1 4 -3.5
2011 1 6 -1.4
2011 1 7 0.1
2011 1 9 -6.3
2011 1 10 -9.4
2011 1 11 -13.3
2011 1 12 -17.9
2011 1 14 -11.8
2011 1 15 -11.2
2011 1 16 -7.1
2011 1 17 -7.6
2011 1 18 -9.9
2011 1 20 -6.9
2011 1 21 -8.8
2011 1 22 -11.3
2011 1 24 -3.1
2011 1 25 -0.7
2011 1 26 0.8
2011 1 27 -0.9
2011 1 28 -6.9
2011 1 29 -3.2
2011 1 30 -2.3
2011 1 31 -7
as you see ,in first month of 2011 , many value missing and l need to open a row for this values and then fill. is there any way to do it ?

You need reindex by MultiIndex.from_arrays created by date_range:
start = '2011-01-01'
end = '2011-01-31'
rng = pd.date_range(start, end)
mux = pd.MultiIndex.from_arrays([rng.year, rng.month, rng.day], names=('year','month','day'))
df = df.set_index(['year','month','day'])
print (df.reindex(mux).reset_index())
year month day min
0 2011 1 1 -2.3
1 2011 1 2 -9.1
2 2011 1 3 -4.7
3 2011 1 4 -3.5
4 2011 1 5 NaN
5 2011 1 6 -1.4
6 2011 1 7 0.1
7 2011 1 8 NaN
8 2011 1 9 -6.3
9 2011 1 10 -9.4
10 2011 1 11 -13.3
11 2011 1 12 -17.9
12 2011 1 13 NaN
13 2011 1 14 -11.8
14 2011 1 15 -11.2
15 2011 1 16 -7.1
16 2011 1 17 -7.6
17 2011 1 18 -9.9
18 2011 1 19 NaN
19 2011 1 20 -6.9
20 2011 1 21 -8.8
21 2011 1 22 -11.3
22 2011 1 23 NaN
23 2011 1 24 -3.1
24 2011 1 25 -0.7
25 2011 1 26 0.8
26 2011 1 27 -0.9
27 2011 1 28 -6.9
28 2011 1 29 -3.2
29 2011 1 30 -2.3
30 2011 1 31 -7.0

Convert the DataFrame to a timeseries with a datetime index, and then change the frequency of the index to daily ('D') using asfreq:
import pandas as pd
raw = """2011 1 1 -2.3
2011 1 2 -9.1
2011 1 3 -4.7
2011 1 4 -3.5
2011 1 6 -1.4"""
# Parse the rows into dates and values
new_rows = []
for row in raw.split('\n'):
date = pd.to_datetime('/'.join(row.split()[:3]))
value = row[-1]
new_rows.append({'date': date, 'value': value})
timeseries = pd.DataFrame(new_rows).set_index('date')
timeseries.asfreq('D')

I think df.replace() does the job:
df = pd.DataFrame([
[-0.532681, 'foo', 0],
[1.490752, 'bar', 1],
[-1.387326, 'foo', 2],
[0.814772, 'baz', ' '],
[-0.222552, ' ', 4],
[-1.176781, 'qux', ' '],
], columns='A B C'.split(), index=pd.date_range('2000-01-01','2000-01-06'))
print df.replace(r'\s+', np.nan, regex=True)
Produces:
A B C
2000-01-01 -0.532681 foo 0
2000-01-02 1.490752 bar 1
2000-01-03 -1.387326 foo 2
2000-01-04 0.814772 baz NaN
2000-01-05 -0.222552 NaN 4
2000-01-06 -1.176781 qux NaN

Yeah use Pandas
Create a dataframe with your date as index
Use asfreq
Hope this helps, see http://pandas.pydata.org/pandas-docs/stable/timeseries.html for more information :)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Nested for loop python pandas not functioning as desired - python

Related

Pandas Method Chaining: getting KeyError on calculated column

How do I replace the values of a specific timeframe with a weighting of the same timeframe from previous years?

Based on a condition, convert a week date to day on Pandas

converting date to string and splitting day,month and year

Filling missing gap with "NAN"

Categories

Resources