Iterating through dates over a specific period of time in python - python

I am currently working with a large dataset that has lists of daily inventory. I want to compare the inventory over 2 days to see what has changed, and continue that process for an entire month. For example for the month of January, I want to see the change between January 1 and 2, and then January 2 and 3, and so on until I reach January 31st.
I was able to write a code to compare inventory between 2 dates. But how do I iterate that process for the code to continue running for the next set of days? I am new to programming and would appreciate any help.
In the code below, I created 2 subsets: the 1st for the inventory on October 14 and the 2nd for inventory on October 15. In the 3rd line, I calculate what has changed between the 2 days using the unique identifier in the dataset (image).
cars_date_1 = cars_extract_drop[(df['as_of_date'] > '2015-10-14') &
(df['as_of_date'] < '2015-10-15')]
cars_date_2 = cars_extract_drop[(df['as_of_date'] > '2015-10-15') &
(df['as_of_date'] < '2015-10-16')]
cars_sold = cars_date_1[~cars_date_1['image'].isin(cars_date_2['image'])]

Pandas pd.date_range() function and iteration through each element:
rng = pd.date_range('1/1/2011', periods=365, freq='D')
for i in range(365):
day_1 = rng(i)
day_2 = rng(i+1)
difference_function(day_1, day_2)

Related

Convert days data in years data in a list

I want to do a time serie with temperature data from 1850 to 2014. And I have an issue because when I plot the time series the start is 0 and it corresponds to day 1 of January 1850 and it stops day 60 230 with the 31 December of 2014.
I try to do a loop to create a new list with the time in month-years but it didn't succeed, and to create the plot with this new list and my initial temperature list.
This is the kind of loop that I tested :
days = list(range(1,365+1))
years = []
y = 1850
years.append(y)
while y<2015:
for i in days:
years.append(y+i)
y = y+1
del years [-1]
dsetyears = Dataset(years)
I also try with the tool called "datetime" but it didn't work also (maybe this tool is better because it will take into account the bissextile years...).
day_number = "0"
year = "1850"
res = datetime.strptime(year + "-" + day_number, "%Y-%j").strftime("%m-%d-%Y")
If anyone has a clue or a lead I can look into I'm interested.
Thanks by advance !
You can achieve that using datetime module. Let's declare starting and ending date.
import datetime
dates = []
starting_date = datetime.datetime(1850, 1, 1)
ending_date = datetime.datetime(2014, 1, 1)
Then we can create a while loop and check if the ending date is greater or equal to starting date and add 1-day using timedelta function for every iteration. before iteration, we will append the formatted date as a string to the dates list.
while starting_date <= ending_date:
dates.append(starting_date.strftime("%m-%d-%Y"))
starting_date += datetime.timedelta(days=1)

Return Earliest Date based on value within dataset

I am working with REIGN data that documents elections and leaders in countries around the world (https://www.oneearthfuture.org/datasets/reign)
In the dataset there is an boolean election anticipation variable that turns from 0 to 1 to denote that an election is anticipated in at least the next 6 months, possibly sooner.
Excel sheet of data in question
I want to create a new column that returns the earliest date of when anticipation (column N) turns 1 (i.e. when was the election first anticipated).
So for example, with Afghanistan in column we have an election in 2014 and in 2017.
In column N we see it turn from 0 to 1 on Oct, 2014 (election anticipated) and then we see it go back to 0 on July, 2014 (election concluded) until it goes back to 1 on Jan, 2019 (election anticipated) and then turns back to 0 on Oct, 2019.
So if successful, I would capture Oct, 2014 (election anticipated) and Jan, 2019 (election anticipated) as election announcement dates in a new column along with any other dates an election was anticipated.
Currently I have the following:
#bringing in Reign CSV
regin = pd.read_csv('REIGN_2021_7(1).csv')
#shows us the first 5 rows to make sure they look good
print(regin.head())
#show us the rows and columns in the file
regin.shape
#Getting our index
print(regin.columns)
#adding in a date column that concatenates year and month
regin['date'] = pd.to_datetime(regin[['year', 'month']].assign(DAY=1))
regin.head
#def conditions(s):
if (s['anticipation'] == 1):
return (s['date'])
else:
return 0
regin['announced_date'] = regin.apply(conditions, axis=1)
print(regin.head)
Biggest issue for me is that while this returns the date of when a 1 appears, it does not display the earliest date. How I can loop through the anticipation column and return the minimum date, but do so multiple times as a country will have many elections over the years and there are therefore multiple instances in column N for one country of the anticipation turning on(1) and off(0).
Thanks in advance for any assistance! Let me know if anything is unclear.
If you can loop over your dates, you will probably want to use the datetime module (assuming all dates have the same format):
from datetime import datetime
[...]
earliest_date = datetime.today()
[... loop over data, by country ...]
date1 = datetime.strptime(input_date_string1, date_string_format)
if date1 < earliest_date:
earliest_date = date1
[...]
This module supports (among other things):
parsing date objects from a string (.strptime(in_str, format))
comparison of date objects (date1 > date2)
datetime object from current date + time (.today())
datetime object from arbitrary date (.date(year, month, day))
docs: https://docs.python.org/3/library/datetime.html

Find first day of the month previous to a random date with Pandas pd.DateOffset

I want to find the first day of a given month an average 90 days previous to a random date. For instance:
December 15 -- returns August 30
December 30 -- returns August 30
December 1st -- returns August 30
I know this can be done with Pandas pd.DateOffset:
print(pd.Timestamp("2019-12-15") - pd.DateOffset(days=90))
but then I'll get something like September 15th.
I know I can count minus 90 days, select the month, subtract 1 and then select last day of the obtained month, but I was wondering if this can be easily done in one line of code, efficiently.
Assume that the date in question is:
dat = pd.Timestamp('2019-12-15')
To compute the date 90 days before, run:
dat2 = dat - pd.DateOffset(days=90)
getting 2019-09-16.
And finally, to get the start of this month, run:
dat2 - pd.offsets.MonthBegin(0)
getting 2019-09-01.
To put tho whole thing short, run just:
dat - pd.DateOffset(days=90) - pd.offsets.MonthBegin(0)
A subtle difference becomes visible if you start from a date, which
turned 90 days back gives just the first day of a month. E.g.
dat = pd.Timestamp('2019-11-30')
dat2 = dat - pd.DateOffset(days=90)
gives 2019-09-01.
Then dat2 - pd.offsets.MonthBegin(0) gives just the same date.
If you want in this case the start date of the previous month, run:
dat2 - pd.offsets.MonthBegin(1)
(note the argument changed to 1), getting 2019-08-01.
So choose the variant which suits your needs.

How to add hours to pandas datetime skipping non-business hours

I would like to create a running dataframe of trading data for the next four hours from the current time while skipping non-trading hours (5-6pm weekdays, Saturday-6pm Sunday). For example, at 4pm on Friday, I'd like a dataframe that runs from 4pm to 5pm on Friday and then 6pm-9pm on Sunday.
Currently, I am using the following:
time_parameter = pd.Timedelta(hours=4) #Set time difference to four hours
df = df.set_index(['Time'])
for current_time, row in df.iterrows(): #df is the entire trading data df
future_time = current_time + time_parameter
temp_df = df.loc[current_time : future_time]
This obviously doesn't skip non-trading hours so I am trying to find an efficient way to do that.
One method I can use is creating a set of non-trading hours, checking if the current time bounds (current_time:future_time) include any, and adding an additional hour for each.
However, since the dataset has about 3.5million rows and would need this check for each row, I want to ask if anyone may know of a faster approach?
In short, looking for a method to add 4 business hours (Sun-Fri 6pm-5pm) to current time. Thanks!
Input Data: This shows the first 19 rows of the trading data
Expected Output Data: This shows the first and last 3 rows from a four hour period starting at 18:00:30 on January 8th, 2017
Solution
Based on the answer by Code Different below, I used the following:
def last_trading_hour(start_time, time_parameter, periods_parameter):
start_series = pd.date_range(start_time, freq='H', periods = periods_parameter)
mask = (((start_series.dayofweek == 6) & (time_2(18) <= start_series.time)) #Sunday: After 6pm
| ((start_series.dayofweek == 4) & (start_series.time < time_2(17))) #Friday before 5pm
| ((start_series.dayofweek < 4) & (start_series.time < time_2(17))) #Mon-Thur before 5pm
| ((start_series.dayofweek < 4) & (time_2(18) <= start_series.time)) #Mon-Thur after 6pm
)
return start_series[mask][time_parameter]
start_time = pd.Timestamp('2019-08-16 13:00:10')
time_parameter = 4 #Adding 4 hours to time
periods_parameter = 49 + time_parameter #Max 49 straight hours of no-trades (Fri 5pm-Sun 6pm)
last_trading_hour(start_time, time_parameter, periods_parameter)
Results:
Timestamp('2019-08-18 18:00:10')
If you need the entire series, follow Code Different's method for indexing.
Generate a sufficiently long series of hours then filter for the first 4 that are trading hours:
from datetime import time
start_time = pd.Timestamp('2019-08-16 16:00')
s = pd.date_range(start_time, freq='H', periods=72)
is_trading_hour = (
((s.weekday == 6) & (time(18) <= s.time))
| ((s.weekday == 4) & (s.time < time(17)))
| (s.weekday < 4)
)
s[is_trading_hour][:4]
Result:
DatetimeIndex(['2019-08-16 16:00:00', '2019-08-18 18:00:00',
'2019-08-18 19:00:00', '2019-08-18 20:00:00'],
dtype='datetime64[ns]', freq=None)
It's hard to tell from so little information. However, it seems that you're working on hour boundaries. If so, it should be straightforward to set up a look-up table (dict) keyed by each day and hour, perhaps: (0,0) for midnight Sun/Mon, (2, 13) for 1pm Wed, and so on. Then provide simple entries for the end of the 4-hour period
(0, 0): Timedelta(hours= 4), # 0:00 Mon, normal span; regular trading hours
(0,16): Timedelta(hours= 5), # 16:00 Sun; 1 hour of down-time
(4,16): Timedelta(hours=53), # 16:00 Fri; 1 hour trade, 49 hrs down, 3 hrs trade
(5,16): Timedelta(hours=26), # 16:00 Sat; 26 hours down, 4 hours trade
Add the indicated Timedelta to your start time; that gives you the end time of the period. You can write a few loops and if statements to compute these times for you, or just hard-code all 168; they're rather repetitive.
Checking your data base lines remains up to you, since you didn't specify their format or semantics in your posting.

Calculate first day of the week, given year and week number of the year

In python, How can we calculate the first day of the week when given a year and the particular week number of the year?
Note that date should be in format YYYY-MM-DD. Year and the week number is given in int format..
I am making the following assumptions about what your question means. If they are off, it should not be hard to adjust the code.
1) The first day of the week is Sunday. (so the answer is always a Sunday)
2) The week in which January 1 falls is week 1 (not 0).
Then the work breaks down into two parts.
a) Figure out the first day of the first week.
b) Add the right number of days onto that.
In Python, it looks as follows:
import datetime
def firstDayOfWeek1(y):
#takes a year and says the date of the first Sunday in the week in which January 1 falls
janDay = datetime.date(y,1,1)
while (janDay.weekday()!=6):#back up until Sunday, change if you hold Sunday is not the first day of the week
janDay=janDay-datetime.timedelta(days=1)
return janDay
def firstDayOfWeekN(y, n):#takes a year and a week number and gives the date of the first Sunday that week
return firstDayOfWeek1(y)+datetime.timedelta(weeks=(n-1))
def formattedFirstDayOfWeekN(y, n):#takes a year and a week number and gives the date of the first Sunday that week
return firstDayOfWeekN(y, n).isoformat()
#example
print formattedFirstDayOfWeekN(2018,2)#2018-01-07, first day of second week of January this year
I am using an algorithm which starts with a close-by date and then simply loops down till it finds the desired result. I am sacrificing some CPU cycles for ease of readability since the cost is not significant. I have done some limited testing but I hope the general idea is clear. Let me know your thoughts.
#This is the input in integer format
input_year = 2018
input_week = 29
#The general idea is that we will go down day by day from a reference date
#till we get the desired result.
#The loop is not computationally intensive since it will
#loop at max around 365 times.
#The program uses Python's ISO standard functions which considers Monday as
#the start of week.
ref_date = date(input_year+1,1,7) #approximation for starting point
#Reasoning behind arguments: Move to next year, January. Using 7 as day
#ensures that the calendar year has moved to the next year
#because as per ISO standard the first week starts in the week with Thursday
isoyear,isoweek,isoday = ref_date.isocalendar()
output_date = ref_date #initialize for loop
while True:
outisoyear,outisoweek,outisoday = output_date.isocalendar()
if outisoyear == input_year and outisoweek == input_week and outisoday == 1:
break
output_date = output_date + timedelta(days=-1)
print(output_date)

Categories

Resources