Hi I created a daily price list of a stock and now I need to insert another column for 14days return. Currently I have the following:
data["2W-Chg"]=stock_store['Adj Close'] - stock_store['Adj Close'].shift(14)
enter image description here
but when checking the shifting, it had actual shift more than 14 days (eg. Today: 6/23/2021, Shifted to: 6/2/2021). The correct shifting date that I'm looking for should be 6/09/2021.
Is there anyway to shift base on trading date instead by row?
Thanks
JC
shift has a parameter called freq that, when set, shifts the index of the data according to the given frequency instead of ignoring the time relation. So you can first set your data's index as the date column and then shift with your desired frequency:
# or you can set it when reading CSV, as you mentioned in the comments
stock_store = stock_store.set_index("Date")
# using `D` as the daily frequenct (as you mentioned in comments)
data["2W-Chg"] = stock_store["Adj Close"] - stock_store["Adj Close"].shift(14, freq="D")
Related
I'm trying to pull some data from yfinance in Python for different funds from different exchanges. In pulling my data I just set-up the start and end dates through:
start = '2002-01-01'
end = '2022-06-30'
and pulling it through:
assets = ['GOVT', 'IDNA.L', 'IMEU.L', 'EMMUSA.SW', 'EEM', 'IJPD.L', 'VCIT',
'LQD', 'JNK', 'JNKE.L', 'IEF', 'IEI', 'SHY', 'TLH', 'IGIB',
'IHYG.L', 'TIP', 'TLT']
assets.sort()
data = yf.download(assets, start = start, end = end)
I guess you've noticed that the "assets" or the ETFs come from different exchanges such as ".L" or ".SW".
Now the result this:
It seems to me that there is no overlap for a single instrument (i.e. two prices for the same day). So I don't think the data will be disturbed if any scrubbing or clean-up is done.
So my goal is to harmonize or consolidate the prices to its date index rather than date-and-time index so that each price for each instrument is firmly side-by-side each other for a particular date.
Thanks!
If you want the daily last closing price from the yahoo-finance api you could use the interval argument,
yf.download(assets, start=start, end=end, interval="1d")
Solution with Pandas:
Transforming the Index
You have an index where each row is a string representing the datetime. You firstly want to transform those strings to an actual DatetimeIndex where each row will be of type datetime64. This is done in order to easily work with dates in you dataset applying functions from the datetime library. Finally, you pick the date from each datetime64;
data.index = pd.to_datetime(data.index).date
Groupby
Now that you have an index of dates you can groupby on index. Firstly, you want to deal with NaN values. If you want that the closing price is only considered to fill the values within the date itself only you want to apply:
data= data.groupby(data.index).ffill()
Otherwise, if you think that the closing price of (e.g.) the 1st October can be used not only to filter values in the 1st October but also 2nd and 3rd of October which have NaN values, simply apply the ffill() without the groupby;
data= data.ffill()
Lastly, taking last observed record grouping for date (Index); Note that you can apply all the functions you want here, even a custom lambda;
data = data.groupby(data.index).last()
I have a few set of days where the index is based on 30min data from monday to friday. There might some missing dates (Might be because of holidays). But i would like to find the highest from column high and lowest from column low for ever past week. Like i am calculating today so previous week high and low is marked in the yellow of attached image.
Tried using rolling , resampling but some how not working. Can any one help
enter image description here
You really should add sample data to your question (by that I mean a piece of code/text that can easily be used to create a dataframe for illustrating how the proposed solution works).
Here's a suggestion. With df your dataframe, and column datatime with datetimes (and not strings):
df["week"] = (
df["datetime"].dt.isocalendar().year.astype(str)
+ df["datetime"].dt.isocalendar().week.astype(str)
)
mask = df["high"] == df.groupby("week")["high"].transform("max")
df = df.merge(
df[mask].rename(columns={"low": "high_low"})
.groupby("week").agg({"high_low": "min"}).shift(),
on="week", how="left"
).drop(columns="week")
Add a week column to df (year + week) for grouping along weeks.
Extract the rows with the weekly maximum highs by mask (there could be more than one for a week).
Build a corresponding dataframe with the weekly minimum of the lows corresponding to the weekly maximum highs (column named high_low), shift it once to get the value from the previous week, and .merge it to df.
If column datetime doesn't contain datetimes:
df["datetime"] = pd.to_datetime(df["datetime"])
If I have understood correctly, the solution should be
get the week number from the date
groupby the week number and fetch the max and min number.
groupby the week fetch max date to get max/last date for a week
now merge all the dataframes into one based on date key
Once the steps are done, you could do any formatting as required.
I would like to have a list of date every year from an end date spaced in given month amount. For example if my end is "20251220" and I have a paramter freq which steers the time step to go back. A value of 1 indicates yearly steps and a value of 12 indicates monthly backward steps. I would like to get the following list of dates for freq 1 it should
"20211220","20221220","20231220","20241220" and for freq 2 "20211220","20220620","20221220","20230620""20231220","20240620","20241220","20250620". However, if it was "20250220" I only need "20220220","20230220","20240220" as we already passed February in the case for freq 1. I tried to a simple loop by myself (see below) where I then would check at then end if first date is in the past. But I think there must be a build in function to do this, via pandas or dateutil etc. See the similar question without the freq paramter
You can do it with date_range, setting end date as now and frequency in DateOffset:
pd.date_range('20250220', 'now', freq=-pd.DateOffset(months=12//freq))[1:]
Output:
DatetimeIndex(['2024-02-20', '2023-02-20', '2022-02-20'], dtype='datetime64[ns]', freq='<-1 * DateOffset: months=12>')
Hi I had created a dataframe with Acutal Close, High, Low and now I will have to calculate the Day-Change, 3Days-Change, 2weeks-Change for each of the row.
With the code below, I can see the Day-Change field with Blank/NaN value (10/27/2009 D-Chg field), and now how can I get python to Auto-Pick the last trading date (10/23/2009) AC price for calculation when shifted date doesn't exist?
data["D-Chg"]=stock_store['Adj Close'] - stock_store['Adj Close'].shift(1, freq='B')
Thanks with Regards
format your first column to datetime.
data['Mycol'] = pd.to_datetime(data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
get the max value.
last_date = data['date'].max()
Get the most up-to-date row
is_last = data['date'] == last_date
data[is_last]
This may be done in one step if you give your desired column to max().
I have a file with one row per EMID per Effective Date. I need to find the maximum Effective date per EMID that occurred before a specific date. For instance, if EMID =1 has 4 rows, one for 1/1/16, one for 10/1/16, one for 12/1/16, and one for 12/2/17, and I choose the date 1/1/17 as my specific date, I'd want to know that 12/1/16 is the maximum date for EMID=1 that occurred before 1/1/17.
I know how to find the maximum date overall by EMID (groupby.max()). I also can filter the file to just dates before 1/1/17 and find the max of the remaining rows. However, ultimately I need the last row before 1/1/17, and then all the rows following 1/1/17, so filtering out the rows that occur after the date isn't optimal, because then I have to do complicated joins to get them back in.
# Create dummy data
dummy = pd.DataFrame(columns=['EmID', 'EffectiveDate'])
dummy['EmID'] = [random.randint(1, 10000) for x in range(49999)]
dummy['EffectiveDate'] = [np.random.choice(pd.date_range(datetime.datetime(2016,1,1), datetime.datetime(2018,1,3))) for i in range(49999)]
#Create group by
g = dummy.groupby('EmID')['EffectiveDate']
# This doesn't work, but effectively shows what I'm trying to do
dummy['max_prestart'] = max(dt for dt in g if dt < datetime(2017,1,1))
I expect that output to be an additional column in my dataframe that has the maximum date that occurred before the specified date.
Using map after selected .
s=dummy.loc[dummy.EffectiveDate>'2017-01-01'].groupby('EmID').EffectiveDate.max()
dummy['new']=dummy.EmID.map(s)
Here Using transform and assuming else dt
dummy['new']=dummy.loc[dummy.EffectiveDate>'2017-01-01'].groupby('EmID').EffectiveDate.transform('max')
dummy['new']=dummy['new'].fillna(dummy.EffectiveDate)