Pandas not appending after first column - python

I have two dataframes. One contains a column that contains the date of earnings for a stock. The other contains the all the prices for the stock, keep in mind that the index is the date. I want to get the prices of a stock N days before and after earnings and store it in a new dataframe column wise. This is what I have so far
earningsPrices = pd.DataFrame()
for date in dates:
earningsPrices[date] = prices[date - pd.Timedelta(days=N):date + pd.Timedelta(days=N)]
print(earningsPrices)
and this is the output
The problem is that it only writes the prices for the first date, and not the rest.

You should maybe take this approach:
earningsPrices = pd.DataFrame(index=dates, columns=['price1', 'price2', 'price3'])
for date in dates:
start_date = date - pd.Timedelta(days=N)
end_date = date + pd.Timedelta(days=N)
selected_rows = prices.loc[prices['date_column'].between(start_date, end_date)]
earningsPrices.loc[date, 'price1'] = selected_rows['price1'].values
earningsPrices.loc[date, 'price2'] = selected_rows['price2'].values
earningsPrices.loc[date, 'price3'] = selected_rows['price3'].values
print(earningsPrices)

use concat
for date in dates:
earningsPeriod = prices[date - pd.Timedelta(days=window):date + pd.Timedelta(days=window)].reset_index(drop=True)
earningsPrices = pd.concat([earningsPrices, earningsPeriod], axis=1)

Related

In column dataframe, how do I find the date just before a given date

I have the following DF :
Date
01/07/2022
10/07/2022
20/07/2022
The date x is
12/07/2022
So basically the function should return
10/07/2022
I am trying to avoid looping over the whole column but I don't know how to specify that I want the max date before a given date.
max(DF['Dates']) #Returns 20/07/2022
Try this:
d = '12/07/2022'
f = '%d/%m/%Y'
(pd.to_datetime(df['Date'],format=f)
.where(lambda x: x.lt(pd.to_datetime(d,format=f)))
.max())
You can filter dates by index:
df[df.Date < pd.to_datetime('12/07/2022')]
Then find max:
max(df[df.Date < pd.to_datetime('12/07/2022')].Date)
# Setting some stuff up
Date = ["01/07/2022", "10/07/2022", "20/07/2022"]
df = pd.DataFrame({"Date":Date})
df.Date = pd.to_datetime(df.Date, format='%d/%m/%Y')
target_date = pd.to_datetime("12/07/2022", format='%d/%m/%Y')
df = df.sort_values(by=["Date"]) # Sort by date
# Find all dates that are before target date, then choose the last one (i.e. the most recent one)
df.Date[df.Date < target_date][-1:].dt.date.values[0]
Output:
datetime.date(2022, 7, 10)

Python increment rows starting from a month on month

How to obtain the below result
Current Month is the column which is to be calculated. We need to get the increment every month starting from Jan-18 for every account id.
Every Account First row/ Record will start from JAN-18, and Second Row will be Feb-18 an so on. We need to increment from Jan-18 till last observation is there for that account id
Above shown is for a sample account and the same has to be applied for multiple account id.
You could achieve what you are looking for as follows:
import pandas as pd
from datetime import date
acct_id = "123456789"
loan_start_date = date(2018, 1, 31)
current_date = date.today()
dates = pd.date_range(loan_start_date,current_date, freq='M').strftime("%b-%y")
df_columns = [acct_id, loan_start_date, dates]
df = pd.DataFrame()
df["current_month"] = dates
df["acct_id"] = acct_id
df["loan_start_date"] = loan_start_date
df = df[["acct_id", "loan_start_date", "current_month"]]
print(df.head())

extracting the last day of the quarter from dataframes

I have a daily time series and I want to pick out the data for the last day of the quarter. I tried doing this by generating a series for the last day of the quarter and merging it with the other dataframe, but to no avail.
My Python code is here:
import pandas as pd
import numpy as np
s1 = pd.read_csv(r"C:\Users\Tim Peterson\Documents\Tom\Rocky\DJIA.csv", index_col=0,parse_dates=True)
ds1 = pd.DataFrame(s1, columns=[ 'DJIA'])
date1 = "2014-10-10" # input start date
date2 = "2016-01-07" # input end date
month_list = [i.strftime("%b-%y") for i in pd.date_range(start=date1, end=date2, freq='MS')]
ds2 =pd.date_range(date1, date2, freq='BQ')
eom = pd.DataFrame(ds2 )
mergedDf = ds1.merge(eom, left_index=True, right_index=True)
print(mergedDf)
when I run this I get
Empty DataFrame
Columns: [DJIA, 0]
Index: []
IIUC use:
dates = pd.date_range(date1, date2, freq='BQ')
out = ds1[ds1.index.isin(dates)]

Expand values with dates in a pandas dataframe

I have a dataframe with name values and a date range (start/end). I need to expand/replace the dates with the ones generated by the from/to index. How can I do this?
Name date_range
NameOne_%Y%m-%d [-2,1]
NameTwo_%y%m%d [-3,1]
Desired result (Assuming that today's date is 2021-03-09 - 9 of march 2021):
Name
NameOne_202103-10
NameOne_202103-09
NameOne_202103-08
NameOne_202103-07
NameTwo_210310
NameTwo_210309
NameTwo_210308
NameTwo_210307
NameTwo_210306
I've been trying iterating over the dataframe and then generating the dates, but I still can't make it work..
for index, row in self.config_df.iterrows():
print(row['source'], row['date_range'])
days_sub=int(str(self.config_df["date_range"][0]).strip("[").strip("]").split(",")[0].strip())
days_add=int(str(self.config_df["date_range"][0]).strip("[").strip("]").split(",")[1].strip())
start_date = date.today() + timedelta(days=days_sub)
end_date = date.today() + timedelta(days=days_add)
date_range_df=pd.date_range(start=start_date, end=end_date)
date_range_df["source"]=row['source']
Any help is appreciated. Thanks!
Convert your date_range from str to list with ast module:
import ast
df = df.assign(date_range=df["date_range"].apply(ast.literal_eval)
Use date_range to create list of dates and explode to chain the list:
today = pd.Timestamp.today().normalize()
offset = pd.tseries.offsets.Day # shortcut
names = pd.Series([pd.date_range(today + offset(end),
today + offset(start),
freq="-1D").strftime(name)
for name, (start, end) in df.values]).explode(ignore_index=True)
>>> names
0 NameOne_202103-10
1 NameOne_202103-09
2 NameOne_202103-08
3 NameOne_202103-07
4 NameTwo_210310
5 NameTwo_210309
6 NameTwo_210308
7 NameTwo_210307
8 NameTwo_210306
dtype: object
Alright. From your question I understand you have a starting data frame like so:
config_df = pd.DataFrame({
'name': ['NameOne_%Y-%m-%d', 'NameTwo_%y%m%d'],
'str_date_range': ['[-2,1]', '[-3,1]']})
Resulting in this:
name str_date_range
0 NameOne_%Y-%m-%d [-2,1]
1 NameTwo_%y%m%d [-3,1]
To achieve your goal and avoid iterating rows - which should be avoided using pandas - you can use groupby().apply() like so:
def expand(row):
# Get the start_date and end_date from the row, by splitting
# the string and taking the first and last value respectively.
# .min() is required because row is technically a pd.Series
start_date = row.str_date_range.str.strip('[]').str.split(',').str[0].astype(int).min()
end_date = row.str_date_range.str.strip('[]').str.split(',').str[1].astype(int).min()
# Create a list range for from start_date to end_date.
# Note that range() does not include the end_date, therefor add 1
day_range = range(start_date, end_date+1)
# Create a Timedelta series from the day_range
days_diff = pd.to_timedelta(pd.Series(day_range), unit='days')
# Create an equally sized Series of today Timestamps
todays = pd.Series(pd.Timestamp.today()).repeat(len(day_range)-1).reset_index(drop=True)
df = todays.to_frame(name='date')
# Add days_diff to date column
df['date'] = df.date + days_diff
df['name'] = row.name
# Extract the date format from the name
date_format = row.name.split('_')[1]
# Add a column with the formatted date using the date_format string
df['date_str'] = df.date.dt.strftime(date_format=date_format)
df['name'] = df.name.str.split('_').str[0] + '_' + df.date_str
# Optional: drop columns
return df.drop(columns=['date'])
config_df.groupby('name').apply(expand).reset_index(drop=True)
returning:
name date_str
0 NameOne_2021-03-07 2021-03-07
1 NameOne_2021-03-08 2021-03-08
2 NameOne_2021-03-09 2021-03-09
3 NameTwo_210306 210306
4 NameTwo_210307 210307
5 NameTwo_210308 210308
6 NameTwo_210309 210309

Pandas: Convert Month Name to Int + Concat to Column and Convert to Date time

Convert Month Name (ex.October) to int value.
Append this column "Month" to the beginning of another column "FY"
Convert new column "Month FY" to date
I've tried to used pandas, calendar, datetime, but have been unsuccessful. I would like to accomplish this without having to create a dict.
End Goal:
Month FY
10/1/2018
What I've tried:
var2.Month = var2.Month.astype('|S')
pd.to_datetime(var2.Month)
var2['Month'] = var2['Month'].apply(lambda x: cal.month_abbr[x])
var2.Month = datetime.datetime.strptime(var2.Month, "%m")
pd.to_datetime(var2.Month, format='%b').dt.month
df['yy'] = df['fy'].map(lambda x: x.lstrip('FY '))
df['yy'] = pd.to_datetime(df.yy,format='%Y').dt.year
df['month'] = pd.to_datetime(df.month, format='%B').dt.month

Categories

Resources