How to tell if a pandas date time difference is null? - python

I need to fill in missing dates in a pandas data frame. The dataframe consists of weekly sales data for multiple items. I am looping through each item to see if there are missing weeks of dates with the intention of filling in those dates with a '0' for sales and all other information copied down.
I use the following code to find the missing dates:
pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date'])
While I can print the missing dates and search manually for the few items that are missing sales weeks, I have not found a way to do this programmatically.
I tried
for item in df['ord_base7'].unique():
df_ = df[df['ord_base7'] == item]
if pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']).isnan() == True:
pass
else:
print(item, pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']))
That yielded the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_55320/2582723605.py in <module>
1 for item in df['ord_base7'].unique():
2 df_ = df[df['ord_base7'] == item]
----> 3 if pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']).isnan() == True:
4 pass
5 else:
AttributeError: 'DatetimeIndex' object has no attribute 'isnan'
How can I program a way to see if there are no dates missing so those items can be passed over?

Looping on a pandas dataframe is not a good idea because it's inefficient. Just use the .fillna() and pass in whatever value you want to be set instead of NaN:
df['week_date'].fillna(0)

Nevermind... I just tried the following and it worked.
for item in df['ord_base7'].unique():
df_ = df[df['ord_base7'] == item]
if pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']).empty == True:
pass
else:
print(item, pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']))
The .empty is how to do this with a date time index.

Related

extracting values matching timestamps by a new set of timestamps

sample table here
i am trying to look up corresponding commodity prices from columns(CU00.SHF,AU00.SHF,SC00.SHF,I8888.DCE C00.DCE), with a new set of timestamps, the dates of which are 32 days later than the dates in column 'history_date'.
i tried .loc and .at in a loop to extract the matching values with below functions:
latest_day = data.iloc[data.shape[0] - 1, 0].date()
def next_trade_day(x):
x = pd.to_datetime(x).date() #imported is_workday funtion requires datetime type
while True:
if is_workday(x + timedelta(32)) != False:
break
return (pd.Timestamp((x + timedelta(32))))
if is_workday(x + timedelta(32)) == False:
x = x + timedelta(1)
return pd.Timestamp(x + timedelta(32))
def end_price(x):
x = pd.Timestamp(x)
if x <= latest_day:
return data.at[x,'CU00.SHF']
if x > latest_day:
return'None'
return data.at[x,'CU00.SHF']
but it always gives
KeyError: Timestamp('2023-02-03 00:00:00')
any idea how should i achieve the target?
thanks in advance!
if you want work datetime:
convert column datetime
check date converted, use filte
pd.to_datetime(df['your column'],errors='ignore')
df.loc[df.['your column'] > 'your-date' ]
if work both, then check your full code.

How to create new columns of last 5 sale price off in dataframe

I have a pandas data frame of sneakers sale, which looks like this,
I added columns last1, ..., last5 indicating the last 5 sale prices of the sneakers and made them all None. I'm trying to update the values of these new columns using the 'Sale Price' column. This is my attempt to do so,
for index, row in df.iterrows():
if (index==0):
continue
for i in range(index-1, -1, -1):
if df['Sneaker Name'][index] == df['Sneaker Name'][i]:
df['last5'][index] = df['last4'][i]
df['last4'][index] = df['last3'][i]
df['last3'][index] = df['last2'][i]
df['last2'][index] = df['last1'][i]
df['last1'][index] = df['Sale Price'][i]
continue
if (index == 100):
break
When I ran this, I got a warning,
A value is trying to be set on a copy of a slice from a DataFrame
and the result is also wrong.
Does anyone know what I did wrong?
Also, this is the expected output,
Use this instead of for loop, if you have rows sorted:
df['last1'] = df['Sale Price'].shift(1)
df['last2'] = df['last1'].shift(1)
df['last3'] = df['last2'].shift(1)
df['last4'] = df['last3'].shift(1)
df['last5'] = df['last4'].shift(1)

How do I filter by a certain date and hour using Pandas dataframe in python

I am trying to find in a csv of a price chart the price values at a given period of time. I have converted the Datetime column into datetime data with the pd.to_datetime function, however I can not seem to find a method that allows me to filter the rows by separate dates hours and minutes. A typical row looks something like this.
Datetime 2021-10-15 19:55:00-04:00
Open 40.15
High 40.2
Low 40.14
Close 40.15
Volume 0
Dividends 0
Stock Splits 0
Name: 939, dtype: object
Empty DataFrame
Columns: [Datetime, Open, High, Low, Close, Volume, Dividends, Stock Splits]
Index: []
So far here is my code
import pandas as pd
data = pd.read_csv("Data\\09-16-21 AMC-5min", parse_dates=["Datetime"])
data["Datetime"] = pd.to_datetime(data['Datetime'])
newData = data[(data.Datetime.day == data.Datetime.day.max()) & data.Datetime.hour == 9 & data.Datetime.minute == 30]
print(newData)
in this example I am trying to find the data point on 9:30 of the most recent day. When I try to run this I get the following error
Traceback (most recent call last):
File "C:\Users\Zach\PycharmProjects\Algotrading\Test.py", line 7, in <module>
newData = data[(data.Datetime.day == data.Datetime.day.max()) & data.Datetime.hour == 9 & data.Datetime.minute == 30]
File "C:\Users\Zach\PycharmProjects\Algotrading\venv\lib\site-packages\pandas\core\generic.py", line 5487, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'day'
I'm not sure how to access the separate values for day, hour, and minutes. Any advice would be appreciated
You need .dt accessor with () for second and third condition:
newData = data[(data.Datetime.dt.day == data.Datetime.dt.day.max()) &
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]
For converting to days only once:
s = data.Datetime.dt.day
newData = data[(s == s.max()) &
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]

Fill pandas dataframe with a for loop

I have 4 dataframes for 4 newspapers (newspaper1,newspaper2,newspaper3,newspaper4])
which have a single column for author name.
Now I'd like to merge these 4 dataframes into one, which has 5 columns: author, and newspaper1,newspaper2,newspaper3,newspaper4 which contain 1/0 value (1 for author writing for that newspaper)
import pandas as pd
listOfMedia =[newspaper1,newspaper2,newspaper3,newspaper4]
merged = pd.DataFrame(columns=['author','newspaper1','newspaper2', 'newspaper4', 'newspaper4'])
while this loop does what I intended (fills the merged df author columns with the name):
for item in listOfMedia:
merged.author = item.author
I can't figure out how to fill the newspapers columns with the 1/0 values...
for item in listOfMedia:
if item == newspaper1:
merged['newspaper1'] = '1'
elif item == newspaper2:
merged['newspaper2'] = '1'
elif item == newspaper3:
merged['newspaper3'] = '1'
else:
merged['newspaper4'] = '1'
I keep getting error
During handling of the above exception, another exception occurred:
TypeError: attrib() got an unexpected keyword argument 'convert'
Tried to google that error but didn't help me identify what the problem is.
What am I missing here? I also think there must be smarter way to fill the newspaper/author matrix, however don't seem to be able to figure out even this simple way. I am using jupyter notebook.
Actually you are setting all rows to 1 so use:
for col in merged.columns:
merged[col].values[:] = 1
I've taken a guess at what I think your dataframes look like.
newspaper1 = pd.DataFrame({'author': ['author1', 'author2', 'author3']})
newspaper2 = pd.DataFrame({'author': ['author1', 'author2', 'author4']})
newspaper3 = pd.DataFrame({'author': ['author1', 'author2', 'author5']})
newspaper4 = pd.DataFrame({'author': ['author1', 'author2', 'author6']})
Firstly we will copy the dataframes so we don't affect the originals:
newspaper1_temp = newspaper1.copy()
newspaper2_temp = newspaper2.copy()
newspaper3_temp = newspaper3.copy()
newspaper4_temp = newspaper4.copy()
Next we replace the index of each dataframe with the author name:
newspaper1_temp.index = newspaper1['author']
newspaper2_temp.index = newspaper2['author']
newspaper3_temp.index = newspaper3['author']
newspaper4_temp.index = newspaper4['author']
Then we concatenate these dataframes (matching them together by the index we set):
merged = pd.concat([newspaper1_temp, newspaper2_temp, newspaper3_temp, newspaper4_temp], axis =1)
merged.columns = ['newspaper1', 'newspaper2', 'newspaper3', 'newspaper4']
And finally we replace NaN's with 0 and then non-zero entries (they will still have the author names in them) as 1:
merged = merged.fillna(0)
merged[merged != 0] = 1

Continuing during an exception in a try/except statement

I have read numerous StackOverflow threads about looping during try/except statements, using else and finally, if/else statements, and while statements, but none of them address what I want. That or I don't know how to utilise that information to get what I want done.
Basically, I am trying to get adjusted closing stock prices for various companies on a given date. I pasted some dummy data in the code block below to demonstrate (NOTE: you'll have to install pandas and pandas_datareader to get the dummy code to run). The get_stock_adj_close function returns the adj_close price given a ticker and date. The dummy_dataframe contains 4 companies with their tickers and random dates. And the add_days function takes a date and adds any number of days. I would like to append the adjusted close stock prices for each company in the dataframe on the listed date into the stock_prices list.
Because the yahoo stock price database isn't that reliable for older entries and because some dates fall on days when the market is closed, whenever a price isn't available it raises a KeyError: 'Date'. Thus, what I would like to do is keep adding days indefinitely until it finds a date where a price does exist. The problem is it only adds the day once and then raises the same KeyError. I want it to keep adding days until it finds a day where the database has a stock price available and then return back to the dataframe and keep going with the next row. Right now the whole thing breaks on the first GM date (fourth row), which raises the KeyError and the fifth row/second GM date is ignored. Any help is appreciated!
Dummy data:
from datetime import datetime, date, timedelta
import pandas as pd
import pandas_datareader as pdr
from dateutil.relativedelta import relativedelta
def add_days(d, num_days):
return d + timedelta(days=num_days)
def get_stock_adj_close(ticker, chosen_date):
stock_df = pdr.get_data_yahoo(ticker, start = chosen_date, end = chosen_date)
return stock_df.iloc[0]['Adj Close']
d = {'TICKER': ['AMD','AMD','CHTR','GM'], 'DATE': [datetime(2020,2,4), datetime(2019,2,8),datetime(2019,1,31), datetime(2010,4,7)]}
dummy_dataframe = pd.DataFrame(data=d)
stock_prices = []
for i, row in dummy_dataframe.iterrows():
given_date = row['DATE']
try:
stock_price = get_stock_adj_close(row['TICKER'], given_date)
print(stock_price)
stock_prices.append(stock_price)
except KeyError:
given_date = add_days(given_date,1)
stock_price = get_stock_adj_close(row['TICKER'], given_date)
stock_prices.append(stock_price)
print(stock_prices)
I think while loop will help you. For example:
for i, row in dummy_dataframe.iterrows():
given_date = row['DATE']
stock_price_found = False
while not stock_price_found:
try:
stock_price = get_stock_adj_close(row['TICKER'], given_date)
print(stock_price)
stock_prices.append(stock_price)
stock_price_found = False
except KeyError:
given_date = add_days(given_date,1)
Or you can also use while True together with break:
for i, row in dummy_dataframe.iterrows():
given_date = row['DATE']
while True:
try:
stock_price = get_stock_adj_close(row['TICKER'], given_date)
print(stock_price)
stock_prices.append(stock_price)
break
except KeyError:
given_date = add_days(given_date,1)
Don't forget to make sure that you are not stuck in indefinite loop, would be also helpful some other exit conditions from while loop, for example, after 10 failures.

Categories

Resources