How to omit only weekends from my data frame? - python

I am working on a project to create an algorithmic trader. However, I want to remove the weekends from my data frame as it ruins the data as shown in I have tried to do somethings I found on StackOverflow but I get an error that the type is Timestamp and so I can't use that technique. It also isn't a column in the data frame. I'm new to python so I'm not very sure but I think it's an index since when I go through the .index function it shows me the date and time. I'm sorry if these are stupid questions but I am new to python and pandas.
Here is my code:
#import all the libraries
import nsetools as ns
import pandas as pd
import numpy
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
print(hist)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()
EDIT: On the suggestion of a user, I tried to modify my code to this
refinedlist = hist[hist.index.dayofweek<5]plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
refinedlist = hist[hist.index.dayofweek<5]
print (refinedlist)
And graphed that, but the graph still includes the weekends on the x axis.

In the first place, stock market data does not exist because the market is closed on holidays and national holidays. The reason for this is that your unit of acquisition is time, so there is also no data from the time the market closes to the time it opens the next day.
For example, I graphed the first 50 results. (The x-axis doesn't seem to be correct.)
plt.plot(hist['Close'][:50], label=a)
As one example, if you include holidays and national holidays and draw a graph with missing values for the times when the market is not open, you get the following.
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
# plt.style.use('fivethirtyeight')
# a = input("Enter the ticker name you wish to apply strategy to")
a = 'AAPL'
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()

Related

Time series data visualization issue

I have a time series data like below where the data consists of year and week. So, the data is from 2014 1st week to 2015 52 weeks.
Now, below is the line plot of the above mentioned data
As you can see the x axis labelling is not quite what I was trying to achieve since the point after 201453 should be 201501 and there should not be any straight line and it should not be up to 201499. How can I rescale the xaxis exactly according to Due_date column? Below is the code
rand_products = np.random.choice(Op_2['Sp_number'].unique(), 3)
selected_products = Op_2[Op_2['Sp_number'].isin(rand_products)][['Due_date', 'Sp_number', 'Billing']]
plt.figure(figsize=(20,10))
plt.grid(True)
g = sns.lineplot(data=selected_products, x='Due_date', y='Billing', hue='Sp_number', ci=False, legend='full', palette='Set1');
the issue is because 201401... etc. are read as numbers and that is the reason the line chart has that gap. To fix it, you will need to change the numbers to date format and plot it.
As the full data is not available, below is the two column dataframe which has the Due_date in the form of integer YYYYWW. Billing column is a bunch of random numbers. Use the method here to convert the integers to dateformat and plot. The gap will be removed....
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
import seaborn as sns
Due_date = list(np.arange(201401,201454)) #Year 2014
Due_date.extend(np.arange(201501,201553)) #Year 2915
Billing = random.sample(range(500, 1000), 105) #billing numbers
df = pd.DataFrame({'Due_date': Due_date, 'Billing': Billing})
df.Due_date = df.Due_date.astype(str)
df.Due_date = pd.to_datetime(df['Due_date']+ '-1',format="%Y%W-%w") #Convert to date
plt.figure(figsize=(20,10))
plt.grid(True)
ax = sns.lineplot(data=df, x='Due_date', y='Billing', ci=False, legend='full', palette='Set1')
Output graph

How to create a State vs. Death Bar Graph

File: https://docs.google.com/spreadsheets/d/1JNrPnC2YRg78ceblt1eeBN_Iz6rG2psE/edit?usp=sharing&ouid=105308566456636539364&rtpof=true&sd=true
I am looking to create a Bar Graph Comparing State vs Death COVID 19 Data(Data is Attached) I have already filtered out the states and dates I want using the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
datetime.datetime.strptime
df = pd.read_excel("Project.xlsx")
start = datetime.date(2020,10,31).strftime('%Y%m%d')
end = datetime.date(2020,12,1).strftime('%Y%m%d')
dfnew=df.query(f"{start} < date < {end}")
dfnew = dfnew.fillna('0')
dflatest = dfnew[(dfnew['state']=='GA')|(dfnew['state']=='IL')|(dfnew['state']=='CA')|(dfnew['state']=='NY')|
(dfnew['state']=='NC')|(dfnew['state']=='MI')|(dfnew['state']=='OH')|(dfnew['state']=='FL')|
(dfnew['state']=='PA')|(dfnew['state']=='TX')]
dflatest
However I am looking to get the average deaths (add up deaths per day) in the month of November by State. And Create a bar graph with X: State Y: Average Deaths in month of November I am not sure how to write out this code and any help would be appreciated.

Python: How to access a subordinate Column?

i'm Sven and right before to say i am an absolute beginner with Python. I rode the books "Beginning with Python" and "Python for Data Analysis" to get at least a basic understanding for what i'm doing. My goal with the code below is, that i would like to show the Volume of S&P500 with a rolling Mean of the last 250 days. Means combine a barchart(seaborn) with a line chart(matplotlib.pyplot).
The problem arise in plotting the "S&P500 data by Volume with seaborn as a barchart because i can not access on the subordinate column " Date" . I have an idea but im not quite sure how to start. Has anybody an idea? Thanks a lot.
My approach is anywher between Index, Hierachical and Grouping.
Open High Low Close Adj Close Volume
Date
1993-02-01 438.78 442.52 438.78 442.52 442.52 238570000
1993-02-02 442.52 442.87 440.76 442.55 442.55 271560000
1993-02-03 442.56 447.35 442.56 447.20 447.20 345410000
1993-02-04 447.20 449.86 447.20 449.56 449.56 351140000
1993-02-05 449.56 449.56 446.95 448.93 448.93 324710000
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
yesterday = datetime.now()-timedelta(1)
datetime.strftime(yesterday, "%Y-%m-%d")
SP500 = yf.download('^GSPC', start='1993-02-01', end=yesterday)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
SP500f = SP500.head()
SP500f.groupby
#Stats_Vol = SP500["Volume"]
#Date = SP500["Date"]
#print(Stats_Vol)
#print(Stats_Vol.describe())
#sns.barplot(data=SP500, y="Volume")
#print(Stats_Vol.rolling(250).mean().plot())
plt.show()
Primarily you need to access the Date which is the index
could reset_index() to make it a column
there are two many dates to plot so resampled and then created a new column for display format on x-axis
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
yesterday = datetime.now()-timedelta(1)
fig, ax = plt.subplots()
SP500 = yf.download('^GSPC', start='1993-02-01', end=yesterday)
# too many days, resample
# do a display format for date (which is the index)
sns.barplot(data=SP500.loc[:,"Volume"]\
.resample("Y").mean().to_frame()\
.assign(GDate=lambda dfa: dfa.index.strftime("%Y")),
x="GDate", y="Volume", ax=ax)
# rotate the labels
l = ax.set_xticklabels(ax.get_xticklabels(), rotation = 90)

How can I adjust the bounds of the x tick values that are automatically chosen by matplotlib?

I have a graph that shows the closing price of a stock throughout a day at each five minute interval. The x axis shows the time and the range of x values is from 9:30 to 4:00 (16:00).
The problem is that the automatic bounds for the x axis go from 9:37 to 16:07 and I really just want it from 9:30 to 16:00.
The code I am currently running is this:
stk = yf.Ticker(ticker)
his = stk.history(interval="5m", start=start, end=end).values.tolist() #open - high - low - close - volume
x = []
y = []
count = 0
five_minutes = datetime.timedelta(minutes = 5)
for bar in his:
x.append((start + five_minutes * count))#.strftime("%H:%M"))
count = count + 1
y.append(bar[3])
plt.clf()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
plt.gca().xaxis.set_major_locator(mdates.MinuteLocator(interval=30))
plt.plot(x, y)
plt.gcf().autofmt_xdate()
plt.show()
And it produces this plot (currently a link because I am on a new user account):
I thought I was supposed to use the axis.set_data_interval function providing, so I did so by providing datetime objects representing 9:30 and 16:00 as the min and the max. This gave me the error:
TypeError: '<' not supported between instances of 'float' and 'datetime.datetime'
Is there another a way for me to be able to adjust the first xtick and still have it automatically fill in the rest?
This problem can be fixed by adjusting the way you use the mdates tick locator. Here is an example based on the one shared by r-beginners to make it comparable. Note that I use the pandas plotting function for convenience. The x_compat=True argument is needed for it to work with mdates:
import pandas as pd # 1.1.3
import yfinance as yf # 0.1.54
import matplotlib.dates as mdates # 3.3.2
# Import data
ticker = 'AAPL'
stk = yf.Ticker(ticker)
his = stk.history(period='1D', interval='5m')
# Create pandas plot with appropriately formatted x-axis ticks
ax = his.plot(y='Close', x_compat=True, figsize=(10,5))
ax.xaxis.set_major_locator(mdates.MinuteLocator(byminute=[0, 30]))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M', tz=his.index.tz))
ax.legend(frameon=False)
ax.figure.autofmt_xdate(rotation=0, ha='center')
The sample data was created by obtaining Apple's stock price from Yahoo Finance. The desired five-minute interval labels are a list of strings obtained by using the date function to get the start and end times at five-minute intervals.
Based on this, the x-axis is drawn as a graph of the number of five-minute intervals and the closing price, and the x-axis is set to any interval by slicing.
import yfinance as yf
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
ticker = 'AAPL'
stk = yf.Ticker(ticker)
his = stk.history(period='1D',interval="5m")
his.reset_index(inplace=True)
time_rng = pd.date_range('09:30','15:55', freq='5min')
labels = ['{:02}:{:02}'.format(t.hour,t.minute) for t in time_rng]
fig, ax = plt.subplots()
x = np.arange(len(his))
y = his.Close
ax.plot(x,y)
ax.set_xticks(x[::3])
ax.set_xticklabels(labels[::3], rotation=45)
plt.show()

Error ValueError: day is out of range for month

I was looking at a sample table of student information and wanted to see what days were the most popular for students to enroll on a course. The script worked fine the first day I ran it and I left it. A few days later I returned to take another look at it but started getting the ValueError message.
Why did it stop working? No new information was added to the dataset
since it worked the first time. The code now fails at
df["EnrolmentDate"] = pd.to_datetime(df.EnrolmentDate)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import seaborn as sns
colors = sns.cubehelix_palette(28, rot=-0.4)
df = pd.read_csv("data.csv")
#print(df.dtypes)
# To change the format of the data type from object to datetime. Has to be run at start of script or the format returns to object.
#df['Enrolment Date'] = pd.to_datetime(df['Enrolment Date'])
print(df.EnrolmentDate.str.slice(0, 10))
df["EnrolmentDate"] = pd.to_datetime(df.EnrolmentDate)
print(df.head())
print(df.dtypes)
#Tells us what day of the week the enrolment date was. Can also use .dayofyear. Google Pandas API Reference, search for .dt., datetime properties
print(df.EnrolmentDate.dt.weekday_name)
#Shows the latest or greatest enrolment date
print(df.EnrolmentDate.max())
print(df.EnrolmentDate.min())
print(df.EnrolmentDate.max()-df.EnrolmentDate.min())
df["EnrolmentDay"] = df.EnrolmentDate.dt.weekday_name
print(df.head())
print(df.EnrolmentDay.value_counts())
print(df.EnrolmentDay.value_counts().plot())
#print(df.Day.value_counts().sort_index())
#df.EnrolmentDay.value_counts().sort_index().plot()
# naming the x axis
plt.xlabel('Day')
# naming the y axis
plt.ylabel('No. of Enrolments')
plt.show()

Categories

Resources