Python: How to access a subordinate Column? - python

i'm Sven and right before to say i am an absolute beginner with Python. I rode the books "Beginning with Python" and "Python for Data Analysis" to get at least a basic understanding for what i'm doing. My goal with the code below is, that i would like to show the Volume of S&P500 with a rolling Mean of the last 250 days. Means combine a barchart(seaborn) with a line chart(matplotlib.pyplot).
The problem arise in plotting the "S&P500 data by Volume with seaborn as a barchart because i can not access on the subordinate column " Date" . I have an idea but im not quite sure how to start. Has anybody an idea? Thanks a lot.
My approach is anywher between Index, Hierachical and Grouping.
Open High Low Close Adj Close Volume
Date
1993-02-01 438.78 442.52 438.78 442.52 442.52 238570000
1993-02-02 442.52 442.87 440.76 442.55 442.55 271560000
1993-02-03 442.56 447.35 442.56 447.20 447.20 345410000
1993-02-04 447.20 449.86 447.20 449.56 449.56 351140000
1993-02-05 449.56 449.56 446.95 448.93 448.93 324710000
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
yesterday = datetime.now()-timedelta(1)
datetime.strftime(yesterday, "%Y-%m-%d")
SP500 = yf.download('^GSPC', start='1993-02-01', end=yesterday)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
SP500f = SP500.head()
SP500f.groupby
#Stats_Vol = SP500["Volume"]
#Date = SP500["Date"]
#print(Stats_Vol)
#print(Stats_Vol.describe())
#sns.barplot(data=SP500, y="Volume")
#print(Stats_Vol.rolling(250).mean().plot())
plt.show()

Primarily you need to access the Date which is the index
could reset_index() to make it a column
there are two many dates to plot so resampled and then created a new column for display format on x-axis
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
yesterday = datetime.now()-timedelta(1)
fig, ax = plt.subplots()
SP500 = yf.download('^GSPC', start='1993-02-01', end=yesterday)
# too many days, resample
# do a display format for date (which is the index)
sns.barplot(data=SP500.loc[:,"Volume"]\
.resample("Y").mean().to_frame()\
.assign(GDate=lambda dfa: dfa.index.strftime("%Y")),
x="GDate", y="Volume", ax=ax)
# rotate the labels
l = ax.set_xticklabels(ax.get_xticklabels(), rotation = 90)

Related

Adding arrows to mpf finance plots

I am trying to add an arrow on a given date and price to mpf plot. To do this i have the following code:
import pandas as pd
import yfinance as yf
import datetime
from dateutil.relativedelta import relativedelta
import pandas as pd, mplfinance as mpf, matplotlib.pyplot as plt
db = yf.download(tickers='goog', start=datetime.datetime.now()-relativedelta(days=7), end= datetime.datetime.now(), interval="5m")
db = db.dropna()
a = db['Close'][31:32]
test = mpf.make_addplot(a, type='scatter', markersize=200, marker='^')
mpf.plot(db, type='candle', style= 'charles', addplot=test)
But it is producing the following error:
ValueError: x and y must be the same size
Could you please advise how can i resolve this.
The data passed into mpf.make_addplot() must be the same length as the dataframe passed into mpf.plot(). To plot only some points, the remaining points must be filled with nan values (float('nan'), or np.nan).
You can see this clearly in the documentation at cell **In [7]** (and used in the following cells). See there where the signal data is generated as follows:
def percentB_belowzero(percentB,price):
import numpy as np
signal = []
previous = -1.0
for date,value in percentB.iteritems():
if value < 0 and previous >= 0:
signal.append(price[date]*0.99)
else:
signal.append(np.nan) # <- Make `nan` where no marker needed.
previous = value
return signal
Note: alternatively the signal data can be generated by first initializing to all nan values, and then replacing those nans where you want your arrows:
signal = [float('nan')]*len(db)
signal[31] = db['Close'][31:32]
test = mpf.make_addplot(signal, type='scatter', markersize=200, marker='^')
...
If your ultimate goal is to add an arrow to the title of the question, you can add it in the way shown in #Daniel Goldfarb's How to add value of hlines in y axis using mplfinance python. I used this answer to create a code that meets the end goal. As you can see in the answer, the way to do this is to get the axis and then add an annotation for that axis, where 31 is the date/time index and a[0] is the closing price.
import pandas as pd
import yfinance as yf
import datetime
from dateutil.relativedelta import relativedelta
import pandas as pd
import mplfinance as mpf
import matplotlib.pyplot as plt
db = yf.download(tickers='goog', start=datetime.datetime.now()-relativedelta(days=7), end= datetime.datetime.now(), interval="5m")
db = db.dropna()
a = db['Close'][31:32]
#test = mpf.make_addplot(a, type='scatter', markersize=200, marker='^')
fig, axlist = mpf.plot(db, type='candle', style= 'charles', returnfig=True)#addplot=test
axlist[0].annotate('X', (31, a[0]), fontsize=20, xytext=(34, a[0]+20),
color='r',
arrowprops=dict(
arrowstyle='->',
facecolor='r',
edgecolor='r'))
mpf.show()

how to change xy axis with matplot in python

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
corona_data = pd.read_csv("서울시 코로나19 확진자 현황 csv.csv", encoding="cp949")
confirmed_dates = corona_data["확진일"]
confirmed_date = [datetime.strptime(date, "%Y-%m-%d") for date in confirmed_dates]
corona_data["확진일"]= confirmed_date
plt.rc('font', family='Malgun Gothic')
corona_data["확진일"].plot(title="확진일 별 확진자 추이")
plt.show()
This plot show x-axis is just number and y-axis is date but I wanna change x-axis is date and y-axis is number how can I solve it?
If your data is in a dataframe, I recommend using Seaborn to visualize it. It has a great API that allows you to plot elements of your dataframe by referening column names. Here is a toy example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load data
df = pd.read_csv(...)
# Plot scatter plot
sns.scatter(x='col_1', y='col_2', data=df)
plt.show()
Check out the Seaborn documentation for more
The problem seems to be that your dataframe only contains one dataset which are the dated. You could add a column that contains the row numbers and then select what you want to have on x and y axis by passing the column name to the plot function:
import matplotlib.pyplot as plt
from datetime import datetime
corona_data = pd.read_csv("서울시 코로나19 확진자 현황 csv.csv", encoding="cp949")
confirmed_dates = corona_data["확진일"]
confirmed_date = [datetime.strptime(date, "%Y-%m-%d") for date in confirmed_dates]
corona_data["확진일"]= confirmed_date
# now add the numbers to the dataset
corona_data["numbers"]=[i for i in len(confirmed_dates)]
plt.rc('font', family='Malgun Gothic')
# and tell the plot function that you want "확진일" as x ans "numbers" as y axis
corona_data.plot("확진일","numbers",title="확진일 별 확진자 추이")
plt.show()```

How to omit only weekends from my data frame?

I am working on a project to create an algorithmic trader. However, I want to remove the weekends from my data frame as it ruins the data as shown in I have tried to do somethings I found on StackOverflow but I get an error that the type is Timestamp and so I can't use that technique. It also isn't a column in the data frame. I'm new to python so I'm not very sure but I think it's an index since when I go through the .index function it shows me the date and time. I'm sorry if these are stupid questions but I am new to python and pandas.
Here is my code:
#import all the libraries
import nsetools as ns
import pandas as pd
import numpy
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
print(hist)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()
EDIT: On the suggestion of a user, I tried to modify my code to this
refinedlist = hist[hist.index.dayofweek<5]plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
refinedlist = hist[hist.index.dayofweek<5]
print (refinedlist)
And graphed that, but the graph still includes the weekends on the x axis.
In the first place, stock market data does not exist because the market is closed on holidays and national holidays. The reason for this is that your unit of acquisition is time, so there is also no data from the time the market closes to the time it opens the next day.
For example, I graphed the first 50 results. (The x-axis doesn't seem to be correct.)
plt.plot(hist['Close'][:50], label=a)
As one example, if you include holidays and national holidays and draw a graph with missing values for the times when the market is not open, you get the following.
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
# plt.style.use('fivethirtyeight')
# a = input("Enter the ticker name you wish to apply strategy to")
a = 'AAPL'
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()

mpl_finance remove empty dates on Candlestick Python

I'm working with a DataFrame. My data is using for a Candlestick.
The problem is I can't remove the weekend dates. I mean, my code shows this:
enter image description here
And I'm looking for this:
enter image description here
Here is my code:
import matplotlib.ticker as ticker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_finance import candlestick_ohlc
df = pd.read_csv('AAPL.csv')
df['Date'] = pd.to_datetime(df['Date'])
df["Date"] = df["Date"].apply(mdates.date2num)
dates = df['Date'].tolist()
ohlc = df[['Date', 'Open', 'High', 'Low','Close']]
f1, ax = plt.subplots(figsize = (12,6))
candlestick_ohlc(ax, ohlc.values, width=.5, colorup='green', colordown='red')
ax.xaxis.set_major_locator(ticker.MultipleLocator(1.0))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.setp(ax.get_xticklabels(), rotation=70, fontsize=7)
close = df['Close'].values
plt.plot(dates,close, marker='o')
plt.show()
Dataframe:
Date,Open,High,Low,Close,Adj Close,Volume
2019-02-04,167.410004,171.660004,167.279999,171.250000,170.518677,31495500
2019-02-05,172.860001,175.080002,172.350006,174.179993,173.436157,36101600
2019-02-06,174.649994,175.570007,172.850006,174.240005,173.495911,28239600
2019-02-07,172.399994,173.940002,170.339996,170.940002,170.210007,31741700
2019-02-08,168.990005,170.660004,168.419998,170.410004,170.410004,23820000
2019-02-11,171.050003,171.210007,169.250000,169.429993,169.429993,20993400
2019-02-12,170.100006,171.000000,169.699997,170.889999,170.889999,22283500
2019-02-13,171.389999,172.479996,169.919998,170.179993,170.179993,22490200
2019-02-14,169.710007,171.259995,169.380005,170.800003,170.800003,21835700
2019-02-15,171.250000,171.699997,169.750000,170.419998,170.419998,24626800
2019-02-19,169.710007,171.440002,169.490005,170.929993,170.929993,18972800
2019-02-20,171.190002,173.320007,170.990005,172.029999,172.029999,26114400
2019-02-21,171.800003,172.369995,170.300003,171.059998,171.059998,17249700
2019-02-22,171.580002,173.000000,171.380005,172.970001,172.970001,18913200
2019-02-25,174.160004,175.869995,173.949997,174.229996,174.229996,21873400
2019-02-26,173.710007,175.300003,173.169998,174.330002,174.330002,17070200
2019-02-27,173.210007,175.000000,172.729996,174.869995,174.869995,27835400
2019-02-28,174.320007,174.910004,172.919998,173.149994,173.149994,28215400
This is "NOT" enough solution, but I can suggest something for u.
Just use
import mplfinance as mpf
mpf.plot(df, type='candle')
This ignores non-trading days automatically in the plot and make me happier little bit, though I couldn't be fully-satisfied with. I hope this would help u.
Check this out.
https://github.com/matplotlib/mplfinance#basic-usage
You can slice it from the dataframe before processing
please check this link Remove non-business days rows from pandas dataframe
Do not use date/time as your index but use a candle number as index.
then your data becomes continuously and you have no interruption of the time series.
So use candle number as Index , for plotting the data you need to plot it not with a date/time
If you want plot with a date/time you need to use a column where you have put the timestamp of the candle and put that into a plot .. but then you will have gaps again.
Try to filter your dataframe.
df = df[df.Open.notnull()]
Add this to your plot.
show_nontrading=False

How do I plot a graph with x_lims between time h1:m1 and h2:m2

I'm working on a project with loads of temperature data and I'm currently processing and plotting all of my data. However, I keep falling foul when I try to set x_lims on my plots between a time1 (9:00) and time2 (21:00)
Data background:
The sensor has collected data every second for two weeks and I've split the main data file into smaller daily files (e.g. dayX). Each day contains a timestamp (column = 'timeStamp') and a mean temperature (column = 'meanT').
The data for each day has been presliced just slightly over the window I want to plot (i.e. dayX contains data from 8:55:00 - 21:05:00). The dataset contains NaN values at some points as the sensors were not worn and data needed to be discarded.
Goal:
What I want to do is to be able to plot the dayX data between a set time interval (x_lim = 9:00 - 21:00). As I have many days of data, I eventually want to plot each day using the same x axis (I want them as separate figures however, not subplots), but each day has different gaps in the main data set, so I want to set constant x lims. As I have many different days of data, I'd rather not have to specify the date as well as the time.
Example data:
dayX =
timeStamp meanT
2018-05-10 08:55:00 NaN
. .
. .
. .
2018-05-10 18:20:00 32.4
. .
. .
. .
2018-05-10 21:05:00 32.0
What I've tried:
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import date2num, DateFormatter
dayX = pd.read_csv('path/to/file/dayX.csv)
dayX['timeStamp'] = pd.to_datetime(dayX['timeStamp'], format=%Y %m %d %H:%M:%S.%f')
fig, ax1 = plt.subplots(1,1)
ax1.plot(dayX['timeStamp'], dayX['meanT'])
ax1.xaxis.set_major_formatter(DateFormatter('%H:%M'))
ax1.set_xlim(pd.Timestamp('9:00'), pd.Timestamp('21:00'))
fig.autofmt_xdate()
plt.show()
Which gives:
If I remove the limit line however, the data plots okay, but the limits are automatically selected
# Get rid of this line:
ax1.set_xlim(pd.Timestamp('9:00'), pd.Timestamp('21:00'))
# Get this:
I'm really not sure why this is going wrong or what else I should be trying.
Your timeStamp is a datetime object. All you got to do is pass the datetime objects as the limits.
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import date2num, DateFormatter
dayX = df
dayX['timeStamp'] = pd.to_datetime(dayX['timeStamp'], format='%Y-%m-%d %H:%M:%S')
fig, ax1 = plt.subplots(1,1)
ax1.plot(dayX['timeStamp'], dayX['meanT'])
ax1.xaxis.set_major_formatter(DateFormatter('%H:%M'))
ax1.set_xlim(df['timeStamp'].min().replace(hour=9), df['timeStamp'].min().replace(hour=21))
fig.autofmt_xdate()
plt.show()
Output:
You probably need to construct a full timestamp object since it'll default to today's date, which has no data in your case. the following snippet shoudl replace the ax1.set_xlim line in your code, and should also work for starting and ending multiday time ranges on specific hours of your choosing.
min_h = 9 # hours
max_h = 21 # hours
start = dayX['timeStamp'].min()
end = dayX['timeStamp'].max()
xmin = pd.Timestamp(year=start.year, month=start.month, day=start.day, hour=min_h)
xmax = pd.Timestamp(year=end.year, month=end.month, day=end.day, hour=max_h)
ax1.set_xlim(xmin, xmax)

Categories

Resources