I try to do first steps with pandas.
After a few successful steps I stuck with the following task: display data with OHLC bars.
I downloaded data for Apple stock from Google Finance and stored it to *.csv file.
After a lot of search I wrote the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
from matplotlib.finance import candlestick_ohlc
#read stored data
#First two lines of csv:
#Date,Open,High,Low,Close
#2010-01-04,30.49,30.64,30.34,30.57
data = pd.read_csv("AAPL.csv")
#graph settings
fig, ax = plt.subplots()
ax.xaxis_date()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("AAPL")
#convert date to float format
data['Date2'] = data['Date'].map(lambda d: mdates.date2num(dt.datetime.strptime(d, "%Y-%m-%d")))
candlestick_ohlc(ax, (data['Date2'], data['Open'], data['High'], data['Low'], data['Close']))
plt.show()
But it displays empty graph.
What is wrong with this code?
Thanks.
You need to change the last line to combine tuples daily. The following code:
start = dt.datetime(2015, 7, 1)
data = pd.io.data.DataReader('AAPL', 'yahoo', start)
data = data.reset_index()
data['Date2'] = data['Date'].apply(lambda d: mdates.date2num(d.to_pydatetime()))
tuples = [tuple(x) for x in data[['Date2','Open','High','Low','Close']].values]
fig, ax = plt.subplots()
ax.xaxis_date()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
plt.xticks(rotation=45)
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("AAPL")
candlestick_ohlc(ax, tuples, width=.6, colorup='g', alpha =.4);
Produces the below plot:
which you can further tinker with.
Related
I am trying to add a list of dates to Matplotlib xticks and when I do that the actual plot disappears keeping only xticks.
For example, I have the following code:
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import (DateFormatter, rrulewrapper, RRuleLocator, YEARLY)
# Generate random data and dates
data = np.random.randn(10000)
start = dt.datetime.strptime("2019-03-14", "%Y-%m-%d")
end = dt.datetime.strptime("2046-07-30", "%Y-%m-%d")
date = [start + dt.timedelta(days=x) for x in range(0, (end-start).days)]
rule = rrulewrapper(YEARLY, byeaster=1, interval=2)
loc = RRuleLocator(rule)
formatter = DateFormatter('%d/%m/%y')
fig, ax = plt.subplots()
ax.xaxis.set_major_locator(loc)
ax.xaxis.set_major_formatter(formatter)
ax.xaxis.set_tick_params(rotation=30, labelsize=10)
plt.plot(data)
# ax.set_xlim(min(date), max(date))
plt.show()
This code plots the data which looks like this:
Now if I uncomment ax.set_xlim(min(date), max(date)) and rerun the code I get:
You can see that I only get the dates, formatted correctly but not the plot. I am not sure what the problem here. Any help would be appreciated.
Update
If I change data = np.random.randn(10000) to data = np.random.randn(1000000), then I am able to see the plot Which is not what I want
Most likely your data is plotted, but not at the correct location. If you go along that example you would need to add something like fig.autofmt_xdate() to your code.
The way to do this is by passing the date array along with data array in the plot method. That is with the given example it will be:
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import (DateFormatter, rrulewrapper, RRuleLocator, YEARLY)
# Generate random data and dates
data = np.random.randn(10000)
start = dt.datetime.strptime("2019-03-14", "%Y-%m-%d")
end = dt.datetime.strptime("2046-07-30", "%Y-%m-%d")
date = [start + dt.timedelta(days=x) for x in range(0, (end-start).days)]
rule = rrulewrapper(YEARLY, byeaster=1, interval=2)
loc = RRuleLocator(rule)
formatter = DateFormatter('%d/%m/%y')
fig, ax = plt.subplots()
ax.xaxis.set_major_locator(loc)
ax.xaxis.set_major_formatter(formatter)
ax.xaxis.set_tick_params(rotation=30, labelsize=10)
plt.plot(date, data)
ax.set_xlim(min(date), max(date))
plt.show()
Then you'll get:
See matplotlib.pyplot.plot() for more information.
I am new to Python playing around with a csv file. I would like to find a way to print out my graph by selecting a specific date range, for example 2013-03-20:2014-03-04.
Code below:
import pandas as pd
import matplotlib.pyplot as plt
prc=pd.read_csv("csv",parse_dates=True, nrows=150, usecols=["Close"])
prc_ma=prc.rolling(5).mean()
plt.plot(prc, color="blue", label="Price")
plt.plot(prc_ma, color="red", label="Moving Average")
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Moving Average")
plt.grid()
I currently work with the parameter nrows.
Thank you
Simply filter for the dates with .loc assuming datetimes are the index of dataframe:
prc = pd.read_csv("csv", parse_dates=True, nrows=150, usecols=["Close"])
prc_sub = prc.loc['2013-03-20':'2014-03-04']
To demonstrate with random data subsetted out of all days of 2013 and 2014:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.width', 1000)
np.random.seed(1001)
prc = pd.DataFrame({'PRICE': abs(np.random.randn(730))},
index=pd.date_range("2013-01-01", "2014-12-31", freq="D"))
# SUBSETTED DATAFRAME
prc_sub = prc.loc['2013-03-20':'2014-03-04']
prc_ma = prc_sub.rolling(5).mean()
plt.plot(prc_sub, color="blue", label="Price")
plt.plot(prc_ma, color="red", label="Moving Average")
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Moving Average")
plt.grid()
I'm trying to adjust the formatting of the date tick labels of the x-axis so that it only shows the Year and Month values. From what I've found online, I have to use mdates.DateFormatter, but it's not taking effect at all with my current code as is. Anyone see where the issue is? (the dates are the index of the pandas Dataframe)
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
basicDF['some_column'].plot(ax=ax, kind='bar', rot=75)
ax.xaxis_date()
Reproducible scenario code:
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
rng = pd.date_range('1/1/2014', periods=20, freq='m')
blah = pd.DataFrame(data = np.random.randn(len(rng)), index=rng)
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
blah.plot(ax=ax, kind='bar')
ax.xaxis_date()
Still can't get just the year and month to show up.
If I set the format after .plot , get an error like this:
ValueError: DateFormatter found a value of x=0, which is an illegal date. This usually occurs because you have not informed the axis that it is plotting dates, e.g., with ax.xaxis_date().
It's the same for if I put it before ax.xaxis_date() or after.
pandas just doesn't work well with custom date-time formats.
You need to just use raw matplotlib in cases like this.
import numpy
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas
N = 20
numpy.random.seed(N)
dates = pandas.date_range('1/1/2014', periods=N, freq='m')
df = pandas.DataFrame(
data=numpy.random.randn(N),
index=dates,
columns=['A']
)
fig, ax = plt.subplots(figsize=(10, 6))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax.bar(df.index, df['A'], width=25, align='center')
And that gives me:
Solution with pandas only
You can create nicely formatted ticks by using the DatetimeIndex and taking advantage of the datetime properties of the timestamps. Tick locators and formatters from matplotlib.dates are not necessary for a case like this unless you would want dynamic ticks when using the interactive interface of matplotlib for zooming in and out (more relevant for time ranges longer than in this example).
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
# Create sample time series with month start frequency, plot it with a pandas bar chart
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.date_range('1/1/2014', periods=20, freq='m')
df = pd.DataFrame(data=rng.normal(size=dti.size), index=dti)
ax = df.plot.bar(figsize=(10,4), legend=None)
# Set major ticks and tick labels
ax.set_xticks(range(df.index.size))
ax.set_xticklabels([ts.strftime('%b\n%Y') if ts.year != df.index[idx-1].year
else ts.strftime('%b') for idx, ts in enumerate(df.index)])
ax.figure.autofmt_xdate(rotation=0, ha='center');
The accepted answer claims that "pandas won't work well with custom date-time formats", but you can make use of pandas' to_datetime() function to use your existing datetime Series in the dataframe:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import pandas as pd
rng = pd.date_range('1/1/2014', periods=20, freq='m')
blah = pd.DataFrame(data = np.random.randn(len(rng)), index=pd.to_datetime(rng))
fig, ax = plt.subplots()
ax.xaxis.set_major_formatter(DateFormatter('%m-%Y'))
ax.bar(blah.index, blah[0], width=25, align='center')
Will result in:
You can see the different available formats here.
I stepped into the same problem and I used an workaround to transform the index from date time format into the desired string format:
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
rng = pd.date_range('1/1/2014', periods=20, freq='m')
blah = pd.DataFrame(data = np.random.randn(len(rng)), index=rng)
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
# transform index to strings
blah_test = blah.copy()
str_index = []
for s_year,s_month in zip(blah.index.year.values,blah.index.month.values):
# build string accorind to format "%Y-%m"
string_day = '{}-{:02d}'.format(s_year,s_month)
str_index.append(string_day)
blah_test.index = str_index
blah_test.plot(ax=ax, kind='bar', rot=45)
plt.show()
which results in the following figure:
i am just starte learn matplotlib. i am try to plot yahoo char api plot stock. i am try this program but it is not working...there is my program
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def graph():
date, closep, highp, lowp, openp, valuep = np.loadtxt('/home/najeeb/Desktop/table.csv', delimiter=',', unpack=True,
converters={0: mdates.strpdate2num('%Y-%m-%d')})
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1, axisbg='w')
plt.plot_date(x=date, y=value, fmt='-')
plt.title('title')
plt.ylabel('value')
plt.xlabel('date')
plt.show()
graph()
here is CSV file
please guide me how to solve this problem and there another any way to plot stock graph...thank you
The CSV file looked like this:
Date,Open,High,Low,Close,Volume,Adj Close
2014-10-17,97.50,99.00,96.81,97.67,68032200,97.67
2014-10-16,95.55,97.72,95.41,96.26,72110700,96.26
2014-10-15,97.97,99.15,95.18,97.54,100875400,97.54
Your code np.loadtxt() was trying to parse the header 'Date' as a date, which didn't work because that string wasn't a valid date value, so I used skiprows=1 to skip the header.
The other problem was, the CSV has 7 columns, and in your tuple you were unpacking only 6 values
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import urllib2
url = 'http://ichart.finance.yahoo.com/table.csv?s=AAPL&d=9&e=14&f=2008&g=d&a=8&b=7&c=1984&ignore=.csv'
def graph():
date, open, high, low, close, volume, adj_close = np.loadtxt(urllib2.urlopen(url), skiprows=1, delimiter=',', unpack=True, converters={0: mdates.strpdate2num('%Y-%m-%d')})
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1, axisbg='w')
plt.plot_date(x=date, y=adj_close, fmt='-')
plt.title('Apple, 1984 to 2008')
plt.ylabel('Adjusted close')
plt.xlabel('Date')
plt.show()
graph()
My data looks as follows:
2012021305, 65217
2012021306, 82418
2012021307, 71316
2012021308, 66833
2012021309, 69406
2012021310, 76422
2012021311, 94188
2012021312, 111817
2012021313, 127002
2012021314, 141099
2012021315, 147830
2012021316, 136330
2012021317, 122252
2012021318, 118619
2012021319, 115763
2012021320, 121393
2012021321, 130022
2012021322, 137658
2012021323, 139363
Where the first column is the data YYYYMMDDHH . I'm trying to graph the data using the csv2rec module. I can get the data to graph but the x axis and labels are not showing up the way that I expect them to.
import matplotlib
matplotlib.use('Agg')
from matplotlib.mlab import csv2rec
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from pylab import *
output_image_name='plot1.png'
input_filename="data.log"
input = open(input_filename, 'r')
input.close()
data = csv2rec(input_filename, names=['time', 'count'])
rcParams['figure.figsize'] = 10, 5
rcParams['font.size'] = 8
fig = plt.figure()
plt.plot(data['time'], data['count'])
ax = fig.add_subplot(111)
ax.plot(data['time'], data['count'])
hours = mdates.HourLocator()
fmt = mdates.DateFormatter('%Y%M%D%H')
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(fmt)
ax.grid()
plt.ylabel("Count")
plt.title("Count Log Per Hour")
fig.autofmt_xdate(bottom=0.2, rotation=90, ha='left')
plt.savefig(output_image_name)
I assume this has something to do with the date format. Any suggestions?
You need to convert the x-values to datetime objects
Something like:
time_vec = [datetime.strp(str(x),'%Y%m%d%H') for x in data['time']]
plot(time_vec,data['count'])
Currently, you are telling python to format integers (2012021305) as a date, which it does not know how to do, so it returns and empty string (although, I suspect that you are getting errors raised someplace).
You should also check your format string mark up.