pyplot xticklabel (date/time) sometimes off by a minute - python

I have a program that uses matplotlib.pyplot to produce a couple of graphs. On the x-axis I want to label the major ticks with a 4-hour time interval (so: 00:00, 04:00, 08:00 etc.)
When I plot the graph, the first few labels are okay but the rest isn't.
The code that I'm using (not showing the ax3 being set-up and loading of the data as this is IMHO off-topic):
import matplotlib as mpl
mpl.use("Agg") # activate Anti-Grain Geometry library
import matplotlib.pyplot as plt
import numpy as nmp
:
:
hours = mpl.dates.HourLocator()
fourhours = 4. / 24.
# [DAY]
major_ticks = nmp.arange(nmp.ceil(DY[1, 0]/fourhours)*fourhours, DY[-1, 0], fourhours)
ax3.set_xlabel('past day')
ax3.grid(True)
ax3.set_ylim([Ymin, Ymax])
ax3.set_xlim([DY[1, 0], DY[-1, 0]])
#
t = nmp.array(DY[:, 0]) # date/time
ax3.set_xticklabels(t, size='small')
ax3.set_yticklabels([])
ax3.set_xticks(major_ticks)
ax3.xaxis.set_major_formatter(mpl.dates.DateFormatter('%R'))
ax3.grid(which='major', alpha=0.5)
ax3.xaxis.set_minor_locator(hours)
ax3.grid(which='minor', alpha=0.2)
#
s = nmp.array(DY[:, 2]) # averages
slo = nmp.array(DY[:, 1]) # minima
shi = nmp.array(DY[:, 3]) # maxima
line, = ax3.plot(t, s, marker='.', linestyle='', color='red', lw=2)
ax3.fill_between(t, slo, shi, interpolate=True, color='red', alpha=0.2)
DY[1,0] contains the value 736364.444444
DY[-1,0] is 736365.458333
and major_ticks then becomes:
[ 736364.5 736364.66666667 736364.83333333 736365. 736365.16666667 736365.33333333]
This all looks fine to me but the resulting graph doesn't:
Any suggestions on how to fix this are welcome.
#j-p-petersen proposed to use linspace:
I replaced the line that calculates major_ticks = ... with this code:
intervals = int((DY[-1, 0] - DY[1, 0]) / fourhours) + 1
major_ticks = np.linspace(np.ceil(DY[1, 0]/fourhours)*fourhours, np.floor(DY[-1, 0]/fourhours)*fourhours, intervals)
This reduced the problem but still some ticks show as one minute before the hour i.s.o. on the hour.

I think this is due to a rounding error in the floating point arithmetic in here:
fourhours = 4. / 24.
Instead of using arange you could use linspace, this way it will not undershoot the target.
Alternatively you could do the calculation using timedelta and deltatime, and then afterwards convert it the Matplotlib's number dates with matplotlib.dates.date2num.

Related

Plotting in a zooming in matplotlib subplot

This question is from this tutorial found here:
I want my plot to look like the one below but with time series data and the zoomed data not being x_lim , y_lim data but from a different source.
So in the plot above i would like the intraday data that is from a different source and the plot below would be daily data for some stock. But because they both have different source i cannot use a limit to zoom. For this i will be using yahoo datareader for daily and yfinance for intraday.
The code:
import pandas as pd
from pandas_datareader import data as web
from matplotlib.patches import ConnectionPatch
df = web.DataReader('goog', 'yahoo')
df.Close = pd.to_numeric(df['Close'], errors='coerce')
fig = plt.figure(figsize=(6, 5))
plt.subplots_adjust(bottom = 0., left = 0, top = 1., right = 1)
sub1 = fig.add_subplot(2,2,1)
sub1 = df.Close.plot()
sub2 = fig.add_subplot(2,1,2) # two rows, two columns, second cell
df.Close.pct_change().plot(ax =sub2)
sub2.plot(theta, y, color = 'orange')
con1 = ConnectionPatch(xyA=(df[1:2].index, df[2:3].Close), coordsA=sub1.transData,
xyB=(df[4:5].index, df[5:6].Close), coordsB=sub2.transData, color = 'green')
fig.add_artist(con1)
I am having trouble with xy coordinates. With the code above i am getting :
TypeError: Cannot cast array data from dtype('O') to dtype('float64')
according to the rule 'safe'
xyA=(df[1:2].index, df[2:3].Close)
What i had done here is that my xvalue is the date df[1:2].index and my y value is the price df[2:3].Close
Is converting the df to an array and then ploting my only option here? If there is any other way to get the ConnectionPatch to work kindly please advise.
df.dtypes
High float64
Low float64
Open float64
Close float64
Volume int64
Adj Close float64
dtype: object
The way matplotlib dates are plotted are by converting dates to floats as a number of days, starting with 0 on 1970-1-1, i.e. the POSIX timestamp zero. It’s different from that timestamp as it’s not the same resolution, i.e. “1” is a day instead of a second.
There’s 3 ways to compute that number,
either use matplotlib.dates.date2num
or use .toordinal() which gives you the right resolution and remove the offset corresponding to 1970-1-1,
or get the POSIX timestamp and divide by the number of seconds in a day:
df['Close'] = pd.to_numeric(df['Close'], errors='coerce')
df['Change'] = df['Close'].pct_change()
con1 = ConnectionPatch(xyA=(df.index[0].toordinal() - pd.Timestamp(0).toordinal(), df['Close'].iloc[0]), coordsA=sub1.transData,
xyB=(df.index[1].toordinal() - pd.Timestamp(0).toordinal(), df['Change'].iloc[1]), coordsB=sub2.transData, color='green')
fig.add_artist(con1)
con2 = ConnectionPatch(xyA=(df.index[-1].timestamp() / 86_400, df['Close'].iloc[-1]), coordsA=sub1.transData,
xyB=(df.index[-1].timestamp() / 86_400, df['Change'].iloc[-1]), coordsB=sub2.transData, color='green')
fig.add_artist(con2)
You also need to make sure that you’re using values that are in range for the targeted axes, in your example you use Close values on sub2 which contains pct_change’d values.
Of course if you want the bottom of the boxes as in your example it’s easier to express the coordinates using the axes transform instead of the data transform:
from matplotlib.dates import date2num
con1 = ConnectionPatch(xyA=(0, 0), coordsA=sub1.transAxes,
xyB=(date2num(df.index[1]), df['Change'].iloc[1]), coordsB=sub2.transData, color='green')
fig.add_artist(con1)
con2 = ConnectionPatch(xyA=(1, 0), coordsA=sub1.transAxes,
xyB=(date2num(df.index[-1]), df['Change'].iloc[-1]), coordsB=sub2.transData, color='green')
fig.add_artist(con2)
To plot your candlesticks, I’d recommend using the mplfinance (previously matplotlib.finance) package:
import mplfinance as mpf
sub3 = fig.add_subplot(2, 2, 2)
mpf.plot(df.iloc[30:70], type='candle', ax=sub3)
Putting all this together in a single script, it could look like this:
import pandas as pd, mplfinance as mpf, matplotlib.pyplot as plt
from pandas_datareader import data as web
from matplotlib.patches import ConnectionPatch
from matplotlib.dates import date2num, ConciseDateFormatter, AutoDateLocator
from matplotlib.ticker import PercentFormatter
# Get / compute data
df = web.DataReader('goog', 'yahoo')
df['Close'] = pd.to_numeric(df['Close'], errors='coerce')
df['Change'] = df['Close'].pct_change()
# Pick zoom range
zoom_start = df.index[30]
zoom_end = df.index[30 + 8 * 5] # 8 weeks ~ 2 months
# Create figures / axes
fig = plt.figure(figsize=(18, 12))
top_left = fig.add_subplot(2, 2, 1)
top_right = fig.add_subplot(2, 2, 2)
bottom = fig.add_subplot(2, 1, 2)
fig.subplots_adjust(hspace=.35)
# Plot all 3 data
df['Close'].plot(ax=bottom, linewidth=1, rot=0, title='Daily closing value', color='purple')
bottom.set_ylim(0)
df.loc[zoom_start:zoom_end, 'Change'].plot(ax=top_left, linewidth=1, rot=0, title='Daily Change, zoomed')
top_left.yaxis.set_major_formatter(PercentFormatter())
# Here instead of df.loc[...] use your intra-day data
mpf.plot(df.loc[zoom_start:zoom_end], type='candle', ax=top_right, xrotation=0, show_nontrading=True)
top_right.set_title('Last day OHLC')
# Put ConciseDateFormatters on all x-axes for fancy date display
for ax in fig.axes:
locator = AutoDateLocator()
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(ConciseDateFormatter(locator))
# Add the connection patches
fig.add_artist(ConnectionPatch(
xyA=(0, 0), coordsA=top_left.transAxes,
xyB=(date2num(zoom_start), df.loc[zoom_start, 'Close']), coordsB=bottom.transData,
color='green'
))
fig.add_artist(ConnectionPatch(
xyA=(1, 0), coordsA=top_left.transAxes,
xyB=(date2num(zoom_end), df.loc[zoom_end, 'Close']), coordsB=bottom.transData,
color='green'
))
plt.show()

How do I make my histogram of unequal bins show properly?

My data consists of the following:
Majority numbers < 60, and then a few outliers that are in the 2000s.
I want to display it in a histogram with the following bin ranges:
0-1, 1-2, 2-3, 3-4, ..., 59-60, 60-max
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.axes as axes
b = list(range(61)) + [2000] # will make [0, 1, ..., 60, 2000]
plt.hist(b, bins=b, edgecolor='black')
plt.xticks(b)
plt.show()
This shows the following:
Essentially what you see is all the numbers 0 .. 60 squished together on the left, and the 2000 on the right. This is not what I want.
So I remove the [2000] and get something like what I am looking for:
As you can see now it is better, but I still have the following problems:
How do I fix this such that the graph doesn't have any white space around (there's a big gap before 0 and after 60).
How do I fix this such that after 60, there is a 2000 tick that shows at the very end, while still keeping roughly the same spacing (not like the first?)
Here is one hacky solution using some random data. I still don't quite understand your second question but I tried to do something based on your wordings
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.axes as axes
fig, ax = plt.subplots(figsize=(12, 6))
data= np.random.normal(10, 5, 5000)
upper = 31
outlier = 2000
data = np.append(data, 100*[upper])
b = list(range(upper)) + [upper]
plt.hist(data, bins=b, edgecolor='black')
plt.xticks(b)
b[-1] = outlier
ax.set_xticklabels(b)
plt.xlim(0, upper)
plt.show()

Normalizing a histogram with matplotlib

I want to plot a histogram with Matplotlib, but I'd like the bins' values to represent the percentage of the total observations. A MWE would be like this:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy
sns.set(style='dark')
imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')
luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())
top_left = plt.subplot(121)
top_left.imshow(luminance)
bottom_left = plt.subplot(122)
sns.distplot(luminance.flatten(), kde_kws={"cumulative": True})
# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
The CDF here is OK (range: [0, 1]), but the resulting histogram doesn't match my expectations:
Why are the histogram's results in the range [0, 4]? Is there any way to fix this?
What you think you want
Here's how to plot the histogram such that the bins sum to 1:
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy as np
sns.set(style='dark')
imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')
luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())
# get the histogram values
heights,edges = np.histogram(luminance.flat, bins=30)
binCenters = (edges[:-1] + edges[1:])/2
# norm the heights
heights = heights/heights.sum()
# get the cdf
cdf = heights.cumsum()
left = plt.subplot(121)
left.imshow(luminance)
right = plt.subplot(122)
right.plot(binCenters, cdf, binCenters, heights)
# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
# confirm that the hist vals sum to 1
print('heights sum: %.2f' % heights.sum())
output:
heights sum: 1.00
The actual answer
This one is actually super easy. Just do
sns.distplot(luminance.flatten(), kde_kws={"cumulative": True}, norm_hist=True)
Here's what I get when I run your script with the above modification:
Surprise twist!
So it turns out that your histogram was normalized all along, as per the formal identity:
In plain(er) English, the general practice is to norm continuously valued histograms (ie their observations can be expressed as floating point number) in terms of their density. So in this case the sum of the bin widths times the bin heights will 1.0, as you can see by running this simplified version of your script:
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import numpy as np
imagen2 = plt.figure(1, figsize=(4,3))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')
luminance = numpy.random.randn(1000, 1000)
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())
heights,edges,patches = plt.hist(luminance.ravel(), density=True, bins=30)
widths = edges[1:] - edges[:-1]
totalWeight = (heights*widths).sum()
# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
print(totalWeight)
And the totalWeight will indeed be exactly equal to 1.0, give or take a smidge of rounding error.
tel's answer is great! I just want to provide an alternative to give you the histogram you want with less lines. The key idea is to use weights arguments in the matplotlib hist function to normalize counts. You can replace your sns.distplot(luminance.flatten(), kde_kws={"cumulative": True}) with the following three lines of code:
lf = luminance.flatten()
sns.kdeplot(lf, cumulative=True)
sns.distplot(lf, kde=False,
hist_kws={'weights': numpy.full(len(lf), 1/len(lf))})
If you want to see the histogram on a second y-axis (better visual), add ax=bottom_left.twinx() to sns.distplot:

Matplotlib log scale tick label number formatting

With matplotlib when a log scale is specified for an axis, the default method of labeling that axis is with numbers that are 10 to a power eg. 10^6. Is there an easy way to change all of these labels to be their full numerical representation? eg. 1, 10, 100, etc.
Note that I do not know what the range of powers will be and want to support an arbitrary range (negatives included).
Sure, just change the formatter.
For example, if we have this plot:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.axis([1, 10000, 1, 100000])
ax.loglog()
plt.show()
You could set the tick labels manually, but then the tick locations and labels would be fixed when you zoom/pan/etc. Therefore, it's best to change the formatter. By default, a logarithmic scale uses a LogFormatter, which will format the values in scientific notation. To change the formatter to the default for linear axes (ScalarFormatter) use e.g.
from matplotlib.ticker import ScalarFormatter
for axis in [ax.xaxis, ax.yaxis]:
axis.set_major_formatter(ScalarFormatter())
I've found that using ScalarFormatter is great if all your tick values are greater than or equal to 1. However, if you have a tick at a number <1, the ScalarFormatter prints the tick label as 0.
We can use a FuncFormatter from the matplotlib ticker module to fix this issue. The simplest way to do this is with a lambda function and the g format specifier (thanks to #lenz in comments).
import matplotlib.ticker as ticker
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda y, _: '{:g}'.format(y)))
Note in my original answer I didn't use the g format, instead I came up with this lambda function with FuncFormatter to set numbers >= 1 to their integer value, and numbers <1 to their decimal value, with the minimum number of decimal places required (i.e. 0.1, 0.01, 0.001, etc). It assumes that you are only setting ticks on the base10 values.
import matplotlib.ticker as ticker
import numpy as np
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda y,pos: ('{{:.{:1d}f}}'.format(int(np.maximum(-np.log10(y),0)))).format(y)))
For clarity, here's that lambda function written out in a more verbose, but also more understandable, way:
def myLogFormat(y,pos):
# Find the number of decimal places required
decimalplaces = int(np.maximum(-np.log10(y),0)) # =0 for numbers >=1
# Insert that number into a format string
formatstring = '{{:.{:1d}f}}'.format(decimalplaces)
# Return the formatted tick label
return formatstring.format(y)
ax.yaxis.set_major_formatter(ticker.FuncFormatter(myLogFormat))
I found Joe's and Tom's answers very helpful, but there are a lot of useful details in the comments on those answers. Here's a summary of the two scenarios:
Ranges above 1
Here's the example code like Joe's, but with a higher range:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.axis([1, 10000, 1, 1000000])
ax.loglog()
plt.show()
That shows a plot like this, using scientific notation:
As in Joe's answer, I use a ScalarFormatter, but I also call set_scientific(False). That's necessary when the scale goes up to 1000000 or above.
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter
fig, ax = plt.subplots()
ax.axis([1, 10000, 1, 1000000])
ax.loglog()
for axis in [ax.xaxis, ax.yaxis]:
formatter = ScalarFormatter()
formatter.set_scientific(False)
axis.set_major_formatter(formatter)
plt.show()
Ranges below 1
As in Tom's answer, here's what happens when the range goes below 1:
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter
fig, ax = plt.subplots()
ax.axis([0.01, 10000, 1, 1000000])
ax.loglog()
for axis in [ax.xaxis, ax.yaxis]:
formatter = ScalarFormatter()
formatter.set_scientific(False)
axis.set_major_formatter(formatter)
plt.show()
That displays the first two ticks on the x axis as zeroes.
Switching to a FuncFormatter handles that. Again, I had problems with numbers 1000000 or higher, but adding a precision to the format string solved it.
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
fig, ax = plt.subplots()
ax.axis([0.01, 10000, 1, 1000000])
ax.loglog()
for axis in [ax.xaxis, ax.yaxis]:
formatter = FuncFormatter(lambda y, _: '{:.16g}'.format(y))
axis.set_major_formatter(formatter)
plt.show()
regarding these questions
What if I wanted to change the numbers to, 1, 5, 10, 20?
– aloha Jul 10 '15 at 13:26
I would like to add ticks in between, like 50,200, etc.., How can I do
that? I tried, set_xticks[50.0,200.0] but that doesn't seem to work!
– ThePredator Aug 3 '15 at 12:54
But with ax.axis([1, 100, 1, 100]), ScalarFormatter gives 1.0, 10.0, ... which is not what I desire. I want it to give integers...
– CPBL Dec 7 '15 at 20:22
you can solve those issue like this with MINOR formatter:
ax.yaxis.set_minor_formatter(matplotlib.ticker.ScalarFormatter())
ax.yaxis.set_minor_formatter(matplotlib.ticker.FormatStrFormatter("%.8f"))
ax.set_yticks([0.00000025, 0.00000015, 0.00000035])
in my application I'm using this format scheme, which I think solves most issues related to log scalar formatting; the same could be done for data > 1.0 or x axis formatting:
plt.ylabel('LOGARITHMIC PRICE SCALE')
plt.yscale('log')
ax.yaxis.set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.yaxis.set_major_formatter(matplotlib.ticker.FormatStrFormatter("%.8f"))
ax.yaxis.set_minor_formatter(matplotlib.ticker.ScalarFormatter())
ax.yaxis.set_minor_formatter(matplotlib.ticker.FormatStrFormatter("%.8f"))
#####################################################
#force 'autoscale'
#####################################################
yd = [] #matrix of y values from all lines on plot
for n in range(len(plt.gca().get_lines())):
line = plt.gca().get_lines()[n]
yd.append((line.get_ydata()).tolist())
yd = [item for sublist in yd for item in sublist]
ymin, ymax = np.min(yd), np.max(yd)
ax.set_ylim([0.9*ymin, 1.1*ymax])
#####################################################
z = []
for i in [0.0000001, 0.00000015, 0.00000025, 0.00000035,
0.000001, 0.0000015, 0.0000025, 0.0000035,
0.00001, 0.000015, 0.000025, 0.000035,
0.0001, 0.00015, 0.00025, 0.00035,
0.001, 0.0015, 0.0025, 0.0035,
0.01, 0.015, 0.025, 0.035,
0.1, 0.15, 0.25, 0.35]:
if ymin<i<ymax:
z.append(i)
ax.set_yticks(z)
for comments on "force autoscale" see: Python matplotlib logarithmic autoscale
which yields:
then to create a general use machine:
# user controls
#####################################################
sub_ticks = [10,11,12,14,16,18,22,25,35,45] # fill these midpoints
sub_range = [-8,8] # from 100000000 to 0.000000001
format = "%.8f" # standard float string formatting
# set scalar and string format floats
#####################################################
ax.yaxis.set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.yaxis.set_major_formatter(matplotlib.ticker.FormatStrFormatter(format))
ax.yaxis.set_minor_formatter(matplotlib.ticker.ScalarFormatter())
ax.yaxis.set_minor_formatter(matplotlib.ticker.FormatStrFormatter(format))
#force 'autoscale'
#####################################################
yd = [] #matrix of y values from all lines on plot
for n in range(len(plt.gca().get_lines())):
line = plt.gca().get_lines()[n]
yd.append((line.get_ydata()).tolist())
yd = [item for sublist in yd for item in sublist]
ymin, ymax = np.min(yd), np.max(yd)
ax.set_ylim([0.9*ymin, 1.1*ymax])
# add sub minor ticks
#####################################################
set_sub_formatter=[]
for i in sub_ticks:
for j in range(sub_range[0],sub_range[1]):
set_sub_formatter.append(i*10**j)
k = []
for l in set_sub_formatter:
if ymin<l<ymax:
k.append(l)
ax.set_yticks(k)
#####################################################
yields:
The machinery outlined in the accepted answer works great, but sometimes a simple manual override is easier. To get ticks at 1, 10, 100, 1000, for example, you could say:
ticks = 10**np.arange(4)
plt.xticks(ticks, ticks)
Note that it is critical to specify both the locations and the labels, otherwise matplotlib will ignore you.
This mechanism can be used to obtain arbitrary formatting. For instance:
plt.xticks(ticks, [ f"{x:.0f}" for x in ticks ])
or
plt.xticks(ticks, [ f"10^{int(np.log10(x))}" for x in ticks ])
or
plt.xticks(ticks, [ romannumerals(x) for x in ticks ])
(where romannumerals is an imagined function that converts its argument into Roman numerals).
As an aside, this technique also works if you want ticks at arbitrary intervals, e.g.,
ticks = [1, 2, 5, 10, 20, 50, 100]
etc.
import matplotlib.pyplot as plt
plt.rcParams['axes.formatter.min_exponent'] = 2
plt.xlim(1e-5, 1e5)
plt.loglog()
plt.show()
This will become default for all plots in a session.
See also: LogFormatter tickmarks scientific format limits

Matplotlib Pcolormesh - in what format should I give the data?

I'm trying to use matplotlib's pcolormesh function to draw a diagram that shows dots in 2d coordinates, and the color of the dots would be defined by a number.
I have three arrays, one of which has the x-coordinates, another one with the y-coordinates, and the third one has the numbers which should represent colors.
xdata = [ 695422. 695423. 695424. 695425. 695426. 695426.]
ydata = [ 0. -15.4 -15.3 -15.7 -15.5 -19. ]
colordata = [ 0. 121. 74. 42. 8. 0.]
Now, apparently pcolormesh wants its data as three 2d arrays.
In some examples I've seen something like this being done:
newxdata, newydata = np.meshgrid(xdata,ydata)
Okay, but how do I get colordata into a similar format? I tried to it this way:
newcolordata, zz = np.meshgrid(colordata, xdata)
But I'm not exactly sure if it's right. Now, if I try to draw the diagram:
ax.pcolormesh(newxdata, newydata, newcolordata)
I get something that looks like this.
No errors, so I guess that's good. The picture it returns obviously doesn't look like what I want it to. Can someone point me into right direction with this? Is the data array still in wrong format?
This should be all of the important code:
newxdata, newydata = np.meshgrid(xdata,ydata)
newcolordata, zz = np.meshgrid(colordata, xdata)
print newxdata
print newydata
print newcolordata
diagram = plt.figure()
ax = diagram.add_subplot(111)
xformat = DateFormatter('%d/%m/%Y')
ax.xaxis_date()
plot1 = ax.pcolormesh(newxdata, newydata, newcolordata)
ax.set_title("A butterfly diagram of sunspots between dates %s and %s" % (date1, date2))
ax.autoscale(enable=False)
ax.xaxis.set_major_formatter(xformat)
diagram.autofmt_xdate()
if command == "save":
diagram.savefig('diagrams//'+name+'.png')
Edit: I noticed that the colors do correspond to the number. Now I just have to turn those equally sized bars into dots.
If you want dots, use scatter. pcolormesh draws a grid. scatter draws markers colored and/or scaled by size.
For example:
import matplotlib.pyplot as plt
xdata = [695422.,695423.,695424.,695425.,695426.,695426.]
ydata = [0.,-15.4,-15.3,-15.7,-15.5,-19.]
colordata = [0.,121.,74.,42.,8.,0.],
fig, ax = plt.subplots()
ax.scatter(xdata, ydata, c=colordata, marker='o', s=200)
ax.xaxis_date()
fig.autofmt_xdate()
plt.show()
Edit:
It sounds like you want to bin your data and sum the areas inside each bin.
If so, you can just use hist2d to do this. If you specify the areas of the sunspots as the weights to the histogram, the areas inside each bin will be summed.
Here's an example (data from here: http://solarscience.msfc.nasa.gov/greenwch.shtml, specifically, this file, formatted as described here). Most of this is reading the data. Notice that I'm specifying the vmin and then using im.cmap.set_under('none') to display anything under that value as transparent.
It's entirely possible that I'm completely misunderstanding the data here. The units may be completely incorrect (the "raw" areas given are in million-ths of the sun's surface area, I think).
from glob import glob
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def main():
files = sorted(glob('sunspot_data/*.txt'))
df = pd.concat([read_file(name) for name in files])
date = mdates.date2num(df.date)
fig, ax = plt.subplots(figsize=(10, 4))
data, xbins, ybins, im = ax.hist2d(date, df.latitude, weights=df.area/1e4,
bins=(1000, 50), vmin=1e-6)
ax.xaxis_date()
im.cmap.set_under('none')
cbar = fig.colorbar(im)
ax.set(xlabel='Date', ylabel='Solar Latitude', title='Butterfly Plot')
cbar.set_label("Percentage of the Sun's surface")
fig.tight_layout()
plt.show()
def read_file(filename):
"""This data happens to be in a rather annoying format..."""
def parse_date(year, month, day, time):
year, month, day = [int(item) for item in [year, month, day]]
time = 24 * float(time)
hour = int(time)
minute_frac = 60 * (time % 1)
minute = int(minute_frac)
second = int(60 * (minute_frac % 1))
return dt.datetime(year, month, day, hour, minute, second)
cols = dict(year=(0, 4), month=(4, 6), day=(6, 8), time=(8, 12),
area=(41, 44), latitude=(63, 68), longitude=(57, 62))
df = pd.read_fwf(filename, colspecs=cols.values(), header=None,
names=cols.keys(), date_parser=parse_date,
parse_dates={'date':['year', 'month', 'day', 'time']})
return df
main()

Categories

Resources