Manually setting xticks with xaxis_date() in Python/matplotlib - python

I've been looking into how to make plots against time on the x axis and have it pretty much sorted, with one strange quirk that makes me wonder whether I've run into a bug or (admittedly much more likely) am doing something I don't really understand.
Simply put, below is a simplified version of my program. If I put this in a .py file and execute it from an interpreter (ipython) I get a figure with an x axis with the year only, "2012", repeated a number of times, like this.
However, if I comment out the line (40) that sets the xticks manually, namely 'plt.xticks(tk)' and then run that exact command in the interpreter immediately after executing the script, it works great and my figure looks like this.
Similarly it also works if I just move that line to be after the savefig command in the script, that's to say to put it at the very end of the file. Of course in both cases only the figure drawn on screen will have the desired axis, and not the saved file. Why can't I set my x axis earlier?
Grateful for any insights, thanks in advance!
import matplotlib.pyplot as plt
import datetime
# define arrays for x, y and errors
x=[16.7,16.8,17.1,17.4]
y=[15,17,14,16]
e=[0.8,1.2,1.1,0.9]
xtn=[]
# convert x to datetime format
for t in x:
hours=int(t)
mins=int((t-int(t))*60)
secs=int(((t-hours)*60-mins)*60)
dt=datetime.datetime(2012,01,01,hours,mins,secs)
xtn.append(date2num(dt))
# set up plot
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
# plot
ax.errorbar(xtn,y,yerr=e,fmt='+',elinewidth=2,capsize=0,color='k',ecolor='k')
# set x axis range
ax.xaxis_date()
t0=date2num(datetime.datetime(2012,01,01,16,35)) # x axis startpoint
t1=date2num(datetime.datetime(2012,01,01,17,35)) # x axis endpoint
plt.xlim(t0,t1)
# manually set xtick values
tk=[]
tk.append(date2num(datetime.datetime(2012,01,01,16,40)))
tk.append(date2num(datetime.datetime(2012,01,01,16,50)))
tk.append(date2num(datetime.datetime(2012,01,01,17,00)))
tk.append(date2num(datetime.datetime(2012,01,01,17,10)))
tk.append(date2num(datetime.datetime(2012,01,01,17,20)))
tk.append(date2num(datetime.datetime(2012,01,01,17,30)))
plt.xticks(tk)
plt.show()
# save to file
plt.savefig('savefile.png')

I don't think you need that call to xaxis_date(); since you are already providing the x-axis data in a format that matplotlib knows how to deal with. I also think there's something slightly wrong with your secs formula.
We can make use of matplotlib's built-in formatters and locators to:
set the major xticks to a regular interval (minutes, hours, days, etc.)
customize the display using a strftime formatting string
It appears that if a formatter is not specified, the default is to display the year; which is what you were seeing.
Try this out:
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, MinuteLocator
x = [16.7,16.8,17.1,17.4]
y = [15,17,14,16]
e = [0.8,1.2,1.1,0.9]
xtn = []
for t in x:
h = int(t)
m = int((t-int(t))*60)
xtn.append(dt.datetime.combine(dt.date(2012,1,1), dt.time(h,m)))
def larger_alim( alim ):
''' simple utility function to expand axis limits a bit '''
amin,amax = alim
arng = amax-amin
nmin = amin - 0.1 * arng
nmax = amax + 0.1 * arng
return nmin,nmax
plt.errorbar(xtn,y,yerr=e,fmt='+',elinewidth=2,capsize=0,color='k',ecolor='k')
plt.gca().xaxis.set_major_locator( MinuteLocator(byminute=range(0,60,10)) )
plt.gca().xaxis.set_major_formatter( DateFormatter('%H:%M:%S') )
plt.gca().set_xlim( larger_alim( plt.gca().get_xlim() ) )
plt.show()
Result:
FWIW the utility function larger_alim was originally written for this other question: Is there a way to tell matplotlib to loosen the zoom on the plotted data?

Related

How can i Plot arrows in a existing mplsoccer pitch?

I tried to do the tutorial of McKay Johns on YT (reference to the Jupyter Notebook to see the data (https://github.com/mckayjohns/passmap/blob/main/Pass%20map%20tutorial.ipynb).
I understood everything but I wanted to do a little change. I wanted to change plt.plot(...) with:
plt.arrow(df['x'][x],df['y'][x], df['endX'][x] - df['x'][x], df['endY'][x]-df['y'][x],
shape='full', color='green')
But the problem is, I still can't see the arrows. I tried multiple changes but I've failed. So I'd like to ask you in the group.
Below you can see the code.
## Read in the data
df = pd.read_csv('...\Codes\Plotting_Passes\messibetis.csv')
#convert the data to match the mplsoccer statsbomb pitch
#to see how to create the pitch, watch the video here: https://www.youtube.com/watch?v=55k1mCRyd2k
df['x'] = df['x']*1.2
df['y'] = df['y']*.8
df['endX'] = df['endX']*1.2
df['endY'] = df['endY']*.8
# Set Base
fig ,ax = plt.subplots(figsize=(13.5,8))
# Change background color of base
fig.set_facecolor('#22312b')
# Change color of base inside
ax.patch.set_facecolor('#22312b')
#this is how we create the pitch
pitch = Pitch(pitch_type='statsbomb',
pitch_color='#22312b', line_color='#c7d5cc')
# Set the axes to our Base
pitch.draw(ax=ax)
# X-Achsen => 0 to 120
# Y-Achsen => 80 to 0
# Lösung: Y-Achse invertieren:
plt.gca().invert_yaxis()
#use a for loop to plot each pass
for x in range(len(df['x'])):
if df['outcome'][x] == 'Successful':
#plt.plot((df['x'][x],df['endX'][x]),(df['y'][x],df['endY'][x]),color='green')
plt.scatter(df['x'][x],df['y'][x],color='green')
**plt.arrow(df['x'][x],df['y'][x], df['endX'][x] - df['x'][x], df['endY'][x]-df['y'][x],
shape='full', color='green')** # Here is the problem!
if df['outcome'][x] == 'Unsuccessful':
plt.plot((df['x'][x],df['endX'][x]),(df['y'][x],df['endY'][x]),color='red')
plt.scatter(df['x'][x],df['y'][x],color='red')
plt.title('Messi Pass Map vs Real Betis',color='white',size=20)
It always shows:
The problem is that plt.arrow has default values for head_width and head_length, which are too small for your figure. I.e. it is drawing arrows, the arrow heads are just way too tiny to see them (even if you zoom out). E.g. try something as follows:
import pandas as pd
import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
df = pd.read_csv('https://raw.githubusercontent.com/mckayjohns/passmap/main/messibetis.csv')
...
# create a dict for the colors to avoid repetitive code
colors = {'Successful':'green', 'Unsuccessful':'red'}
for x in range(len(df['x'])):
plt.scatter(df['x'][x],df['y'][x],color=colors[df.outcome[x]], marker=".")
plt.arrow(df['x'][x],df['y'][x], df['endX'][x] - df['x'][x],
df['endY'][x]-df['y'][x], color=colors[df.outcome[x]],
head_width=1, head_length=1, length_includes_head=True)
# setting `length_includes_head` to `True` ensures that the arrow head is
# *part* of the line, not added on top
plt.title('Messi Pass Map vs Real Betis',color='white',size=20)
Result:
Note that you can also use plt.annotate for this, passing specific props to the parameter arrowprops. E.g.:
import pandas as pd
import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
df = pd.read_csv('https://raw.githubusercontent.com/mckayjohns/passmap/main/messibetis.csv')
...
# create a dict for the colors to avoid repetitive code
colors = {'Successful':'green', 'Unsuccessful':'red'}
for x in range(len(df['x'])):
plt.scatter(df['x'][x],df['y'][x],color=colors[df.outcome[x]], marker=".")
props= {'arrowstyle': '-|>,head_width=0.25,head_length=0.5',
'color': colors[df.outcome[x]]}
plt.annotate("", xy=(df['endX'][x],df['endY'][x]),
xytext=(df['x'][x],df['y'][x]), arrowprops=props)
plt.title('Messi Pass Map vs Real Betis',color='white',size=20)
Result (a bit sharper, if you ask me, but maybe some tweaking with params in plt.arrow can also achieve that):

Using python and matplotlib, fill between two lines not giving expected output

I am trying to plot a linear line with associated error.
I calculated values for slope (a) and intercepts (b). In addition, I calculated the error associated with these values. So I drew the line given by the typical formula below.
y=ax+b
However, in addition to the line, I also want to draw the associated error. I came up with the idea to draw the lines associated with these formulas and color the space between the lines gray.
y=(a+a_sd)x+(b+b_sd)
y=(a-a_sd)x+(b-b_sd)
Uisng the following piece of code, I am able to color part of the surface between the lines, but not the whole span (see included output).
I think this may be due to the fact that "distance" is not sorted, and fill_between is using distance[0] and distance[-1] as begin and end for the span, respectively.
As always, any help would be highly appreciated!
import matplotlib.pyplot as plt
distance=[0.35645334340084989, 0.55406894241607718, 0.10201413273193734, 0.13401365724625941, 0.71918808865838735, 0.14151335417722818]
time=[2.4004984846346171, 2.4909766335028447, 1.9852064018125195, 1.9083156734132103, 2.6380396934372863, 1.9114505780323543]
time_SD=[0.062393810960652669, 0.056945715242838917, 0.073960838867327183, 0.084111239062664475, 0.026912957190265499, 0.08595664694840538]
distance_SD=[0.035160608598240162, 0.032976715460514235, 0.02782911002465227, 0.035465701695038584, 0.043009444687382707, 0.038387585107200854]
a=1.17887019041
b=1.83339229489
a_sd=0.159771527859
b_sd=0.0762509747218
plt.errorbar(distance,time,yerr=time_SD, xerr=distance_SD, linestyle="None")
abline_values = [(a)*i + (b) for i in distance]
abline_values_plus = [(a+a_sd)*i + (b+b_sd) for i in distance]
abline_values_minus = [(a-a_sd)*i + (b-b_sd) for i in distance]
plt.plot(distance, abline_values,"r")
plt.fill_between(distance,abline_values_minus,abline_values_plus,facecolor='lightgrey', interpolate=True, edgecolors="None")
leg = plt.legend(loc="lower right", frameon=False, handlelength=0, handletextpad=0)
for item in leg.legendHandles:
item.set_visible(False)
plt.show()
In order to use pyplot.fill_between() the list to plot the horizontal coordinate should be sorted. Using an unsorted list of x values is possible, but can lead to undesired results.
Sorting a list can be done using sorted(list).
import matplotlib.pyplot as plt
distance=[0.35645334340084989, 0.55406894241607718, 0.10201413273193734, 0.13401365724625941, 0.71918808865838735, 0.14151335417722818]
time=[2.4004984846346171, 2.4909766335028447, 1.9852064018125195, 1.9083156734132103, 2.6380396934372863, 1.9114505780323543]
time_SD=[0.062393810960652669, 0.056945715242838917, 0.073960838867327183, 0.084111239062664475, 0.026912957190265499, 0.08595664694840538]
distance_SD=[0.035160608598240162, 0.032976715460514235, 0.02782911002465227, 0.035465701695038584, 0.043009444687382707, 0.038387585107200854]
a=1.17887019041
b=1.83339229489
a_sd=0.159771527859
b_sd=0.0762509747218
distance_sorted = sorted(distance)
plt.errorbar(distance,time,yerr=time_SD, xerr=distance_SD, linestyle="None")
abline_values = [(a)*i + (b) for i in distance_sorted]
abline_values_plus = [(a+a_sd)*i + (b+b_sd) for i in distance_sorted]
abline_values_minus = [(a-a_sd)*i + (b-b_sd) for i in distance_sorted]
plt.plot(distance_sorted, abline_values,"r")
plt.fill_between(distance_sorted,abline_values_minus,abline_values_plus, facecolor='lightgrey', edgecolors="None")
plt.show()
The documentation does not mention the requirement of x values being sorted. The reason is probably that fill_between actually works even with unsorted lists, just not the way one might expect. Maybe the following animation gives a more intuitive understanding on the issue:
You are right fill_between seems to expect the values to be sorted. The documentation is not clear about this behaviour though. The following example however shows the same effect:
import matplotlib.pyplot as plt
from numpy import random, array
#x = random.randn(20) #does not work
x = array(sorted(random.randn(20))) #works
a = 2
d = .5
y_h = x*(a+d)
y_l = x*(a-d)
plt.fill_between(x,y_h, y_l)
plt.show()
As a workaround just sort your values before calculating your errorlines using sorted.

Matplotlib linestyle inconsistent dashes

I am plotting just a simple scatterplot with MPL 1.4.0. I want to control the number of dashes on the figures I am plotting because currently even though I set a linestyle, the dashes are too close to each other so it doesn't look like a properly dashed line.
#load cdeax,cdeay,gsix,gsiy,reich all are arrays of shape (380,)
figfit = plt.figure(); axfit = figfit.gca()
axfit.plot(cdeax,np.log(cdeay),'ko', alpha=.5); axfit.plot(gsix,np.log(gsiy), 'kx')
axfit.plot(cdeax,cdeafit,'k-'); axfit.plot(gsix,gsifit,'k:')
longevityregplot[1].plot(gsix,np.log(reich_l),'k-.')
#load cdeax,cdeay,gsix,gsiy,reich all are arrays of shape (380,)
figfit = plt.figure(); axfit = figfit.gca()
axfit.plot(cdeax,np.log(cdeay),'ko', alpha=.5); axfit.plot(gsix,np.log(gsiy), 'kx')
axfit.plot(cdeax,cdeafit,'k-',dashes = [10,10]); axfit.plot(gsix,gsifit,'k:',dashes=[10,10])
longevityregplot[1].plot(gsix,np.log(reich_l),'k-.')
However the above is what I get. Rather than a uniformly-dashed line, the lines get dashed at the ends to varying degrees but no matter what values I use for dashes, the dashing is never uniform.
I'm afraid I really don't know what the problem is here... Any ideas?
I have pasted the arrays I am using here: http://pastebin.com/rJ5Jjfmm
You should be able to just copy/paste them to your IDE for the above code to run.
Cheers!
EDIT:
Just with the single line plotted:
axfit.plot(cdeax,cdeafit,'k-',dashes = [10,10]);
EDIT2: pastebin link changed to include all data
EDIT3: Histogram of point density along the x axis:
I think what #cphlewis said is correct, you may have some x-axis backtracking. If I sort everything it looks ok to me (did my own fitting since I still don't see the fits on pastebin)
# import your data here
import math
figfit = plt.figure(); axfit = figfit.gca()
cdea = zip(cdeax,cdeay)
cdea = np.array(sorted(cdea, key = lambda x: x[0]))
gsi = zip(gsix,gsiy)
gsi = np.array(sorted(gsi, key = lambda x: x[0]))
cdeafit2 = np.polyfit(cdea[:,0],cdea[:,1],1)
gsifit2 = np.polyfit([x[0] for x in gsi],[math.log(x[1]) for x in gsi],1)
cdeafit = [x*cdeafit2[0] + cdeafit2[1] for x in cdea[:,0]]
gsifit = [math.exp(y) for y in [x*gsifit2[0] + gsifit2[1] for x in gsi[:,0]]]
axfit.plot(cdea[:,0],cdea[:,1],'ko', alpha=.5); axfit.plot(gsi[:,0],gsi[:,1], 'kx')
axfit.plot(cdea[:,0],cdeafit,'k-',dashes = [10,10]); axfit.plot(gsi[:,0],gsifit,'k:',dashes=[10,10])
#longevityregplot[1].plot(gsix,np.log(reich_l),'k-.') # not sure what this is
axfit.set_yscale('log')
plt.show()

How to chart times (mm:ss) in Matplotlib (formatting output values)

I'm plotting line graphs in Python Matplotlib of times which I get in mm:ss.tttt format.
I've already converted the values back to 10thousanths of a second and I can create a nice plot. But that means the Y axis show a value of "832323" instead of easier to read "1:23.2323".
Is there some way I can format the output values appropriately?
I worked this out myself shortly after I wrote this. Use Matplotlibs's axis, set_major_formatter() function.
I wrote a quick formatting function that would take a value in 10-thousandths of a second and turn it back into mm:ss.tttt. And then passed this formatter to the axis definition.
Import the 'ticker' module along with the plotting stuff:
import matplotlib.pyplot as plt
from matplotlib import ticker
Create your own value formatting function:
def format_10Kth_time(time, pos=None):
mins = time // (10000 * 60)
secs = (time - (mins * 10000 * 60)) // (10000)
fracsecs = time % 10000
return "%d:%02d.%d" % (mins, secs, fracsecs)
Then in my plot code I did this to alter the Y axis formatting:
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(format_10Kth_time))
plt.plot(...)
plt.show()

matplotlib; fractional powers of ten; scientific notation

I deal with simulation data and have been using matplotlib a lot lately and have been encountering something (a bug?) that's annoying.
I have been allowing matplotlib to automatically set the tick labels and their type (scientific, etc) and with some data I get weird scientific ticker labels.
In searching for a resolution to this I found that you can call set_powerlimits((n,m)) to set the the limits of data that will be displayed using scientific notation. But I have encountered this problem (if I remember correctly) with data spanning several orders of magnitude, also my data is all over the place so I need a programmatic solution of some sort, not a hard set solution.
see: http://matplotlib.org/api/ticker_api.html
Below I have included example data, code, and a screenshot.
#! /usr/bin/env python
from matplotlib import pyplot as plt
data = [
[1.83186088e-08,0.03275],
[1.07139009e-07,0.03275],
[2.06376627e-07,0.03275],
[3.03918517e-07,0.03275],
[4.06032883e-07,0.03275],
[5.01194017e-07,0.03275],
[6.02195723e-07,0.03275],
[7.03536925e-07,0.03275],
[8.04625154e-07,0.03275],
[9.06401951e-07,0.03275],
[1.00041895e-06,0.03275],
[1.10230745e-06,0.03275],
[1.2042525e-06,0.03275],
[1.30647822e-06,0.03275],
[1.40109887e-06,0.03275],
[1.50380097e-06,0.03275],
[1.60683242e-06,0.03275],
[1.70208505e-06,0.03275],
[1.80545692e-06,0.03275],
[1.90090648e-06,0.03275],
[2.00453092e-06,0.03275],
[2.10018627e-06,0.03275],
[2.20401747e-06,0.03275],
[2.30009359e-06,0.03275],
[2.4043033e-06,0.03275],
[2.50066449e-06,0.03275],
[2.60513728e-06,0.03275],
[2.70165405e-06,0.03275],
[2.80635938e-06,0.03275],
[2.90331342e-06,0.03275],
[3.00021199e-06,0.03275],
[3.10546819e-06,0.03275],
[3.20257899e-06,0.03275],
[3.30032923e-06,0.0327499999],
[3.40612833e-06,0.0327499999],
[3.50401732e-06,0.0327499997],
[3.60153069e-06,0.0327499996],
[3.70700708e-06,0.0327499993],
[3.80456907e-06,0.0327499988],
[3.90259984e-06,0.0327499982],
[4.00084149e-06,0.0327499973],
[4.10700266e-06,0.0327499959],
[4.2047462e-06,0.0327499942],
[4.30209468e-06,0.0327499918],
[4.40018204e-06,0.0327499886],
[4.50712875e-06,0.032749984],
[4.60630591e-06,0.0327499785],
[4.70519881e-06,0.0327499715],
[4.80398305e-06,0.0327499628],
[4.90251297e-06,0.0327499521],
[5.00182752e-06,0.032749939],
[5.10157551e-06,0.0327499232],
[5.20157575e-06,0.0327499043],
[5.30145192e-06,0.0327498822],
[5.40127044e-06,0.0327498565],
[5.500537e-06,0.0327498272],
[5.60773155e-06,0.0327497911],
[5.70660709e-06,0.0327497534],
[5.80610521e-06,0.0327497112],
[5.90651786e-06,0.0327496642],
[6.00749437e-06,0.0327496124],
[6.10822094e-06,0.0327495566],
[6.20042255e-06,0.0327495018],
[6.30049028e-06,0.0327494386],
[6.40035803e-06,0.0327493715],
[6.50035477e-06,0.0327493004],
[6.60056805e-06,0.0327492251],
[6.70029936e-06,0.0327491461],
[6.80054193e-06,0.0327490625],
[6.90130872e-06,0.0327489743],
[7.00202598e-06,0.0327488818],
[7.10217348e-06,0.0327487855],
[7.20243015e-06,0.0327486847],
[7.30199609e-06,0.0327485801],
[7.40193254e-06,0.0327484707],
[7.50188319e-06,0.0327483567],
[7.60306205e-06,0.0327482367],
[7.70357184e-06,0.0327481129],
[7.80343389e-06,0.0327479853],
[7.90330165e-06,0.0327478532],
[8.00348513e-06,0.0327477162],
[8.10167039e-06,0.0327475777],
[8.206328e-06,0.0327474253],
[8.3020567e-06,0.0327472819],
[8.40527826e-06,0.0327471228],
[8.50095898e-06,0.0327469714],
[8.60536828e-06,0.0327468019],
[8.70106059e-06,0.0327466426],
[8.80396558e-06,0.032746467],
[8.90727378e-06,0.0327462865],
[9.00225164e-06,0.0327461166],
[9.10359892e-06,0.0327459311],
[9.20470894e-06,0.0327457418],
[9.30582982e-06,0.0327455481],
[9.40750123e-06,0.0327453488],
[9.50134495e-06,0.0327451608],
[9.60358199e-06,0.0327449513],
[9.70705637e-06,0.0327447344],
[9.80377546e-06,0.0327445269],
[9.90091941e-06,0.032744314],
]
times=[]
vals=[]
for elem in data:
times.append(elem[0])
vals.append(elem[1])
plt.plot(times,vals)
plt.show()
screen_shot
You might try using the Engineering Formatter:
times=[]
vals=[]
for elem in data:
times.append(elem[0])
vals.append(elem[1])
plt.plot(times,vals)
plt.show()
formatter = matplotlib.ticker.EngFormatter(unit='S', places=3)
formatter.ENG_PREFIXES[-6] = 'u'
plt.axes().yaxis.set_major_formatter(formatter)
Which will look like this:
This is a known problem. You'd be better to analyse the data manually for its limits, like you have done in the screen shot, and use ax.set_ylim(min, max) yourself after plotting. You can also turn off the offset with:
import matplotlib.ticker as mticker
# plot some stuff
# ...
y_formatter = mticker.ScalarFormatter(useOffset=False)
ax.yaxis.set_major_formatter(y_formatter)
I think that you best option is to use logaritmic axis, but if you need to create the graphic with linear axis, you must set the power limits yourself. You can compute the power limits using math.log10:
import math
from matplotlib import ticker
# Compute the span of the data
pow_min = math.floor(math.log10(min(vals)))
pow_max = math.ceil(math.log10(max(vals)))
# Create a scalar formatter without offset, in order to have
# the right exponent over the yaxis
fmt = ticker.ScalarFormatter(useOffset=False)
fmt.set_powerlimits((pow_min, pow_max))
fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
ax1.plot(times, vals)
ax1.yaxis.set_major_formatter(fmt) # Set the formatter

Categories

Resources