How can I force pyplot to show axis limits - python

My x and y axis normally range from 0 to 300 and 0 to 60, respectively.
I want to show only values from 5 <= x <= 300, however, so I do
ax.set_xlim(left=5)
after which the graph does indeed start at 5, but there is nothing to indicate that. My first tick on the x-axis is at 50, and then 100, 150... the y-axis has ticks labeled 0, 20, 40, 60, which will easily mislead the viewer into thinking that the lower limit of 0 for the y-axis also represents the lower limit of 0 for the x-axis.
How can I force pyplot to display an extra tick at x=5 so that the viewer is told explicitly that both axes do not have the same lower bound of 0?

You can use xticks to set the ticks of the x axis.
This is an ipython session:
In [18]: l = [random.randint(0, 10) for i in range(300)]
In [19]: plot(l)
Out[19]: [<matplotlib.lines.Line2D at 0x9241f60>]
In [20]: plt.xlim(xmin=5) # I set to start at 5. No label is draw
Out[20]: (5, 300.0)
In [21]: plt.xticks(arange(5, 301, 50)) # this makes the first xtick at left to be 5
# note the max range is 301, otherwise you will never
# get 300 even if you set the appropriate step
Note that now, at the right side of the xaxis, there is no label. Last label is 255 (the same problem you had at the left side). You can get this label modifying the step of the arange in order to max - min / step to be (or be very close to) an integer value (the number of ticks).
This makes it (although the decimal numbers are ugly):
In [38]: plt.xticks(arange(5, 301, 29.5))

Related

Making a plot that has an x-axis that has neg. values representing hours prior to the start of the event, then pos. values representing hours after

I'm not sure if my question makes sense, so apologies on that.
Basically, I am plotting some data that is ~100 hours long. On the x-axis, I want to make it so that the range goes from -50 to 50, with -1 to -50 representing the 50 hours prior to the event, 0 being in the middle representing the start of the event, and 1-50 representing the 50 hours following the start of the event. Basically, there are 107 hours worth of data and I want to try to divide the hours between each side of 0.
I initially tried using the plt.xlim() function, but that just shifts all the data to one side of the plot.
I've tried using plt.xticks and then labeling the x ticks with "-50", "-25", "0", "25", and "50", and while that somewhat works, it still does not look great. I'll add an example figure of doing it this way to add better clarification of what I'm trying to do, as well as the original plot:
Original plot:
Goal:
edit
Here's my code for plotting it:
fig_1 = plt.figure(figsize=(30,20))
file.plot(x='start',y='value')
plt.xlabel('hour')
plt.ylabel('value')
plt.xticks([0,25,50,75,100],["-50","-25","0","25","50"])
You could obtain a zero mean for the ticks using df.sub(df.mean() or np.mean().
Alternative 1:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# generate data
left = np.linspace(10,60, 54)
right = np.linspace(60,10, 53)
noise_left = np.random.normal(0, 1, 54)
noise_right = np.random.normal(0, 1, 53)
all = np.append(left + noise_left, right + noise_right)
file = pd.DataFrame({'start':np.linspace(1,107,107),'value':all})
# subtract mean
file['start'] = file['start'].sub(file['start'].mean())
fig_1 = plt.figure(figsize=(30,20))
file.plot(x='start',y='value')
plt.xlabel('hour')
plt.ylabel('value')
Output:
Alternative 2:
# subtract the mean from start to obtain zero mean ticks
ticks = file['start'] - np.mean(file['start'])
# set distance between each tick to 10
plt.xticks(file['start'][::10], ticks[::10],rotation=45)

Pyplot - show x-axis labels according to y-axis value

I have 1min 20s long video record of 23.813 FPS. More precisely, I have 1923 frames in which I've been scanning desired features. I've detected some specific behavior via neural network and using chosen metric I calculated a value for each frame.
So, now, I have X-Y values to plot a graph:
X: time (each step of size 0,041993869s)
Y: a value measured by neural network
In the default state, the plot looks like this:
So, I've tried to limit the number of bins in the faith that the bins will be spread over all my values. But they are not. As you can see, only first fifteen x-values are rendered:
pyplot.locator_params(axis='x', nbins=15)
But neither one is desired state. The desired state should render the labels of such x-bins with y-value higher than e.g. 1.2. So, it should look like this:
Is possible to achieve such result?
Code:
# draw plot
from pandas import read_csv
from matplotlib import pyplot
test_video_fps = 23.813
df = read_csv('/path/to/csv/file/file.csv', header=None)
df.columns = ['anomaly']
df['time'] = [round((i + 1) / test_video_fps, 2) for i in range(df.shape[0])]
axes = df.plot.bar(x='time', y='anomaly', rot='0')
# pyplot.locator_params(axis='x', nbins=15)
# axes.get_xaxis().set_visible(False)
fig = pyplot.gcf()
fig.set_size_inches(16, 10)
fig.savefig('/path/to/output/plot.png', dpi=100)
# pyplot.show()
Example:
Simple example with a subset of original data.
0.379799
0.383786
0.345488
0.433286
0.469474
0.431993
0.474253
0.418843
0.491070
0.447778
0.384890
0.410994
0.898229
1.872756
2.907009
3.691382
4.685749
4.599612
3.738768
8.043357
7.660785
2.311198
1.956096
2.877326
3.467511
3.896339
4.250552
6.485533
7.452986
7.103761
2.684189
2.516134
1.512196
1.435303
0.852047
0.842551
0.957888
0.983085
0.990608
1.046679
1.082040
1.119655
0.962391
1.263255
1.371034
1.652812
2.160451
2.646674
1.460051
1.163745
0.938030
0.862976
0.734119
0.567076
0.417270
Desired plot:
Your question has become a two-part problem, but it is interesting enough that I will answer both.
I will answer this in Matplotlib object oriented notation with numpy data rather than pandas. This will make things easier to explain, and can be easily generalized to pandas.
I will assume that you have the following two data arrays:
dt = 0.041993869
x = np.arange(0.0, 15 * dt, dt)
y = np.array([1., 1.1, 1.3, 7.6, 2.4, 0.8, 0.7, 0.8, 1.0, 1.5, 10.0, 4.5, 3.2, 0.9, 0.7])
Part 1: Identifying the locations where you want labels
The data can be masked to get the locations of the peaks:
mask = y > 1.2
Consecutive peaks can be easily eliminated by computing the diff. A diff of a boolean mask will be True at the locations where the mask changes sense. You will then have to take every other element to get the locations where it goes from False to True. The following code will capture all the corner cases where you start with a peak or end in the middle of a peak:
d = np.flatnonzero(np.diff(mask))
if mask[d[0]]: # First diff is end of peak: True to False
d = np.concatenate(([0], d[1::2] + 1))
else:
d = d[::2] + 1
d is now an array indices into x and y that represent the first element of each run of peaks. You can get the last element by swapping the indices [1::2] and [::2] in the if-else statement, and removing the + 1 in both cases.
The locations of the labels are now simply x[d].
Part 2: Locating and formatting the labels
For this part, you will need to access Matplotlib's object oriented API via the Axes object you are plotting on. You already have this in the pandas form, making the transfer easy. Here is a sample in raw Matplotlib:
fig, axes = plt.subplots()
axes.plot(x, y)
Now use the ticker API to easily set the locations and labels. You actually set the locations directly (not with a Locator) since you have a very fixed list of ticks:
axes.set_xticks(x[d])
axes.xaxis.set_major_formatter(ticker.StrMethodFormatter('{x:0.01g}s'))
For the sample data show here, you get

How to plot x axis with month?

I want to plot a dataframe df1. The x axis contains month and the y-axis counts. My x axis is just a black bar because of too many values. I tried a lot but nothing works. Is there a simple way to plot just every 5th date for example?
I think the problem is that the month are date times and I can't build the minimum and maximum?

df1 = pd.read_csv('hello.csv')
plt.plot(df1['a'],df1['b'])
plt.show()
My data frame df1 is:
a b
2006-06,211.0
2006-07,212.41176470588235
2006-08,238.26315789473685
2006-09,239.9375
2006-10,266.1111111111111
2006-11,265.22222222222223
2006-12,283.3333333333333
2007-01,290.0
2007-02,307.5
2007-03,325.0
2007-04,343.05882352941177
2007-05,340.42105263157896
2007-06,353.75
2007-07,348.5
2007-08,359.6111111111111
2007-09,346.5625
2007-10,365.57894736842104
2007-11,358.7647058823529
2007-12,372.8333333333333
2008-01,381.8888888888889
2008-02,396.25
2008-03,422.94117647058823
2008-04,428.6666666666667
2008-05,418.5882352941176
2008-06,433.0
2008-07,440.4736842105263
2008-08,470.375
2008-09,481.3529411764706
2008-10,489.44444444444446
2008-11,485.125
2008-12,514.5714285714286
2009-01,515.375
2009-02,535.3125
2009-03,555.0555555555555
2009-04,557.7222222222222
2009-05,533.375
2009-06,567.7222222222222
2009-07,575.1111111111111
2009-08,582.5294117647059
2009-09,569.1666666666666
2009-10,611.1176470588235
2009-11,591.6470588235294
2009-12,634.6428571428571
2010-01,647.9375
2010-02,655.375
2010-03,672.7368421052631
2010-04,678.5882352941177
2010-05,667.8235294117648
2010-06,689.5
2010-07,657.4117647058823
2010-08,679.1111111111111
2010-09,661.2222222222222
2010-10,685.75
2010-11,676.5555555555555
2010-12,692.3571428571429
2011-01,691.9411764705883
2011-02,697.4375
2011-03,720.5263157894736
2011-04,723.5
2011-05,694.7222222222222
2011-06,705.7222222222222
2011-07,677.9375
2011-08,693.7368421052631
2011-09,671.2352941176471
2011-10,685.1176470588235
2011-11,669.9444444444445
2011-12,708.3076923076923
2012-01,674.9444444444445
2012-04,748.0
2012-05,811.0526315789474
2012-06,863.6875
2012-07,843.1666666666666
2012-08,885.5
2012-09,857.75
2012-10,876.8421052631579
2012-11,863.1764705882352
2012-12,917.6666666666666
2013-01,933.4444444444445
2013-03,975.0625
2013-04,994.0
2013-05,1019.6666666666666
2013-06,1063.625
2013-07,1057.8947368421052
2013-08,1102.1764705882354
2013-09,1046.4117647058824
2013-10,1153.1052631578948
2013-11,1107.25
2013-12,1155.3076923076924
2014-01,1191.3529411764705
2014-02,1240.5
2014-03,1272.764705882353
2014-04,1316.9444444444443
2014-05,1310.3529411764705
2014-06,1349.4117647058824
2014-07,1403.8947368421052
2014-08,1412.375
2014-09,1409.0555555555557
2014-10,1472.9444444444443
2014-11,1421.8125
2014-12,1473.2142857142858
2015-01,1476.9375
2015-02,1495.75
2015-03,1546.111111111111
2015-04,1563.7777777777778
2015-05,1499.0
2015-06,1583.111111111111
2015-07,1594.2222222222222
2015-08,1618.1176470588234
2015-09,1595.8333333333333
2015-10,1706.3529411764705
2015-11,1652.8823529411766
2015-12,1691.0714285714287
2016-01,1717.125
2016-02,1746.7058823529412
2016-03,1945.4736842105262
2016-04,2329.375
2016-05,2408.4444444444443
2016-06,2404.222222222222
2016-07,2184.4375
2016-08,2160.6315789473683
2016-09,2402.176470588235
2016-10,2481.823529411765
2016-11,2372.0
2016-12,2153.0
2017-01,2145.777777777778
2017-02,2213.5625
2017-03,2309.6111111111113
2017-04,2295.8125
2017-05,2116.7894736842104
2017-06,2093.8823529411766
In order to show every nth value, what you can do is to set the x-ticks value.
x = df1['a']
plt.xticks(np.arange(0, len(x), 1.0)) #you can replace 1 with the step interval
Or else, what you can do to further improve the visibility and keep the accuracy is to rotate the x axis inputs by modifying the x-ticks with a rotation variable.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 4, 9, 6]
labels = ['Frogs', 'Hogs', 'Bogs', 'Slogs']
plt.plot(x, y)
# You can specify a rotation for the tick labels in degrees or with keywords.
plt.xticks(x, labels, rotation='vertical') # You can input an integer too.
# Pad margins so that markers don't get clipped by the axes
plt.margins(0.2)
# Tweak spacing to prevent clipping of tick-labels
plt.subplots_adjust(bottom=0.15)
plt.show()

How can I make a signum-like plot in matplotlib?

I'm trying to make a signum-like plot with matplotlib based on this:
The x axis would be an interval in seconds: 0-60
The plot would be 1 if the x is between the starts and the stops.
Elsewhere it should be 0.
label sec1 sec2
start 5.063 8.293
time 0.184 1.033
stop 5.247 9.326
So if X is
0 < X < 5.063 --> 0
5.063 =< X =< 5,247 --> 1
5.247 < X < 8.293 --> 0
8.293 =< X =< 9.326--> 1
9.326 < X < 60 --> 0
There would be more sections, not just two, and the line should be continous.
Maybe its an easy question, but I'm fairly new to python and matplotlib.
I tried to google it, but all the answers is about the sin plot instead of sign plot. I'm not even sure what to google to find correct answer.
Any suggestions?
Matplotlib plots points, not functions. You have to provide
the correct y points. You could do it like this:
import numpy as np
import matplotlib.pyplot as plt
starts = np.arange(1, 55, 4)
stops = starts + 1
x = np.linspace(0, 60, 1000)
y = np.zeros_like(x)
for start, stop in zip(starts, stops):
mask = np.logical_and(x > start, x <= stop)
y[mask] = 1
plt.plot(x, y)
plt.ylim(0, 1.1)
plt.show()
Result:
Edit: second solution with real rectangular pulses and less points
This is a better solution assuming the start and stops do not overlap:
import numpy as np
import matplotlib.pyplot as plt
starts = np.arange(1, 55, 4)
stops = starts + 1
x = np.repeat(np.sort(np.append(starts, stops)), 2)
y = np.zeros_like(x)
y[1::4] = 1
y[2::4] = 1
plt.plot(x, y)
For the x values we join starts and stops together with np.append, sort them to get them in chronological order with np.sort and repeat each value twice with np.repeat.
Then we set the correct values to one (the order is (0, 1, 1, 0) so we set every fourth value starting from the second value and every fourth value starting from the third value to 1.
The solution of #MaxNoe is very instructive and meaningful (and I suggest using that solution, already due to its proper treatment of overlapping intervals). I just want to add that strictly speaking that solution doesn't give you rectangular pulses, but a series of broken lines which are very steep (but not vertical) at crossings.
So, for the sake of completeness, one way to generate your rectangular pulses (assuming that 1. your start and end times are stored in the arrays starts and stops, respectively, and 2. the intervals don't overlap!) is:
x,y=zip(*[(0,0)]+[item for start,stop in zip(starts,stops) for item in [(start,0),(start,1),(stop,1),(stop,0)]]+[(60,0)])
This will take every start-stop pair, duplicate them with a corresponding value of 1 or 0 in order to obtain rectangular pulses like (start,0) -- (start,1) -- (stop,1) -- (stop,0), then adds starting and concluding data points, then assigns the constructed set of points to two arrays x and y. Plotting is done as usual, using plt.plot(x,y).
Edit: here's a bit more verbose implementation of the same algorithm:
tmplist=[]
for start, stop in zip(starts, stops):
tmplist.extend([(start,0),(start,1),(stop,1),(stop,0)])
tmplist=[(0,0)] + tmplist + [(60,0)]
x,y=zip(*tmplist)
plt.plot(x,y)

making binned boxplot in matplotlib with numpy and scipy in Python

I have a 2-d array containing pairs of values and I'd like to make a boxplot of the y-values by different bins of the x-values. I.e. if the array is:
my_array = array([[1, 40.5], [4.5, 60], ...]])
then I'd like to bin my_array[:, 0] and then for each of the bins, produce a boxplot of the corresponding my_array[:, 1] values that fall into each box. So in the end I want the plot to contain number of bins-many box plots.
I tried the following:
min_x = min(my_array[:, 0])
max_x = max(my_array[:, 1])
num_bins = 3
bins = linspace(min_x, max_x, num_bins)
elts_to_bins = digitize(my_array[:, 0], bins)
However, this gives me values in elts_to_bins that range from 1 to 3. I thought I should get 0-based indices for the bins, and I only wanted 3 bins. I'm assuming this is due to some trickyness with how bins are represented in linspace vs. digitize.
What is the easiest way to achieve this? I want num_bins-many equally spaced bins, with the first bin containing the lower half of the data and the upper bin containing the upper half... i.e., I want each data point to fall into some bin, so that I can make a boxplot.
thanks.
You're getting the 3rd bin for the maximum value in the array (I'm assuming you have a typo there, and max_x should be "max(my_array[:,0])" instead of "max(my_array[:,1])"). You can avoid this by adding 1 (or any positive number) to the last bin.
Also, if I'm understanding you correctly, you want to bin one variable by another, so my example below shows that. If you're using recarrays (which are much slower) there are also several functions in matplotlib.mlab (e.g. mlab.rec_groupby, etc) that do this sort of thing.
Anyway, in the end, you might have something like this (to bin x by the values in y, assuming x and y are the same length)
def bin_by(x, y, nbins=30):
"""
Bin x by y.
Returns the binned "x" values and the left edges of the bins
"""
bins = np.linspace(y.min(), y.max(), nbins+1)
# To avoid extra bin for the max value
bins[-1] += 1
indicies = np.digitize(y, bins)
output = []
for i in xrange(1, len(bins)):
output.append(x[indicies==i])
# Just return the left edges of the bins
bins = bins[:-1]
return output, bins
As a quick example:
In [3]: x = np.random.random((100, 2))
In [4]: binned_values, bins = bin_by(x[:,0], x[:,1], 2)
In [5]: binned_values
Out[5]:
[array([ 0.59649575, 0.07082605, 0.7191498 , 0.4026375 , 0.06611863,
0.01473529, 0.45487203, 0.39942696, 0.02342408, 0.04669615,
0.58294003, 0.59510434, 0.76255006, 0.76685052, 0.26108928,
0.7640156 , 0.01771553, 0.38212975, 0.74417014, 0.38217517,
0.73909022, 0.21068663, 0.9103707 , 0.83556636, 0.34277006,
0.38007865, 0.18697416, 0.64370535, 0.68292336, 0.26142583,
0.50457354, 0.63071319, 0.87525221, 0.86509534, 0.96382375,
0.57556343, 0.55860405, 0.36392931, 0.93638048, 0.66889756,
0.46140831, 0.01675165, 0.15401495, 0.10813141, 0.03876953,
0.65967335, 0.86803192, 0.94835281, 0.44950182]),
array([ 0.9249993 , 0.02682873, 0.89439141, 0.26415792, 0.42771144,
0.12292614, 0.44790357, 0.64692616, 0.14871052, 0.55611472,
0.72340179, 0.55335053, 0.07967047, 0.95725514, 0.49737279,
0.99213794, 0.7604765 , 0.56719713, 0.77828727, 0.77046566,
0.15060196, 0.39199123, 0.78904624, 0.59974575, 0.6965413 ,
0.52664095, 0.28629324, 0.21838664, 0.47305751, 0.3544522 ,
0.57704906, 0.1023201 , 0.76861237, 0.88862359, 0.29310836,
0.22079126, 0.84966201, 0.9376939 , 0.95449215, 0.10856864,
0.86655289, 0.57835533, 0.32831162, 0.1673871 , 0.55742108,
0.02436965, 0.45261232, 0.31552715, 0.56666458, 0.24757898,
0.8674747 ])]
Hope that helps a bit!
Numpy has a dedicated function for creating histograms the way you need to:
histogram(a, bins=10, range=None, normed=False, weights=None, new=None)
which you can use like:
(hist_data, bin_edges) = histogram(my_array[:,0], weights=my_array[:,1])
The key point here is to use the weights argument: each value a[i] will contribute weights[i] to the histogram. Example:
a = [0, 1]
weights = [10, 2]
describes 10 points at x = 0 and 2 points at x = 1.
You can set the number of bins, or the bin limits, with the bins argument (see the official documentation for more details).
The histogram can then be plotted with something like:
bar(bin_edges[:-1], hist_data)
If you only need to do a histogram plot, the similar hist() function can directly plot the histogram:
hist(my_array[:,0], weights=my_array[:,1])

Categories

Resources