I'm not sure if my question makes sense, so apologies on that.
Basically, I am plotting some data that is ~100 hours long. On the x-axis, I want to make it so that the range goes from -50 to 50, with -1 to -50 representing the 50 hours prior to the event, 0 being in the middle representing the start of the event, and 1-50 representing the 50 hours following the start of the event. Basically, there are 107 hours worth of data and I want to try to divide the hours between each side of 0.
I initially tried using the plt.xlim() function, but that just shifts all the data to one side of the plot.
I've tried using plt.xticks and then labeling the x ticks with "-50", "-25", "0", "25", and "50", and while that somewhat works, it still does not look great. I'll add an example figure of doing it this way to add better clarification of what I'm trying to do, as well as the original plot:
Original plot:
Goal:
edit
Here's my code for plotting it:
fig_1 = plt.figure(figsize=(30,20))
file.plot(x='start',y='value')
plt.xlabel('hour')
plt.ylabel('value')
plt.xticks([0,25,50,75,100],["-50","-25","0","25","50"])
You could obtain a zero mean for the ticks using df.sub(df.mean() or np.mean().
Alternative 1:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# generate data
left = np.linspace(10,60, 54)
right = np.linspace(60,10, 53)
noise_left = np.random.normal(0, 1, 54)
noise_right = np.random.normal(0, 1, 53)
all = np.append(left + noise_left, right + noise_right)
file = pd.DataFrame({'start':np.linspace(1,107,107),'value':all})
# subtract mean
file['start'] = file['start'].sub(file['start'].mean())
fig_1 = plt.figure(figsize=(30,20))
file.plot(x='start',y='value')
plt.xlabel('hour')
plt.ylabel('value')
Output:
Alternative 2:
# subtract the mean from start to obtain zero mean ticks
ticks = file['start'] - np.mean(file['start'])
# set distance between each tick to 10
plt.xticks(file['start'][::10], ticks[::10],rotation=45)
I have a matrix of floats shaped (3000, 9).
Across 1 line, there is 1 ''simulation''.
Across columns, for a fixed line, there's the contents of the ''simulation''.
I want that for each simulation, the first 8 columns to be normalized to the sum of the 8 first columns.
That is, the first column's entry (for one fixed line) to become what was before, over the sum of the first 8 columns (for that same fixed line).
A trivial task, but I get from a nice, correct, graph (non-normalized), something totally unphysical when plotting with plt.scatter.
The last column of each line is what we are going to use for the x-axis to plot the first 8 columns (the y values).
So one line will represent 8 datapoints for 1 fixed value of x.
The non-normalized graph:
https://ibb.co/Msr8RVB
The normalized graph:
https://ibb.co/tJp7bZn
The datasets:
non-normalized: https://easyupload.io/oat9kq
My code:
import numpy as np
from matplotlib import pyplot as plt
non_norm = np.loadtxt("integration_results_3000samples_10_20_10_25_Wcm2_BenSimulationFromSlack.txt")
plt.figure()
for i in range(non_norm.shape[1]-1):
plt.scatter(non_norm[:, -1], non_norm[:, i], label="c_{}".format(i+47))
plt.xscale("log")
plt.savefig("non-norm_Ben3000samples.pdf", bbox_inches='tight')
norm = np.empty( (non_norm.shape[0], non_norm.shape[1]) )
norm[:, -1] = non_norm[:, -1]
for i in range(norm.shape[1]-1):
for j in range(norm.shape[0]):
norm[j, i] = np.true_divide(non_norm[j, i] , np.sum(non_norm[j, :-1]))
plt.figure()
for i in range(norm.shape[1]-1):
plt.scatter(norm[:, -1], norm[:, i], label="c_{}".format(i+47))
plt.xscale("log")
plt.savefig("norm_Ben3000samples.pdf", bbox_inches='tight')
Do you see what went wrong?
Thank you
When you're normalising a row that has just one value and 7 zeroes, the value becomes 1 and the rest of the row is 0? This is likely why your plot is messing up.
For example, the plot for the first column looks like this before and after normalization:
I have 1min 20s long video record of 23.813 FPS. More precisely, I have 1923 frames in which I've been scanning desired features. I've detected some specific behavior via neural network and using chosen metric I calculated a value for each frame.
So, now, I have X-Y values to plot a graph:
X: time (each step of size 0,041993869s)
Y: a value measured by neural network
In the default state, the plot looks like this:
So, I've tried to limit the number of bins in the faith that the bins will be spread over all my values. But they are not. As you can see, only first fifteen x-values are rendered:
pyplot.locator_params(axis='x', nbins=15)
But neither one is desired state. The desired state should render the labels of such x-bins with y-value higher than e.g. 1.2. So, it should look like this:
Is possible to achieve such result?
Code:
# draw plot
from pandas import read_csv
from matplotlib import pyplot
test_video_fps = 23.813
df = read_csv('/path/to/csv/file/file.csv', header=None)
df.columns = ['anomaly']
df['time'] = [round((i + 1) / test_video_fps, 2) for i in range(df.shape[0])]
axes = df.plot.bar(x='time', y='anomaly', rot='0')
# pyplot.locator_params(axis='x', nbins=15)
# axes.get_xaxis().set_visible(False)
fig = pyplot.gcf()
fig.set_size_inches(16, 10)
fig.savefig('/path/to/output/plot.png', dpi=100)
# pyplot.show()
Example:
Simple example with a subset of original data.
0.379799
0.383786
0.345488
0.433286
0.469474
0.431993
0.474253
0.418843
0.491070
0.447778
0.384890
0.410994
0.898229
1.872756
2.907009
3.691382
4.685749
4.599612
3.738768
8.043357
7.660785
2.311198
1.956096
2.877326
3.467511
3.896339
4.250552
6.485533
7.452986
7.103761
2.684189
2.516134
1.512196
1.435303
0.852047
0.842551
0.957888
0.983085
0.990608
1.046679
1.082040
1.119655
0.962391
1.263255
1.371034
1.652812
2.160451
2.646674
1.460051
1.163745
0.938030
0.862976
0.734119
0.567076
0.417270
Desired plot:
Your question has become a two-part problem, but it is interesting enough that I will answer both.
I will answer this in Matplotlib object oriented notation with numpy data rather than pandas. This will make things easier to explain, and can be easily generalized to pandas.
I will assume that you have the following two data arrays:
dt = 0.041993869
x = np.arange(0.0, 15 * dt, dt)
y = np.array([1., 1.1, 1.3, 7.6, 2.4, 0.8, 0.7, 0.8, 1.0, 1.5, 10.0, 4.5, 3.2, 0.9, 0.7])
Part 1: Identifying the locations where you want labels
The data can be masked to get the locations of the peaks:
mask = y > 1.2
Consecutive peaks can be easily eliminated by computing the diff. A diff of a boolean mask will be True at the locations where the mask changes sense. You will then have to take every other element to get the locations where it goes from False to True. The following code will capture all the corner cases where you start with a peak or end in the middle of a peak:
d = np.flatnonzero(np.diff(mask))
if mask[d[0]]: # First diff is end of peak: True to False
d = np.concatenate(([0], d[1::2] + 1))
else:
d = d[::2] + 1
d is now an array indices into x and y that represent the first element of each run of peaks. You can get the last element by swapping the indices [1::2] and [::2] in the if-else statement, and removing the + 1 in both cases.
The locations of the labels are now simply x[d].
Part 2: Locating and formatting the labels
For this part, you will need to access Matplotlib's object oriented API via the Axes object you are plotting on. You already have this in the pandas form, making the transfer easy. Here is a sample in raw Matplotlib:
fig, axes = plt.subplots()
axes.plot(x, y)
Now use the ticker API to easily set the locations and labels. You actually set the locations directly (not with a Locator) since you have a very fixed list of ticks:
axes.set_xticks(x[d])
axes.xaxis.set_major_formatter(ticker.StrMethodFormatter('{x:0.01g}s'))
For the sample data show here, you get
I have this graph displaying the following:
plt.plot(valueX, scoreList)
plt.xlabel("Score number") # Text for X-Axis
plt.ylabel("Score") # Text for Y-Axis
plt.title("Scores for the topic "+progressDisplay.topicName)
plt.show()
valueX = [1, 2, 3, 4] and
scoreList = [5, 0, 0, 2]
I want the scale to go up in 1's, no matter what values are in 'scoreList'. Currently get my x-axis going up in .5 instead of 1s.
How do I set it so it goes up only in 1?
Just set the xticks yourself.
plt.xticks([1,2,3,4])
or
plt.xticks(valueX)
Since the range functions happens to work with integers you could use that instead:
plt.xticks(range(1, 5))
Or be even more dynamic and calculate it from the data:
plt.xticks(range(min(valueX), max(valueX)+1))
Below is my favorite way to set the scale of axes:
plt.xlim(-0.02, 0.05)
plt.ylim(-0.04, 0.04)
Hey it looks like you need to set the x axis scale.
Try
matplotlib.axes.Axes.set_xscale(1, 'linear')
Here's the documentation for that function
I have a 2-d array containing pairs of values and I'd like to make a boxplot of the y-values by different bins of the x-values. I.e. if the array is:
my_array = array([[1, 40.5], [4.5, 60], ...]])
then I'd like to bin my_array[:, 0] and then for each of the bins, produce a boxplot of the corresponding my_array[:, 1] values that fall into each box. So in the end I want the plot to contain number of bins-many box plots.
I tried the following:
min_x = min(my_array[:, 0])
max_x = max(my_array[:, 1])
num_bins = 3
bins = linspace(min_x, max_x, num_bins)
elts_to_bins = digitize(my_array[:, 0], bins)
However, this gives me values in elts_to_bins that range from 1 to 3. I thought I should get 0-based indices for the bins, and I only wanted 3 bins. I'm assuming this is due to some trickyness with how bins are represented in linspace vs. digitize.
What is the easiest way to achieve this? I want num_bins-many equally spaced bins, with the first bin containing the lower half of the data and the upper bin containing the upper half... i.e., I want each data point to fall into some bin, so that I can make a boxplot.
thanks.
You're getting the 3rd bin for the maximum value in the array (I'm assuming you have a typo there, and max_x should be "max(my_array[:,0])" instead of "max(my_array[:,1])"). You can avoid this by adding 1 (or any positive number) to the last bin.
Also, if I'm understanding you correctly, you want to bin one variable by another, so my example below shows that. If you're using recarrays (which are much slower) there are also several functions in matplotlib.mlab (e.g. mlab.rec_groupby, etc) that do this sort of thing.
Anyway, in the end, you might have something like this (to bin x by the values in y, assuming x and y are the same length)
def bin_by(x, y, nbins=30):
"""
Bin x by y.
Returns the binned "x" values and the left edges of the bins
"""
bins = np.linspace(y.min(), y.max(), nbins+1)
# To avoid extra bin for the max value
bins[-1] += 1
indicies = np.digitize(y, bins)
output = []
for i in xrange(1, len(bins)):
output.append(x[indicies==i])
# Just return the left edges of the bins
bins = bins[:-1]
return output, bins
As a quick example:
In [3]: x = np.random.random((100, 2))
In [4]: binned_values, bins = bin_by(x[:,0], x[:,1], 2)
In [5]: binned_values
Out[5]:
[array([ 0.59649575, 0.07082605, 0.7191498 , 0.4026375 , 0.06611863,
0.01473529, 0.45487203, 0.39942696, 0.02342408, 0.04669615,
0.58294003, 0.59510434, 0.76255006, 0.76685052, 0.26108928,
0.7640156 , 0.01771553, 0.38212975, 0.74417014, 0.38217517,
0.73909022, 0.21068663, 0.9103707 , 0.83556636, 0.34277006,
0.38007865, 0.18697416, 0.64370535, 0.68292336, 0.26142583,
0.50457354, 0.63071319, 0.87525221, 0.86509534, 0.96382375,
0.57556343, 0.55860405, 0.36392931, 0.93638048, 0.66889756,
0.46140831, 0.01675165, 0.15401495, 0.10813141, 0.03876953,
0.65967335, 0.86803192, 0.94835281, 0.44950182]),
array([ 0.9249993 , 0.02682873, 0.89439141, 0.26415792, 0.42771144,
0.12292614, 0.44790357, 0.64692616, 0.14871052, 0.55611472,
0.72340179, 0.55335053, 0.07967047, 0.95725514, 0.49737279,
0.99213794, 0.7604765 , 0.56719713, 0.77828727, 0.77046566,
0.15060196, 0.39199123, 0.78904624, 0.59974575, 0.6965413 ,
0.52664095, 0.28629324, 0.21838664, 0.47305751, 0.3544522 ,
0.57704906, 0.1023201 , 0.76861237, 0.88862359, 0.29310836,
0.22079126, 0.84966201, 0.9376939 , 0.95449215, 0.10856864,
0.86655289, 0.57835533, 0.32831162, 0.1673871 , 0.55742108,
0.02436965, 0.45261232, 0.31552715, 0.56666458, 0.24757898,
0.8674747 ])]
Hope that helps a bit!
Numpy has a dedicated function for creating histograms the way you need to:
histogram(a, bins=10, range=None, normed=False, weights=None, new=None)
which you can use like:
(hist_data, bin_edges) = histogram(my_array[:,0], weights=my_array[:,1])
The key point here is to use the weights argument: each value a[i] will contribute weights[i] to the histogram. Example:
a = [0, 1]
weights = [10, 2]
describes 10 points at x = 0 and 2 points at x = 1.
You can set the number of bins, or the bin limits, with the bins argument (see the official documentation for more details).
The histogram can then be plotted with something like:
bar(bin_edges[:-1], hist_data)
If you only need to do a histogram plot, the similar hist() function can directly plot the histogram:
hist(my_array[:,0], weights=my_array[:,1])