I want to create a frequency plot of a sample whose values lie between -1 and 1.
creating the histogram using numpy works just fine:
freq, bins = np.histogram(sample, bins=np.arange(-1,1,0.05) )
but creating a plot using the same bins gives me an error (see title):
plt.hist(freq, range=bins)
In addition to this, how is it possible to adjust the x-labels such that the correct bin-values are shown?
minimal working example:
import matplotlib.pyplot as plt
import numpy as np
if __name__ == "__main__":
sample = np.random.uniform(-1,1,100)
freq, bins = np.histogram(sample, bins=np.arange(-1,1,0.05) )
plt.figure()
plt.hist(freq, range=bins)
plt.show()
It's not necessary to preprocess the data using numpy, just pass the data direct to matplotlib's hist function:
sample = np.random.uniform(-1,1,10000)
#freq, bins = np.histogram(sample, bins=np.arange(-1,1,0.05) )
plt.figure()
plt.hist(sample, bins=100)
plt.show()
Related
I am struggling to make a histogram plot where the total percentage of events sums to 100%. Instead, for this particular example, it sums to approximately 3%. Will anyone be able to show me how I make the percentages of my events sum to 100% for any array used?
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import numpy as np
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
data = np.array([0,9,78,6,44,23,88,77,12,29])
length_of_data = len(data) # Length of data
bins = int(np.sqrt(length_of_data)) # Choose number of bins
y = data
plt.title('Histogram')
plt.ylabel('Percentage Of Events')
plt.xlabel('bins')
plt.hist(y,bins=bins, density = True)
plt.show()
print(bins)
One way of doing it is to get the bin heights that plt.hist returns, then re-set the patch heights to the normalized height you want. It's not that involved if you know what to do, but not that ideal. Here's your case:
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import numpy as np
plt.gca().yaxis.set_major_formatter(PercentFormatter(100)) # <-- changed here
data = np.array([0,9,78,6,44,23,88,77,12,29])
length_of_data = len(data) # Length of data
bins = int(np.sqrt(length_of_data)) # Choose number of bins
y = data
plt.title('Histogram')
plt.ylabel('Percentage Of Events')
plt.xlabel('bins')
#### Setting new heights
n, bins, patches = plt.hist(y, bins=bins, density = True, edgecolor='k')
scaled_n = n / n.sum() * 100
for new_height, patch in zip(scaled_n, patches):
patch.set_height(new_height)
####
# Setting cumulative sum as verification
plt.plot((bins[1:] + bins[:-1])/2, scaled_n.cumsum())
# If you want the cumsum to start from 0, uncomment the line below
#plt.plot(np.concatenate([[0], (bins[1:] + bins[:-1])/2]), np.concatenate([[0], scaled_n.cumsum()]))
plt.ylim(top=110)
plt.show()
This is the resulting picture:
As others said, you can use seaborn. Here's how to reproduce my code above. You'd still need to add all the labels and styling you want.
import seaborn as sns
sns.histplot(data, bins=int(np.sqrt(length_of_data)), stat='percent')
sns.histplot(data, bins=int(np.sqrt(length_of_data)), stat='percent', cumulative=True, element='poly', fill=False, color='C1')
This is the resulting picture:
I want to plot the mean local binary patterns histograms of a set of images. Here is what I did:
#calculates the lbp
lbp = feature.local_binary_pattern(image, 24, 8, method="uniform")
#Now I calculate the histogram of LBP Patterns
(hist, _) = np.histogram(lbp.ravel(), bins=np.arange(0, 27))
After that I simply sum up all the LBP histograms and take the mean of them. These are the values found, which are saved in a txt file:
2.962000000000000000e+03
1.476000000000000000e+03
1.128000000000000000e+03
1.164000000000000000e+03
1.282000000000000000e+03
1.661000000000000000e+03
2.253000000000000000e+03
3.378000000000000000e+03
4.490000000000000000e+03
5.010000000000000000e+03
4.337000000000000000e+03
3.222000000000000000e+03
2.460000000000000000e+03
2.495000000000000000e+03
2.599000000000000000e+03
2.934000000000000000e+03
2.526000000000000000e+03
1.971000000000000000e+03
1.303000000000000000e+03
9.900000000000000000e+02
7.980000000000000000e+02
8.680000000000000000e+02
1.119000000000000000e+03
1.479000000000000000e+03
4.355000000000000000e+03
3.112600000000000000e+04
I am trying to simply plot these values (don't need to calculate the histogram, because the values are already from a histogram). Here is what I've tried:
import matplotlib
matplotlib.use('Agg')
import numpy as np
import matplotlib.pyplot as plt
import plotly.plotly as py
#load data
data=np.loadtxt('original_dataset1.txt')
#convert to float
data=data.astype('float32')
#define number of Bins
n_bins = data.max() + 1
plt.style.use("ggplot")
(fig, ax) = plt.subplots()
fig.suptitle("Local Binary Patterns")
plt.ylabel("Frequency")
plt.xlabel("LBP value")
plt.bar(n_bins, data)
fig.savefig('lbp_histogram.png')
However, look at the Figure these commands produce:
I still dont understand what is happening. I would like to make a Figure like the one I produced in Excel using the same data, as follows:
I must confess that I am quite rookie with matplotlib. So, what was my mistake?
Try this. Here the array is your mean values from bins.
array = [2962,1476,1128,1164,1282,1661,2253]
fig,ax = plt.subplots(nrows=1, ncols=1,)
ax.bar(np.array(range(len(array)))+1,array,color='orangered')
ax.grid(axis='y')
for i, v in enumerate(array):
ax.text(i+1, v, str(v),color='black',fontweight='bold',
verticalalignment='bottom',horizontalalignment='center')
plt.savefig('savefig.png',dpi=150)
The plot look like this.
[The resolution is described below.]
I'm trying to create a PairGrid. The X-axis has at least 2 different value ranges, although even when 'cvar' below is plotted by itself the x-axis overwrites itself.
My question: is there a way to tilt the x-axis labels to be vertical or have fewer x-axis labels so they don't overlap? Is there another way to solve this issue?
====================
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import numpy as np
columns = ['avar', 'bvar', 'cvar']
index = np.arange(10)
df = pd.DataFrame(columns=columns, index = index)
myarray = np.random.random((10, 3))
for val, item in enumerate(myarray):
df.ix[val] = item
df['cvar'] = [400,450,43567,23000,19030,35607,38900,30202,24332,22322]
fig1 = sns.PairGrid(df, y_vars=['avar'],
x_vars=['bvar', 'cvar'],
palette="GnBu_d")
fig1.map(plt.scatter, s=40, edgecolor="white")
# The fix: Add the following to rotate the x axis.
plt.xticks( rotation= -45 )
=====================
The code above produces this image
Thanks!
I finally figured it out. I added "plt.xticks( rotation= -45 )" to the original code above. More can be fund on the MatPlotLib site here.
Plotting Differences between bar and hist
Given some data in a pandas.Series , rv, there is a difference between
Calling hist directly on the data to plot
Calculating the histogram results (with numpy.histogram) then plotting with bar
Example Data Generation
%matplotlib inline
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')
# Setup size and distribution
size = 50000
distribution = stats.norm()
# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)
# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)
hist() Plotting
ax = pdf.plot(lw=2, label='PDF', legend=True)
rv.plot(kind='hist', bins=50, normed=True, alpha=0.5, label='Random Samples', legend=True, ax=ax)
bar() Plotting
ax = pdf.plot(lw=2, label='PDF', legend=True)
hist.plot(kind='bar', alpha=0.5, label='Random Samples', legend=True, ax=ax)
How can the bar plot be made to look like the hist plot?
The use case for this is needing to save only the histogrammed data to use and plot later (it is typically smaller in size than the original data).
Bar plotting differences
Obtaining a bar plot that looks like the hist plot requires some manipulating of default behavior for bar.
Force bar to use actual x data for plotting range by passing both x (hist.index) and y (hist.values). The default bar behavior is to plot the y data against an arbitrary range and put the x data as the label.
Set the width parameter to be related to actual step size of x data (The default is 0.8)
Set the align parameter to 'center'.
Manually set the axis legend.
These changes need to be made via matplotlib's bar() called on the axis (ax) instead of pandas's bar() called on the data (hist).
Example Plotting
%matplotlib inline
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')
# Setup size and distribution
size = 50000
distribution = stats.norm()
# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)
# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)
# Plot previously histogrammed data
ax = pdf.plot(lw=2, label='PDF', legend=True)
w = abs(hist.index[1]) - abs(hist.index[0])
ax.bar(hist.index, hist.values, width=w, alpha=0.5, align='center')
ax.legend(['PDF', 'Random Samples'])
Another, simpler solution is to create fake samples that reproduce the same histogram and then simply use hist().
I.e., after retrieving bins and counts from stored data, do
fake = np.array([])
for i in range(len(counts)):
a, b = bins[i], bins[i+1]
sample = a + (b-a)*np.random.rand(counts[i])
fake = np.append(fake, sample)
plt.hist(fake, bins=bins)
I am trying to do a Kernel Density Estimation (KDE) plot with seaborn and locate the median. The code looks something like this:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
sns.set_palette("hls", 1)
data = np.random.randn(30)
sns.kdeplot(data, shade=True)
# x_median, y_median = magic_function()
# plt.vlines(x_median, 0, y_median)
plt.show()
As you can see I need a magic_function() to fetch the median x and y values from the kdeplot. Then I would like to plot them with e.g. vlines. However, I can't figure out how to do that. The result should look something like this (obviously the black median bar is wrong here):
I guess my question is not strictly related to seaborn and also applies to other kinds of matplotlib plots. Any ideas are greatly appreciated.
You need to:
Extract the data of the kde line
Integrate it to calculate the cumulative distribution function (CDF)
Find the value that makes CDF equal 1/2, that is the median
import numpy as np
import scipy
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_palette("hls", 1)
data = np.random.randn(30)
p=sns.kdeplot(data, shade=True)
x,y = p.get_lines()[0].get_data()
#care with the order, it is first y
#initial fills a 0 so the result has same length than x
cdf = scipy.integrate.cumtrapz(y, x, initial=0)
nearest_05 = np.abs(cdf-0.5).argmin()
x_median = x[nearest_05]
y_median = y[nearest_05]
plt.vlines(x_median, 0, y_median)
plt.show()