Matplotlib plot already binned data

Matplotlib plot already binned data - python

I want to plot the mean local binary patterns histograms of a set of images. Here is what I did:
#calculates the lbp
lbp = feature.local_binary_pattern(image, 24, 8, method="uniform")
#Now I calculate the histogram of LBP Patterns
(hist, _) = np.histogram(lbp.ravel(), bins=np.arange(0, 27))
After that I simply sum up all the LBP histograms and take the mean of them. These are the values found, which are saved in a txt file:
2.962000000000000000e+03
1.476000000000000000e+03
1.128000000000000000e+03
1.164000000000000000e+03
1.282000000000000000e+03
1.661000000000000000e+03
2.253000000000000000e+03
3.378000000000000000e+03
4.490000000000000000e+03
5.010000000000000000e+03
4.337000000000000000e+03
3.222000000000000000e+03
2.460000000000000000e+03
2.495000000000000000e+03
2.599000000000000000e+03
2.934000000000000000e+03
2.526000000000000000e+03
1.971000000000000000e+03
1.303000000000000000e+03
9.900000000000000000e+02
7.980000000000000000e+02
8.680000000000000000e+02
1.119000000000000000e+03
1.479000000000000000e+03
4.355000000000000000e+03
3.112600000000000000e+04
I am trying to simply plot these values (don't need to calculate the histogram, because the values are already from a histogram). Here is what I've tried:
import matplotlib
matplotlib.use('Agg')
import numpy as np
import matplotlib.pyplot as plt
import plotly.plotly as py
#load data
data=np.loadtxt('original_dataset1.txt')
#convert to float
data=data.astype('float32')
#define number of Bins
n_bins = data.max() + 1
plt.style.use("ggplot")
(fig, ax) = plt.subplots()
fig.suptitle("Local Binary Patterns")
plt.ylabel("Frequency")
plt.xlabel("LBP value")
plt.bar(n_bins, data)
fig.savefig('lbp_histogram.png')
However, look at the Figure these commands produce:
I still dont understand what is happening. I would like to make a Figure like the one I produced in Excel using the same data, as follows:
I must confess that I am quite rookie with matplotlib. So, what was my mistake?

Try this. Here the array is your mean values from bins.
array = [2962,1476,1128,1164,1282,1661,2253]
fig,ax = plt.subplots(nrows=1, ncols=1,)
ax.bar(np.array(range(len(array)))+1,array,color='orangered')
ax.grid(axis='y')
for i, v in enumerate(array):
ax.text(i+1, v, str(v),color='black',fontweight='bold',
verticalalignment='bottom',horizontalalignment='center')
plt.savefig('savefig.png',dpi=150)
The plot look like this.

Related

Plotly: How to make a 3D stacked histogram?

I have several histograms that I succeded to plot using plotly like this:
fig.add_trace(go.Histogram(x=np.array(data[key]), name=self.labels[i]))
I would like to create something like this 3D stacked histogram but with the difference that each 2D histogram inside is a true histogram and not just a hardcoded line (my data is of the form [0.5 0.4 0.5 0.7 0.4] so using Histogram directly is very convenient)
Note that what I am asking is not similar to this and therefore also not the same as this. In the matplotlib example, the data is presented directly in a 2D array so the histogram is the 3rd dimension. In my case, I wanted to feed a function with many already computed histograms.

The snippet below takes care of both binning and formatting of the figure so that it appears as a stacked 3D chart using multiple traces of go.Scatter3D and np.Histogram.
The input is a dataframe with random numbers using np.random.normal(50, 5, size=(300, 4))
We can talk more about the other details if this is something you can use:
Plot 1: Angle 1
Plot 2: Angle 2
Complete code:
# imports
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = 'browser'
# data
np.random.seed(123)
df = pd.DataFrame(np.random.normal(50, 5, size=(300, 4)), columns=list('ABCD'))
# plotly setup
fig=go.Figure()
# data binning and traces
for i, col in enumerate(df.columns):
a0=np.histogram(df[col], bins=10, density=False)[0].tolist()
a0=np.repeat(a0,2).tolist()
a0.insert(0,0)
a0.pop()
a1=np.histogram(df[col], bins=10, density=False)[1].tolist()
a1=np.repeat(a1,2)
fig.add_traces(go.Scatter3d(x=[i]*len(a0), y=a1, z=a0,
mode='lines',
name=col
)
)
fig.show()

Unfortunately you can't use go.Histogram in a 3D space so you should use an alternative way. I used go.Scatter3d and I wanted to use the option to fill line doc but there is an evident bug see
import numpy as np
import plotly.graph_objs as go
# random mat
m = 6
n = 5
mat = np.random.uniform(size=(m,n)).round(1)
# we want to have the number repeated
mat = mat.repeat(2).reshape(m, n*2)
# and finally plot
x = np.arange(2*n)
y = np.ones(2*n)
fig = go.Figure()
for i in range(m):
fig.add_trace(go.Scatter3d(x=x,
y=y*i,
z=mat[i,:],
mode="lines",
# surfaceaxis=1 # bug
)
)
fig.show()

Generating a smooth line with Pandas dataframe and Matplotlib

I am trying to generate a smooth line using a dataset that contains time (measured as number of days) and a set of numbers that represent a socioeconomic variable.
Here is a sample of my data:
date, data
726,1.2414
727,1.2414
728,1.2414
729,1.2414
730,1.2414
731,1.2414
732,1.2414
733,1.2414
734,1.2414
735,1.2414
736,1.2414
737,1.804597701
738,1.804597701
739,1.804597701
740,1.804597701
741,1.804597701
742,1.804597701
743,1.804597701
744,1.804597701
745,1.804597701
746,1.804597701
747,1.804597701
748,1.804597701
749,1.804597701
750,1.804597701
751,1.804597701
752,1.793103448
753,1.793103448
754,1.793103448
755,1.793103448
756,1.793103448
757,1.793103448
758,1.793103448
759,1.793103448
760,1.793103448
761,1.793103448
762,1.793103448
763,1.793103448
764,1
765,1
This is my code so far:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
out_file = "path_to_file/file.csv"
df = pd.read_csv(out_file)
time = df['date']
data = df['data']
ax1 = plt.subplot2grid((4,3),(0,0), colspan = 2, rowspan = 2) # Will be adding other plots
plt.plot(time, data)
plt.yticks(np.arange(1,5,1)) # Include classes 1-4 showing only 1 step changes
plt.gca().invert_yaxis() # Reverse y axis
plt.ylabel('Trend', fontsize = 8, labelpad = 10)
This generates the following plot:
Test plot
I have seen posts that answer similar questions (like the ones below), but can't seem to get my code to work. Can anyone suggest an elegant solution?
Generating smooth line graph using matplotlib
Python Matplotlib - Smooth plot line for x-axis with date values

Ridgeline/Joyplot across a moving range

(Using Python 3.0) In increments of 0.25, I want to calculate and plot PDFs for the given data across specified ranges for easy visualization.
Calculating the individual plot has been done thanks to the SO community, but I cannot quite get the algorithm right to iterate properly across the range of values.
Data: https://www.dropbox.com/s/y78pynq9onyw9iu/Data.csv?dl=0
What I have so far is normalized toy data that looks like a shotgun blast with one of the target areas isolated between the black lines with an increment of 0.25:
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
import seaborn as sns
Data=pd.read_csv("Data.csv")
g = sns.jointplot(x="x", y="y", data=Data)
bottom_lim = 0
top_lim = 0.25
temp = Data.loc[(Data.y>=bottom_lim)&(Data.y<top_lim)]
g.ax_joint.axhline(top_lim, c='k', lw=2)
g.ax_joint.axhline(bottom_lim, c='k', lw=2)
# we have to create a secondary y-axis to the joint-plot, otherwise the kde
might be very small compared to the scale of the original y-axis
ax_joint_2 = g.ax_joint.twinx()
sns.kdeplot(temp.x, shade=True, color='red', ax=ax_joint_2, legend=False)
ax_joint_2.spines['right'].set_visible(False)
ax_joint_2.spines['top'].set_visible(False)
ax_joint_2.yaxis.set_visible(False)
And now what I want to do is make a ridgeline/joyplot of this data across each 0.25 band of data.
I tried a few techniques from the various Seaborn examples out there, but nothing really accounts for the band or range of values as the y-axis. I'm struggling to translate my written algorithm into working code as a result.

I don't know if this is exactly what you are looking for, but hopefully this gets you in the ballpark. I also know very little about python, so here is some R:
library(tidyverse)
library(ggridges)
data = read_csv("https://www.dropbox.com/s/y78pynq9onyw9iu/Data.csv?dl=1")
data2 = data %>%
mutate(breaks = cut(x, breaks = seq(-1,7,.5), labels = FALSE))
data2 %>%
ggplot(aes(x=x,y=breaks)) +
geom_density_ridges() +
facet_grid(~breaks, scales = "free")
data2 %>%
ggplot(aes(x=x,y=y)) +
geom_point() +
geom_density() +
facet_grid(~breaks, scales = "free")
And please forgive the poorly formatted axis.

matplotlib hist gives me "too many values to unpack"

I want to create a frequency plot of a sample whose values lie between -1 and 1.
creating the histogram using numpy works just fine:
freq, bins = np.histogram(sample, bins=np.arange(-1,1,0.05) )
but creating a plot using the same bins gives me an error (see title):
plt.hist(freq, range=bins)
In addition to this, how is it possible to adjust the x-labels such that the correct bin-values are shown?
minimal working example:
import matplotlib.pyplot as plt
import numpy as np
if __name__ == "__main__":
sample = np.random.uniform(-1,1,100)
freq, bins = np.histogram(sample, bins=np.arange(-1,1,0.05) )
plt.figure()
plt.hist(freq, range=bins)
plt.show()

It's not necessary to preprocess the data using numpy, just pass the data direct to matplotlib's hist function:
sample = np.random.uniform(-1,1,10000)
#freq, bins = np.histogram(sample, bins=np.arange(-1,1,0.05) )
plt.figure()
plt.hist(sample, bins=100)
plt.show()

How to locate the median in a (seaborn) KDE plot?

I am trying to do a Kernel Density Estimation (KDE) plot with seaborn and locate the median. The code looks something like this:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
sns.set_palette("hls", 1)
data = np.random.randn(30)
sns.kdeplot(data, shade=True)
# x_median, y_median = magic_function()
# plt.vlines(x_median, 0, y_median)
plt.show()
As you can see I need a magic_function() to fetch the median x and y values from the kdeplot. Then I would like to plot them with e.g. vlines. However, I can't figure out how to do that. The result should look something like this (obviously the black median bar is wrong here):
I guess my question is not strictly related to seaborn and also applies to other kinds of matplotlib plots. Any ideas are greatly appreciated.

You need to:
Extract the data of the kde line
Integrate it to calculate the cumulative distribution function (CDF)
Find the value that makes CDF equal 1/2, that is the median
import numpy as np
import scipy
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_palette("hls", 1)
data = np.random.randn(30)
p=sns.kdeplot(data, shade=True)
x,y = p.get_lines()[0].get_data()
#care with the order, it is first y
#initial fills a 0 so the result has same length than x
cdf = scipy.integrate.cumtrapz(y, x, initial=0)
nearest_05 = np.abs(cdf-0.5).argmin()
x_median = x[nearest_05]
y_median = y[nearest_05]
plt.vlines(x_median, 0, y_median)
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib plot already binned data - python

Related

Plotly: How to make a 3D stacked histogram?

Generating a smooth line with Pandas dataframe and Matplotlib

Ridgeline/Joyplot across a moving range

matplotlib hist gives me "too many values to unpack"

How to locate the median in a (seaborn) KDE plot?

Categories

Resources