matplotlib convert real to categorical - python

If i have a set of data
python
x_data = array([-0.5597064565292805, -0.6044992007582148, 0.22877491676881043,
-1.2332817779977419, 0.42077626119484773, 1.825509016838052,
0.3476645527864688, -0.35439666443655543, 0.8783711637081933,
-0.438777582274935], dtype=object)
I can't get matplotlib to draw a bar chart with x as categorical values. No matter what I do, it forces a convert to real. Any ideas how to make each number a categorical?

You can plot your y data as a function of the numbers 0,1,2,3, etc and then set the ticklabels as the strings from the x_data.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
x_data = [-0.5597064565292805, -0.6044992007582148, 0.22877491676881043,
-1.2332817779977419, 0.42077626119484773, 1.825509016838052,
0.3476645527864688, -0.35439666443655543, 0.8783711637081933,
-0.438777582274935]
y_data = np.random.rand(len(x_data))
x_strings=["{:.3f}".format(x) for x in x_data]
plt.figure(figsize=(5,5), dpi=64./5)
plt.bar(range(len(x_data)), y_data )
plt.xticks(range(len(x_data)), x_strings)
plt.savefig(__file__+".png", dpi="figure")
plt.show()
The result as a 64x64 pixel image would look like
where nothing is readable. So it may not make too much sense.

Related

Seaborn violin plot over time given numpy ndarray

I have a distribution that changes over time for which I would like to plot a violin plot for each time step side-by-side using seaborn. My initial attempt failed as violinplot cannot handle a np.ndarray for the y argument:
import numpy as np
import seaborn as sns
time = np.arange(0, 10)
samples = np.random.randn(10, 200)
ax = sns.violinplot(x=time, y=samples) # Exception: Data must be 1-dimensional
The seaborn documentation has an example for a vertical violinplot grouped by a categorical variable. However, it uses a DataFrame in long format.
Do I need to convert my time series into a DataFrame as well? If so, how do I achieve this?
A closer look at the documentation made me realize that omitting the x and y argument altogether leads to the data argument being interpreted in wide-form:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
samples = np.random.randn(20, 10)
ax = sns.violinplot(data=samples)
plt.show()
In the violin plot documentation it says that the input x and y parameters do not have to be a data frame, but they have a restriction of having the same dimension. In addition, the variable y that you created has 10 rows and 200 columns. This is detrimental when plotting the graphics and causes a dimension problem.
I tested it and this code has no problems when reading the python file.
import numpy as np
import seaborn as sns
import pandas as pd
time = np.arange(0, 200)
samples = np.random.randn(10, 200)
for sample in samples:
ax = sns.violinplot(x=time, y=sample)
You can then group the resulting graphs using this link:
https://python-graph-gallery.com/199-matplotlib-style-sheets/
If you want to convert your data into data frames it is also possible. You just need to use pandas.
example
import pandas as pd
x = [1,2,3,4]
df = pd.DataFrame(x)

How to make sure matplotlib line graph is correct?

I need to plot an accurate line graph through matplotlib but I only get a y=x graph. And the y-axis tick values are jumbled up.
import numpy as np
import matplotlib.pyplot as plt
title = "Number of Flats Constructed"
data = np.genfromtxt('C:\data/flats-constructed-by-housing-and-development-board-annual.csv',
skip_header=1,
dtype=[('year','i8'),('flats_constructed','U50')], delimiter=",",
missing_values=['na','-'],filling_values=[0])
x = data['year']
y = data['flats_constructed']
plt.title('No. of Flats Constructed over the Years')
#plt.plot(data['year'], data['flats_constructed'])
plt.plot(x, y)
plt.show()
I received a y=x graph instead of a jagged graph reflecting the values.
Actual output
Sample of expected output
Your mistake is at ('flats_constructed','U50').
Give it as ('flats_constructed','i8') itself. You read it as string when you gave U50.
from io import StringIO
import numpy as np
s = StringIO(u"1977,30498\n1978,264946\n1979,54666\n1980,54666")
data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','i8')], delimiter=",",skip_header=0)
data
plt.plot(data['myint'],data['myfloat'])
plt.show()

Matplotlib plot already binned data

I want to plot the mean local binary patterns histograms of a set of images. Here is what I did:
#calculates the lbp
lbp = feature.local_binary_pattern(image, 24, 8, method="uniform")
#Now I calculate the histogram of LBP Patterns
(hist, _) = np.histogram(lbp.ravel(), bins=np.arange(0, 27))
After that I simply sum up all the LBP histograms and take the mean of them. These are the values found, which are saved in a txt file:
2.962000000000000000e+03
1.476000000000000000e+03
1.128000000000000000e+03
1.164000000000000000e+03
1.282000000000000000e+03
1.661000000000000000e+03
2.253000000000000000e+03
3.378000000000000000e+03
4.490000000000000000e+03
5.010000000000000000e+03
4.337000000000000000e+03
3.222000000000000000e+03
2.460000000000000000e+03
2.495000000000000000e+03
2.599000000000000000e+03
2.934000000000000000e+03
2.526000000000000000e+03
1.971000000000000000e+03
1.303000000000000000e+03
9.900000000000000000e+02
7.980000000000000000e+02
8.680000000000000000e+02
1.119000000000000000e+03
1.479000000000000000e+03
4.355000000000000000e+03
3.112600000000000000e+04
I am trying to simply plot these values (don't need to calculate the histogram, because the values are already from a histogram). Here is what I've tried:
import matplotlib
matplotlib.use('Agg')
import numpy as np
import matplotlib.pyplot as plt
import plotly.plotly as py
#load data
data=np.loadtxt('original_dataset1.txt')
#convert to float
data=data.astype('float32')
#define number of Bins
n_bins = data.max() + 1
plt.style.use("ggplot")
(fig, ax) = plt.subplots()
fig.suptitle("Local Binary Patterns")
plt.ylabel("Frequency")
plt.xlabel("LBP value")
plt.bar(n_bins, data)
fig.savefig('lbp_histogram.png')
However, look at the Figure these commands produce:
I still dont understand what is happening. I would like to make a Figure like the one I produced in Excel using the same data, as follows:
I must confess that I am quite rookie with matplotlib. So, what was my mistake?
Try this. Here the array is your mean values from bins.
array = [2962,1476,1128,1164,1282,1661,2253]
fig,ax = plt.subplots(nrows=1, ncols=1,)
ax.bar(np.array(range(len(array)))+1,array,color='orangered')
ax.grid(axis='y')
for i, v in enumerate(array):
ax.text(i+1, v, str(v),color='black',fontweight='bold',
verticalalignment='bottom',horizontalalignment='center')
plt.savefig('savefig.png',dpi=150)
The plot look like this.

Difference between specified and measured colours, matplotlib colormap

I'm having trouble replicating an old colormap I've used in matplotlib. It seems as if it was the default colormap because in the original code, no colormap was specified.
So looking at the old figure I made I've measured the colours from the colorbar using gpick. I've inputted these into a custom colormap as follows:
blue_red1 = LinearSegmentedColormap.from_list('mycmap', [
(0, '#6666de'),
(0.1428, '#668cff'),
(0.2856, '#66d9ff'),
(0.4284, '#92ffce'),
(0.5712, '#d0ff90'),
(0.714, '#ffe366'),
(0.8568, '#ff9b66'),
(1, '#db6666')])
CS = plt.contourf(H, temps, diff_list, cmap=blue_red1)
plt.savefig('out.png')
Yet when I measure the output colours with gpick again they have different hex values (and I can tell they're different).
What could be causing this?
The original I'm trying to replicate, and the output from the custom colour map are linked below:
You may get much closer to the desired result using the following.
The logic is that each color in the colorbar is the value corresponding to the mean of its interval.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
X,Y=np.meshgrid(np.linspace(0,1),np.linspace(0,1) )
Z = X+Y
blue_red1 = LinearSegmentedColormap.from_list('mycmap', [
(0.0000, '#6666de'),
(0.0625, '#6666de'),
(0.1875, '#668cff'),
(0.3125, '#66d9ff'),
(0.4375, '#92ffce'),
(0.5625, '#d0ff90'),
(0.6875, '#ffe366'),
(0.8125, '#ff9b66'),
(0.9375, '#db6666'),
(1.0000, '#db6666')])
CS = plt.contourf(X,Y,Z, cmap=blue_red1)
plt.colorbar()
plt.show()
The other option is to use a ListedColormap. This gives the accurate colors.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
X,Y=np.meshgrid(np.linspace(0,1),np.linspace(0,1) )
Z = X+Y
blue_red1 = ListedColormap(['#6666de','#668cff','#66d9ff','#92ffce','#d0ff90',
'#ffe366','#ff9b66','#db6666'],'mycmap')
CS = plt.contourf(X,Y,Z, cmap=blue_red1)
plt.colorbar()
plt.show()

How to locate the median in a (seaborn) KDE plot?

I am trying to do a Kernel Density Estimation (KDE) plot with seaborn and locate the median. The code looks something like this:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
sns.set_palette("hls", 1)
data = np.random.randn(30)
sns.kdeplot(data, shade=True)
# x_median, y_median = magic_function()
# plt.vlines(x_median, 0, y_median)
plt.show()
As you can see I need a magic_function() to fetch the median x and y values from the kdeplot. Then I would like to plot them with e.g. vlines. However, I can't figure out how to do that. The result should look something like this (obviously the black median bar is wrong here):
I guess my question is not strictly related to seaborn and also applies to other kinds of matplotlib plots. Any ideas are greatly appreciated.
You need to:
Extract the data of the kde line
Integrate it to calculate the cumulative distribution function (CDF)
Find the value that makes CDF equal 1/2, that is the median
import numpy as np
import scipy
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_palette("hls", 1)
data = np.random.randn(30)
p=sns.kdeplot(data, shade=True)
x,y = p.get_lines()[0].get_data()
#care with the order, it is first y
#initial fills a 0 so the result has same length than x
cdf = scipy.integrate.cumtrapz(y, x, initial=0)
nearest_05 = np.abs(cdf-0.5).argmin()
x_median = x[nearest_05]
y_median = y[nearest_05]
plt.vlines(x_median, 0, y_median)
plt.show()

Categories

Resources