Horizontal bar chart that does not start at zero / displaying ranges - python

I am trying to visualize a set of frequency ranges for around 20 samples I have. What I want to do is a horizontal bar chart where each row represents one sample. The sample name is supposed to go on the left and on the right I want an x-axis with limits 0 and 150 kHz.
Now the ranges I have are something like (70.5, 95.5). Can I realize this with a horizontal bar chart or am I looking for a different type of chart?
Sorry that I can't provide an example, because I just got nothing so far. A bar chart just doesn't do what I want.
Edit: I basically want something like in this example but without the actual bars and with being able to enter my data for the error bars. As far as I know error bars can only work with errors relative to the "main data".

If I understand you correctly, you can do this with a simple errorbar plot (though it's a bit of a hack):
import numpy as np
import matplotlib.pyplot as plt
# 20 random samples
nsamples = 20
xmin, xmax = 0, 150
samples = np.random.random_sample((nsamples,2)) * (xmax-xmin) + xmin
samples.sort(axis=1)
means = np.mean(samples, axis=1)
# Find the length of the errorbar each side of the mean
half_range = samples[:,1] - means
# Plot without markers and customize the errorbar
_, caps, _ = plt.errorbar(means, np.arange(nsamples)+1, xerr=half_range, ls='',
elinewidth=3, capsize=5)
for cap in caps:
cap.set_markeredgewidth(3)
# Set the y-range so we can see all the errorbars clearly
plt.ylim(0, nsamples+1)
plt.show()

Related

Python Scatter plot with matrix input. Having trouble getting number of columns showing on x axis, then a dot for each value in each column

I'm making a bar chart and a scatter plot. The bar chart takes a vector as an input. I plotted the values on the x-axis, and the amount of times they repeat on the y-axis. This is did by converting the vector to a list and using .count(). That worked great and was relatively straightforward.
As for the scatterplot, the input is going to be a matrix of any x and y dimensions. The idea is to have the amount of columns in the matrix show up on the x axis going from 1,2,3,4 etc depending on how many columns the inserted matrix is. The rows of each column will consist of many different numbers that I would like all to be displayed as dots or stars above the relevant column index, i. e. Column #3 consists of values 6,2,8,5,9,5 going down, and would like a dot for each of them going up the y-axis directly on top of the number 3 on the x axis. I have tried different approaches, some with dots showing up but in wrong places, other times the x axis is completely off even though I used .len(0,:) which prints out the correct amount of columns but doesn't chart it.
My latest attempt which now doesn't even show the dots or stars:
import numpy as np # Import NumPy
import matplotlib.pyplot as plt # Import the matplotlib.pyplot module
vector = np.array([[-3,7,12,4,0o2,7,-3],[7,7,12,4,0o2,4,12],[12,-3,4,10,12,4,-3],[10,12,4,0o3,7,10,12]])
x = len(vector[0,:])
print(x)#vector[0,:]
y = vector[:,0]
plt.plot(x, y, "r.") # Scatter plot with blue stars
plt.title("Scatter plot") # Set the title of the graph
plt.xlabel("Column #") # Set the x-axis label
plt.ylabel("Occurences of values for each column") # Set the y-axis label
plt.xlim([1,len(vector[0,:])]) # Set the limits of the x-axis
plt.ylim([-5,15]) # Set the limits of the y-axis
plt.show(vector)
The matrix shown at the top is just one I made up for the purpose of testing, the idea is that it should work for any given matrix which is imported.
I tried the above pasted code which is the closest I have gotten as it actually prints the amount of columns it has, but it doesn't show them on the plot. I haven't gotten to a point where it actually plots the points above the columns on y axis yet, only in completely wrong positions in a previous version.
import numpy as np # Import NumPy
import matplotlib.pyplot as plt # Import the matplotlib.pyplot module
vector = np.array([[-3,7,12,4,0o2,7,-3],
[7,7,12,4,0o2,4,12],
[12,-3,4,10,12,4,-3],
[10,12,4,0o3,7,10,12]])
rows, columns = vector.shape
plt.title("Scatter plot") # Set the title of the graph
plt.xlabel("Column #") # Set the x-axis label
plt.ylabel("Occurences of values for each column") # Set the y-axis label
plt.xlim([1,columns]) # Set the limits of the x-axis
plt.ylim([-5,15]) # Set the limits of the y-axis
for i in range(1, columns+1):
y = vector[:,i-1]
x = [i] * rows
plt.plot(x, y, "r.")
plt.show()

Set log xticks in matplotlib for a linear plot

Consider
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.hist(np.log10(xdata), bins=100)
plt.show()
plt.semilogy(xdata)
plt.show()
is there any way to display xticks of the first plot (plt.hist) as in the second plot's yticks? For good reasons I want to histogram the np.log10(xdata) of xdata but I'd like to set minor ticks to display as usual in a log scale (even considering that the exponent is linear...)
In other words, I want the x_axis of this plot:
to be like the y_axis
of the 2nd plot, without changing the spacing between major ticks (e.g., adding log marks between 5.5 and 6.0, without altering these values)
Proper histogram plot with logarithmic x-axis:
Explanation:
Cut off negative values
The randomly generated example data likely contains still some negative values
activate the commented code lines at the beginning to see the effect
logarithmic function isn't defined for values <= 0
while the 2nd plot just deals with y-axis log scaling (negative values are just out of range), the 1st plot doesn't work with negative values in the BINs range
probably real world working data won't be <= 0, otherwise keep that in mind
BINs should be aligned to log scale as well
otherwise the 'BINs widths' distribution looks off
switch # on the plt.hist( statements in the 1st plot section to see the effect)
xdata (not np.log10(xdata)) to be plotted in the histogram
that 'workaround' with plotting np.log10(xdata) probably was the root cause for the misunderstanding in the comments
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
# MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}") # note the negative values
# cut off potential negative values (log function isn't defined for <= 0 )
xdata = np.ma.masked_less_equal(xdata, 0)
MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}")
# align the bins to fit a log scale
bins = 100
bins_log_aligned = np.logspace(np.log10(MIN_xdata), np.log10(MAX_xdata), bins)
# 1st plot
plt.hist(xdata, bins = bins_log_aligned) # note: xdata (not np.log10(xdata) )
# plt.hist(xdata, bins = 100)
plt.xscale('log')
plt.show()
# 2nd plot
plt.semilogy(xdata)
plt.show()
Just kept for now for clarification purpose. Will be deleted when the question is revised.
Disclaimer:
As Lucas M. Uriarte already mentioned that isn't an expected way of changing axis ticks.
x axis ticks and labels don't represent the plotted data
You should at least always provide that information along with such a plot.
The plot
From seeing the result I kinda understand where that special plot idea is coming from - still there should be a preferred way (e.g. conversion of the data in advance) to do such a plot instead of 'faking' the axis.
Explanation how that special axis transfer plot is done:
original x-axis is hidden
a twiny axis is added
note that its y-axis is hidden by default, so that doesn't need handling
twiny x-axis is set to log and the 2nd plot y-axis limits are transferred
subplots used to directly transfer the 2nd plot y-axis limits
use variables if you need to stick with your two plots
twiny x-axis is moved from top (twiny default position) to bottom (where the original x-axis was)
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.figure()
fig, axs = plt.subplots(2, figsize=(7,10), facecolor=(1, 1, 1))
# 1st plot
axs[0].hist(np.log10(xdata), bins=100) # plot the data on the normal x axis
axs[0].axes.xaxis.set_visible(False) # hide the normal x axis
# 2nd plot
axs[1].semilogy(xdata)
# 1st plot - twin axis
axs0_y_twin = axs[0].twiny() # set a twiny axis, note twiny y axis is hidden by default
axs0_y_twin.set(xscale="log")
# transfer the limits from the 2nd plot y axis to the twin axis
axs0_y_twin.set_xlim(axs[1].get_ylim()[0],
axs[1].get_ylim()[1])
# move the twin x axis from top to bottom
axs0_y_twin.tick_params(axis="x", which="both", bottom=True, top=False,
labelbottom=True, labeltop=False)
# Disclaimer
disclaimer_text = "Disclaimer: x axis ticks and labels don't represent the plotted data"
axs[0].text(0.5,-0.09, disclaimer_text, size=12, ha="center", color="red",
transform=axs[0].transAxes)
plt.tight_layout()
plt.subplots_adjust(hspace=0.2)
plt.show()

How to plot small values in python chart?

I am using matplotlib.pyplot to draw a bar graph from csv files. The graph is drawn successfully.
I am plotting some values on X axis. However, when I have for example this data:
A = 10
B = 2000,000
The A bar does not appear on the graph because its value is too small. I need to show A bar even if it is too small, what should I do?
What is the method that should I change its value?
I used the following:
plt.minorticks_on()
plt.grid(axis='x')
plt.grid(which='minor',axis='x',linestyle=':',linewidth=0.6)
I looked in the previous question
How to draw bar charts for very small values in python or matplotlib?
I cannot use:
plt.xscale("log")
because I want to the x axis to contain Time in milliseconds.
Graph show that A do not have bar because its value is too small.
This is the version with log scale
a = 2e6
b = 1e1
c = 2e3
data = [a,b,c]
y = np.arange(len(data))
fig, ax = plt.subplots()
ax.barh(y,data)
ax.set_xlabel('This is time in ms, still ms, regardless of log scale')
ax.xaxis.grid()
ax.set_yticks(y)
ax.set_yticklabels(['A', 'B', 'C'])
for i in range(len(data)):
ax.text(data[i], y[i]-0.1, f'{int(data[i])}', rotation=90)
ax.set_xscale('log')
Output:
Comment out the last line, then you get linear scale
Without log you will never be able to see the other two.
Forget 10, even 2000 will not be visible. 2e3:2e6 is 1:1000 --> imagine 1cm in 10m. Even 1cm in 1m would be barely visible.

setting spacing between grouped bar plots in matplotlib

I'm trying to make a grouped bar plot in matplotlib, following the example in the gallery. I use the following:
import matplotlib.pyplot as plt
plt.figure(figsize=(7,7), dpi=300)
xticks = [0.1, 1.1]
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
ind = arange(num_items)
width = 0.1
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
group_len = len(vals)
gene_rects = plt.bar(ind, vals, width,
align="center")
ind = ind + width
num_groups = len(group_labels)
# Make label centered with respect to group of bars
# Is there a less complicated way?
offset = (num_groups / 2.) * width
xticks = arange(num_groups) + offset
s.set_xticks(xticks)
print "xticks: ", xticks
plt.xlim([0 - width, max(xticks) + (num_groups * width)])
s.set_xticklabels(group_labels)
My questions are:
How can I control the space between the groups of bars? Right now the spacing is huge and it looks silly. Note that I do not want to make the bars wider - I want them to have the same width, but be closer together.
How can I get the labels to be centered below the groups of bars? I tried to come up with some arithmetic calculations to position the xlabels in the right place (see code above) but it's still slightly off... it feels a bit like writing a plotting library rather than using one. How can this be fixed? (Is there a wrapper or built in utility for matplotlib where this is default behavior?)
EDIT: Reply to #mlgill: thank you for your answer. Your code is certainly much more elegant but still has the same issue, namely that the width of the bars and the spacing between the groups are not controlled separately. Your graph looks correct but the bars are far too wide -- it looks like an Excel graph -- and I wanted to make the bar thinner.
Width and margin are now linked, so if I try:
margin = 0.60
width = (1.-2.*margin)/num_items
It makes the bar skinnier, but brings the group far apart, so the plot again does not look right.
How can I make a grouped bar plot function that takes two parameters: the width of each bar, and the spacing between the bar groups, and plots it correctly like your code did, i.e. with the x-axis labels centered below the groups?
I think that since the user has to compute specific low-level layout quantities like margin and width, we are still basically writing a plotting library :)
Actually I think this problem is best solved by adjusting figsize and width; here is my output with figsize=(2,7) and width=0.3:
By the way, this type of thing becomes a lot simpler if you use pandas wrappers (i've also imported seaborn, not necessary for the solution, but makes the plot a lot prettier and more modern looking in my opinion):
import pandas as pd
import seaborn
seaborn.set()
df = pd.DataFrame(groups, index=group_labels)
df.plot(kind='bar', legend=False, width=0.8, figsize=(2,5))
plt.show()
The trick to both of your questions is understanding that bar graphs in Matplotlib expect each series (G1, G2) to have a total width of "1.0", counting margins on either side. Thus, it's probably easiest to set margins up and then calculate the width of each bar depending on how many of them there are per series. In your case, there are two bars per series.
Assuming you left align each bar, instead of center aligning them as you had done, this setup will result in series which span from 0.0 to 1.0, 1.0 to 2.0, and so forth on the x-axis. Thus, the exact center of each series, which is where you want your labels to appear, will be at 0.5, 1.5, etc.
I've cleaned up your code as there were a lot of extraneous variables. See comments within.
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(7,7), dpi=300)
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
# This needs to be a numpy range for xdata calculations
# to work.
ind = np.arange(num_items)
# Bar graphs expect a total width of "1.0" per group
# Thus, you should make the sum of the two margins
# plus the sum of the width for each entry equal 1.0.
# One way of doing that is shown below. You can make
# The margins smaller if they're still too big.
margin = 0.05
width = (1.-2.*margin)/num_items
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
# The position of the xdata must be calculated for each of the two data series
xdata = ind+margin+(num*width)
# Removing the "align=center" feature will left align graphs, which is what
# this method of calculating positions assumes
gene_rects = plt.bar(xdata, vals, width)
# You should no longer need to manually set the plot limit since everything
# is scaled to one.
# Also the ticks should be much simpler now that each group of bars extends from
# 0.0 to 1.0, 1.0 to 2.0, and so forth and, thus, are centered at 0.5, 1.5, etc.
s.set_xticks(ind+0.5)
s.set_xticklabels(group_labels)
I read an answer that Paul Ivanov posted on Nabble that might solve this problem with less complexity. Just set the index as below. This will increase the spacing between grouped columns.
ind = np.arange(0,12,2)

blank space in the top of the plot matplotlib django

I've a question about matplotlib bars.
I've already made some bar charts but I don't know why, this one left a huge blank space in the top.
the code is similar to other graphics I've made and they don't have this problem.
If anyone has any idea, I appreciate the help.
x = matplotlib.numpy.arange(0, max(total))
ind = matplotlib.numpy.arange(len(age_list))
ax.barh(ind, total)
ax.set_yticks(ind)
ax.set_yticklabels(age_list)
By "blank space in the top" do you mean that the y-limits are set too large?
By default, matplotlib will choose the x and y axis limits so that they're rounded to the closest "even" number (e.g. 1, 2, 12, 5, 50, -0.5 etc...).
If you want the axis limits to be set so that they're "tight" around the plot (i.e. the min and max of the data) use ax.axis('tight') (or equivalently, plt.axis('tight') which will use the current axis).
Another very useful method is plt.margins(...)/ax.margins(). It will act similar to axis('tight'), but will leave a bit of padding around the limits.
As an example of your problem:
import numpy as np
import matplotlib.pyplot as plt
# Make some data...
age_list = range(10,31)
total = np.random.random(len(age_list))
ind = np.arange(len(age_list))
plt.barh(ind, total)
# Set the y-ticks centered on each bar
# The default height (thickness) of each bar is 0.8
# Therefore, adding 0.4 to the tick positions will
# center the ticks on the bars...
plt.yticks(ind + 0.4, age_list)
plt.show()
If I wanted the limits to be tighter, I could call plt.axis('tight') after the call to plt.barh, which would give:
However, you might not want things to be too tight, so you could use plt.margins(0.02) to add 2% padding in all directions. You can then set the left-hand limit back to 0 with plt.xlim(xmin=0):
import numpy as np
import matplotlib.pyplot as plt
# Make some data...
age_list = range(10,31)
total = np.random.random(len(age_list))
ind = np.arange(len(age_list))
height = 0.8
plt.barh(ind, total, height=height)
plt.yticks(ind + height / 2.0, age_list)
plt.margins(0.05)
plt.xlim(xmin=0)
plt.show()
Which produces a bit nicer of a plot:
Hopefully that points you in the right direction, at any rate!

Categories

Resources