Related
I am a newbie to Python but slowly getting there. I am having a problem trying to increase the number of grid lines on a graph. Basically, the Graph is labelled 0-24 (Hours) but the x axis only generates a label every 5 hours (O,5,10,15,20) with a grid line at each of those majors. Ideally, I would like a grid line every hour as I am collecting real time data.
Most of this code has been lifted from various sources, but the one thing that has stumped me is how to configure the grid..
Edit - As requested my simplified code is below..
import numpy as np
import matplotlib.pyplot as plt
import time
timedata=[0.01,1.1,2.2,3.3,4.4,5.55,6.6,7.7,8.8,9.1,10.2,11.2,12.2,13.2,14.1,15.2,16.1,17.2,18.1,19.2,20.1,21.1,22.2,23.1]
#timedata is in decimal hours
bxdata=[10,10,20,20,20,30,30,30,40,40,40,30,30,30,20,20,30,30,20,20,40,50,30,24]
bydata=[20,10,20,30,20,30,30,30,5,40,40,30,5,30,20,20,30,35,20,20,5,50,30,24]
#draw the graph
fig, ax = plt.subplots(sharex=True, figsize=(12, 6))
x=np.arange(0,24,1)
ax.plot(timedata,bxdata, color='red', label='Bx',lw=1)
ax.plot (timedata, bydata, color='blue', label = 'By',lw=1)
ax.set_xlim(0,24)
ax.set_ylim(-250,250)
plt.ion()
plt.xlabel("Time (Hours)")
plt.ylabel("nT")
plt.grid(True, which='both')
plt.legend()
plt.show()
image = "test.png"
time.sleep(2)
plt.savefig(image)
plt.close('all')
and this is the graph that I get.
The idea is to associate a locator to the minor x-axis ticks, the locator you need is MultipleLocator and we use it also to fix the major ticks' spacing (for hours, 6 is better than 5, isn't it?)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
y = np.random.rand(25)
plt.plot(y)
plt.gca().xaxis.set_major_locator(MultipleLocator(6))
plt.gca().xaxis.set_minor_locator(MultipleLocator(1))
plt.grid()
plt.grid(True, 'minor', color='#ddddee') # use a lighter color
plt.show()
If you set the x-axis spacing to any desired interval, the grid will automatically be drawn in conjunction with it. There is a mixture of object-oriented and plot formats, so the object format is used for unification.
import numpy as np
import matplotlib.pyplot as plt
import time
timedata=[0.01,1.1,2.2,3.3,4.4,5.55,6.6,7.7,8.8,9.1,10.2,11.2,12.2,13.2,14.1,15.2,16.1,17.2,18.1,19.2,20.1,21.1,22.2,23.1]
#timedata is in decimal hours
bxdata=[10,10,20,20,20,30,30,30,40,40,40,30,30,30,20,20,30,30,20,20,40,50,30,24]
bydata=[20,10,20,30,20,30,30,30,5,40,40,30,5,30,20,20,30,35,20,20,5,50,30,24]
#draw the graph
fig, ax = plt.subplots(sharex=True, figsize=(12, 6))
x=np.arange(0,24,1)
ax.plot(timedata,bxdata, color='red', label='Bx',lw=1)
ax.plot(timedata, bydata, color='blue', label='By',lw=1)
ax.set_xlim(0,24)
ax.set_ylim(-250,250)
# plt.ion()
ax.set_xticks(np.arange(0,24,1))
ax.set_xlabel("Time (Hours)")
ax.set_ylabel("nT")
ax.grid(True, which='both')
ax.legend()
# image = "test.png"
# time.sleep(2)
# plt.savefig(image)
# plt.close('all')
plt.show()
I'm learning Python using Jupiter and I'm struggling trying to put the graphs into one figure. Here's what I have so far...
Code for my graphs(I have three of graphs, they only differ in color and lines vs. dot):
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
one = plt.figure()
plt.plot(x_v, y_v, '#008000') #change color using hex strings
plt.xlabel('x')
plt.ylabel('y')
plt.show()
two = plt.figure()
plt.plot(x_v, y_v, linestyle='none', marker='o', markersize=0.5)
plt.show()
three = plt.figure()
plt.plot(x_v, y_v, linestyle='none', marker='o', markersize=0.5, color = 'yellow')
plt.show()
Here's code that I have so far to make it one figure... I was wondering If I should should put it in a np.arange and plot it, but I can't seem to get it to work....
def f(x):
return one
def g(x):
return two
def h(x):
return three
If anyone can help, it'll be of great use! Thank you!
You can use plt.subplots:
fig, (ax1, ax2, ax3) = plt.subplots(figsize=(15, 5), ncols=3)
ax1.plot(x_v, y_v, '#008000')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax2.plot(x_v, y_v, linestyle='none', marker='o', markersize=0.5)
ax3.plot(x_v, y_v, linestyle='none', marker='o', markersize=0.5, color = 'yellow')
Here is one way to approach multiple plots with plt.subplots. I think it is very easy to follow and also gives a lot of control over individual plots:
import numpy as np
import matplotlib.pyplot as plt
#generating test data
x = np.arange(0,9)
y = np.arange(1,10)
#defining figure layout (i.e. rows, columns, size, horizontal and vertical space between subplots
fig,ax = plt.subplots(nrows=2,ncols=2,figsize=(15,7))
plt.subplots_adjust(hspace=0.4,wspace=0.2)
#first subplot (numbering can be read as 1st plot in a grid of 2x2)
plt.subplot(2,2,1)
plt.plot(x,y)
#second subplot in a grid of 2x2
plt.subplot(2,2,2)
plt.plot(x,y,ls='--')
#third subplot in a grid of 2x2
plt.subplot(2,2,3)
plt.scatter(x,y)
#fourth subplot in a grid of 2x2
plt.subplot(2,2,4)
plt.plot(x,y)
plt.tight_layout()
plt.show()
Output:
I want to use a scatter plot to describe the relationship between X, Y and Z. Z is p-value so it is better to denote it as log values.
Following the instructions here, I can plot a logarithmic scatter plot, but the color bar seems wrong. The color bar is almost totally blue, but there should be some red! Below is the figure and my codes.
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import LogNorm
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
ax1.set_title("P-value")
Z1 = pos_spearmanr['pval']
X = pos_spearmanr['X']
Y = pos_spearmanr['Y']
im = ax1.scatter(X,
Y,
edgecolors=None,
c=Z1,
norm=LogNorm(),
cmap=plt.get_cmap('bwr'), alpha=0.2)
ax1.set_xlabel('X')
ax1.set_ylabel('Y')
ax1.set_xlim(0, 1)
ax1.set_ylim(0, 1)
cbar = fig.colorbar(im,ax=ax1)
I would like to plot say two values x = [0, 10,20,50,100] and y=[1,2,3,10,100] using pylab. I want to keep the spacing of x-axis in log form. But I want to tick at the values of x i'e at 10, 20, 50, 100 and print them as it not in the form of 10e1 or 10e2. I am doing it as follows:
import matplotlib.pylab as plt
plt.xscale('log')
plt.plot(x, y)
plt.xticks(x)
plt.grid()
But it keeps the values in the form of 10e1, 10e2.
Could you please help me out?
I think what you want is to change the major_formatter of the x axis?
import matplotlib.pylab as plt
import numpy as np
from matplotlib.ticker import ScalarFormatter
x = [0, 10,20,50,100]
y=[1,2,3,10,100]
plt.plot(x, y)
plt.xscale('log')
plt.grid()
ax = plt.gca()
ax.set_xticks(x[1:]) # note that with a log axis, you can't have x = 0 so that value isn't plotted.
ax.xaxis.set_major_formatter(ScalarFormatter())
plt.show()
The following
import matplotlib.pyplot as plt
x = [0,10,20,50,100]
y = [1,2,3,10,100]
f,ax = plt.subplots()
ax.plot(x,y)
ax.set_xscale('log')
ax.set_xticks(x)
ax.set_xticklabels(x)
ax.set_xlim([0,100])
will produce
I created a histogram plot using data from a file and no problem. Now I wanted to superpose data from another file in the same histogram, so I do something like this
n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)
but the problem is that for each interval, only the bar with the highest value appears, and the other is hidden. I wonder how could I plot both histograms at the same time with different colors.
Here you have a working example:
import random
import numpy
from matplotlib import pyplot
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
bins = numpy.linspace(-10, 10, 100)
pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()
The accepted answers gives the code for a histogram with overlapping bars, but in case you want each bar to be side-by-side (as I did), try the variation below:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-deep')
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
Reference: http://matplotlib.org/examples/statistics/histogram_demo_multihist.html
EDIT [2018/03/16]: Updated to allow plotting of arrays of different sizes, as suggested by #stochastic_zeitgeist
In the case you have different sample sizes, it may be difficult to compare the distributions with a single y-axis. For example:
import numpy as np
import matplotlib.pyplot as plt
#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']
#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()
In this case, you can plot your two data sets on different axes. To do so, you can get your histogram data using matplotlib, clear the axis, and then re-plot it on two separate axes (shifting the bin edges so that they don't overlap):
#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis
#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])
#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()
As a completion to Gustavo Bezerra's answer:
If you want each histogram to be normalized (normed for mpl<=2.1 and density for mpl>=3.1) you cannot just use normed/density=True, you need to set the weights for each value instead:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
As a comparison, the exact same x and y vectors with default weights and density=True:
You should use bins from the values returned by hist:
import numpy as np
import matplotlib.pyplot as plt
foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution
_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)
Here is a simple method to plot two histograms, with their bars side-by-side, on the same plot when the data has different sizes:
def plotHistogram(p, o):
"""
p and o are iterables with the values you want to
plot the histogram of
"""
plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
plt.show()
Plotting two overlapping histograms (or more) can lead to a rather cluttered plot. I find that using step histograms (aka hollow histograms) improves the readability quite a bit. The only downside is that in matplotlib the default legend for a step histogram is not properly formatted, so it can be edited like in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
from matplotlib.lines import Line2D
rng = np.random.default_rng(seed=123)
# Create two normally distributed random variables of different sizes
# and with different shapes
data1 = rng.normal(loc=30, scale=10, size=500)
data2 = rng.normal(loc=50, scale=10, size=1000)
# Create figure with 'step' type of histogram to improve plot readability
fig, ax = plt.subplots(figsize=(9,5))
ax.hist([data1, data2], bins=15, histtype='step', linewidth=2,
alpha=0.7, label=['data1','data2'])
# Edit legend to get lines as legend keys instead of the default polygons
# and sort the legend entries in alphanumeric order
handles, labels = ax.get_legend_handles_labels()
leg_entries = {}
for h, label in zip(handles, labels):
leg_entries[label] = Line2D([0], [0], color=h.get_facecolor()[:-1],
alpha=h.get_alpha(), lw=h.get_linewidth())
labels_sorted, lines = zip(*sorted(leg_entries.items()))
ax.legend(lines, labels_sorted, frameon=False)
# Remove spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Add annotations
plt.ylabel('Frequency', labelpad=15)
plt.title('Matplotlib step histogram', fontsize=14, pad=20)
plt.show()
As you can see, the result looks quite clean. This is especially useful when overlapping even more than two histograms. Depending on how the variables are distributed, this can work for up to around 5 overlapping distributions. More than that would require the use of another type of plot, such as one of those presented here.
It sounds like you might want just a bar graph:
http://matplotlib.sourceforge.net/examples/pylab_examples/bar_stacked.html
http://matplotlib.sourceforge.net/examples/pylab_examples/barchart_demo.html
Alternatively, you can use subplots.
There is one caveat when you want to plot the histogram from a 2-d numpy array. You need to swap the 2 axes.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()
Also an option which is quite similar to joaquin answer:
import random
from matplotlib import pyplot
#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()
Gives the following output:
Just in case you have pandas (import pandas as pd) or are ok with using it:
test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)],
[random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()
This question has been answered before, but wanted to add another quick/easy workaround that might help other visitors to this question.
import seasborn as sns
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)
Some helpful examples are here for kde vs histogram comparison.
Inspired by Solomon's answer, but to stick with the question, which is related to histogram, a clean solution is:
sns.distplot(bar)
sns.distplot(foo)
plt.show()
Make sure to plot the taller one first, otherwise you would need to set plt.ylim(0,0.45) so that the taller histogram is not chopped off.