I want to make a bar chart that has binned values every 10 x values. Here is my bins array:bins = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100] I do not want to use a histogram because I have specific y values that I want to use, not just the frequency of the binned values. I have a pandas dataframe with two columns: yardline_100 (these are the values that are becoming "binned", they always fall between 0 and 100) and epa. I want to have my yardline_100 on the x and epa on the y. How do I do this? plt.hist() only takes one argument for data. And I can't figure out how to make plt.bar() work with binned values. Advice?
IIUC, do you want something like this?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'yardline_100':np.random.randint(0,100,200), 'epa':np.random.random(200)})
df['bin'] = pd.cut(df['yardline_100'], bins=range(0,101,10), labels=[f'{l}-{l+10}' for l in range(0,91,10)])
fig,ax = plt.subplots(2,2, figsize=(15,8))
ax=ax.flatten()
sns.stripplot(x='bin', y='epa', data=df, ax=ax[0])
sns.violinplot(x='bin', y='epa', data=df, ax=ax[1])
sns.boxplot(x='bin', y='epa', data=df, ax=ax[2])
sns.barplot(x='bin', y='epa', data=df, ax=ax[3])
Output:
Change bar width
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'yardline_100':np.random.randint(0,100,200), 'epa':np.random.random(200)})
df['bin'] = pd.cut(df['yardline_100'], bins=range(0,101,10), labels=[f'{l}-{l+10}' for l in range(0,91,10)])
fig,ax = plt.subplots(figsize=(15,8))
sns.barplot(x='bin', y='epa', data=df, ax=ax)
def change_width(ax, new_value) :
for patch in ax.patches :
current_width = patch.get_width()
diff = current_width - new_value
# we change the bar width
patch.set_width(new_value)
# we recenter the bar
patch.set_x(patch.get_x() + diff * .5)
change_width(ax, 1.)
Output:
Related
I want to create a heatmap out of 3 1dimensional arrays. Something that looks like this:
Up to this point, I was only able to create a scatter plot where the markers have a different color and marker size depending on the third value:
My code:
xf = np.random.rand(1000)
yf = np.random.rand(1000)
zf = 1e5*np.random.rand(1000)
ms1 = (zf).astype('int')
from matplotlib.colors import LinearSegmentedColormap
# Remove the middle 40% of the RdBu_r colormap
interval = np.hstack([np.linspace(0, 0.4), np.linspace(0.6, 1)])
colors = plt.cm.RdBu_r(interval)
cmap = LinearSegmentedColormap.from_list('name', colors)
col = cmap(np.linspace(0,1,len(ms1)))
#for i in range(len(ms1)):
plt.scatter(xf, yf, c=zf, s=5*ms1/1e4, cmap=cmap,alpha=0.8)#, norm =matplotlib.colors.LogNorm())
ax1 =plt.colorbar(pad=0.01)
is giving me this result:
Any idea how I could make it look like the first figure?
Essentially what I want to do is find the average of the z value for groups of the x and y arrays
I think the functionality you are looking for is provided by scipy.stats.binned_statistic_2d. You can use it to organize values of xf and yf arrays into 2-dimensional bins, and compute the mean of zf values in each bin:
import numpy as np
from scipy import stats
np.random.seed(0)
xf = np.random.rand(1000)
yf = np.random.rand(1000)
zf = 1e5 * np.random.rand(1000)
means = stats.binned_statistic_2d(xf,
yf,
values=zf,
statistic='mean',
bins=(5, 5))[0]
Then you can use e.g. seaborn to plot a heatmap of the array of mean values:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(means,
cmap="Reds_r",
annot=True,
annot_kws={"fontsize": 16},
cbar=True,
linewidth=2,
square=True)
plt.show()
This gives:
This question already has answers here:
How to plot in multiple subplots
(12 answers)
Closed 1 year ago.
I want to arrange 5 histograms in a grid. Here is my code and the result:
I was able to create the graphs but the difficulty comes by arranging them in a grid. I used the grid function to achieve that but i need to link the graphs to it in the respective places.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
Openness = df['O']
Conscientiousness = df['C']
Extraversion = df['E']
Areeableness = df['A']
Neurocitism = df['N']
grid = plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)
# Plot 1
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['O'], bins = 100)
plt.title("Openness to experience")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 2
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['C'], bins = 100)
plt.title("Conscientiousness")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 3
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['E'], bins = 100)
plt.title("Extraversion")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 4
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['A'], bins = 100)
plt.title("Areeableness")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 5
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['N'], bins = 100)
plt.title("Neurocitism")
plt.xlabel("Value")
plt.ylabel("Frequency")
Results merge everything into one chart
But it should look like this
Could you guys please help me out?
You can use plt.subplots:
fig, axes = plt.subplots(nrows=2, ncols=2)
this creates a 2x2 grid. You can access individual positions by indexing hte axes object:
top left:
ax = axes[0,0]
ax.hist(df['C'], bins = 100)
ax.set_title("Conscientiousness")
ax.set_xlabel("Value")
ax.set_ylabel("Frequency")
and so on.
You also continue use GridSpec. Visit https://matplotlib.org/stable/tutorials/intermediate/gridspec.html
for example -
fig2 = plt.figure(constrained_layout=True)
spec2 = gridspec.GridSpec(ncols=2, nrows=3, figure=fig2)
f2_ax1 = fig2.add_subplot(spec2[0, 0])
f2_ax2 = fig2.add_subplot(spec2[0, 1])
f2_ax3 = fig2.add_subplot(spec2[1, 0])
f2_ax4 = fig2.add_subplot(spec2[1, 1])
f2_ax5 = fig2.add_subplot(spec2[2, 1])
# Plot 1
f2_ax1.hist(df['O'])
f2_ax1.set_title("Openness to experience")
f2_ax1.set_xlabel("Value")
f2_ax1.set_ylabel("Frequency")
` plt.show()
I have two values:
test1 = 0.75565
test2 = 0.77615
I am trying to plot a bar chart (using matlplotlib in jupyter notebook) with the x-axis as the the two test values and the y-axis as the resulting values but I keep getting a crazy plot with just one big box
here is the code I've tried:
plt.bar(test1, 1, width = 2, label = 'test1')
plt.bar(test2, 1, width = 2, label = 'test2')
As you can see in this example, you should define X and Y in two separated arrays, so you can do it like this :
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(2)
y = [0.75565,0.77615]
fig, ax = plt.subplots()
plt.bar(x, y)
# set your labels for the x axis here :
plt.xticks(x, ('test1', 'test2'))
plt.show()
the final plot would be like :
UPDATE
If you want to draw each bar with a different color, you should call the bar method multiple times and give it colors to draw, although it has default colors :
import matplotlib.pyplot as plt
import numpy as np
number_of_points = 2
x = np.arange(number_of_points)
y = [0.75565,0.77615]
fig, ax = plt.subplots()
for i in range(number_of_points):
plt.bar(x[i], y[i])
# set your labels for the x axis here :
plt.xticks(x, ('test1', 'test2'))
plt.show()
or you can do it even more better and choose the colors yourself :
import matplotlib.pyplot as plt
import numpy as np
number_of_points = 2
x = np.arange(number_of_points)
y = [0.75565,0.77615]
# choosing the colors and keeping them in a list
colors = ['g','b']
fig, ax = plt.subplots()
for i in range(number_of_points):
plt.bar(x[i], y[i],color = colors[i])
# set your labels for the x axis here :
plt.xticks(x, ('test1', 'test2'))
plt.show()
The main reason your plot is showing one large value is because you are setting a width for the columns that is greater than the distance between the explicit x values that you have set. Reduce the width to see the individual columns. The only advantage to doing it this way is if you need to set the x values (and y values) explicitly for some reason on a bar chart. Otherwise, the other answer is what you need for a "traditional bar chart".
import matplotlib.pyplot as plt
test1 = 0.75565
test2 = 0.77615
plt.bar(test1, 1, width = 0.01, label = 'test1')
plt.bar(test2, 1, width = 0.01, label = 'test2')
Plotting scatters I am using below:
import matplotlib.pyplot as plt
import pandas as pd
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
dates = ['2015-12-20','2015-09-12','2015-08-12','2015-06-12']
PM_25 = [68, 66, 55, 46]
dates = [pd.to_datetime(d) for d in dates]
plt.scatter(dates, PM_25, s =50, c = 'red')
plt.show()
For each of the scatters, I want to add data label 'date' to it. So I made these changes:
fig, ax = plt.subplots()
ax.scatter(dates, PM_25)
for i, txt in enumerate(dates):
ax.annotate(txt, i)
It doesn't work.
What's the right way to label them? Thank you.
You need both x and y when you annotate.
for i, txt in enumerate(dates):
ax.annotate(txt, (dates[i],PM_25[i]))
I need to display values of my matrix using matshow.
However, with the code I have now I just get two matrices - one with values and other colored.
How do I impose them? Thanks :)
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
min_val, max_val = 0, 15
for i in xrange(15):
for j in xrange(15):
c = intersection_matrix[i][j]
ax.text(i+0.5, j+0.5, str(c), va='center', ha='center')
plt.matshow(intersection_matrix, cmap=plt.cm.Blues)
ax.set_xlim(min_val, max_val)
ax.set_ylim(min_val, max_val)
ax.set_xticks(np.arange(max_val))
ax.set_yticks(np.arange(max_val))
ax.grid()
Output:
You need to use ax.matshow not plt.matshow to make sure they both appear on the same axes.
If you do that, you also don't need to set the axes limits or ticks.
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
min_val, max_val = 0, 15
intersection_matrix = np.random.randint(0, 10, size=(max_val, max_val))
ax.matshow(intersection_matrix, cmap=plt.cm.Blues)
for i in xrange(15):
for j in xrange(15):
c = intersection_matrix[j,i]
ax.text(i, j, str(c), va='center', ha='center')
Here I have created some random data as I don't have your matrix. Note that I had to change the ordering of the index for the text label to [j,i] rather than [i][j] to align the labels correctly.
In Jupyter notebooks this is also possible with DataFrames and Seaborn:
import numpy as np
import seaborn as sns
import pandas as pd
min_val, max_val = 0, 15
intersection_matrix = np.random.randint(0, 10, size=(max_val, max_val))
cm = sns.light_palette("blue", as_cmap=True)
x=pd.DataFrame(intersection_matrix)
x=x.style.background_gradient(cmap=cm)
display(x)