How to plot an histogram with matplotlib using python [duplicate] - python

This question already has answers here:
Histogram Matplotlib
(7 answers)
Closed 7 years ago.
I have 2 lists:
X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Y = [0.5717, 0.699, 0.7243, 0.5939, 0.5383, 0.5093, 0.7001, 0.589, 0.6486, 0.7152, 0.6805, 0.5688, 0.6133, 0.6041, 0.5676].
plt.xlabel('X')
plt.ylabel('Y'))
plt.title("Histogram")
xbins = [x for x in range(len(Xaxis))]
numBins = len(Xaxis)
plt.hist(Xaxis,xbins ,color='green',alpha=0.6)
plt.show()
plt.close()
When I am doing like this i am not getting correctly.so if i want to plot an histogram using this data. How can I do that using python programming?

I'm not sure if I understand your question, but I'll give it a shot:
import matplotlib.pyplot as plt
X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Y = [0.5717, 0.699, 0.7243, 0.5939, 0.5383, 0.5093, 0.7001, 0.589, 0.6486, 0.7152, 0.6805, 0.5688, 0.6133, 0.6041, 0.5676]
plt.bar(X, Y, color='green', alpha=0.6, align='center')
plt.xlabel('X')
plt.ylabel('Y')
plt.title("Histogram")
plt.show()
Is that the plot you are looking for? If not, please provide more details.

Related

Matplotlib not displaying text

I am trying to add text to a chart in matplotlib in Python.
It does not show up - without any errors.
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot()
dtFmt = mdates.DateFormatter('%d.%m - %H:%M')
plt.gca().xaxis.set_major_formatter(dtFmt)
plt.grid()
ax.set_ylim(bottom=0, top=df.threadswaiting.max()+10)
ax.plot(df.threadsrunning, label='Running Threads', color='blue')
ax.plot(df.threadswaiting, label='Waiting Threads', color='red')
ax.set_title(f'Thread load\nLast {offset}\ncreated: {timestamp}')
ax.set_ylabel('Threads')
ax.set_xlabel('Time')
# Definition of teststring, which is not showing up:
ax.text(1, 1, 'Teststring')
plt.legend()
plt.show()
Anyone to help?
Thank You!
The dataframe 'df' consists of data in this form:
timestamp,memtotal,memfree,memused,mempercentage,threadsmax,threadsrunning,threadswaiting
2023-02-06T10:34:03.691266, 17179.869184, 13149.334632, 3950.842776, 23.0, 5, 7, 38
2023-02-06T10:34:34.450291, 17179.869184, 15950.03556, 2583.789464, 15.04, 5, 4, 0
2023-02-06T10:35:04.722786, 17179.869184, 13645.117544, 3530.557336, 20.55, 5, 9, 43
2023-02-06T10:35:35.068253, 17179.869184, 12564.56868, 4615.300504, 26.86, 5, 3, 0
2023-02-06T10:36:05.355758, 17179.869184, 12443.191912, 4732.482968, 27.55, 5, 6, 41
2023-02-06T10:36:35.638119, 17179.869184, 14418.945808, 2664.454384, 15.51, 5, 5, 38
2023-02-06T10:37:05.915987, 17179.869184, 10899.06084, 6217.893784, 36.19, 5, 5, 64
2023-02-06T10:37:36.195419, 17179.869184, 14730.19836, 2252.538536, 13.11, 5, 5, 63
2023-02-06T10:38:06.476530, 17179.869184, 13079.04248, 3819.808336, 22.23, 5, 8, 65
2023-02-06T10:38:36.753379, 17179.869184, 13479.695576, 3700.173608, 21.54, 5, 3, 0
2023-02-06T10:39:07.034731, 17179.869184, 12682.653384, 4484.632888, 26.1, 5, 8, 35
2023-02-06T10:39:37.326827, 17179.869184, 14345.964728, 2829.710152, 16.47, 5, 2, 0
2023-02-06T10:40:07.617135, 17179.869184, 11636.444344, 5535.036232, 32.22, 5, 6, 47
2023-02-06T10:40:37.912047, 17179.869184, 14231.936984, 2947.9322, 17.16, 5, 3, 0
2023-02-06T10:41:08.192663, 17179.869184, 9836.306392, 7339.368488, 42.72, 5, 4, 0
2023-02-06T10:41:38.470597, 17179.869184, 13915.799128, 3264.070056, 19.0, 5, 2, 0
It works now.
I still don't unterstand why though.
ax.text(0.1, 0.876,
f'Running Mean: {round(df.threadsrunning.mean(), 2)}\n'
f'Waiting Mean: {round(df.threadswaiting.mean(), 2)}\n'
f'Running Max: {round(df.threadsrunning.max(), 2)}\n'
f'Waiting Max: {round(df.threadswaiting.max(), 2)}',
bbox=dict(facecolor='white', alpha=0.5), fontsize=12, transform=ax.transAxes)

What is plotted when string data is passed to the matplotlib API?

# first, some imports:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Let's say I want to make a scatter plot, using this data:
np.random.seed(42)
x=np.arange(0,50)
y=np.random.normal(loc=3000,scale=1,size=50)
Plot via:
plt.scatter(x,y)
I get this answer:
Ok, let's create a dataframe first:
df=pd.DataFrame.from_dict({'x':x,'y':y.astype(str)})
(I am aware that I am storing y as str - this is a reproducible example, and I do this to reflect the real use case.)
Then, if I do:
plt.scatter(df.x,df.y)
I get:
What am I seeing in this second plot? I thought that the second plot must be showing the x column plotted against the y column, which are converted to float. This is clearly not the case.
Matplotlib doesn't automatically convert str values to numerical, so your y values are treated as categorical. As far as Matplotlib is concerned, the differences '1.0' to '0.9' and '1.0' to '100.0' are not different.
So, the y-axis on the plot will be the same as range(len(y)) (since the difference between all categorical values is the same) with labels assigned from the categorical values.
Since your x is a range equal to range(50), and now your y is a range too (also equal to range(50)), it plots x = y, with y-labels set to respective str value.
As per the excellent answer by dm2, when you pass y as a string, y is simply being treated as arbitrary string labels, and being plotted one after the other in the order in which they appear. To demonstrate, here's an even simpler example.
from matplotlib import pyplot as plt
x = [1, 2, 3, 4]
y = [5, 25, 10, 1] # these are ints
plt.scatter(x, y)
So far so good. Now, different string y values.
y = list("abcd")
plt.scatter(x, y)
You can see how it just takes the y labels and just drops them on the axis one after another.
Finally,
y = ["5", "25", "10", "1"]
plt.scatter(x, y)
Compare this with the previous results and now it should become obvious what's going on.
It's more obvious if the labels and locations are extracted, that the API plots the strings as labels, and the axis locations are 0 indexed numbers based on the how many (len) categories exist.
.get_xticks() and .get_yticks() extract a list of the numeric locations.
.get_xticklabels() and .get_yticklabels() extract a list of matplotlib.text.Text, Text(x, y, text).
There are fewer numbers in the list for the y axis because there were duplicate values as a result of rounding.
This applies to any APIs, like seaborn or pandas that use matplotlib as the backend.
sns.scatterplot(data=df, x='x_num', y='y', ax=ax1)
ax1.scatter(data=df, x='x_num', y='y')
ax1.plot('x_num', 'y', 'o', data=df)
Labels, Locs, and Text
print(x_nums_loc)
print(y_nums_loc)
print(x_lets_loc)
print(y_lets_loc)
print(x_lets_labels)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
[Text(0, 0, 'A'), Text(1, 0, 'B'), Text(2, 0, 'C'), Text(3, 0, 'D'), Text(4, 0, 'E'),
Text(5, 0, 'F'), Text(6, 0, 'G'), Text(7, 0, 'H'), Text(8, 0, 'I'), Text(9, 0, 'J'),
Text(10, 0, 'K'), Text(11, 0, 'L'), Text(12, 0, 'M'), Text(13, 0, 'N'), Text(14, 0, 'O'),
Text(15, 0, 'P'), Text(16, 0, 'Q'), Text(17, 0, 'R'), Text(18, 0, 'S'), Text(19, 0, 'T'),
Text(20, 0, 'U'), Text(21, 0, 'V'), Text(22, 0, 'W'), Text(23, 0, 'X'), Text(24, 0, 'Y'),
Text(25, 0, 'Z')]
Imports, Data, and Plotting
import numpy as np
import string
import pandas as pd
import matplotlib.pyplot as plt
import string
# sample data
np.random.seed(45)
x_numbers = np.arange(100, 126)
x_letters = list(string.ascii_uppercase)
y= np.random.normal(loc=3000, scale=1, size=26).round(2)
df = pd.DataFrame.from_dict({'x_num': x_numbers, 'x_let': x_letters, 'y': y}).astype(str)
# plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 3.5))
df.plot(kind='scatter', x='x_num', y='y', ax=ax1, title='X Numbers', rot=90)
df.plot(kind='scatter', x='x_let', y='y', ax=ax2, title='X Letters')
x_nums_loc = ax1.get_xticks()
y_nums_loc = ax1.get_yticks()
x_lets_loc = ax2.get_xticks()
y_lets_loc = ax2.get_yticks()
x_lets_labels = ax2.get_xticklabels()
fig.tight_layout()
plt.show()

Bar graph df.plot() vs ax.bar() structure matplotlib

I am trying to graph a table as a bar graph.
I get my desired outcome using df.plot(kind='bar') structure. But for certain reasons, I now need to graph it using the ax.bar() structure.
Please refer to the example screenshot. I would like to graph the x axis as categorical labels like the df.plot(kind='bar') structure rather than continuous scale, but need to learn to use ax.bar() structure to do the same.
Make the index categorical by setting the type to 'str'
import pandas as pd
import matplotlib.pyplot as plt
data = {'SA': [11, 12, 13, 16, 17, 159, 209, 216],
'ET': [36, 45, 11, 15, 16, 4, 11, 10],
'UT': [11, 26, 10, 11, 16, 7, 2, 2],
'CT': [5, 0.3, 9, 5, 0.2, 0.2, 3, 4]}
df = pd.DataFrame(data)
df['SA'] = df['SA'].astype('str')
df.set_index('SA', inplace=True)
width = 3
fig, ax = plt.subplots(figsize=(12, 8))
p1 = ax.bar(df.index, df.ET, color='b', label='ET')
p2 = ax.bar(df.index, df.UT, bottom=df.ET, color='g', label='UT')
p3 = ax.bar(df.index, df.CT, bottom=df.ET+df.UT, color='r', label='CT')
plt.legend()
plt.show()

Attempting to make a multi-column graph

I am trying to make a column graph where the y-axis is the mean grain size, the x-axis is the distance along the transect, and each series is a date and/or number value (it doesn't really matter).
I have been trying a few different methods in Excel 2010 but I cannot figure it out. My hope is that, lets say at the first location, 9, there will be three columns and then at 12 there will be two columns. If it matter at all, lets say the total distance is 50. The result of this data should have 7 sets of columns along the transect/x-axis.
I have tried to do this using python but my coding knowledge is close to nil. Here is my code so far:
import numpy as np
import matplotlib.pyplot as plt
grainsize = [0.7912, 0.513, 0.4644, 1.0852, 1.8515, 1.812, 6.371, 1.602, 1.0251, 5.6884, 0.4166, 24.8669, 0.5223, 37.387, 0.5159, 0.6727]
series = [2, 3, 4, 1, 4, 2, 3, 4, 1, 4, 1, 4, 1, 4, 1, 4]
distance = [9, 9, 9, 12, 12, 15, 15, 15, 17, 17, 25, 25, 32.5, 32.5, 39.5, 39.5]
If someone happen to know of a code to use, it would be very helpful. A recommendation for how to do this in Excel would be awesome too.
There's a plotting library called seaborn, built on top of matplotlib, that does this in one line. Your example:
import numpy as np
import seaborn as sns
from matplotlib.pyplot import show
grainsize = [0.7912, 0.513, 0.4644, 1.0852, 1.8515, 1.812, 6.371,
1.602, 1.0251, 5.6884, 0.4166, 24.8669, 0.5223, 37.387, 0.5159, 0.6727]
series = [2, 3, 4, 1, 4, 2, 3, 4, 1, 4, 1, 4, 1, 4, 1, 4]
distance = [9, 9, 9, 12, 12, 15, 15, 15, 17, 17, 25, 25, 32.5, 32.5, 39.5, 39.5]
ax = sns.barplot(x=distance, y=grainsize, hue=series, palette='muted')
ax.set_xlabel('distance')
ax.set_ylabel('grainsize')
show()
You will be able to do a lot even as a total newbie by editing the many examples in the seaborn gallery. Use them as training wheels: edit only one thing at a time and think about what changes.

How to place lines below markers in Python?

I have to plot multiple lines and their curve fit lines on a single plot. All these lines are plotted using a for loop. Since it is plot using loops the curve fit lines of the succeeding step is plotted over its predecessor as shown in figure.
The reproducible code:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
y = np.array([[4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[6, 5.2, 8.5, 9.1, 13.4, 15.1, 16.1, 18.3, 20.4, 22.1, 23.7]])
m, n = x.shape
figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
for i in range(m):
poly = np.polyfit(x[i, :], y[i, :], deg =1)
plt.plot(poly[0] * x[i, :] + poly[1], linestyle = '-')
plt.plot(x[i, :], y[i, :], linestyle = '', marker = 'o', markersize = 20)
plot.set_ylabel('Y', labelpad = 6)
plot.set_xlabel('X', labelpad = 6)
plt.show()
I can fix this using another loop as:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
y = np.array([[4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[6, 5.2, 8.5, 9.1, 13.4, 15.1, 16.1, 18.3, 20.4, 22.1, 23.7]])
m, n = x.shape
figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
for i in range(m):
poly = np.polyfit(x[i, :], y[i, :], deg =1)
plt.plot(poly[0] * x[i, :] + poly[1], linestyle = '-')
for i in range(m):
plt.plot(x[i, :], y[i, :], linestyle = '', marker = 'o', markersize = 20)
plot.set_ylabel('Y', labelpad = 6)
plot.set_xlabel('X', labelpad = 6)
plt.show()
which gives me all the fit lines below the markers.
But is there any built-in function in Python/matplotlib to do this without using two loops?
Update
Only as an example I have used n = 2, n can be greater than 2, i.e. the loop would be run multiple times.
Update 2 after answer
Can I do this for the same line also? As an example:
plt.plot(x[i, :], y[i, :], linestyle = ':', marker = 'o', markersize = 20)
Can I give the linestyle a zorder = 1 and the markers a zorder = 3?
Editing just your plotting lines:
plt.plot(poly[0] * x[i, :] + poly[1], linestyle = '-',
zorder=-1)
plt.plot(x[i, :], y[i, :], linestyle = '', marker = 'o', markersize = 20,
zorder=3)
now the markers are all in front of the lines, though within marker/line groups they're still order-of-plotting.
Update answer
No. One call to plot, one zorder argument.
If you want to match the color and style of markers and line in each pass through the loop, set up an iterator or generator for colors and get current_color on each pass, then use that as an argument for plot calls.

Categories

Resources