I have 60 numbers divided into 8 intervals:
[[534, 540.0, 3], [540.0, 546.0, 3], [546.0, 552.0, 14], [552.0, 558.0, 8], [558.0, 564.0, 14], [564.0, 570.0, 9], [570.0, 576.0, 6], [576.0, 582.0, 3]]
The number of numbers in each interval is divided by 6:
[0.5, 0.5, 2.33, 1.33, 2.33, 1.5, 1.0, 0.5]
How do I create a histogram so that the height of the bars corresponds to the obtained values, while signing the intervals in accordance with my intervals? The result should be something like this
i do not have reputation to post images, so
Running F Blanchet's code generates the following graph in my IPython console:
That doesn't really look like your image. I think you're looking for something more like this, where the x-ticks are between the bars:
This is the code I used to generate the above plot:
import matplotlib.pyplot as plt
# Include one more value for final x-tick.
intervals = list(range(534, 583, 6))
# Include one more bar height that == 0.
bar_height = [0.5, 0.5, 2.33, 1.33, 2.33, 1.5, 1.0, 0.5, 0]
plt.bar(intervals,
bar_height,
width = [6] * 8 + [0], # Set width of 0 bar to 0.
align = "edge", # Align ticks at edge of bars.
tick_label = intervals) # Make tick labels explicit.
You can use matplotlib :
import matplotlib.pyplot as plt
data = [[534, 540.0, 3], [540.0, 546.0, 3], [546.0, 552.0, 14], [552.0, 558.0, 8], [558.0, 564.0, 14], [564.0, 570.0, 9], [570.0, 576.0, 6], [576.0, 582.0, 3]]
x = [element[0]+3 for element in data]
y = [element[2]/6 for element in data]
width = 6
plt.bar(x, y, width, color="blue")
plt.show()
More documentation here
Related
I have a dataframe that looks like this:
data = {'QA Score': [0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7],
'Scopus': [0,0,0,0,0,1,0,0,1,3,2,3,6,4,2],
'ResearchGate': [0,0,0,0,0,0,1,1,2,3,2,1,0,2,1],
'Taylor&Francis': [0,0,0,0,0,0,0,0,0,1,0,0,2,0,0],
'ACM': [0,0,0,0,1,1,3,3,0,4,2,5,0,0,0]
}
df = pd.DataFrame(data)
I would like to create a stacked bar plot with the numbers of the column 'QA score' as xticks.
Right now I have got this plot:
The x axis from 0 to 14 should be set to 0 to 7 with steps of 0.5 (as in column 'QA score')
I tried resetting the index with df.set_index(['QA Score']) and tried plt.xticks() but it all does not work.
Anyone has an idea how to do this? Thanks in advance!
What about:
df.set_index('QA Score').plot.bar(stacked=True)
output:
If we look at this code and x,y data,
rects1 = plt.bar([0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1],[1, 2, 4, 10, 5, 9, 1,4, 9, 9],edgecolor='black')
plt.xlabel('Sample Mean')
plt.ylabel('Probability')
this displays the following graph
I can not understand how the x values go beyond 1 and even takes negative values. Also, why do the bars have different widths?
The problem is that your x-values are separated by a spacing of 0.1 and the default bar width is 1 so you see overlapping bars. The solution is to define the bar width. In your case, a bar width smaller than 0.1 will work perfectly fine. For instance, you can use width=0.05 and you will get the following graph.
Why negative?: The bars are by default centered at 0, 1, 2, 3 and so on. So your first bar in the question was drawn centered at 0 and had a width of 1. That's why it was spanning from -0.5 to +0.5.
rects1 = plt.bar([0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1],
[1, 2, 4, 10, 5, 9, 1,4, 9, 9], width=0.05, edgecolor='black')
plt.xlabel('Sample Mean')
plt.ylabel('Probability')
If you don't want bars at x<0: You can align your bars to the right by passing argument align='edge.
rects1 = plt.bar([0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1],
[1, 2, 4, 10, 5, 9, 1,4, 9, 9], width=0.05, align='edge',
edgecolor='black')
Few data points have been obtained from an expt,but they are not in order ,so the lines between plots are not correct,
I need to plot them in say, increasing order in Xaxis
C=[0.5,4,2,1,3,8,6,10]
D=[20,2,2,10,0.3,2.5,0.8,1]
%matplotlib inline
import matplotlib.pyplot as plt
#plot obtained from given data points
plt.plot(C,D)
## required plot
A=[0.5, 1, 2, 3, 4, 6, 8, 10]
B=[20, 10, 2, 0.5, 2, 0.8, 2.5, 1]
plt.plot(A,B)
Solution using pandas. I recommend using DataFrames in future for plotting tasks.
from matplotlib import pyplot as plt
import pandas as pd
C= [0.5, 4, 2, 1, 3, 8, 6, 10]
D= [20, 2, 2, 10, 0.3, 2.5, 0.8, 1]
xy = pd.DataFrame({'x': C, 'y': D})
xy.sort_values('x', inplace=True)
plt.plot(xy['x'], xy['y'])
plt.show()
Your C is not sorted and hence by default the points which are joined by a continuous line seems like a mess in your output of plot(C,D). I personally would make use of the np.argsort function to get the sorted indices of C and use them to plot C and D as follows (showing only relevant lines added):
import numpy as np
C = np.array([0.5,4,2,1,3,8,6,10])
D = np.array([20,2,2,10,0.3,2.5,0.8,1])
plt.plot(sorted(C), D[np.argsort(C)], 'b')
Output
When plotting 2 columns from a dataframe into a line plot, is it possible to, instead of a consistently increasing scale, have fixed values on your y axis (and keep the distances between the numbers on the axis constant)? For example, instead of 0, 100, 200, 300, ... to have 0, 21, 53, 124, 287, depending on the values from your dataset? So basically to have on the axis all your possible values fixed instead of an increasing scale?
Yes, you can use: ax.set_yticks()
Example:
df = pd.DataFrame([[13, 1], [14, 1.5], [15, 1.8], [16, 2], [17, 2], [18, 3 ], [19, 3.6]], columns = ['A','B'])
fig, ax = plt.subplots()
x = df['A']
y = df['B']
ax.plot(x, y, 'g-')
ax.set_yticks(y)
plt.show()
Or if the values are very distant each other, you can use ax.set_yscale('log').
Example:
df = pd.DataFrame([[13, 1], [14, 1.5], [15, 1.8], [16, 2], [17, 2], [18, 3 ], [19, 3.6], [20, 300]], columns = ['A','B'])
fig, ax = plt.subplots()
x = df['A']
y = df['B']
ax.plot(x, y, 'g-')
ax.set_yscale('log', basex=2)
ax.yaxis.set_ticks(y)
ax.yaxis.set_ticklabels(y)
plt.show()
What you need to do is:
get all distinct y values and sort them
set their y position on the plot according to their place on the ordered list
set the y labels according to distinct ordered values
The code below would do
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame([[13, 1], [14, 1.8], [16, 2], [15, 1.5], [17, 2], [18, 3 ],
[19, 200],[20, 3.6], ], columns = ['A','B'])
x = df['A']
y = df['B']
y_keys = np.sort(y.unique())
y_values = range(len(y_keys))
y_dict = dict(zip(y_keys,y_values))
fig, ax = plt.subplots()
ax.plot(x,[y_dict[k] for k in y],'o-')
ax.set_yticks(y_values)
ax.set_yticklabels(y_keys)
I guess I just didn't use the right keywords, because this probably has been asked before, but I didn't find a solution. Anyway, I have a problem where the the bars of a histogram do not line up with the xticks. I want the bars to be centred over the xticks they correspond to, but they get placed between ticks to fill the space in-between evenly.
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
bins = [x+n for n in range(1, 10) for x in [0.0, 0.5]]+[10.0]
plt.hist(data, bins, rwidth = .3)
plt.xticks(bins)
plt.show()
Note that what you are plotting here is not a histogram. A histogram would be
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
bins = [x+n for n in range(1, 10) for x in [0.0, 0.5]]+[10.0]
plt.hist(data, bins, edgecolor="k", alpha=1)
plt.xticks(bins)
plt.show()
Here, the bars range between the bins as expected. E.g. you have 3 values in the interval 1 <= x < 1.5.
Conceptually what you want to do here is get a bar plot of the counts of data values. This would not require any bins at all and could be done as follows:
import numpy as np
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
u, inv = np.unique(data, return_inverse=True)
counts = np.bincount(inv)
plt.bar(u, counts, width=0.3)
plt.xticks(np.arange(1,10,0.5))
plt.show()
Of course you can "misuse" a histogram plot to get a similar result. This would require to move the center of the bar to the left bin edge, plt.hist(.., align="left").
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
bins = [x+n for n in range(1, 10) for x in [0.0, 0.5]]+[10.0]
plt.hist(data, bins, align="left", rwidth = .6)
plt.xticks(bins)
plt.show()
This results in the same plot as above.