I am struggling wth plotting a histogram with two lists through the pylab module (which I am required to use)
The first list, totalTime, is populated with 7 float values calculated within the program.
The second list, raceTrack, is populated with 7 string values that represent the name of a race track.
totalTime[0] is the time taken on raceTrack[0], totalTime[3] is the time taken on raceTrack[3], etc...
I sorted out the array and rounded the values to 2 decimal place
totalTimes.sort()
myFormattedTotalTimes = ['%.2f' % elem for elem in totalTimes]
myFormattedTotalTimes' output (when the value entered is 100) is
['68.17', '71.43', '71.53', '84.23', '84.55', '87.20', '102.85']
I would need to use the values in the list to create a histogram, where x-axis would show the name of the race track and the y-axis would show the time on that particular track. Ive made quickly an excel histogram to help understand.
I have attempted but to no avail
for i in range (7):
pylab.hist([myFormattedTotalTimes[i]],7,[0,120])
pylab.show()
Any help would be very appreciated, I am quite lost on this one.
As #John Doe states, I think you want a bar chart. From the matplotlib example, the following does what you want,
import matplotlib.pyplot as plt
import numpy as np
myFormattedTotalTimes = ['68.17', '71.43', '71.53', '84.23', '84.55', '87.20', '102.85']
#Setup track names
raceTrack = ["track " + str(i+1) for i in range(7)]
#Convert to float
racetime = [float(i) for i in myFormattedTotalTimes]
#Plot a bar chart (not a histogram)
width = 0.35 # the width of the bars
ind = np.arange(7) #Bar indices
fig, ax = plt.subplots(1,1)
ax.bar(ind,racetime, width)
ax.set_xticks(ind + width)
ax.set_xticklabels(raceTrack)
plt.show()
Which looks like,
Related
I am having an issue when trying to plot some of the date values into a matplotlib side by side bar graph.
I first define my Series x = new_df['month'] which contains the following values:
0,2021-01-01
1,2021-02-01
2,2021-03-01
3,2021-04-01
4,2021-05-01
5,2021-06-01
6,2021-07-01
7,2021-08-01
8,2021-09-01
9,2021-10-01
10,2021-11-01
11,2021-12-01
12,2022-01-01
13,2022-02-01
14,2022-03-01
15,2022-04-01
16,2022-05-01
17,2022-06-01
18,2022-07-01
19,2022-08-01
20,2022-09-01
21,2022-10-01
22,2022-11-01
After this I define the function to plot my graph:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
import numpy as np
def side_by_side_bar_chart(x, y, labels, file_name):
width = 0.25 # set bar width
ind = np.arange(len(x)) # Get the number of x labels
fig, ax = plt.subplots(figsize=(10, 8))
# Get average number in order to set labels formatting
ymax = int(max([mean(x) for x in y]))
plt.xticks(ind, x) # sets x labels with values in x list (months)
# These two lines format ax labels
dtFmt = mdates.DateFormatter('%b-%y') # define the formatting
plt.gca().xaxis.set_major_formatter(dtFmt)
plt.savefig("charts/"+ file_name + ".png", dpi = 300)
However, my x values are plotted as Jan 70 for all xticks:
Wrong labeled x ticks
I suspect that this has something to do with formatting. The same is causing similar issues in a different part of the script where I use twin(x) for a side by side chart with a trendline on top and my values are plotted wrong in the graph:
Wrong plotted graph
Does anybody have an idea how to fix these bugs? Thank you for your help in advance!
Pass the dates in the x array and plot all values correspondingly in the graphs.
The thing is that your "x" is not a date. It is obviously a string. So formatter can't interpret it correctly.
Let's try to reproduce your problem (this is the kind of minimal reproducible example I was mentioning earlier) :
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np # just to generate something to plot
# Generate a dummy set of 20 dates, starting from Mar 15 2020
dt=datetime.timedelta(days=31)
x0=[datetime.date(2020,3,1) + k*dt for k in range(20)]
x=[d.strftime("%Y-%m-%d") for d in x0] # This looks like your x: 20 strings
# And some y to have something to plot
y=np.cumsum(np.random.normal(0,1,20)) # Don't overthink it, it is just 20 numbers :)
# Plot y vs x (x being the strings)
plt.plot(x,y)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b-%y'))
plt.show()
Result
Now, solution for that is very simple: x must contains date, not strings.
From my example, I could just plt.plot(x0,y) instead of x, since x0 is the list of dates from which I computed x. But if, as it appears, you only have the string available, you can parse them. For example, using [d datetime.date.fromisoformat(d) for d in x].
Or, since you have already pandas at hand: pd.to_datetime(x) (it is not exactly the same date time, but both are understood by matplotlib)
xx=pd.to_datetime(x)
plt.plot(xx,y)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b-%y'))
plt.show()
Note that, without any action from me, it also stop printing all labels. That is because in the first case, matplotlib wasn't aware of any logical progression of x values. From its point of view, those where all just labels. And you can't, a priori, skip a label, since the reader could not guess what is between two labels separated by a gap (it seems obvious for us, since we know they are dates. But matplotlib doesn't know that. It is just as if x contained ['red', 'green', 'yellow', 'purple', 'black', 'blue', ...]. You would not expect every other label to be just arbitrarily skipped).
Whereas, now that we passed real dates to matplotlib, it is as if x was numerical: there is a logical progression of its values. Matplotlib knows it, and, more importantly, knows that we know it. So it is acceptable to just skip some to make the figure more readable: everybody knows what is between "Mar 20" and "May 20".
So, short answer: convert your string to dates.
I'm very new to data science, and I've been trying to do this for about 2 weeks and I haven't got any closer to figuring it out.
I have 4 lists of lists with dates (each date represents a sale):
a = [[2012-6], [2013-5], [2014-5]]
b = [[2015-5], [2017-4], [2019-5]]
etc.
I'm trying to plot the number of occurrences of each date across the x axis on a line plot, with each list represented by a different colour line.
I've tried converting them to np arrays, DataFrames, date time objects etc but I have to admit that I'm finally stuck and I'm not getting any closer.
This was the closest I got:
fig_3 = pd.Series(sale1).value_counts().plot.line()
fig_4 = pd.Series(sal2).value_counts().plot.line()
fig_5 = pd.Series(sale3).value_counts().plot.line()
fig_6 = pd.Series(sale4).value_counts().plot.line()
But when I do this it plots them on different plots, and when I can get them on the same plot I can't figure out how to plot the X axis (I tried using xticks, x labels etc).
Other times the dates end up plotting on the Y axis, and I don't know how to switch that either.
If anyone can help I would greatly appreciate it!
Thanks.
You can add a few changes in your code like this:
import matplotlib.pyplot as plt # <- this row
fig, ax = plt.subplots() # <- this row
pd.Series(sale1).value_counts().plot.line(ax=ax) # <- and this parameter to each your line
pd.Series(sale2).value_counts().plot.line(ax=ax)
I'm making a bar chart and a scatter plot. The bar chart takes a vector as an input. I plotted the values on the x-axis, and the amount of times they repeat on the y-axis. This is did by converting the vector to a list and using .count(). That worked great and was relatively straightforward.
As for the scatterplot, the input is going to be a matrix of any x and y dimensions. The idea is to have the amount of columns in the matrix show up on the x axis going from 1,2,3,4 etc depending on how many columns the inserted matrix is. The rows of each column will consist of many different numbers that I would like all to be displayed as dots or stars above the relevant column index, i. e. Column #3 consists of values 6,2,8,5,9,5 going down, and would like a dot for each of them going up the y-axis directly on top of the number 3 on the x axis. I have tried different approaches, some with dots showing up but in wrong places, other times the x axis is completely off even though I used .len(0,:) which prints out the correct amount of columns but doesn't chart it.
My latest attempt which now doesn't even show the dots or stars:
import numpy as np # Import NumPy
import matplotlib.pyplot as plt # Import the matplotlib.pyplot module
vector = np.array([[-3,7,12,4,0o2,7,-3],[7,7,12,4,0o2,4,12],[12,-3,4,10,12,4,-3],[10,12,4,0o3,7,10,12]])
x = len(vector[0,:])
print(x)#vector[0,:]
y = vector[:,0]
plt.plot(x, y, "r.") # Scatter plot with blue stars
plt.title("Scatter plot") # Set the title of the graph
plt.xlabel("Column #") # Set the x-axis label
plt.ylabel("Occurences of values for each column") # Set the y-axis label
plt.xlim([1,len(vector[0,:])]) # Set the limits of the x-axis
plt.ylim([-5,15]) # Set the limits of the y-axis
plt.show(vector)
The matrix shown at the top is just one I made up for the purpose of testing, the idea is that it should work for any given matrix which is imported.
I tried the above pasted code which is the closest I have gotten as it actually prints the amount of columns it has, but it doesn't show them on the plot. I haven't gotten to a point where it actually plots the points above the columns on y axis yet, only in completely wrong positions in a previous version.
import numpy as np # Import NumPy
import matplotlib.pyplot as plt # Import the matplotlib.pyplot module
vector = np.array([[-3,7,12,4,0o2,7,-3],
[7,7,12,4,0o2,4,12],
[12,-3,4,10,12,4,-3],
[10,12,4,0o3,7,10,12]])
rows, columns = vector.shape
plt.title("Scatter plot") # Set the title of the graph
plt.xlabel("Column #") # Set the x-axis label
plt.ylabel("Occurences of values for each column") # Set the y-axis label
plt.xlim([1,columns]) # Set the limits of the x-axis
plt.ylim([-5,15]) # Set the limits of the y-axis
for i in range(1, columns+1):
y = vector[:,i-1]
x = [i] * rows
plt.plot(x, y, "r.")
plt.show()
I want to build a bar chart that shows the utilization of some resources. Let's say characters in a text:
from collections import Counter
import matplotlib.pyplot as plt
raw_data = 'data to make example bar chart'
counts = Counter(raw_data)
keys, values = zip(*counts.most_common())
plt.bar(keys, values);
This produces the following chart, with absolute counts of characters:
If I transform values before plotting using for example
values = [v/len(raw_data) * 100.0 for v in values]
I would get exactly same graph, but value for a would be 20.0 (%).
Question is, could I somehow show two values on y axis?
I saw recipes on how to show two different functions of the same value and have scale to the left and right, but here I have one function, just different units of measurement. Could I somehow show two scales without plotting two bar charts?
https://matplotlib.org/gallery/api/two_scales.html
You could create a right y-axis via ax.twinx(), give it exactly the same limits as the left y-axis and format the ticks as percentages. The PercentFormatter() gets a parameter telling which value corresponds to 100%. In this case, 100% would be all the data in raw_data.
from collections import Counter
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import numpy as np
raw_data = np.random.choice([*'abcdefghijklmnopqrst'], 200)
counts = Counter(raw_data)
keys, values = zip(*counts.most_common())
fig, ax = plt.subplots()
ax.bar(keys, values, color='turquoise')
ax.margins(x=0.02) # less wasted space left and right
ax.grid(axis='y')
ax2 = ax.twinx()
ax2.set_ylim(*ax.get_ylim())
ax2.yaxis.set_major_formatter(PercentFormatter(len(raw_data)))
plt.show()
I am trying to plot a data and function with matplotlib 2.0 under python 2.7.
The x values of the function are evolving with time and the x is first decreasing to a certain value, than increasing again.
If the function is plotted against time, it shows function like this plot of data against time
I need the same x axis evolution for plotting against real x values. Unfortunately as the x values are the same for both parts before and after, both values are mixed together. This gives me the wrong data plot:
In this example it means I need the x-axis to start on value 2.4 and decrease to 1.0 than again increase to 2.4. I swear I found before that this is possible, but unfortunately I can't find a trace about that again.
A matplotlib axis is by default linearly increasing. More importantly, there must be an injective mapping of the number line to the axis units. So changing the data range is not really an option (at least when the aim is to keep things simple).
It would hence be good to keep the original numbers and only change the ticks and ticklabels on the axis. E.g. you could use a FuncFormatter to map the original numbers to
np.abs(x-tp)+tp
where tp would be the turning point.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
x = np.linspace(-10,20,151)
y = np.exp(-(x-5)**2/19.)
plt.plot(x,y)
tp = 5
fmt = lambda x,pos:"{:g}".format(np.abs(x-tp)+tp)
plt.gca().xaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(fmt))
plt.show()
One option would be to use two axes, and plot your two timespans separately on each axes.
for instance, if you have the following data:
myX = np.linspace(1,2.4,100)
myY1 = -1*myX
myY2 = -0.5*myX-0.5
plt.plot(myX,myY, c='b')
plt.plot(myX,myY2, c='g')
you can instead create two subplots with a shared y-axis and no space between the two axes, plot each time span independently, and finally, adjust the limits of one of your x-axis to reverse the order of the points
fig, (ax1,ax2) = plt.subplots(1,2, gridspec_kw={'wspace':0}, sharey=True)
ax1.plot(myX,myY1, c='b')
ax2.plot(myX,myY2, c='g')
ax1.set_xlim((2.4,1))
ax2.set_xlim((1,2.4))