Plotting values versus strings in matplotlib? - python

I am trying to create a plot in matplotlib where the x-values are integers and the y-values are strings. Is it possible to plot data of this type in matplotlib? I examined the documentation and the gallery for matplotlib and could not find any examples of this type.
I have many lists bound to a variable called my_lists. The structure looks like this:
mylists = [765340, 765371, 765310,'MA011',],
[65310, 'MA015'],
[765422, 765422, 24920205, 24920161, 'MA125'],
[765422, 'MA105'],
[765371, 12345, 'MA004']
In each list, all items except the last item are x-values. The last item in each list is a string, which is the single y-value.
How can I plot this is matplotlib? Here was my attempt:
import matplotlib.pyplot as plt
for sub_list in my_lists:
x_value = sub_list[:1]
y_value = sub_list[-1]
plt.plot(x_value, y_value, "ro")
plt.show()
The above code throws me this error:
ValueError: could not convert string to float: MA011
How can integers versus strings be plotted?

You could probably do something like this, where you give each y "string" a unique index value. You may have to fiddle with the spacing for i. Ie. i*2 instead of i to make things look nice. After that you set the tick label for each of those indexes to its corresponding string.
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(9,7))
ax1 = fig.add_subplot(111)
mylists = [[765340, 765371, 765310,'MA011',], [65310, 'MA015'],
[765422, 765422, 24920205, 24920161, 'MA125'],
[765422, 'MA105'],[765371, 12345, 'MA004']]
x = []
y = []
y_labels = []
y_ticks = []
for i,sub_list in enumerate(mylists):
y_labels.append(sub_list[-1])
y_ticks.append(i)
for v in sub_list[:-1]:
x.append(v)
y.append(i)
ax1.set_yticks(y_ticks)
ax1.set_yticklabels(y_labels)
ax1.plot(x, y, "ro")
plt.show()
EDIT:
Sorry I forgot to include the enuemrate call in the for loop. It basically sets the value of i to the index of the current sub_list. Then you use the index instead of the string value as the y-value. After that you replace the label for those y-values with the actual string value.

Related

How to plot x int date values from array matplotlib correctly?

I am having an issue when trying to plot some of the date values into a matplotlib side by side bar graph.
I first define my Series x = new_df['month'] which contains the following values:
0,2021-01-01
1,2021-02-01
2,2021-03-01
3,2021-04-01
4,2021-05-01
5,2021-06-01
6,2021-07-01
7,2021-08-01
8,2021-09-01
9,2021-10-01
10,2021-11-01
11,2021-12-01
12,2022-01-01
13,2022-02-01
14,2022-03-01
15,2022-04-01
16,2022-05-01
17,2022-06-01
18,2022-07-01
19,2022-08-01
20,2022-09-01
21,2022-10-01
22,2022-11-01
After this I define the function to plot my graph:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
import numpy as np
def side_by_side_bar_chart(x, y, labels, file_name):
width = 0.25 # set bar width
ind = np.arange(len(x)) # Get the number of x labels
fig, ax = plt.subplots(figsize=(10, 8))
# Get average number in order to set labels formatting
ymax = int(max([mean(x) for x in y]))
plt.xticks(ind, x) # sets x labels with values in x list (months)
# These two lines format ax labels
dtFmt = mdates.DateFormatter('%b-%y') # define the formatting
plt.gca().xaxis.set_major_formatter(dtFmt)
plt.savefig("charts/"+ file_name + ".png", dpi = 300)
However, my x values are plotted as Jan 70 for all xticks:
Wrong labeled x ticks
I suspect that this has something to do with formatting. The same is causing similar issues in a different part of the script where I use twin(x) for a side by side chart with a trendline on top and my values are plotted wrong in the graph:
Wrong plotted graph
Does anybody have an idea how to fix these bugs? Thank you for your help in advance!
Pass the dates in the x array and plot all values correspondingly in the graphs.
The thing is that your "x" is not a date. It is obviously a string. So formatter can't interpret it correctly.
Let's try to reproduce your problem (this is the kind of minimal reproducible example I was mentioning earlier) :
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np # just to generate something to plot
# Generate a dummy set of 20 dates, starting from Mar 15 2020
dt=datetime.timedelta(days=31)
x0=[datetime.date(2020,3,1) + k*dt for k in range(20)]
x=[d.strftime("%Y-%m-%d") for d in x0] # This looks like your x: 20 strings
# And some y to have something to plot
y=np.cumsum(np.random.normal(0,1,20)) # Don't overthink it, it is just 20 numbers :)
# Plot y vs x (x being the strings)
plt.plot(x,y)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b-%y'))
plt.show()
Result
Now, solution for that is very simple: x must contains date, not strings.
From my example, I could just plt.plot(x0,y) instead of x, since x0 is the list of dates from which I computed x. But if, as it appears, you only have the string available, you can parse them. For example, using [d datetime.date.fromisoformat(d) for d in x].
Or, since you have already pandas at hand: pd.to_datetime(x) (it is not exactly the same date time, but both are understood by matplotlib)
xx=pd.to_datetime(x)
plt.plot(xx,y)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b-%y'))
plt.show()
Note that, without any action from me, it also stop printing all labels. That is because in the first case, matplotlib wasn't aware of any logical progression of x values. From its point of view, those where all just labels. And you can't, a priori, skip a label, since the reader could not guess what is between two labels separated by a gap (it seems obvious for us, since we know they are dates. But matplotlib doesn't know that. It is just as if x contained ['red', 'green', 'yellow', 'purple', 'black', 'blue', ...]. You would not expect every other label to be just arbitrarily skipped).
Whereas, now that we passed real dates to matplotlib, it is as if x was numerical: there is a logical progression of its values. Matplotlib knows it, and, more importantly, knows that we know it. So it is acceptable to just skip some to make the figure more readable: everybody knows what is between "Mar 20" and "May 20".
So, short answer: convert your string to dates.

Plotting output of a function iterated over a list in matplotlib

I have a list of values, to which I am applying a function.
I want to be able to plot the results of each iteration separately on a scatterplot.
To complicate things somewhat, the results list is not the same length for each iteration.
I've tried playing around with colourmap, but it's not even printing a blank chart.
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.cm as cm
cmap = cm.get_cmap('Set1')
def scatter_plot(list):
x = []
y = []
for i in list:
x.append(i[0])
y.append(i[1])
c = cmap(i[2])
plt.figure(figsize=(8,8))
plt.scatter(x,y, color=c)
plt.show()
In the function funky_function I have:
return(my_list, a_value)
my_list contains the x and y values for the plot, a_value is the value for which I want each different result a separate colour. The scatter_plot function is picking out the x and y fine for a single value.
To produce the results:
pointlist = funky_function(a_value)
value_list = [1,2,3,4]
for a_value in value_list:
funky_function(a_value)
scatter_plot(pointlist)
It's printing the results fine, but not plotting them. I want it to be able to just add new results to the plot if I add new items to the value list, hence trying to set the colour to be a dynamic input rather than plot1=color1, plot2=color2.
I had a look at Add colour scale to plot as 3rd variable, but I need the colour to match to a specific item in the list. (I agree with that poster that the info available on colormap isn't very clear.)

Formatting string ticklabels matplotlib

I have a set of ticklabels that are strings on my x axis, and I want to be able to get -> modify -> set them. Say for example I have a plot that looks like this:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(1,6), range(5))
plt.xticks(range(1,6), ['a','b','c','d','e']
and I want to change the labels on the x axis to ['(a)','(b)','(c)','(d)','(e)']
what is the simplest/best way to do this? I've tried things like:
labels = ['(%s)' % l for l in ax.xaxis.get_ticklabels()]
ax.xaxis.set_ticklabels(labels)
but ax.xaxis.get_ticklabels() returns matplotlib Text objects as opposed to a list of strings and I'm not sure how to go about modifying them. I also tried using matplotlib.ticker.FuncFormatter but could only get a hold of the numeric positions not the labels themselves. Any would be appreciated.
One more layer to unpeel:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(1,6), range(5))
plt.xticks(range(1,6), ['a','b','c','d','e'])
labels = ['(%s)' % l.get_text() for l in ax.xaxis.get_ticklabels()]
ax.xaxis.set_ticklabels(labels)
your code but with l.get_text() in the list comp where there was a l.

Is there a way to return same length arrays in numpy.hist?

I'm trying to create a histogram plot in python, normalizing with some custom values the y-axis values. For this, I was thinking to do it like this:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('foo.bar')
fig = plt.figure()
ax = fig.add_subplot(111)
hist=np.histogram(data, bins=(1.0, 1.5 ,2.0,2.5,3.0))
x=[hist[0]*5,hist[1]]
ax.plot(x[0], x[1], 'o')
but of course, the last line gives:
ValueError: x and y must have same first dimension
Is there a way to force np.hist to give the same number of elements for the x[0] and x[1] arrays, for example by deleting the first or last element for one of them?
hist[1] contains the limits in which you have made the histogram. I guess you probably want to get the centers of those intervals, something like:
x = [hist[0], 0.5*(hist[1][1:]+hist[1][:-1])]
and then the plot should be ok, right?
I would imagine it depends on your data source.
Try loading the data as a numpy array, and selecting the range of elements yourself before passing to the histogram function.
e.g.
dataForHistogram = data[0:100][0:100] # Assuming your data is in this kind of structure.

Histogram in Python

I have a list of numbers.The list is like [0,0,1,0,1 .... ] .Presently it has binary digits only but later on it can have decimal digits as well. I want to plot a histogram of this sequence in the list.
When I use standard hist funcion of matplotlib library , I get only two bars.It counts all zeros and all ones and shows me the histogram with two bars. But I want to plot in a different way.
I want a no of bars = length of list
and
Height of each bar = value in the list at ( position = bar# ).
Here is the code:
def plot_histogram(self,li_input,):
binseq = numpy.arange(len(li_input))
tupl = matplotlib.pyplot.hist(li_input,bins=binseq)
matplotlib.pyplot.show()
li_input is the list discussed above.
I can do it in a nasty way like :
li_input_mod = []
for x in range(len(li_input)):
li_input_mod += [x]*li_input[x]
and then plot it but i want something better.
The behavior you describe is the way a histogram works; it shows you the distribution of values. It sounds to me like you want to create a bar chart:
import matplotlib.pyplot as plt
x = [0,0,1,0,1,1,0,1,1,0,0,0,1]
plt.bar(range(len(x)), x, align='center')
which would produce:

Categories

Resources