I am struggling to modify my code to define a specific range of the secondary x-axis. Below is a snippet of the relevant code for creating 2 x-axes, and the output it generates:
from matplotlib import pyplot as plt
import matplotlib.ticker as ticker
from mpl_toolkits.axes_grid.parasite_axes import SubplotHost
...
x = np.arange(1, len(metric1)+1) # the label locations
width = 0.3 # the width of the bars
fig1 = plt.figure()
ax1 = SubplotHost(fig1, 111)
fig1.add_subplot(ax1)
ax1.axis((0, 14, 0, 20))
ax1.bar(x, [t[2] for t in metric1], width, label='metric1')
ax1.bar(x + width, [t[2] for t in metric2], width, label='metric2')
ax1.bar(x + 2*width, [t[2] for t in metric3], width, label='metric3')
ax1.set_xticks(x+width)
ax1.set_xticklabels(['BN', 'B', 'DO', 'N', 'BN', 'B', 'DO', 'N', 'BN', 'B', 'DO', 'N', 'BN', 'B', 'DO', 'N'])
ax1.axis["bottom"].major_ticks.set_ticksize(0)
ax2 = ax1.twiny()
offset = 0, -25 # Position of the second axis
new_axisline = ax2.get_grid_helper().new_fixed_axis
ax2.axis["bottom"] = new_axisline(loc="bottom", axes=ax2, offset=offset)
ax2.axis["top"].set_visible(False)
ax2.axis["bottom"].minor_ticks.set_ticksize(0)
ax2.axis["bottom"].major_ticks.set_ticksize(15)
ax2.set_xticks([0.058, 0.3434, 0.63, 0.915])
ax2.xaxis.set_major_formatter(ticker.NullFormatter())
ax2.xaxis.set_minor_locator(ticker.FixedLocator([0.20125, 0.48825, 0.776]))
ax2.xaxis.set_minor_formatter(ticker.FixedFormatter(['foo', 'bar', 'foo2']))
...
This is the current output:
What I would like to have, is to not have the secondary x-axis (foo, bar, foo2) line extend beyond the first and last x-tick, as follows (I edited in MS paint 😅):
Any help appreciated.
As there have been no other answers, I can suggest a non-elegant way of doing what you need.
You can hide the axis line and "manually" create one line yourself:
import matplotlib.lines as lines
ax2.axis["bottom"].line.set_visible(False)
p1 = ax2.axis["bottom"].line.get_extents().get_points()
x1 = 0.058 * (p1[1][0]-p1[0][0]) / (1) + p1[0][0]
x2 = 0.915 * (p1[1][0]-p1[0][0]) / (1) + p1[0][0]
newL = lines.Line2D([x1,x2], [p1[0][1],p1[1][1]], transform=None, axes=ax2,color="k",linewidth=0.5)
ax2.lines.extend([newL,])
Which gives, in a simple example, something like this:
As opposed to:
Alternative
One alternative for the creation of multiple axis is using spines (no parasite axis):
https://matplotlib.org/stable/gallery/ticks_and_spines/multiple_yaxis_with_spines.html
In this case, it is possible to do what you need simply by changing the bounds of the spines. For instance, by adding the following line to the code in the link
par2.spines["right"].set_bounds(10,30)
we get this:
Obviously, this does not strictly reply to the title of your question, and unfortunately, I do not know a proper way of doing it for new_fixed_axis as it can be done for the spines. I hope the "manually" created line solves your issue, in case nobody else comes with a better solution.
Related
I'm trying to plot data with different colors depending on their classification. The data is in an nx3 array, with the first column the x position, the second column the y position, and the third column an integer defining their categorical value. I can do this by running a for loop over the entire array and plotting each point individually, but I have found that doing so massively slows down everything.
So, this works.
data = np.loadtxt('data.csv', delimiter = ",")
colors = ['r', 'g', 'b']
fig = plt.figure():
for i in data:
plt.scatter(i[0], i[1], color = colors[int(i[2] % 3]))
plt.show()
This does not work, but I want it to, as something along this line would avoid using a for loop.
data = np.loadtxt('data.csv', delimiter = ",")
colors = ['r', 'g', 'b']
fig = plt.figure():
plt.scatter(data[:,0], data[:,1], color = colors[int(data[:,2]) % 3])
plt.show()
Your code doesn't work because your x and y values are arrays from the data while color is not. So, you have to define it as an array. Just a look at the matplotlib page:
https://matplotlib.org/stable/gallery/shapes_and_collections/scatter.html They have this example there:
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
np.random.seed(19680801)
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2 # 0 to 15 point radii
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()
Here, you have the same x and y. Probably, you won't need s. Color is an array. You can do something as follows:
colors = ['r', 'g', 'b']
colors_list = [colors[int(i) % 3] for i in data[:,2]]
plt.scatter(data[:,0], data[:,1], c = colors_list)
Just note that since I don't have the data to test it, you may need to tweak the code just in case.
Without seeing your array it's hard to know exactly what your data look like (my answer doesn't have a %3, but that's easy enough to insert depending on what data[:,2] looks like). This has a for loop, but only loops 3 times so will be fast.
for ind,col in enumerate(colors):
plt.scatter(data[:,0][data[:,2]==ind], data[:,1][data[:,2]==ind], c = col)
I have an issue with plotting the big CSV file with Y-axis values ranging from 1 upto 20+ millions. There are two problems I am facing right now.
The Y-axis do not show all the values that it is suppose to. When using the original data, it shows upto 6 million, instead of showing all the data upto 20 millions. In the sample data (smaller data) I put below, it only shows the first Y-axis value and does not show any other values.
In the label section, since I am using hue and style = name, "name" appears as the label title and as an item inside.
Questions:
Could anyone give me a sample or help me to answer how may I show all the Y-axis values? How can I fix it so all the Y-values show up?
How can I get rid of "name" under label section without getting rid of shapes and colors for the scatter points?
(Please let me know of any sources exist or this question was answered on some other post without labeling it duplicated. Please also let me know if I have any grammar/spelling issues that I need to fix. Thank you!)
Below you can find the function I am using to plot the graph and the sample data.
def test_graph (file_name):
data_file = pd.read_csv(file_name, header=None, error_bad_lines=False, delimiter="|", index_col = False, dtype='unicode')
data_file.rename(columns={0: 'name',
1: 'date',
2: 'name3',
3: 'name4',
4: 'name5',
5: 'ID',
6: 'counter'}, inplace=True)
data_file.date = pd.to_datetime(data_file['date'], unit='s')
norm = plt.Normalize(1,4)
cmap = plt.cm.tab10
df = pd.DataFrame(data_file)
# Below creates and returns a dictionary of category-point combinations,
# by cycling over the marker points specified.
points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']
mult = len(df['name']) // len(points) + (len(df['name']) % len(points) > 0)
markers = {key:value for (key, value)
in zip(df['name'], points * mult)} ; markers
sc = sns.scatterplot(data = df, x=df['date'], y=df['counter'], hue = df['name'], style = df['name'], markers = markers, s=50)
ax.set_autoscaley_on(True)
ax.set_title("TEST", size = 12, zorder=0)
plt.legend(title="Names", loc='center left', shadow=True, edgecolor = 'grey', handletextpad = 0.1, bbox_to_anchor=(1, 0.5))
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(100))
plt.xlabel("Dates", fontsize = 12, labelpad = 7)
plt.ylabel("Counter", fontsize = 12)
plt.grid(axis='y', color='0.95')
fig.autofmt_xdate(rotation = 30)
fig = plt.figure(figsize=(20,15),dpi=100)
ax = fig.add_subplot(1,1,1)
test_graph(file_name)
plt.savefig(graph_results + "/Test.png", dpi=100)
# Prevents to cut-off the bottom labels (manually) => makes the bottom part bigger
plt.gcf().subplots_adjust(bottom=0.15)
plt.show()
Sample data
namet1|1582334815|ai1|ai1||150|101
namet1|1582392415|ai2|ai2||142|105
namet2|1582882105|pc1|pc1||1|106
namet2|1582594106|pc1|pc1||1|123
namet2|1580592505|pc1|pc1||1|141
namet2|1580909305|pc1|pc1||1|144
namet3|1581974872|ai3|ai3||140|169
namet1|1581211616|ai4|ai4||134|173
namet2|1582550907|pc1|pc1||1|179
namet2|1582608505|pc1|pc1||1|185
namet4|1581355640|ai5|ai5|bcu|180|298466
namet4|1582651641|pc2|pc2||233|298670
namet5|1582406860|ai6|ai6|bcu|179|298977
namet5|1580563661|pc2|pc2||233|299406
namet6|1581283626|qe1|q0/1|Link to btse1/3|51|299990
namet7|1581643672|ai5|ai5|bcu|180|300046
namet4|1581758842|ai6|ai6|bcu|179|300061
namet6|1581298027|qe2|q0/2|Link to btse|52|300064
namet1|1582680415|pc2|pc2||233|300461
namet6|1581744427|pc3|p90|Link to btsi3a4|55|6215663
namet6|1581730026|pc3|p90|Link to btsi3a4|55|6573348
namet6|1582190826|qe2|q0/2|Link to btse|52|6706378
namet6|1582190826|qe1|q0/1|Link to btse1/3|51|6788568
namet1|1581974815|pc2|pc2||233|6895836
namet4|1581974841|pc2|pc2||233|7874504
namet6|1582176427|qe1|q0/1|Link to btse1/3|51|9497687
namet6|1582176427|qe2|q0/2|Link to btse|52|9529133
namet7|1581974872|pc2|pc2||233|9573450
namet6|1582162027|pc3|p90|Link to btsi3a4|55|9819491
namet6|1582190826|pc3|p90|Link to btsi3a4|55|13494946
namet6|1582176427|pc3|p90|Link to btsi3a4|55|19026820
Results I am getting:
Big data:
Small data:
Updated Graph
Updated-graph
First of all, some improvements on your post: you are missing the import statements
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker
import seaborn as sns
The line
df = pd.DataFrame(data_file)
is not necessary, since data_file already is a DataFrame. The lines
points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']
mult = len(df['name']) // len(points) + (len(df['name']) % len(points) > 0)
markers = {key:value for (key, value)
in zip(df['name'], points * mult)}
do not cycle through points as you might expect, maybe use itertools as suggested here. Also, setting yticks like
ax.yaxis.set_major_locator(ticker.MultipleLocator(100))
for every 100 might be too much if your data is spanning values from 0 to 20 million, consider replacing 100 with, say, 1000000.
I was able to reproduce your first problem. Using df.dtypes I found that the column counter was stored as type object. Adding the line
df['counter']=df['counter'].astype(int)
resolved your first problem for me. I couldn't reproduce your second issue, though. Here is what the resulting plot looks like for me:
Have you tried updating all your packages to the latest version?
EDIT: as follow up on your comment, you can also adjust the number of xticks in your plot by replacing 1 in
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
by a higher number, say 10. Incorporating all my suggestions and deleting the seemingly unnecessary function definition, my version of your code looks as follows:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker
import seaborn as sns
import itertools
fig = plt.figure()
ax = fig.add_subplot()
df = pd.read_csv(
'data.csv',
header = None,
error_bad_lines = False,
delimiter = "|",
index_col = False,
dtype = 'unicode')
df.rename(columns={0: 'name',
1: 'date',
2: 'name3',
3: 'name4',
4: 'name5',
5: 'ID',
6: 'counter'}, inplace=True)
df.date = pd.to_datetime(df['date'], unit='s')
df['counter'] = df['counter'].astype(int)
points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']
markers = itertools.cycle(points)
markers = list(itertools.islice(markers, len(df['name'].unique())))
sc = sns.scatterplot(
data = df,
x = 'date',
y = 'counter',
hue = 'name',
style = 'name',
markers = markers,
s = 50)
ax.set_title("TEST", size = 12, zorder=0)
ax.legend(
title = "Names",
loc = 'center left',
shadow = True,
edgecolor = 'grey',
handletextpad = 0.1,
bbox_to_anchor = (1, 0.5))
ax.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1000000))
ax.minorticks_off()
ax.set_xlabel("Dates", fontsize = 12, labelpad = 7)
ax.set_ylabel("Counter", fontsize = 12)
ax.grid(axis='y', color='0.95')
fig.autofmt_xdate(rotation = 30)
plt.gcf().subplots_adjust(bottom=0.15)
plt.show()
I want to reorder x-axis tick labels such that the data also changes appropriately.
Example
y = [5,8,9,10]
x = ['a', 'b', 'c', 'd']
plt.plot(y, x)
What I want the plot to look like by modifying the location of axis ticks.
Please note that I don't want to achieve this by modifying the order of my data
My Try
# attempt 1
fig, ax =plt.subplots()
plt.plot(y,x)
ax.set_xticklabels(['b', 'c', 'a', 'd'])
# this just overwrites the labels, not what we intended
# attempt2
fig, ax =plt.subplots()
plt.plot(y,x)
locs, labels = plt.xticks()
plt.xticks((1,2,0,3)); # This is essentially showing the location
# of the labels to dsiplay irrespective of the order of the tuple.
Edit:
Based on comments here are some further clarifications.
Let's say the first point (a,5) in fig 1. If I changed my x-axis definition such that a is now defined at the third position, then it gets reflected in the plot as well, which means, 5 on y-axis moves with a as shown in fig-2. One way to achieve this would be to re-order the data. However, I would like to see if it is possible to achieve it somehow by changing axis locations. To summarize, the data should be plotted based on how we define our custom axis without re-ordering the original data.
Edit2:
Based on the discussion in the comments it's not possible to do it by just modifying axis labels. Any approach would involve modifying the data. This was an oversimplification of the original problem I was facing. Finally, using dictionary-based labels in a pandas data frame helped me to sort the axis values in a specific order while also making sure that their respective values change accordingly.
Toggling between two different orders of the x axis categories could look as follows,
import numpy as np
import matplotlib.pyplot as plt
x = ['a', 'b', 'c', 'd']
y = [5,8,9,10]
order1 = ['a', 'b', 'c', 'd']
order2 = ['b', 'c', 'a', 'd']
fig, ax = plt.subplots()
line, = ax.plot(x, y, marker="o")
def toggle(order):
_, ind1 = np.unique(x, return_index=True)
_, inv2 = np.unique(order, return_inverse=True)
y_new = np.array(y)[ind1][inv2]
line.set_ydata(y_new)
line.axes.set_xticks(range(len(order)))
line.axes.set_xticklabels(order)
fig.canvas.draw_idle()
curr = [0]
orders = [order1, order2]
def onclick(evt):
curr[0] = (curr[0] + 1) % 2
toggle(orders[curr[0]])
fig.canvas.mpl_connect("button_press_event", onclick)
plt.show()
Click anywhere on the plot to toggle between order1 and order2.
The below code helps in obtaining subplots with unique colored boxes. But all subplots share a common set of x and y axis. I was looking forward to having independent axis for each sub-plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
df = pd.DataFrame(np.random.rand(140, 4), columns=['A', 'B', 'C', 'D'])
df['models'] = pd.Series(np.repeat(['model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'], 20))
bp_dict = df.boxplot(
by="models",layout=(2,2),figsize=(6,4),
return_type='both',
patch_artist = True,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
for row_key, (ax,row) in bp_dict.iteritems():
ax.set_xlabel('')
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
Here is an output of the above code:
I am trying to have separate x and y axis for each subplot...
You need to create the figure and subplots before hand and pass this in as an argument to df.boxplot(). This also means you can remove the argument layout=(2,2):
fig, axes = plt.subplots(2,2,sharex=False,sharey=False)
Then use:
bp_dict = df.boxplot(
by="models", ax=axes, figsize=(6,4),
return_type='both',
patch_artist = True,
)
You may set the ticklabels visible again, e.g. via
plt.setp(ax.get_xticklabels(), visible=True)
This does not make the axes independent though, they are still bound to each other, but it seems like you are asking about the visibilty, rather than the shared behaviour here.
If you really think it is necessary to un-share the axes after the creation of the boxplot array, you can do this, but you have to do everything 'by hand'. Searching a while through stackoverflow and looking at the matplotlib documentation pages I came up with the following solution to un-share the yaxes of the Axes instances, for the xaxes, you would have to go analogously:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
from matplotlib.ticker import AutoLocator, AutoMinorLocator
##using differently scaled data for the different random series:
df = pd.DataFrame(
np.asarray([
np.random.rand(140),
2*np.random.rand(140),
4*np.random.rand(140),
8*np.random.rand(140),
]).T,
columns=['A', 'B', 'C', 'D']
)
df['models'] = pd.Series(np.repeat([
'model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'
], 20))
##creating the boxplot array:
bp_dict = df.boxplot(
by="models",layout = (2,2),figsize=(6,8),
return_type='both',
patch_artist = True,
rot = 45,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
##adjusting the Axes instances to your needs
for row_key, (ax,row) in bp_dict.items():
ax.set_xlabel('')
##removing shared axes:
grouper = ax.get_shared_y_axes()
shared_ys = [a for a in grouper]
for ax_list in shared_ys:
for ax2 in ax_list:
grouper.remove(ax2)
##setting limits:
ax.axis('auto')
ax.relim() #<-- maybe not necessary
##adjusting tick positions:
ax.yaxis.set_major_locator(AutoLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
##making tick labels visible:
plt.setp(ax.get_yticklabels(), visible=True)
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
The resulting plot looks like this:
Explanation:
You first need to tell each Axes instance that it shouldn't share its yaxis with any other Axis instance. This post got me into the direction of how to do this -- Axes.get_shared_y_axes() returns a Grouper object, that holds references to all other Axes instances with which the current Axes should share its xaxis. Looping through those instances and calling Grouper.remove does the actual un-sharing.
Once the yaxis is un-shared, the y limits and the y ticks need to be adjusted. The former can be achieved with ax.axis('auto') and ax.relim() (not sure if the second command is necessary). The ticks can be adjusted by using ax.yaxis.set_major_locator() and ax.yaxis.set_minor_locator() with the appropriate Locators. Finally, the tick labels can be made visible using plt.setp(ax.get_yticklabels(), visible=True) (see here).
Considering all this, #DavidG's answer is in my opinion the better approach.
I am able to build the histogram I need. However, the bars overlap over one another.
As you can see I changed the width of the bars to 0.2 but it still overlaps. What is the mistake I am doing?
from matplotlib import pyplot as plt
import numpy as np
from matplotlib.font_manager import FontProperties
from random import randrange
color = ['r', 'b', 'g','c','m','y','k','darkgreen', 'darkkhaki', 'darkmagenta', 'darkolivegreen', 'darkorange', 'darkorchid', 'darkred']
label = ['2','6','10','14','18','22','26','30','34','38','42','46']
file_names = ['a','b','c']
diff = [[randrange(10) for a in range(0, len(label))] for a in range(0, len(file_names))]
print diff
x = diff
name = file_names
y = zip(*x)
pos = np.arange(len(x))
width = 1. / (1 + len(x))
fig, ax = plt.subplots()
for idx, (serie, color,label) in enumerate(zip(y, color,label)):
ax.bar(pos + idx * width, serie, width, color=color, label=label)
ax.set_xticks(pos + width)
plt.xlabel('foo')
plt.ylabel('bar')
ax.set_xticklabels(name)
ax.legend()
plt.savefig("final" + '.eps', bbox_inches='tight', pad_inches=0.5,dpi=100,format="eps")
plt.clf()
Here is the graph:
As you can see in the below example, you can easily get non-overlapping bars using a heavily simplified version of your plotting code. I'd suggest you to have a closer look at whether x and y really are what you expect them to be. (And that you try to simplify your code as much as possible when you are looking for an error in the code.)
Also have a look at the computation of the width of the bars. You appear to use the number of subjects for this, while it should be the number of bars per subject instead.
Have a look at this example:
import numpy as np
import matplotlib.pyplot as plt
subjects = ('Tom', 'Dick', 'Harry', 'Sally', 'Sue')
# number of bars per subject
n = 5
# y-data per subject
y = np.random.rand(n, len(subjects))
# x-positions for the bars
x = np.arange(len(subjects))
# plot bars
width = 1./(1+n) # <-- n.b., use number of bars, not number of subjects
for i, yi in enumerate(y):
plt.bar(x+i*width, yi, width)
# add labels
plt.xticks(x+n/2.*width, subjects)
plt.show()
This is the result image:
For reference:
http://matplotlib.org/examples/api/barchart_demo.html
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar
The problem is that the width of your bars is calculated from the three subjects, not the twelve bars per subject. That means you're placing multiple bars at each x-position. Try swapping in these lines where appropriate to fix that:
n = len(x[0]) # New variable with the right length to calculate bar width
width = 1. / (1 + n)
ax.set_xticks(pos + n/2. * width)