Scale negative xticks to different scale than positive x ticks - python

is this possible?
I have a barchart representing the difference between two dataframe columns divided by the original dataframe column
difference = (df - full_df)/full_df
I then plot the difference
difference.plot(kind='barh',color = ['r' if x > 0 else 'b' for x in difference.values]).\
set_yticklabels([str(tick)[:45] for tick in difference.index])
plt.xticks(fontsize=20)
plt.gca().set_title('Selected minus full feature set averages divided by full', fontsize=30)
axs[1].yaxis.tick_right()
axs[1].yaxis.grid(color='gray', linestyle='dashed')
axs[1].xaxis.grid(color='gray', linestyle='dashed')
plt.yticks(fontsize=23)
plt.tight_layout()
Most the positive x numbers are going to be in the range of 0 < x < 10. All of the negative numbers should between -1 < x < 0. Is there a way to set the xtick intervals below zero to .1 (or something like that) and the xtick intervals above 0 to 1 so the x axis would look like:
[-1,-.9,-.8,-.7,-.6,-.5,-.4,-.3,-.2,-.1,0,1,2,3,4,5,6,7,8,9, to inf] ?

I may not understand your question correctly, but I think are looking for a plot with negative x and positive x on different scales, but taking up the same amount of space. I think the easiest solution would be to scale your x negative axis data so that it fits well with the y axis data and then relabel the ticks on their original scale.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# original data
data_x = x = [-1,-.9,-.8,-.7,-.6,-.5,-.4,-.3,-.2,-.1,0,1,2,3,4,5,6,7,8,9,]
data_y = y = range(len(data_x))
# scale negative x values
x_mod =[i*10 if i < 0 else i for i in x]
# draw plots
with sns.axes_style('whitegrid'):
fig, ax = plt.subplots(2,1,figsize=(5,8))
# unscaled
ax1 = ax[0]
ax2 = ax[1]
ax1.plot(data_x, data_y)
ax1.vlines(0, 0, 20, color='black') # mark x = 0
ax1.set_title('unscaled data')
# scaled
ax2.plot(x_mod, data_y)
# fix the xticks and their labeling
xticks = list(np.concatenate([np.arange(-1,0, 0.2),np.arange(0,11,2)]))
xtick_locs = list(np.concatenate([np.arange(-1,0, 0.2) *10, np.arange(0,11,2)]))
ax2.set(xticks=xtick_locs, xticklabels = xticks)
ax2.vlines(0, 0, 20, color='black')
ax2.set_title('scaled data')

Related

How to plot a histogram with the yaxis as the "total of y values corresponding to each x bin" and x axis as n bins of x in python?

Say I have two arrays:
x=np.random.uniform(0,10,100)
y=np.random.uniform(0,1,100)
I want to make n bins between xmin=0 and xmax=10. For each y, there is a corresponding x which belongs to one of these n bins. Say the value corresponding to each bin is initially zero. What I wish to do is add each value of y to its corresponding x's bin and plot a histogram with the x-axis as xmin to xmax with n bins and the y axis as the total of all y values added to corresponding x's bins. How can one do this in python?
First of all, i think its easier to use a bar plot instead of a histogram.
import matplotlib.pyplot as plt
import numpy as np
xmax = 6600
xmin = 6400
x = np.random.uniform(xmin, xmax, 10000)
y = np.random.uniform(0, 1, 10000)
number_of_bins = 100
bins = [xmin]
step = (xmax - xmin)/number_of_bins
print("step = ", step)
for i in range(1, number_of_bins+1):
bins.append(xmin + step*i)
print("bins = ", bins)
# time to create the sums for each bin
sums = [0] * number_of_bins
for i in range(len(x)):
sums[int((x[i]-xmin)/step)] += y[i]
print(sums)
xbar = []
for i in range(number_of_bins):
xbar.append(step/2 + xmin + step*i)
# now i have to create the x axis values for the barplot
plt.bar(xbar, sums, label="Bars")
plt.xlabel("X axis label")
plt.ylabel("Y axis label")
plt.legend()
plt.show()
Here is the new result
If you don't understand something please tell me, in order to explain it

Generating specific y-axis in python

I would like to generate a y-axis like shown in the below plot in Python. I guess using matplotlib should help, but i cant figure out the code needed for that.
You need a logarithmic scale but this usually starts from zero. So the trick is to plot (1 - y) instead of y. Then you set the ticks and their labels. My suggestion (the values are < 1, but you can easily scale to 100):
# Some data
x = np.array([1, 2, 3, 4, 5])
y = np.array([0.99, 0.999, 0.9923, 0.995, 0.997])
fig, ax = plt.subplots()
# Plot the inverted data with log scale
ax.plot(x, 1 - y)
ax.set_ylim(0.1, 0.001)
ax.set_yscale("log")
# Now set what ticks (in transformed y) and what labels to use
ticks = np.array([0.0001, 0.001, 0.01, 0.1])
tick_labels = (1 - ticks) * 100
ax.set_yticks(ticks)
ax.set_yticklabels(tick_labels)
ax.set_ylabel("Some value in %")
# And you're done :-)
Let's say for example you have a list for your y axis:
y = [1,2,3,4]
You can add ticks on it like this:
plt.yticks([90.0,99.0,99.9,99.99])
Thus changing the y axis label.

Color-coding a histogram

I have a set of N objects with two properties: x and y.
I would like to depict the distribution of x with a histogram in MATPLOTLIB using hist(). Easy enough. Now, I would like to color-code EACH bar of the histogram with a color that represents the average value of y in that set with a colormap. Is there an easy way to do this? Here, x and y are both N-d numpy arrays. Thanks!
fig = plt.figure()
n, bins, patches = plt.hist(x, 100, normed=1, histtype='stepfilled')
plt.setp(patches, 'facecolor', 'g', 'alpha', 0.1)
plt.xlabel('x')
plt.ylabel('Normalized frequency')
plt.show()
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
# set up the bins
Nbins = 10
bins = np.linspace(0, 1, Nbins +1, endpoint=True)
# get some fake data
x = np.random.rand(300)
y = np.arange(300)
# figure out which bin each x goes into
bin_num = np.digitize(x, bins, right=True) - 1
# compute the counts per bin
hist_vals = np.bincount(bin_num)
# set up array for bins
means = np.zeros(Nbins)
# numpy slicing magic to sum the y values by bin
means[bin_num] += y
# take the average
means /= hist_vals
# make the figure/axes objects
fig, ax = plt.subplots(1,1)
# get a color map
my_cmap = cm.get_cmap('jet')
# get normalize function (takes data in range [vmin, vmax] -> [0, 1])
my_norm = Normalize()
# use bar plot
ax.bar(bins[:-1], hist_vals, color=my_cmap(my_norm(means)), width=np.diff(bins))
# make sure the figure updates
plt.draw()
plt.show()
related: vary the color of each bar in bargraph using particular value

Stacked histogram will not stack

I am trying to run the following code :
variable_values = #numpy vector, one dimension, 5053 values between 1 and 0.
label_values = #numpy vector, one dimension, 5053 values, discrete value of either 1 OR 0.
x = variable_values[variable_values != '?'].astype(float)
y = label_values[variable_values != '?'].astype(float)
print np.max(x) #prints 0.90101
print np.max(y) #prints 1.0
N = 5053
ind = np.arange(N) # the x locations for the groups
width = 0.45 # the width of the bars: can also be len(x) sequence
n, bins, patches = plt.hist(x, 5, stacked=True, normed = True)
#Stack the data
plt.figure()
plt.hist(x, bins, stacked=True, normed = True)
plt.hist(y, bins, stacked=True, normed = True)
plt.show()
What I want to achieve is the following graph :
With the colour on each bar split according to whether its value for label is 1 or 0.
Unfortunately my output currently is :
There are two things incorrect with this - it isn't stacked appropriately first of all. Second of all, the values on the Y axis go up to 1.6, but I believe the Y axis should hold the number of pieces of data that fall into each subgroup (so if all pieces of data had a value of 0-0.25 the only bar that would show data would be the first).
variable_values = #numpy vector, one dimension, 5053 values between 1 and 0.
label_values = #numpy vector, one dimension, 5053 values, discrete
value of either 1 OR 0.
You're trying to use the same bins for x and y. x probably being from 0-1 not including the edges. So y falls outside the range of bins you're plotting.
It's 1.6 because you have chosen to normalize the plot. Set that parameter to false to get the real counts.
This should fix most of these problems:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.random(5053)
y = np.random.random_integers(0,1, 5053)
# x = variable_values[variable_values != '?'].astype(float)
# y = label_values[variable_values != '?'].astype(float)
print np.max(x) #prints 0.90101
print np.max(y) #prints 1.0
N = 5053
ind = np.arange(N) # the x locations for the groups
width = 0.45 # the width of the bars: can also be len(x) sequence
n, bins, patches = plt.hist(x, 5, stacked=True, normed = True)
bins[0] = 0
bins[-1] = 1
#Stack the data
plt.figure()
plt.hist(y, bins, stacked=True, normed = False)
plt.hist(x, bins, stacked=True, normed = False)
plt.show()
May I suggest a more simple solution:
variable_values=np.random.random(size=5053)
label_values=np.random.randint(0,2, size=5053)
plt.hist(variable_values, label='1')
plt.hist(variable_values[label_values==0], label='0')
plt.legend(loc='upper right')
plt.savefig('temp.png')
Actually since the label_values is either 1 or 0, you don't even need to stack the histgram. Just make a histogram of both 1 and 0's and then superimpose a histogram for the 0's on top.
To use stack histogram, although I prefer only to use when there are many different classes:
plt.hist([variable_values[label_values==1],variable_values[label_values==0]], stacked=True, label=['1', '0'])

Changing the tick frequency on the x or y axis

I am trying to fix how python plots my data.
Say:
x = [0,5,9,10,15]
y = [0,1,2,3,4]
matplotlib.pyplot.plot(x,y)
matplotlib.pyplot.show()
The x axis' ticks are plotted in intervals of 5. Is there a way to make it show intervals of 1?
You could explicitly set where you want to tick marks with plt.xticks:
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
For example,
import numpy as np
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.show()
(np.arange was used rather than Python's range function just in case min(x) and max(x) are floats instead of ints.)
The plt.plot (or ax.plot) function will automatically set default x and y limits. If you wish to keep those limits, and just change the stepsize of the tick marks, then you could use ax.get_xlim() to discover what limits Matplotlib has already set.
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
The default tick formatter should do a decent job rounding the tick values to a sensible number of significant digits. However, if you wish to have more control over the format, you can define your own formatter. For example,
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
Here's a runnable example:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 0.712123))
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
plt.show()
Another approach is to set the axis locator:
import matplotlib.ticker as plticker
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
There are several different types of locator depending upon your needs.
Here is a full example:
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
plt.show()
I like this solution (from the Matplotlib Plotting Cookbook):
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
tick_spacing = 1
fig, ax = plt.subplots(1,1)
ax.plot(x,y)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.show()
This solution give you explicit control of the tick spacing via the number given to ticker.MultipleLocater(), allows automatic limit determination, and is easy to read later.
In case anyone is interested in a general one-liner, simply get the current ticks and use it to set the new ticks by sampling every other tick.
ax.set_xticks(ax.get_xticks()[::2])
if you just want to set the spacing a simple one liner with minimal boilerplate:
plt.gca().xaxis.set_major_locator(plt.MultipleLocator(1))
also works easily for minor ticks:
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
a bit of a mouthfull, but pretty compact
This is a bit hacky, but by far the cleanest/easiest to understand example that I've found to do this. It's from an answer on SO here:
Cleanest way to hide every nth tick label in matplotlib colorbar?
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
Then you can loop over the labels setting them to visible or not depending on the density you want.
edit: note that sometimes matplotlib sets labels == '', so it might look like a label is not present, when in fact it is and just isn't displaying anything. To make sure you're looping through actual visible labels, you could try:
visible_labels = [lab for lab in ax.get_xticklabels() if lab.get_visible() is True and lab.get_text() != '']
plt.setp(visible_labels[::2], visible=False)
This is an old topic, but I stumble over this every now and then and made this function. It's very convenient:
import matplotlib.pyplot as pp
import numpy as np
def resadjust(ax, xres=None, yres=None):
"""
Send in an axis and I fix the resolution as desired.
"""
if xres:
start, stop = ax.get_xlim()
ticks = np.arange(start, stop + xres, xres)
ax.set_xticks(ticks)
if yres:
start, stop = ax.get_ylim()
ticks = np.arange(start, stop + yres, yres)
ax.set_yticks(ticks)
One caveat of controlling the ticks like this is that one does no longer enjoy the interactive automagic updating of max scale after an added line. Then do
gca().set_ylim(top=new_top) # for example
and run the resadjust function again.
I developed an inelegant solution. Consider that we have the X axis and also a list of labels for each point in X.
Example:
import matplotlib.pyplot as plt
x = [0,1,2,3,4,5]
y = [10,20,15,18,7,19]
xlabels = ['jan','feb','mar','apr','may','jun']
Let's say that I want to show ticks labels only for 'feb' and 'jun'
xlabelsnew = []
for i in xlabels:
if i not in ['feb','jun']:
i = ' '
xlabelsnew.append(i)
else:
xlabelsnew.append(i)
Good, now we have a fake list of labels. First, we plotted the original version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabels,rotation=45)
plt.show()
Now, the modified version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabelsnew,rotation=45)
plt.show()
Pure Python Implementation
Below's a pure python implementation of the desired functionality that handles any numeric series (int or float) with positive, negative, or mixed values and allows for the user to specify the desired step size:
import math
def computeTicks (x, step = 5):
"""
Computes domain with given step encompassing series x
# params
x - Required - A list-like object of integers or floats
step - Optional - Tick frequency
"""
xMax, xMin = math.ceil(max(x)), math.floor(min(x))
dMax, dMin = xMax + abs((xMax % step) - step) + (step if (xMax % step != 0) else 0), xMin - abs((xMin % step))
return range(dMin, dMax, step)
Sample Output
# Negative to Positive
series = [-2, 18, 24, 29, 43]
print(list(computeTicks(series)))
[-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
# Negative to 0
series = [-30, -14, -10, -9, -3, 0]
print(list(computeTicks(series)))
[-30, -25, -20, -15, -10, -5, 0]
# 0 to Positive
series = [19, 23, 24, 27]
print(list(computeTicks(series)))
[15, 20, 25, 30]
# Floats
series = [1.8, 12.0, 21.2]
print(list(computeTicks(series)))
[0, 5, 10, 15, 20, 25]
# Step – 100
series = [118.3, 293.2, 768.1]
print(list(computeTicks(series, step = 100)))
[100, 200, 300, 400, 500, 600, 700, 800]
Sample Usage
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(computeTicks(x))
plt.show()
Notice the x-axis has integer values all evenly spaced by 5, whereas the y-axis has a different interval (the matplotlib default behavior, because the ticks weren't specified).
Generalisable one liner, with only Numpy imported:
ax.set_xticks(np.arange(min(x),max(x),1))
Set in the context of the question:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = [0,5,9,10,15]
y = [0,1,2,3,4]
ax.plot(x,y)
ax.set_xticks(np.arange(min(x),max(x),1))
plt.show()
How it works:
fig, ax = plt.subplots() gives the ax object which contains the axes.
np.arange(min(x),max(x),1) gives an array of interval 1 from the min of x to the max of x. This is the new x ticks that we want.
ax.set_xticks() changes the ticks on the ax object.
xmarks=[i for i in range(1,length+1,1)]
plt.xticks(xmarks)
This worked for me
if you want ticks between [1,5] (1 and 5 inclusive) then replace
length = 5
Since None of the above solutions worked for my usecase, here I provide a solution using None (pun!) which can be adapted to a wide variety of scenarios.
Here is a sample piece of code that produces cluttered ticks on both X and Y axes.
# Note the super cluttered ticks on both X and Y axis.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x) # set xtick values
ax.set_yticks(y) # set ytick values
plt.show()
Now, we clean up the clutter with a new plot that shows only a sparse set of values on both x and y axes as ticks.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x)
ax.set_yticks(y)
# which values need to be shown?
# here, we show every third value from `x` and `y`
show_every = 3
sparse_xticks = [None] * x.shape[0]
sparse_xticks[::show_every] = x[::show_every]
sparse_yticks = [None] * y.shape[0]
sparse_yticks[::show_every] = y[::show_every]
ax.set_xticklabels(sparse_xticks, fontsize=6) # set sparse xtick values
ax.set_yticklabels(sparse_yticks, fontsize=6) # set sparse ytick values
plt.show()
Depending on the usecase, one can adapt the above code simply by changing show_every and using that for sampling tick values for X or Y or both the axes.
If this stepsize based solution doesn't fit, then one can also populate the values of sparse_xticks or sparse_yticks at irregular intervals, if that is what is desired.
You can loop through labels and show or hide those you want:
for i, label in enumerate(ax.get_xticklabels()):
if i % interval != 0:
label.set_visible(False)

Categories

Resources