How to plot for frequency only? - python

Question
How can I plot the following scenario, just like shown in the attached image? This is for the purpose of visualising frequency allocation in a network
Scenario
I have a range of frequency values in a list-tuple like so, where the 1st value is the centre frequency, 2nd is total width, 3rd is guard band:
frequencies = [('195.71250000', '59.00000000', '2.50000000'), ('195.78750000', '59.00000000', '2.50000000'), ('195.86250000', '59.00000000', '2.50000000')]
and the range of these values are:
range = [('191.32500000', '196.12500000')]
Note: These are dummy values, the actual data is much larger but follows the same general structure

There are several ways to create this plot. One way is to use ax.vlines to plot the dashed lines for the frequencies and to use ax.bar for the rectangles representing the frequency ranges.
Here is an example where the frequencies are occupied at regular intervals within the range you have given (boundaries included) but with widths of randomly varying size. No guards are computed seeing as they should be automatically apparent thanks to the position of the frequencies and the widths, as far as I understand.
Also, the widths are much smaller compared to the sample data you have provided, else the bars will be very wide and will all overlap with one another, which would look very different from the image you have shared.
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset
rng = np.random.default_rng(seed=1) # random number generator
frequencies = np.arange(191.325, 196.125, step=0.3)
widths = rng.uniform(0.05, 0.25, size=frequencies.size)
# Create figure with single Axes and loop through frequencies and widths to plot
# vertical dashed lines for the frequencies and bars for the widths
fig, ax = plt.subplots(figsize=(10,3))
for freq, width in zip(frequencies, widths):
ax.vlines(x=freq, ymin=0, ymax=10, colors='tab:blue', linestyle='--', zorder=1)
ax.bar(x=freq, height=6, width=width, color='tab:blue', zorder=2)
# Additional formatting
ax.set_xlabel('Frequency (THZ)', labelpad=15, size=12)
ax.set_xticks(frequencies[::2])
ax.yaxis.set_visible(False)
for spine in ['top', 'left', 'right']:
ax.spines[spine].set_visible(False)
plt.show()

Related

Set log xticks in matplotlib for a linear plot

Consider
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.hist(np.log10(xdata), bins=100)
plt.show()
plt.semilogy(xdata)
plt.show()
is there any way to display xticks of the first plot (plt.hist) as in the second plot's yticks? For good reasons I want to histogram the np.log10(xdata) of xdata but I'd like to set minor ticks to display as usual in a log scale (even considering that the exponent is linear...)
In other words, I want the x_axis of this plot:
to be like the y_axis
of the 2nd plot, without changing the spacing between major ticks (e.g., adding log marks between 5.5 and 6.0, without altering these values)
Proper histogram plot with logarithmic x-axis:
Explanation:
Cut off negative values
The randomly generated example data likely contains still some negative values
activate the commented code lines at the beginning to see the effect
logarithmic function isn't defined for values <= 0
while the 2nd plot just deals with y-axis log scaling (negative values are just out of range), the 1st plot doesn't work with negative values in the BINs range
probably real world working data won't be <= 0, otherwise keep that in mind
BINs should be aligned to log scale as well
otherwise the 'BINs widths' distribution looks off
switch # on the plt.hist( statements in the 1st plot section to see the effect)
xdata (not np.log10(xdata)) to be plotted in the histogram
that 'workaround' with plotting np.log10(xdata) probably was the root cause for the misunderstanding in the comments
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
# MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}") # note the negative values
# cut off potential negative values (log function isn't defined for <= 0 )
xdata = np.ma.masked_less_equal(xdata, 0)
MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}")
# align the bins to fit a log scale
bins = 100
bins_log_aligned = np.logspace(np.log10(MIN_xdata), np.log10(MAX_xdata), bins)
# 1st plot
plt.hist(xdata, bins = bins_log_aligned) # note: xdata (not np.log10(xdata) )
# plt.hist(xdata, bins = 100)
plt.xscale('log')
plt.show()
# 2nd plot
plt.semilogy(xdata)
plt.show()
Just kept for now for clarification purpose. Will be deleted when the question is revised.
Disclaimer:
As Lucas M. Uriarte already mentioned that isn't an expected way of changing axis ticks.
x axis ticks and labels don't represent the plotted data
You should at least always provide that information along with such a plot.
The plot
From seeing the result I kinda understand where that special plot idea is coming from - still there should be a preferred way (e.g. conversion of the data in advance) to do such a plot instead of 'faking' the axis.
Explanation how that special axis transfer plot is done:
original x-axis is hidden
a twiny axis is added
note that its y-axis is hidden by default, so that doesn't need handling
twiny x-axis is set to log and the 2nd plot y-axis limits are transferred
subplots used to directly transfer the 2nd plot y-axis limits
use variables if you need to stick with your two plots
twiny x-axis is moved from top (twiny default position) to bottom (where the original x-axis was)
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.figure()
fig, axs = plt.subplots(2, figsize=(7,10), facecolor=(1, 1, 1))
# 1st plot
axs[0].hist(np.log10(xdata), bins=100) # plot the data on the normal x axis
axs[0].axes.xaxis.set_visible(False) # hide the normal x axis
# 2nd plot
axs[1].semilogy(xdata)
# 1st plot - twin axis
axs0_y_twin = axs[0].twiny() # set a twiny axis, note twiny y axis is hidden by default
axs0_y_twin.set(xscale="log")
# transfer the limits from the 2nd plot y axis to the twin axis
axs0_y_twin.set_xlim(axs[1].get_ylim()[0],
axs[1].get_ylim()[1])
# move the twin x axis from top to bottom
axs0_y_twin.tick_params(axis="x", which="both", bottom=True, top=False,
labelbottom=True, labeltop=False)
# Disclaimer
disclaimer_text = "Disclaimer: x axis ticks and labels don't represent the plotted data"
axs[0].text(0.5,-0.09, disclaimer_text, size=12, ha="center", color="red",
transform=axs[0].transAxes)
plt.tight_layout()
plt.subplots_adjust(hspace=0.2)
plt.show()

Seaborn heatmap with variating cell sizes

I have a heatmap with ticks which have non equal deltas between themselves:
For example, in the attached image, the deltas are between 0.015 to 0.13. The current scale doesn't show the real scenario, since all cell sizes are equal.
Is there a way to place the ticks in their realistic positions, such that cell sizes would also change accordingly?
Alternatively, is there another method to generate this figure such that it would provide a realistic representation of the tick values?
As mentioned in the comments, a Seaborn heatmap uses categorical labels. However, the underlying structure is a pcolormesh, which can have different sizes for each cell.
Also mentioned in the comments, is that updating the private attributes of the pcolormesh isn't recommended. Moreover, the heatmap can be directly created calling pcolormesh.
Note that if there are N cells, there will be N+1 boundaries. The example code below supposes you have x-positions for the centers of the cells. It then calculates boundaries in the middle between successive cells. The first and the last distance is repeated.
The ticks and tick labels for x and y axis can be set from the given x-values. The example code supposes the original values indicate the centers of the cells.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set()
N = 10
xs = np.random.uniform(0.015, 0.13, 10).cumsum().round(3) # some random x values
values = np.random.rand(N, N) # a random matrix
# set bounds in the middle of successive cells, add extra bounds at start and end
bounds = (xs[:-1] + xs[1:]) / 2
bounds = np.concatenate([[2 * bounds[0] - bounds[1]], bounds, [2 * bounds[-1] - bounds[-2]]])
fig, ax = plt.subplots()
ax.pcolormesh(bounds, bounds, values)
ax.set_xticks(xs)
ax.set_xticklabels(xs, rotation=90)
ax.set_yticks(xs)
ax.set_yticklabels(xs, rotation=0)
plt.tight_layout()
plt.show()
PS: In case the ticks are mean to be the boundaries, the code can be simplified. One extra boundary is needed, for example a zero at the start.`
bounds = np.concatenate([[0], xs])
ax.tick_params(bottom=True, left=True)

Plotting a cumulative histogram with exported data in Python

I am trying to plot a cumulative histogram similar to the one shown below. It shows the number of occurrences (y-axis) of the French pronoun “vous” in a text corpus (x-axis) represented from word 0 to 92,633. It’s been created using a corpus analysis application named TXM. TXM’s plots, however, are not adapted to the specific requirements of my publisher. I would like to produce my own plots exporting the data to python. The problem is that the data exported by TXM is a bit puzzling, and I am wondering how I it can be used to make plots:
it’s a one-column txt file with integers.
Each one of them indicates the position of “vous” in the text corpus. Word 2620 is one “vous,”
3376, another one, etc. One of my attempts with Matplotlib :
from matplotlib import pyplot as plt
pos = [2620,3367,3756,4522,4546,9914,9972,9979,9987,10013,10047,10087,10114,13635,13645,13646,13758,13771,13783,13796,23410,23420,28179,28265,28274,28297,28344,34579,34590,34612,40280,40449,40570,40932,40938,40969,40983,41006,41040,41069,41096,41120,41214,41474,41478,42524,42533,42534,45569,45587,45598,56450,57574,57587]
plt.bar(pos, 1)
plt.show()
But this doesn't come close.
What steps should I follow to complete the plot?
Desired plot:
With matplotlib, you could create the step plot as follows. where='post' means the value changes at every x-position and stays so until the next x-position.
The x-values are the positions in the text, a zero is prepended to let the graph start with zero occurrences. The text-length is appended at the end. The y-values are the numbers 0, 1, 2, ..., where the last value is repeated to draw the last step in full.
from matplotlib import pyplot as plt
from matplotlib.ticker import MultipleLocator, StrMethodFormatter
import numpy as np
pos = [2620,3367,3756,4522,4546,9914,9972,9979,9987,10013,10047,10087,10114,13635,13645,13646,13758,13771,13783,13796,23410,23420,28179,28265,28274,28297,28344,34579,34590,34612,40280,40449,40570,40932,40938,40969,40983,41006,41040,41069,41096,41120,41214,41474,41478,42524,42533,42534,45569,45587,45598,56450,57574,57587]
text_len = 92633
cum = np.arange(0, len(pos) + 1)
fig, ax = plt.subplots(figsize=(12, 3))
ax.step([0] + pos + [text_len], np.pad(cum, (0, 1), 'edge'), where='post', label=f'vous {len(pos)}')
ax.xaxis.set_major_locator(MultipleLocator(5000)) # x-ticks every 5000
ax.xaxis.set_major_formatter(StrMethodFormatter('{x:,.0f}')) # use the thousands separator
ax.yaxis.set_major_locator(MultipleLocator(5)) # have a y-tick every 5
ax.grid(b=True, ls=':') # show a grid with dotted lines
ax.autoscale(enable=True, axis='x', tight=True) # disable padding x-direction
ax.set_xlabel(f'T={text_len:,d}')
ax.set_ylabel('Occurrences')
ax.set_title("Progression of 'vous' in TCN")
plt.legend() # add a legend (uses the label of ax.step)
plt.tight_layout()
plt.show()

Excluding a certain range of bins in a matplotlib histogram?

I'm using matplotlib to look at how wins are distributed based on betting odds for the MLB. The issue is that because betting odds are either >= 100 or <= -100, there's a big gap in the middle of my histogram.
Is there any way to exclude certain bins (specifically anything between -100 and 100) so that the bars of the chart flow more smoothly?
Link to current histogram
Here's the code I have right now:
num_bins = 20
fig, ax = plt.subplots()
n, bins, patches = ax.hist(winner_odds_df['WinnerOdds'], num_bins,
range=range_of_winner_odds)
ax.set_xlabel('Betting Odds')
ax.set_ylabel('Win Frequency')
ax.set_title('Histogram of Favorite Win Frequency Based on Betting Odds (2018)')
fig.tight_layout()
plt.show()
You could break your chart's x-axis as explained here, by plotting on two different axes that are made to visually look like one plot. The essential part, rewritten to apply to the x-axis instead of the y-axis, is:
f, (axl, axr) = plt.subplots(1, 2, sharey=True)
# plot the same data on both axes
axl.hist(winner_odds_df['WinnerOdds'], num_bins)
axr.hist(winner_odds_df['WinnerOdds'], num_bins)
# zoom-in / limit the view to different portions of the data
axl.set_xlim(-500, -100) # outliers only
axr.set_xlim(100, 500) # most of the data
# hide the spines between axl and axr
axl.spines['right'].set_visible(False)
axr.spines['left'].set_visible(False)
axr.yaxis.tick_right()
# How much space to leave between plots
plt.subplots_adjust(wspace=0.15)
See the linked document for how to polish this by adding diagonal break lines. The basic version produced by the code above then looks like this:

pyplot - How to set a specific range in x axis

I have a CDF plot with data of wifi usage in MB. For better understanding I would like to present the usage starting in KB and finishing in TB. I would like to know how to set a specific range for x axis to replace the produce by plt.plot() and show the axis x, per example, as [1KB 10KB 1MB 10MB 1TB 10TB], even the space between bins not representing the real values.
My code for now:
wifi = np.sort(matrix[matrix['wifi_total_mb']>0]['wifi_total_mb'].values)
g = sns.distplot(wifi, kde_kws=dict(cumulative=True))
plt.show()
Thanks
EDIT 1
I know that I can use plt.xticks and i already tried it: plt.xticks([0.00098, 0.00977, 1, 10, 1024, 10240, 1048576, 10485760, 24117248]). These are values in MB that represents the sample range I specified before. But the plot is still wrong.
The result expected
In excel it is pretty easy makes what I want to. Look the image, with the same range I get the plot I wanted.
Thanks
It may be better to calculate the data to plot manually, instead of relying on some seaborn helper function like distplot. This also makes it easier to understand the underlying issue of histogramming with very unequal bin sizes.
Calculating histogram
The histogram of the data can be calculated by using np.histogram(). It can take the desired bins as argument.
In order to get the cummulative histogram, np.cumsum does the job.
Now there are two options here: (a) plotting the real data or (b) plotting the data enumerated by bin.
(a) Plotting the real data:
Because the bin sizes are pretty unequal, a logarithmic scaling seems adequate, which can be done by semilogx(x,y). The bin edges can be shown as xticks using set_xticks (and since the semilogx plot will not automatically set the labels correctly, we also need to set them to the bin edges' values).
(b) Plotting data enumerated by bin:
The second option is to plot the histogram values bin by bin, independent of the actual bin size. Is is very close to the Excel solution from the question. In this case the x values of the plot are simply values from 0 to number of bins and the xticklabels are the bin edges.
Here is the complete code:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#use the bin from the question
bins = [0, 0.00098, 0.00977, 1, 10, 1024, 10240, 1048576, 10485760, 24117248]
# invent some data
data = np.random.lognormal(2,4,10000)
# calculate histogram of the data into the given bins
hist, _bins = np.histogram(data, bins=bins)
# make histogram cumulative
cum_hist=np.cumsum(hist)
# normalize data to 1
norm_cum_hist = cum_hist/float(cum_hist.max())
fig, (ax, ax2) = plt.subplots(nrows=2)
plt.subplots_adjust(hspace=0.5, bottom=0.17)
# First option plots the actual data, i.e. the bin width is reflected
# by the spacing between values on x-axis.
ax.set_title("Plotting actual data")
ax.semilogx(bins[1:],norm_cum_hist, marker="s")
ax.set_xticks(bins[1:])
ax.set_xticklabels(bins[1:] ,rotation=45, horizontalalignment="right")
# Second option plots the data bin by bin, i.e. every bin has the same width,
# independent of it's actual value.
ax2.set_title("Plotting bin by bin")
ax2.plot(range(len(bins[1:])),norm_cum_hist, marker="s")
ax2.set_xticks(range(len(bins[1:])))
ax2.set_xticklabels(bins[1:] ,rotation=45, horizontalalignment="right")
for axes in [ax, ax2]:
axes.set_ylim([0,1.05])
plt.show()

Categories

Resources