Plotting a probability distribution using matplotlib - python

I would like to plot the softmax probabilities for a neural network classification task, similar to the plot below
However most of the code I've found on SO and the doc pages for matplotlib are using histograms.
Examples:
plotting histograms whose bar heights sum to 1 in matplotlib
Python: matplotlib - probability mass function as histogram
http://matplotlib.org/gallery.html
But none of them match what I'm trying to achieve in that plot. Code and sample figure are highly appreciated.

I guess you are just looking for a different plot type. Adapted from here:
# Import
import numpy as np
import matplotlib.pyplot as plt
# Generate random normally distributed data
data=np.random.randn(10000)
# Histogram
heights,bins = np.histogram(data,bins=50)
# Normalize
heights = heights/float(sum(heights))
binMids=bins[:-1]+np.diff(bins)/2.
plt.plot(binMids,heights)
Which produces something like this:
Hope that is what you are looking for.

Related

How to create stacked histogram using matplotlib

I am very new to Python and start to learn matplotlib recently. I have a dataset which have one 5 independent variables and 1 dependent variable. I want to create a stacked histogram which can show the variable distribution within independent variable.
Here is my raw data-
Country, age, new_use, source and total_pages_visited are independent variables. Converted is dependent variable. I want to create separate stacked histogram for each independent variables. And in each histogram, it shows the distribution of variable and mark the different category of 'converted' in different color.
I think what you want is stacked bar plot and you can use pandas to achieve it.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.DataFrame(np.asarray([[1,2],[3,4],[5,6]]),index=['A','B','C'], columns=['Converted-Yes', 'Converted-No'])
df.plot.bar(stacked=True)
plt.show()
The above code generates the plot:

Reproduce statsmodel calculation

I am running an OLS regression with statsmodel, using clustered standard errors (https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.RegressionResults.get_robustcov_results.html):
import statsmodels.api as sm
import statsmodels.formula.api as smf
model = smf.ols(model_string, data=df1).fit(cov_type = 'cluster', cov_kwds={'groups': df1['correctedSE']})
I run a loop with different specifications and would like to visualise the the coefficients of all iterations.
I extract the coefficient and standard error in every iteration with:
coe = model.params["variable_of_interest"]
se = model.bse["variable_of_interest"]
List_coefficients.append(coe)
List_standard_errors.append(se)
I create a simple dataframe out of the lists then want to visualise the coefficients in an errorbar from matplotlib:
import matplotlib
import matplotlib.pyplot as plt
ax = plt.errorbar(df["loop_rounds"], df["Coefficient"], yerr=df150["Standard_e"])
However, when doing so, the confidence intervals are somehow calculated different and some coefficients in the graph are currently significant.
What would be the best way to correct for the calculation difference (e.g. extract different parameters from statsmodel, manually adjust standard errors, change in errorbar)?
If manually editing is the solution, where can I find the formula for the clustered standard errors?
Thank you

Python Heatmaps (Basic and Complex)

What's the best way to do a heatmap in python (2.7)? I've found the heatmap.py module, and I was wondering if people have any advice on using it, or if there are other packages that do a good job.
I'm dealing with pretty basic data, like xy = np.random.rand(1000,2) superimposed on an image.
Although there's another thing I want to try, which is doing a heatmap that's scaled to a different heatmap. E.g., I have
attempts = np.random.rand(5000,2)
successes = np.random.rand(500,2)
And I want a heatmap of the successes relative to the density of the attempts. Is this possible?
Seaborn is a pretty widely-used library for making nice-looking plots, and has a heatmap function. Seaborn uses matplotlib under the hood.
import numpy as np
import seaborn as sns
xy = np.random.rand(1000,2)
sns.heatmap(xy, yticklabels=100)
Regarding your second question, I'm not sure what you mean. But my advice would be to create a numpy array or pandas dataframe of "successes [scaled] relative to the density of the attempts", however you mean that, and then pass that scaled array or dataframe to sns.heatmap
You can plot very complex heatmap using python package PyComplexHeatmap: https://github.com/DingWB/PyComplexHeatmap
https://github.com/DingWB/PyComplexHeatmap/blob/main/examples.ipynb
The most basic heatmap you can get is an image plot:
import matplotlib.pyplot as plt
import numpy as np
xy = np.random.rand(100,2)
plt.imshow(xy, aspect="auto")
plt.colorbar()
plt.show()
Note that using more points than you have pixels to show the heatmap might not make too much sense.
There are of course also different methods to draw a heatmaps and you may go through the matplotlib example gallery and see which plot appeals most to you.

Multiple histograms with logarithmic x scale

This is a combination of this thread on multiple histograms, and this thread on a logarithmic scales.
I am trying to have two (or more) histograms in a plot with a logarithmic x-scale, using this code: (with some external lists)
import numpy
import matplotlib.pyplot as plt
plt.hist([capacity_list, capacity_list2], np.logspace(-1,4,11))
plt.gca().set_xscale("log")
plt.show()
It works in principle; my only problem is that the logarithmic scale also seems to affect the bin width of the histograms and so one if them always has shorter bins, which doesn't look nice:
Does anybody know how to fix that?

pyplot for correlation matrix visualization using python for huge matrix(700 X 700)

I have calculated correlation matrix for 900 feature and removed 200 highly correlated feature now reduced data having about 700 feature.
i have used classical method to plot correlation matrix:
from matplotlib import pyplot as plt
hm = plt.pycolor(cor_mat)
plt.show()
heat-map generated by this method is extremely dense and visualization is very poor how can i improve image for such a huge correlation matrix for my publication work. thanks
For heatmaps and correlation/statistical analysis based plots, I have had more luck with Seaborn, than with matplotlib. Maybe it's worth to check it out.

Categories

Resources