I am very new to Python and start to learn matplotlib recently. I have a dataset which have one 5 independent variables and 1 dependent variable. I want to create a stacked histogram which can show the variable distribution within independent variable.
Here is my raw data-
Country, age, new_use, source and total_pages_visited are independent variables. Converted is dependent variable. I want to create separate stacked histogram for each independent variables. And in each histogram, it shows the distribution of variable and mark the different category of 'converted' in different color.
I think what you want is stacked bar plot and you can use pandas to achieve it.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.DataFrame(np.asarray([[1,2],[3,4],[5,6]]),index=['A','B','C'], columns=['Converted-Yes', 'Converted-No'])
df.plot.bar(stacked=True)
plt.show()
The above code generates the plot:
Related
I have a list of photometric redshifts and spectroscopic redshifts, and I need to make a scatterplot of these numbers to compare them. The problem is that I don't know how to make a scatterplot in python. How do you graph a scatterplot in python?
Simple Approach
First import the matplotlib package
Use the plot method, then the scatter method (both contained within the matplotlib package) to create the scatterplot
import matplotlib
%matplotlib inline # to ensure the scatter output will be shown instead of code
your_data = pd.read_csv('your_dataset')
data = your_data # to avoid typing your_data each time
scatterplot = data.plot.scatter(x='select_your_x_axis', y='select_your_y_axis')
scatterplot.plot()
Hope this helps :)
What's the best way to do a heatmap in python (2.7)? I've found the heatmap.py module, and I was wondering if people have any advice on using it, or if there are other packages that do a good job.
I'm dealing with pretty basic data, like xy = np.random.rand(1000,2) superimposed on an image.
Although there's another thing I want to try, which is doing a heatmap that's scaled to a different heatmap. E.g., I have
attempts = np.random.rand(5000,2)
successes = np.random.rand(500,2)
And I want a heatmap of the successes relative to the density of the attempts. Is this possible?
Seaborn is a pretty widely-used library for making nice-looking plots, and has a heatmap function. Seaborn uses matplotlib under the hood.
import numpy as np
import seaborn as sns
xy = np.random.rand(1000,2)
sns.heatmap(xy, yticklabels=100)
Regarding your second question, I'm not sure what you mean. But my advice would be to create a numpy array or pandas dataframe of "successes [scaled] relative to the density of the attempts", however you mean that, and then pass that scaled array or dataframe to sns.heatmap
You can plot very complex heatmap using python package PyComplexHeatmap: https://github.com/DingWB/PyComplexHeatmap
https://github.com/DingWB/PyComplexHeatmap/blob/main/examples.ipynb
The most basic heatmap you can get is an image plot:
import matplotlib.pyplot as plt
import numpy as np
xy = np.random.rand(100,2)
plt.imshow(xy, aspect="auto")
plt.colorbar()
plt.show()
Note that using more points than you have pixels to show the heatmap might not make too much sense.
There are of course also different methods to draw a heatmaps and you may go through the matplotlib example gallery and see which plot appeals most to you.
I would like to plot the softmax probabilities for a neural network classification task, similar to the plot below
However most of the code I've found on SO and the doc pages for matplotlib are using histograms.
Examples:
plotting histograms whose bar heights sum to 1 in matplotlib
Python: matplotlib - probability mass function as histogram
http://matplotlib.org/gallery.html
But none of them match what I'm trying to achieve in that plot. Code and sample figure are highly appreciated.
I guess you are just looking for a different plot type. Adapted from here:
# Import
import numpy as np
import matplotlib.pyplot as plt
# Generate random normally distributed data
data=np.random.randn(10000)
# Histogram
heights,bins = np.histogram(data,bins=50)
# Normalize
heights = heights/float(sum(heights))
binMids=bins[:-1]+np.diff(bins)/2.
plt.plot(binMids,heights)
Which produces something like this:
Hope that is what you are looking for.
As part of a project I'm working on I need to add data to a histogram in a loop. Part of the requirements of the project is that I don't use arrays to store data. Here's the psedo code of what I'm trying to do:
import matplotlib.pyplot as plt #could by numpy if that works better
plt.hist(define histogram with n bins)
for i in range (bignumber):
MCMC to find datapoint
add point to histogram
plt.plot()
The code I'm having trouble with is how to prefine a histogram with no data then append data to it as its generated.
As a bit self-advertisment (disclaimer!)... for updateable histograms, you can use my library called physt: https://github.com/janpipek/physt . After you collect all the data, you may plot the results in a way similar to matplotlib (in fact, using matplotlib in behind).
I want to create a stacked bar plot with different amount of stacks for each bar. The general example for stacked bars works fine if my data are all homogenous, but I want something that rather looks like the shown example.
This turned out to be whole other level in Matplotlib (while still easy with some Excel-like tool, as you can see). Is there a convenient way of creating this kind of plot in Matplotlib? Thanks.
I guess you are working directly in matplotlib, but these days plotting data, especially for quick a view can be easily done with pandas, following your example we get:
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use("ggplot")
import pandas as pd
import numpy as np
df = pd.DataFrame([pd.Series([10,20,40,10,np.nan]), pd.Series([20,10,30,10,10]), pd.Series([30,40, np.nan, np.nan, np.nan])], index=["Bar1", "Bar2", "Bar3"])
df.plot.bar(stacked=True)
plt.show()