How to do a histogram from 2 datasets (Bin problem)

How to do a histogram from 2 datasets (Bin problem) - python

I am trying to do a histogram like the one below but I am struggling with the bins. This is my code:
plt.subplots(figsize=(2, 1), dpi=400)
width = 0.005
plt.xticks(((density_1.index.unique()) | set(density_2.index.unique())), rotation=90, fontsize=1.5)
plt.yticks(list(set(density_1.unique()) | set(density_2.unique())), fontsize=2)
plt.hist(density_1.index, density_1, width, color='Green', label=condition_1,alpha=0.5)
plt.hist(density_2.index, density_2, width, color='Red', label=condition_2,alpha=0.5,bins=my_beans1)
plt.legend(loc="upper right", fontsize=2)
plt.show()
Those are my pandas:
1st Data sample:
Xticks Yticks
0.27 0.068182
0.58 0.045455
0.32 0.045455
0.47 0.045455
0.75 0.045455
0.17 0.045455
0.43 0.022727
0.66 0.022727
0.11 0.022727
0.68 0.022727
0.59 0.022727
2nd Data sample:
Xticks Yticks
0.94 0.058442
0.86 0.058442
0.74 0.045455
0.93 0.045455
0.99 0.045455
0.71 0.038961
0.63 0.019481
0.97 0.019481
0.87 0.019481
0.84 0.019481
0.75 0.019481
0.89 0.019481
0.80 0.012987
I did this picture by using plt.bar() but I need to do with plt.hist. That is for the full dataset, but I am providing a sample of my dataframe to make it shorter.
I saw some forums and webs to do hist and the use of bins but I always get errors.
I tried something like this:
my_bins1=density_2.unique()
my_bins2=10

Assuming that you want to use a histogram to count the frequency of y-ticks, something like this might be what you are looking for:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
data1 = pd.read_csv(r"file1", sep='\t')
data2 = pd.read_csv(r"file2", sep='\t')
data1 = data1.set_index('x_ticks')
data2 = data2.set_index('x_ticks')
plt.figure()
bins=np.linspace(0, 0.1, num=100)
n, bins, rectangles = plt.hist(data1, bins, color = 'green', alpha=0.5, label='dataset 1')
plt.hist(data2, bins, color = 'red', alpha=0.5, label='dataset 2')
plt.legend(loc='upper right')
plt.title('frequency of y-ticks')
plt.show()
Output looks like this:

Related

Get a colormap of different shades to multiple matplotlib plots

I am plotting multiple plots generated from a for loop. I want each plot to be in a different shade of the same color so that it can be easily identified. I don't know how many plots, because it depends on a user selection (There can be 5 plots or 7 or 8 or etc.).
In this MWE, I used matplotlib blues to plot the graphs in different shads of blue. As you can see, the shades are not very different from each other. What parameter do I have to change so that the shades are noticeably different from each other?
(Now, I can "not" use colors at all and let the matplotlib use its default colors by removing color = blues(xaxis[j]). But I am looking for shades, not totally different colors.)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
plt.close("all")
blues = cm.get_cmap('Blues', 10)
fig, ax = plt.subplots()
l1 = np.array([1,2,3,4,5,6,7,8,9,10])
j=0
xaxis = np.linspace(0.5,1, 1000)
for i in range(0,10):
l0 = l1*i
print(l0)
ax.plot(l1, l0,color = blues(xaxis[j]),label='plot')
j+=1
ax.legend()

By specifying cm.get_cmap('Blues', 10) you are creating a colormap of 10 distinct shades of blue. You can access each specific shade like so:
from matplotlib import cm
blues = cm.get_cmap("Blues", 10)
for i in range(blues.N):
print(f'RGBA {i}:' *(f"{v:.2f}" for v in blues(i)))
# RGBA 0: 0.97 0.98 1.00 1.00
# RGBA 1: 0.88 0.93 0.97 1.00
# RGBA 2: 0.80 0.87 0.94 1.00
# RGBA 3: 0.67 0.81 0.90 1.00
# RGBA 4: 0.51 0.73 0.86 1.00
# RGBA 5: 0.35 0.63 0.81 1.00
# RGBA 6: 0.22 0.53 0.75 1.00
# RGBA 7: 0.11 0.42 0.69 1.00
# RGBA 8: 0.03 0.30 0.59 1.00
# RGBA 9: 0.03 0.19 0.42 1.00
The issue here is that the steps of your xaxis variable are too small and are converging to the same exact shade of blue due to rounding.
import numpy as np
from matplotlib import cm
blues = cm.get_cmap("Blues", 10)
xaxis = np.linspace(.5, 1, 1000)
for i in range(10):
print(f"RGBA {xaxis[i]:.3f}:", *(f"{v:.2f}" for v in blues(xaxis[i])))
# RGBA 0.500: 0.35 0.63 0.81 1.00
# RGBA 0.501: 0.35 0.63 0.81 1.00
# RGBA 0.501: 0.35 0.63 0.81 1.00
# RGBA 0.502: 0.35 0.63 0.81 1.00
# RGBA 0.502: 0.35 0.63 0.81 1.00
# RGBA 0.503: 0.35 0.63 0.81 1.00
# RGBA 0.503: 0.35 0.63 0.81 1.00
# RGBA 0.504: 0.35 0.63 0.81 1.00
# RGBA 0.504: 0.35 0.63 0.81 1.00
# RGBA 0.505: 0.35 0.63 0.81 1.00
Therefore, your xaxis variable should not intermediate the color selection instead, you can simplify your code like so:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
blues = cm.get_cmap("Blues", 10)
fig, ax = plt.subplots()
l1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
for i in range(blues.N):
ax.plot(l1, l1 * i, color=blues(i), label=f"line {i}")
ax.legend()
plt.show()

How to visualise means with Seaborn?

I have a Pandas data frame with the following structure:
alpha beta gamma mse
0 0.00 0.00 0.00 0.000000
1 0.05 0.05 0.90 0.025411
2 0.05 0.10 0.85 0.025794
3 0.05 0.15 0.80 0.026289
4 0.05 0.20 0.75 0.025320
.. ... ... ... ...
148 0.75 0.05 0.20 0.026816
149 0.75 0.10 0.15 0.025817
150 0.75 0.15 0.10 0.025702
151 0.80 0.05 0.15 0.027104
152 0.80 0.10 0.10 0.025936
I would like to visualise the data frame with a heatmap where alpha is represented on the x-axis, beta is represented on the y-axis, and for each square of the lattice, the mean MSE over all gammas is computed. Is there an easy way to do this by using Seaborn?
Thanks in advance.

For what you showed, yes, you can do with:
sns.heatmap(df.pivot_table(index='beta', columns='alpha', values='mse'))

All the calculation should be done in your DataFrame.
Once you have the data, you could use pivoted DataFrame to build the heatmap
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Assuming that you have the df variable with your data
# pivot the data
pivoted = df.pivot('alpha', 'beta', 'mse')
# plot the heatmap
sns.heatmap(pivoted, annot=True)
plt.show()
More information in the official documentation: https://seaborn.pydata.org/generated/seaborn.heatmap.html

Why are bars missing in my stacked bar chart -- Python w/matplotlib

all.
I am trying to create a stacked bar chart built using time series data. My issue -- if I plot my data as time series (using lines) then everything works fine and I get a (messy) time series graph that includes correct dates. However, if I instead try to plot this as a stacked bar chart, my dates disappear and none of my bars appear.
I have tried messing with the indexing, height, and width of the bars. No luck.
Here is my code:
import pylab
import pandas as pd
import matplotlib.pyplot as plt
df1= pd.read_excel('pathway/filename.xls')
df1.set_index('TIME', inplace=True)
ax = df1.plot(kind="Bar", stacked=True)
ax.set_xlabel("Date")
ax.set_ylabel("Change in Yield")
df1.sum(axis=1).plot( ax=ax, color="k", title='Historical Decomposition -- 1 year -- One-Quarter Revision')
plt.axhline(y=0, color='r', linestyle='-')
plt.show()
If i change
ax = df1.plot(kind="Bar", stacked=True)
to ax = df1.plot(kind="line", stacked=False)
I get:
if instead I use ax = df1.plot(kind="Bar", stacked=True)
I get:
Any thoughts here?

Without knowing what the data looks like, I'd try something like this:
#Import data here and generate DataFrame
print(df.head(5))
A B C D
DATE
2020-01-01 -0.01 0.06 0.40 0.45
2020-01-02 -0.02 0.05 0.39 0.42
2020-01-03 -0.03 0.04 0.38 0.39
2020-01-04 -0.04 0.03 0.37 0.36
2020-01-05 -0.05 0.02 0.36 0.33
f, ax = plt.subplots()
ax.bar(df.index, df['A'])
ax.bar(df.index, df['B'])
ax.bar(df.index, df['C'], bottom=df['B'])
ax.plot(df.index, df['D'], color='black', linewidth=2)
ax.set_xlabel('Date')
ax.set_ylabel('Change in Yield')
ax.axhline(y=0, color='r')
ax.set_xticks([])
ax.legend()
plt.show()
Edit:: Ok, I've found a way looking at this post here:
Plot Pandas DataFrame as Bar and Line on the same one chart
Try resetting the index so that it is a separate column. In my example, it is called 'DATE'. Then try:
ax = df[['DATE','D']].plot(x='DATE',color='black')
df[['DATE','A','B','C']].plot(x='DATE', kind='bar',stacked=True,ax=ax)
ax.axhline(y=0, color='r')
ax.set_xticks([])
ax.set_xlabel('Date')
ax.set_ylabel('Change in Yield')
ax.legend()
plt.show()

Visualising 10 dimensional data with matplotlib

I have this kind of data :
ID x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 -0.18 5 -0.40 -0.26 0.53 -0.66 0.10 2 -0.20 1
2 -0.58 5 -0.52 -1.66 0.65 -0.15 0.08 3 3.03 -2
3 -0.62 5 -0.09 -0.38 0.65 0.22 0.44 4 1.49 1
4 -0.22 -3 1.64 -1.38 0.08 0.42 1.24 5 -0.34 0
5 0.00 5 1.76 -1.16 0.78 0.46 0.32 5 -0.51 -2
what's the best method for visualizing this data, i'm using matplotlib to visualizing it, and read it from csv using pandas
thanks

Visualising data in a high-dimensional space is always a difficult problem. One solution that is commonly used (and is now available in pandas) is to inspect all of the 1D and 2D projections of the data. It doesn't give you all of the information about the data, but that's impossible to visualise unless you can see in 10D! Here's an example of how to do this with pandas (version 0.7.3 upwards):
import numpy as np
import pandas as pd
from pandas.plotting import scatter_matrix
#first make some fake data with same layout as yours
data = pd.DataFrame(np.random.randn(100, 10), columns=['x1', 'x2', 'x3',\
'x4','x5','x6','x7','x8','x9','x10'])
#now plot using pandas
scatter_matrix(data, alpha=0.2, figsize=(6, 6), diagonal='kde')
This generates a plot with all of the 2D projections as scatter plots, and KDE histograms of the 1D projections:
I also have a pure matplotlib approach to this on my github page, which produces a very similar type of plot (it is designed for MCMC output, but is also appropriate here). Here's how you'd use it here:
import corner_plot as cp
cp.corner_plot(data.as_matrix(),axis_labels=data.columns,nbins=10,\
figsize=(7,7),scatter=True,fontsize=10,tickfontsize=7)

You may change the plot over the time, for each instant you plot a different "dimension" of the dataframe.
Here an example on how you can do plots that change over the time, you may adjust it for your purposes
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
plt.grid(True)
plt.hold(False)
x = np.arange(-3, 3, 0.01)
for n in range(15):
y = np.sin(np.pi*x*n) / (np.pi*x*n)
line, = ax.plot(x, y)
plt.draw()
plt.pause(0.5)

Histogram with breaking axis and interlaced colorbar

I have data as those ones
a b c d e
alpha 5.51 0.60 -0.12 26.90 76284.53
beta 3.39 0.94 -0.17 -0.20 -0.20
gamma 7.98 3.34 -1.41 7.74 28394.93
delta 2.29 1.24 0.40 0.29 0.28
I want to do a nice publishable histogram as this one
but with a break in the y axis so we can figure out the variation of a , b , c , d and e so that data will not be squashed by extreme values in e column as this one but using interlaced colorbar histogram:
I would like to do that in python (matplotlib, pandas, numpy/scipy) or in mathematica... or any other open and free high-level language (R, scilab, ...). Thanks for your help.
edit: using matplotlib through pandas allows to adjust the space between the two subgraph using option button at bottom left "hspace".

Have you seen this example? It's for a broken y-axis plot in matplotlib.
Hope this helps.
Combining with pandas this gives:
import pandas as pd
import matplotlib.pyplot as plt
from StringIO import StringIO
data = """\
a b c d e
alpha 5.51 0.60 -0.12 26.90 76284.53
beta 3.39 0.94 -0.17 -0.20 -0.20
gamma 7.98 3.34 -1.41 7.74 28394.93
delta 2.29 1.24 0.40 0.29 0.28
"""
df = pd.read_csv(StringIO(data), sep='\s+')
f, axis = plt.subplots(2, 1, sharex=True)
df.plot(kind='bar', ax=axis[0])
df.plot(kind='bar', ax=axis[1])
axis[0].set_ylim(20000, 80000)
axis[1].set_ylim(-2, 30)
axis[1].legend().set_visible(False)
axis[0].spines['bottom'].set_visible(False)
axis[1].spines['top'].set_visible(False)
axis[0].xaxis.tick_top()
axis[0].tick_params(labeltop='off')
axis[1].xaxis.tick_bottom()
d = .015
kwargs = dict(transform=axis[0].transAxes, color='k', clip_on=False)
axis[0].plot((-d,+d),(-d,+d), **kwargs)
axis[0].plot((1-d,1+d),(-d,+d), **kwargs)
kwargs.update(transform=axis[1].transAxes)
axis[1].plot((-d,+d),(1-d,1+d), **kwargs)
axis[1].plot((1-d,1+d),(1-d,1+d), **kwargs)
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to do a histogram from 2 datasets (Bin problem) - python

Related

Get a colormap of different shades to multiple matplotlib plots

How to visualise means with Seaborn?

Why are bars missing in my stacked bar chart -- Python w/matplotlib

Visualising 10 dimensional data with matplotlib

Histogram with breaking axis and interlaced colorbar

Categories

Resources