Plotly: How to make a 3D stacked histogram? - python

I have several histograms that I succeded to plot using plotly like this:
fig.add_trace(go.Histogram(x=np.array(data[key]), name=self.labels[i]))
I would like to create something like this 3D stacked histogram but with the difference that each 2D histogram inside is a true histogram and not just a hardcoded line (my data is of the form [0.5 0.4 0.5 0.7 0.4] so using Histogram directly is very convenient)
Note that what I am asking is not similar to this and therefore also not the same as this. In the matplotlib example, the data is presented directly in a 2D array so the histogram is the 3rd dimension. In my case, I wanted to feed a function with many already computed histograms.

The snippet below takes care of both binning and formatting of the figure so that it appears as a stacked 3D chart using multiple traces of go.Scatter3D and np.Histogram.
The input is a dataframe with random numbers using np.random.normal(50, 5, size=(300, 4))
We can talk more about the other details if this is something you can use:
Plot 1: Angle 1
Plot 2: Angle 2
Complete code:
# imports
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = 'browser'
# data
np.random.seed(123)
df = pd.DataFrame(np.random.normal(50, 5, size=(300, 4)), columns=list('ABCD'))
# plotly setup
fig=go.Figure()
# data binning and traces
for i, col in enumerate(df.columns):
a0=np.histogram(df[col], bins=10, density=False)[0].tolist()
a0=np.repeat(a0,2).tolist()
a0.insert(0,0)
a0.pop()
a1=np.histogram(df[col], bins=10, density=False)[1].tolist()
a1=np.repeat(a1,2)
fig.add_traces(go.Scatter3d(x=[i]*len(a0), y=a1, z=a0,
mode='lines',
name=col
)
)
fig.show()

Unfortunately you can't use go.Histogram in a 3D space so you should use an alternative way. I used go.Scatter3d and I wanted to use the option to fill line doc but there is an evident bug see
import numpy as np
import plotly.graph_objs as go
# random mat
m = 6
n = 5
mat = np.random.uniform(size=(m,n)).round(1)
# we want to have the number repeated
mat = mat.repeat(2).reshape(m, n*2)
# and finally plot
x = np.arange(2*n)
y = np.ones(2*n)
fig = go.Figure()
for i in range(m):
fig.add_trace(go.Scatter3d(x=x,
y=y*i,
z=mat[i,:],
mode="lines",
# surfaceaxis=1 # bug
)
)
fig.show()

Related

How to make animated 3D scatter plot in plotly

My goal is to create an animation with my 3D data in plotly.
I have 3 variables x,y,z for simplicity and I plot the 4th value depending on these x,y,z.
I create a 3D scatter plot where the 4th dim sort to speak is the color like this:
from numpy import genfromtxt
import numpy as np
import plotly.io as pio
import plotly.express as px
pio.renderers.default = 'notebook'
import plotly.graph_objects as go
import math
import pandas as pd
data = pd.read_csv("paramtp_1e-05_big.txt")
data.head()
data = data.iloc[::10, :]
color_data = data['gopt'].astype(float).round(decimals=2)
color_data[color_data>= 10] = 10
color_data_nopt = data['nopt'].astype(float).round(decimals=3)
color_data_mc = data['mc'].astype(float).round(decimals=3)
color_data_P= data['P']
color_data_P[color_data_P >= 1] = 1
data= data.replace(np.nan, '', regex=True)
data.tail()
fig = px.scatter_3d(data, x='NpN0', y='s', z='mu',log_x=True, log_z=True,
opacity = 0.5,
color=color_data,color_continuous_scale=px.colors.sequential.Viridis)
fig.add_trace(
go.Scatter(
mode='markers',
marker=dict(
size=1,
opacity=0.5,
),
)
)
fig.show()
Similarly to this wonderful animation: https://plotly.com/python/visualizing-mri-volume-slices/
I would like to slice up my data to isosurfaces with respect to any x,y,z coordinates.
As in the example they use images, I could not wrap my head around to create the same with my raw data.
Thank you in advance.

Holoviews scatter plot color by categorical data

I've been trying to understand how to accomplish this very simple task of plotting two datasets, each with a different color, but nothing i found online seems to do it. Here is some sample code:
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
ds1x = np.random.randn(1000)
ds1y = np.random.randn(1000)
ds2x = np.random.randn(1000) * 1.5
ds2y = np.random.randn(1000) + 1
ds1 = pd.DataFrame({'dsx' : ds1x, 'dsy' : ds1y})
ds2 = pd.DataFrame({'dsx' : ds2x, 'dsy' : ds2y})
ds1['source'] = ['ds1'] * len(ds1.index)
ds2['source'] = ['ds2'] * len(ds2.index)
ds = pd.concat([ds1, ds2])
Goal is to produce two datasets in a single frame, with a categorical column keeping track of the source. Then i try plotting a scatter plot.
scatter = hv.Scatter(ds, 'dsx', 'dsy')
scatter
And that works as expected. But i cannot seem to understand how to color the two datasets differently based on the source column. I tried the following:
scatter = hv.Scatter(ds, 'dsx', 'dsy', color='source')
scatter = hv.Scatter(ds, 'dsx', 'dsy', cmap='source')
Both throw warnings and no color. I tried this:
scatter = hv.Scatter(ds, 'dsx', 'dsy')
scatter.opts(color='source')
Which throws an error. I tried converting the thing to a Holoviews dataset, same type of thing.
Why is something that is supposed to be so simple so obscure?
P.S. Yes, i know i can split the data and overlay two scatter plots and that will give different colors. But surely there has to be a way to accomplish this based on categorical data.
You can create a scatterplot in Holoviews with different colors per category as follows. They are all elegant one-liners:
1) By simply using .hvplot() on your dataframe to do this for you.
import hvplot
import hvplot.pandas
df.hvplot(kind='scatter', x='col1', y='col2', by='category_col')
# If you are using bokeh as a backend you can also just use 'color' parameter.
# I like this one more because it creates a hv.Scatter() instead of hv.NdOverlay()
# 'category_col' is here just an extra vdim, which is used for colors
df.hvplot(kind='scatter', x='col1', y='col2', color='category_col')
2) By creating an NdOverlay scatter plot as follows:
import holoviews as hv
hv.Dataset(df).to(hv.Scatter, 'col1', 'col2').overlay('category_col')
3) Or doppler's answer slightly adjusted, which sets 'category_col' as an extra vdim and is then used for the colors:
hv.Scatter(
data=df, kdims=['col1'], vdims=['col2', 'category_col'],
).opts(color='category_col', cmap=['blue', 'orange'])
Resulting plot:
You need the following sample data if you want to use my example directly:
import numpy as np
import pandas as pd
# create sample dataframe
df = pd.DataFrame({
'col1': np.random.normal(size=30),
'col2': np.random.normal(size=30),
'category_col': np.random.choice(['category_1', 'category_2'], size=30),
})
As an extra:
I find it interesting that there are basically 2 solutions to the problem.
You can create a hv.Scatter() with the category_col as an extra vdim which provides the colors or alternatively 2 separate scatterplots which are put together by hv.NdOverlay().
In the backend the hv.Scatter() solution will look like this:
:Scatter [col1] (col2,category_col)
And the hv.NdOverlay() backend looks like this:
:NdOverlay [category_col] :Scatter [col1] (col2)
This may help: http://holoviews.org/user_guide/Style_Mapping.html
Concretely, you cannot use a dim transform on a dimension that is not declared, not obscure at all :)
scatter = hv.Scatter(ds, 'dsx', ['dsy', 'source']
).opts(color=hv.dim('source').categorize({'ds1': 'blue', 'ds2': 'orange'}))
should get you there (haven't tested it myself).
Related:
Holoviews color per category
Overlay NdOverlays while keeping color / changing marker

Generating a smooth line with Pandas dataframe and Matplotlib

I am trying to generate a smooth line using a dataset that contains time (measured as number of days) and a set of numbers that represent a socioeconomic variable.
Here is a sample of my data:
date, data
726,1.2414
727,1.2414
728,1.2414
729,1.2414
730,1.2414
731,1.2414
732,1.2414
733,1.2414
734,1.2414
735,1.2414
736,1.2414
737,1.804597701
738,1.804597701
739,1.804597701
740,1.804597701
741,1.804597701
742,1.804597701
743,1.804597701
744,1.804597701
745,1.804597701
746,1.804597701
747,1.804597701
748,1.804597701
749,1.804597701
750,1.804597701
751,1.804597701
752,1.793103448
753,1.793103448
754,1.793103448
755,1.793103448
756,1.793103448
757,1.793103448
758,1.793103448
759,1.793103448
760,1.793103448
761,1.793103448
762,1.793103448
763,1.793103448
764,1
765,1
This is my code so far:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
out_file = "path_to_file/file.csv"
df = pd.read_csv(out_file)
time = df['date']
data = df['data']
ax1 = plt.subplot2grid((4,3),(0,0), colspan = 2, rowspan = 2) # Will be adding other plots
plt.plot(time, data)
plt.yticks(np.arange(1,5,1)) # Include classes 1-4 showing only 1 step changes
plt.gca().invert_yaxis() # Reverse y axis
plt.ylabel('Trend', fontsize = 8, labelpad = 10)
This generates the following plot:
Test plot
I have seen posts that answer similar questions (like the ones below), but can't seem to get my code to work. Can anyone suggest an elegant solution?
Generating smooth line graph using matplotlib
Python Matplotlib - Smooth plot line for x-axis with date values

Plotting data with categorical x and y axes in python

I have a list of case and control samples along with the information about what characteristics are present or absent in each of them. A dataframe including the information can be generated by Pandas:
import pandas as pd
df={'Patient':[True,True,False],'Control':[False,True,False]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
I need to visualize this data as a dotplot/scatterplot in the way that both of the x and y axis to be categorical and presence/absence to be coded by different shapes. Something like following:
Patient| x x -
Control| - x -
__________________
GeneA GeneB GeneC
I am new to Matplotlib/seaborn and I can plot simple line plots and scatter plots. But searching online I could not find any instructions or plot similar to what I need here.
A quick way would be:
import pandas as pd
import matplotlib.pyplot as plt
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
heatmap = plt.imshow(df)
plt.xticks(range(len(df.columns.values)), df.columns.values)
plt.yticks(range(len(df.index)), df.index)
cbar = plt.colorbar(mappable=heatmap, ticks=[0, 1], orientation='vertical')
# vertically oriented colorbar
cbar.ax.set_yticklabels(['Absent', 'Present'])
Thanks to #DEEPAK SURANA for adding labels to the colorbar.
I searched the pyplot documentation and could not find a scatter or dot plot exactly like you described. Here is my take on creating a plot that illustrates what you want. The True records are blue and the False records are red.
# creating dataframe and extra column because index is not numeric
import pandas as pd
df={'Patient':[True,True,False],
'Control':[False,True,False]}
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
df['level'] = [i for i in range(0, len(df))]
print(df)
# plotting the data
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))
for idx, gene in enumerate(df.columns[:-1]):
df_gene = df[[gene, 'level']]
cList = ['blue' if x == True else 'red' for x in df[gene]]
for inr_idx, lv in enumerate(df['level']):
ax.scatter(x=idx, y=lv, c=cList[inr_idx], s=20)
fig.tight_layout()
plt.yticks([i for i in range(len(df.index))], list(df.index))
plt.xticks([i for i in range(len(df.columns)-1)], list(df.columns[:-1]))
plt.show()
Something like this might work
import pandas as pd
import numpy as np
from matplotlib.ticker import FixedLocator
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
plot = df.T.plot()
loc = FixedLocator([0,1,2])
plot.xaxis.set_major_locator(loc)
plot.xaxis.set_ticklabels(df.columns)
look at https://matplotlib.org/examples/pylab_examples/major_minor_demo1.html
and https://matplotlib.org/api/ticker_api.html
I think you have to convert the boolean values to zeros and ones to make it work. Someting like df.astype(int)

Set Seaborn PairGrid x-axis with 2 different value ranges

[The resolution is described below.]
I'm trying to create a PairGrid. The X-axis has at least 2 different value ranges, although even when 'cvar' below is plotted by itself the x-axis overwrites itself.
My question: is there a way to tilt the x-axis labels to be vertical or have fewer x-axis labels so they don't overlap? Is there another way to solve this issue?
====================
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import numpy as np
columns = ['avar', 'bvar', 'cvar']
index = np.arange(10)
df = pd.DataFrame(columns=columns, index = index)
myarray = np.random.random((10, 3))
for val, item in enumerate(myarray):
df.ix[val] = item
df['cvar'] = [400,450,43567,23000,19030,35607,38900,30202,24332,22322]
fig1 = sns.PairGrid(df, y_vars=['avar'],
x_vars=['bvar', 'cvar'],
palette="GnBu_d")
fig1.map(plt.scatter, s=40, edgecolor="white")
# The fix: Add the following to rotate the x axis.
plt.xticks( rotation= -45 )
=====================
The code above produces this image
Thanks!
I finally figured it out. I added "plt.xticks( rotation= -45 )" to the original code above. More can be fund on the MatPlotLib site here.

Categories

Resources