Plot for sankey diagram is empty - python

Empty plot
I'm new to python/spyder, and I'm trying to make a sankey diagram, but no data is included in the plot, it's just empty. I found out that I need to convert from dataframe to lists, but this hasn't helped, and the plot is still empty. I kept it very simple, and pretty much copied it straight from a guide from plotly, and just imported my own data.
Here is my code, where I just removed the filepath.
Can anyone tell my what my mistake is?
edit to add image of xlsx file : xlsx file
edit 2 image of new Plot
import plotly.graph_objects as go
import pandas as pd
# go.renderers.default = "browser"
df = pd.read_excel (r'C:\filepath\data.xlsx')
labels=pd.DataFrame(df, columns= ['Label'])
label_list=labels.values.tolist()
sources=pd.DataFrame(df, columns= ['Source'])
source_list=sources.values.tolist()
targets=pd.DataFrame(df, columns= ['Target'])
target_list=targets.values.tolist()
values=pd.DataFrame(df, columns= ['Value'])
value_list=values.values.tolist()
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = label_list,
color = "blue"
),
link = dict(
source = source_list,
target = target_list,
value = value_list
))])
fig.update_layout(title_text="Basic Sankey Diagram", font_size=10)
fig.show()

Please check the snippet with random data.
When you are reading from excel, it gives you dataframe only. You dont need to store it again to dataframe.
Neither you need to convert it to list. You can pass dataframe column directly to label,source,value and target
import plotly.graph_objects as go
import pandas as pd
df = pd.read_excel (r'data.xlsx')
print(df)
"""
Label Source Target Value
0 A1 0 2 8
1 A2 1 3 4
2 B1 0 3 2
3 B2 2 4 8
4 C1 3 4 4
5 C2 3 5 2
"""
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = df['Label'],
color = "blue"
),
link = dict(
source = df['Source'],
target = df['Target'],
value = df['Value']
))])
fig.update_layout(title_text="Basic Sankey Diagram", font_size=10)
fig.show()
Tips
Your labels_list, source_list, target_list, value_list is not a list. It is a nested list.
If you want to store your dataframe columns to a list then you can do like this,
labels_list=df['Label'].tolist()
source_list=df['Source'].tolist()
target_list=df['Target'].tolist()
value_list=df['Value'].tolist()
For more details you can refer
Sankey Diagram Plotly

Related

Sankey Plot not Showing in Jupyter Notebook

I'm pretty sure my code is fine, btu I can't generate a plot of a simple Sankey Chart. Maybe something is off with the code, not sure. Here's what I have now. Can anyone see a problem with this?
import pandas as pd
import holoviews as hv
import plotly.graph_objects as go
import plotly.express as pex
hv.extension('bokeh')
data = [['TMD','TMD Create','Sub-Section 1',17],['TMD','TMD Create','Sub-Section 1',17],['C4C','Customer Tab','Sub-Section 1',10],['C4C','Customer Tab','Sub-Section 1',10],['C4C','Customer Tab','Sub-Section 1',17]]
df = pd.DataFrame(data, columns=['Source','Target','Attribute','Value'])
df
source = df["Source"].values.tolist()
target = df["Target"].values.tolist()
value = df["Value"].values.tolist()
labels = df["Attribute"].values.tolist()
import plotly.graph_objs as go
#create links
link = dict(source=source, target=target, value=value,
color=["turquoise","tomato"] * len(source))
#create nodes
node = dict(label=labels, pad=15, thickness=5)
#create a sankey object
chart = go.Sankey(link=link, node=node, arrangement="snap")
#build a figure
fig = go.Figure(chart)
fig.show()
I am trying to follow the basic example shown in the link below.
https://python.plainenglish.io/create-a-sankey-diagram-in-python-e09e23cb1a75
You are mentioning two different packages, and both need different solutions. I don't know which you perefer, so I explain both.
Data
import pandas as pd
df = pd.DataFrame({
'Source':['a','a','b','b'],
'Target':['c','d','c','d'],
'Value': [1,2,3,4]
})
>>> df
Source Target Value
0 a c 1
1 a d 2
2 b c 3
3 b d 4
This is a very basic DataFrame with only 4 transitions.
Holoviews/Bokeh
With holoviews it is very easy to plot a sanky diagram, because it takes the DataFrame as it is and gets the labels by the letters in the Source and Target column.
import holoviews as hv
hv.extension('bokeh')
sankey = hv.Sankey(df)
sankey.opts(width=600, height=400)
This is created with holoviews 1.15.4 and bokeh 2.4.3.
Plotly
For plotly it is not so easy, because plotly wants numbers instead of letters in the Source and Target column. Therefor we have to manipulate the DataFrame first before we can create the figure.
Here I collect all different labels and replace them by a unique number.
unique_labels = set(list(df['Source'].unique()) + list(df['Target'].unique()))
mapper = {v: i for i, v in enumerate(unique_labels)}
df['Source'] = df['Source'].map(mapper)
df['Target'] = df['Target'].map(mapper
>>> df
Source Target Value
0 0 2 1
1 0 3 2
2 1 2 3
3 1 3 4
Afterwards I can create the dicts which plotly takes. I have to set the lables by hand and the length of the arrays have to match.
source = df["Source"].values.tolist()
target = df["Target"].values.tolist()
value = df["Value"].values.tolist()
#create links
link = dict(source=source, target=target, value=value, color=["turquoise","tomato"] * 2)
#create nodes
node = dict(label=['a', 'b', 'c', 'd'], pad=15, thickness=5)
#create a sankey object
chart = go.Sankey(link=link, node=node, arrangement="snap")
#build a figure
fig = go.Figure(chart)
fig.show()
I used plotly 5.13.0.

3d animated line plot with plotly in python

I saw this 3d plot. it was animated and added a new value every day. i have not found an example to recreate it with plotly in python.
the plot should start with the value from the first row (100). The start value should remain (no rolling values). The plot should be animated in such a way that each row value is added one after the other and the x-axis expands. the following data frame contains the values (df_stocks) and Dates to plot. assigning the colors would be a great addition. the more positive the deeper the green, the more negative the darker red.
import yfinance as yf
import pandas as pd
stocks = ["AAPL", "MSFT"]
df_stocks = pd.DataFrame()
for stock in stocks:
df = yf.download(stock, start="2022-01-01", end="2022-07-01", group_by='ticker')
df['perct'] = df['Close'].pct_change()
df_stocks[stock] = df['perct']
df_stocks.iloc[0] = 0
df_stocks += 1
df_stocks = df_stocks.cumprod()*100
df_stocks -= 100
You can use a list of go.Frame objects as shown in this example. Since you want the line plot to continually extend outward, each frame needs to include data that's one row longer than the previous frame, so we can use a list comprehension like:
frames = [go.Frame(data=
## ...extract info from df_stocks.iloc[:i]
for i in range(len(df_stocks))]
To add colors to your lines depending on their value, you can use binning and labels (as in this answer) to create new columns called AAPL_color and MSFT_color that contain the string of the css color (like 'darkorange' or 'green'). Then you can pass the information from these columns using the argument line=dict(color=...) in each go.Scatter3d object.
import yfinance as yf
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
stocks = ["AAPL", "MSFT"]
df_stocks = pd.DataFrame()
for stock in stocks:
df = yf.download(stock, start="2022-01-01", end="2022-07-01", group_by='ticker')
df['perct'] = df['Close'].pct_change()
df_stocks[stock] = df['perct']
df_stocks.iloc[0] = 0
df_stocks += 1
df_stocks = df_stocks.cumprod()*100
df_stocks -= 100
df_min = df_stocks[['AAPL','MSFT']].min().min() - 1
df_max = df_stocks[['AAPL','MSFT']].max().max() + 1
labels = ['firebrick','darkorange','peachpuff','palegoldenrod','palegreen','green']
bins = np.linspace(df_min,df_max,len(labels)+1)
df_stocks['AAPL_color'] = pd.cut(df_stocks['AAPL'], bins=bins, labels=labels).astype(str)
df_stocks['MSFT_color'] = pd.cut(df_stocks['MSFT'], bins=bins, labels=labels).astype(str)
frames = [go.Frame(
data=[
go.Scatter3d(
y=df_stocks.iloc[:i].index,
z=df_stocks.iloc[:i].AAPL.values,
x=['AAPL']*i,
name='AAPL',
mode='lines',
line=dict(
color=df_stocks.iloc[:i].AAPL_color.values, width=3,
)
),
go.Scatter3d(
y=df_stocks.iloc[:i].index,
z=df_stocks.iloc[:i].MSFT.values,
x=['MSFT']*i,
name='MSFT',
mode='lines',
line=dict(
color=df_stocks.iloc[:i].MSFT_color.values, width=3,
)
)]
)
for i in range(len(df_stocks))]
fig = go.Figure(
data=list(frames[1]['data']),
frames=frames,
layout=go.Layout(
# xaxis=dict(range=[0, 5], autorange=False),
# yaxis=dict(range=[0, 5], autorange=False),
# zaxis=dict(range=[0, 5], autorange=False),
template='plotly_dark',
legend = dict(bgcolor = 'grey'),
updatemenus=[dict(
type="buttons",
font=dict(color='black'),
buttons=[dict(label="Play",
method="animate",
args=[None])])]
),
)
fig.show()

Plotly: How to display a regression line for one variable against multiple other time series?

With a dataset such as time series for various stocks, how can you easily display a regression line for one variable against all others and quickly define a few aesthetic elements such as:
which variable to plot against the others,
theme color for the figure,
colorscale for the traces
type of trendline; linear or non-linear?
Data:
date GOOG AAPL AMZN FB NFLX MSFT
100 2019-12-02 1.216280 1.546914 1.425061 1.075997 1.463641 1.720717
101 2019-12-09 1.222821 1.572286 1.432660 1.038855 1.421496 1.752239
102 2019-12-16 1.224418 1.596800 1.453455 1.104094 1.604362 1.784896
103 2019-12-23 1.226504 1.656000 1.521226 1.113728 1.567170 1.802472
104 2019-12-30 1.213014 1.678000 1.503360 1.098475 1.540883 1.788185
Reproducible through:
import pandas as pd
import plotly.express as px
df = px.data.stocks()
The essence:
target = 'GOOG'
fig = px.scatter(df, x = target,
y = [c for c in df.columns if c != target],
color_discrete_sequence = px.colors.qualitative.T10,
template = 'plotly_dark', trendline = 'ols',
title = 'Google vs. the world')
The details:
With the latest versions of plotly.express (px) and px.scatter, these things are both easy, straight-forward and flexible at the same time. The snippet below will do exactly as requested in the question.
First, define a target = 'GOOG from the dataframe columns. Then, using `px.scatter() you can:
Plot the rest of the columns against the target using y = [c for c in df.columns if c != target]
Select a theme through template='plotly_dark') or find another using pio.templates.
Select a color scheme for the traces through color_discrete_sequence = px.colors.qualitative.T10 or find another using dir(px.colors.qualitative)
Define trend estimation method through trendline = 'ols' or trendline = 'lowess'
(The following plot is made with a data soure of a wide format. With some very slight amendments, px.scatter() will handle data of a long format just as easily.)
Plot
Complete code:
# imports
import pandas as pd
import plotly.express as px
import plotly.io as pio
# data
df = px.data.stocks()
df = df.drop(['date'], axis = 1)
# your choices
target = 'GOOG'
colors = px.colors.qualitative.T10
# plotly
fig = px.scatter(df,
x = target,
y = [c for c in df.columns if c != target],
template = 'plotly_dark',
color_discrete_sequence = colors,
trendline = 'ols',
title = 'Google vs. the world')
fig.show()

Create a stacked graph or bar graph using plotly in python

I have data like this :
[ ('2018-04-09', '10:18:11',['s1',10],['s2',15],['s3',5])
('2018-04-09', '10:20:11',['s4',8],['s2',20],['s1',10])
('2018-04-10', '10:30:11',['s4',10],['s5',6],['s6',3]) ]
I want to plot a stacked graph preferably of this data.
X-axis will be time,
it should be like this
I created this image in paint just to show.
X axis will show time like normal graph does( 10:00 ,April 3,2018).
I am stuck because the string value (like 's1',or 's2' ) will change in differnt bar graph.
Just to hard code and verify,I try this:
import plotly
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import matplotlib
plotly.offline.init_notebook_mode()
def createPage():
graph_data = []
l1=[('com.p1',1),('com.p2',2)('com.p3',3)]
l2=[('com.p1',1),('com.p4',2)('com.p5',3)]
l3=[('com.p2',8),('com.p3',2)('com.p6',30)]
trace_temp = go.Bar(
x='2018-04-09 10:18:11',
y=l1[0],
name = 'top',
)
graph_data.append(trace_temp)
plotly.offline.plot(graph_data, filename='basic-scatter3.html')
createPage()
Error I am getting is Tuple Object is not callable.
So can someone please suggest some code for how I can plot such data.
If needed,I may store data in some other form which may be helpful in plotting.
Edit :
I used the approach suggested in accepted answer and succeed in plotting using plotly like this
fig=df.iplot(kin='bar',barmode='stack',asFigure=True)
plotly.offline.plt(fig,filename="stack1.html)
However I faced one error:
1.When Time intervals are very close,Data overlaps on graph.
Is there a way to overcome it.
You could use pandas stacked bar plot. The advantage is that you can create with pandas easily the table of column/value pairs you have to generate anyhow.
from matplotlib import pyplot as plt
import pandas as pd
all_data = [('2018-04-09', '10:18:11', ['s1',10],['s2',15],['s3',5]),
('2018-04-09', '10:20:11', ['s4',8], ['s2',20],['s1',10]),
('2018-04-10', '10:30:11', ['s4',10],['s5',6], ['s6',3]) ]
#load data into dataframe
df = pd.DataFrame(all_data, columns = list("ABCDE"))
#combine the two descriptors
df["day/time"] = df["A"] + "\n" + df["B"]
#assign each list to a new row with the appropriate day/time label
df = df.melt(id_vars = ["day/time"], value_vars = ["C", "D", "E"])
#split each list into category and value
df[["category", "val"]] = pd.DataFrame(df.value.values.tolist(), index = df.index)
#create a table with category-value pairs from all lists, missing values are set to NaN
df = df.pivot(index = "day/time", columns = "category", values = "val")
#plot a stacked bar chart
df.plot(kind = "bar", stacked = True)
#give tick labels the right orientation
plt.xticks(rotation = 0)
plt.show()
Output:

How to make multiline graph with matplotlib subplots and pandas?

I'm fairly new at coding (completely self taught), and have started using it at at my job as a research assistant in a cancer lab. I need some help setting up a few line graphs in matplot lab.
I have a dataset that includes nextgen sequencing data for about 80 patients. on each patient, we have different timepoints of analysis, different genes detected (out of 40), and the associated %mutation for the gene.
My goal is to write two scripts, one that will generate a "by patient" plot, that will be a linegraph with y-%mutation, x-time of measurement, and will have a different color line for all lines made by each of the patient's associated genes. The second plot will be a "by gene", where I will have one plot contain different color lines that represent each of the different patient's x/y values for that specific gene.
Here is an example dataframe for 1 genenumber for the above script:
gene yaxis xaxis pt# gene#
ASXL1-3 34 1 3 1
ASXL1-3 0 98 3 1
IDH1-3 24 1 3 11
IDH1-3 0 98 3 11
RUNX1-3 38 1 3 21
RUNX1-3 0 98 3 21
U2AF1-3 33 1 3 26
U2AF1-3 0 98 3 26
I have setup a groupby script that when I iterate over it, gives me a dataframe for every gene-timepoint for each patient.
grouped = df.groupby('pt #')
for groupObject in grouped:
group = groupObject[1]
For patient 1, this gives the following output:
y x gene patientnumber patientgene genenumber dxtotransplant \
0 40.0 1712 ASXL1 1 ASXL1-1 1 1857
1 26.0 1835 ASXL1 1 ASXL1-1 1 1857
302 7.0 1835 RUNX1 1 RUNX1-1 21 1857
I need help writing a script that will create either of the plots described above. using the bypatient example, my general idea is that I need to create a different subplot for every gene a patient has, where each subplot is the line graph represented by that one gene.
Using matplotlib this is about as far as I have gotten:
plt.figure()
grouped = df.groupby('patient number')
for groupObject in grouped:
group = groupObject[1]
df = group #may need to remove this
for element in range(len(group)):
xs = np.array(df[df.columns[1]]) #"x" column
ys= np.array(df[df.columns[0]]) #"y" column
gene = np.array(df[df.columns[2]])[element] #"gene" column
plt.subplot(1,1,1)
plt.scatter(xs,ys, label=gene)
plt.plot(xs,ys, label=gene)
plt.legend()
plt.show()
This produces the following output:
In this output, the circled line is not supposed to be connected to the other 2 points. In this case, this is patient 1, who has the following datapoint:
x y gene
1712 40 ASXL1
1835 26 ASXL1
1835 7 RUNX1
Using seaborn I have gotten close to my desired graph using this code:
grouped = df.groupby(['patientnumber'])
for groupObject in grouped:
group = groupObject[1]
g = sns.FacetGrid(group, col="patientgene", col_wrap=4, size=4, ylim=(0,100))
g = g.map(plt.scatter, "x", "y", alpha=0.5)
g = g.map(plt.plot, "x", "y", alpha=0.5)
plt.title= "gene:%s"%element
Using this code I get the following:
If I adjust the line:
g = sns.FacetGrid(group, col="patientnumber", col_wrap=4, size=4, ylim=(0,100))
I get the following result:
As you can see in the 2d example, the plot is treating every point on my plot as if they are from the same line (but they are actually 4 separate lines).
How I can tweak my iterations so that each patient-gene is treated as a separate line on the same graph?
I wrote a subplot function that may give you a hand. I modified the data a tad to help illustrate the plotting functionality.
gene,yaxis,xaxis,pt #,gene #
ASXL1-3,34,1,3,1
ASXL1-3,3,98,3,1
IDH1-3,24,1,3,11
IDH1-3,7,98,3,11
RUNX1-3,38,1,3,21
RUNX1-3,2,98,3,21
U2AF1-3,33,1,3,26
U2AF1-3,0,98,3,26
ASXL1-3,39,1,4,1
ASXL1-3,8,62,4,1
ASXL1-3,0,119,4,1
IDH1-3,27,1,4,11
IDH1-3,12,62,4,11
IDH1-3,1,119,4,11
RUNX1-3,42,1,4,21
RUNX1-3,3,62,4,21
RUNX1-3,1,119,4,21
U2AF1-3,16,1,4,26
U2AF1-3,1,62,4,26
U2AF1-3,0,119,4,26
This is the subplotting function...with some extra bells and whistles :)
def plotByGroup(df, group, xCol, yCol, title = "", xLabel = "", yLabel = "", lineColors = ["red", "orange", "yellow", "green", "blue", "purple"], lineWidth = 2, lineOpacity = 0.7, plotStyle = 'ggplot', showLegend = False):
"""
Plot multiple lines from a Pandas Data Frame for each group using DataFrame.groupby() and MatPlotLib PyPlot.
#params
df - Required - Data Frame - Pandas Data Frame
group - Required - String - Column name to group on
xCol - Required - String - Column name for X axis data
yCol - Required - String - Column name for y axis data
title - Optional - String - Plot Title
xLabel - Optional - String - X axis label
yLabel - Optional - String - Y axis label
lineColors - Optional - List - Colors to plot multiple lines
lineWidth - Optional - Integer - Width of lines to plot
lineOpacity - Optional - Float - Alpha of lines to plot
plotStyle - Optional - String - MatPlotLib plot style
showLegend - Optional - Boolean - Show legend
#return
MatPlotLib Plot Object
"""
# Import MatPlotLib Plotting Function & Set Style
from matplotlib import pyplot as plt
matplotlib.style.use(plotStyle)
figure = plt.figure() # Initialize Figure
grouped = df.groupby(group) # Set Group
i = 0 # Set iteration to determine line color indexing
for idx, grp in grouped:
colorIndex = i % len(lineColors) # Define line color index
lineLabel = grp[group].values[0] # Get a group label from first position
xValues = grp[xCol] # Get x vector
yValues = grp[yCol] # Get y vector
plt.subplot(1,1,1) # Initialize subplot and plot (on next line)
plt.plot(xValues, yValues, label = lineLabel, color = lineColors[colorIndex], lw = lineWidth, alpha = lineOpacity)
# Plot legend
if showLegend:
plt.legend()
i += 1
# Set title & Labels
axis = figure.add_subplot(1,1,1)
axis.set_title(title)
axis.set_xlabel(xLabel)
axis.set_ylabel(yLabel)
# Return plot for saving, showing, etc.
return plt
And to use it...
import pandas
# Load the Data into Pandas
df = pandas.read_csv('data.csv')
#
# Plotting - by Patient
#
# Create Patient Grouping
patientGroup = df.groupby('pt #')
# Iterate Over Groups
for idx, patientDF in patientGroup:
# Let's give them specific titles
plotTitle = "Gene Frequency over Time by Gene (Patient %s)" % str(patientDf['pt #'].values[0])
# Call the subplot function
plot = plotByGroup(patientDf, 'gene', 'xaxis', 'yaxis', title = plotTitle, xLabel = "Days", yLabel = "Gene Frequency")
# Add Vertical Lines at Assay Timepoints
timepoints = set(patientDf.xaxis.values)
[plot.axvline(x = timepoint, linewidth = 1, linestyle = "dashed", color='gray', alpha = 0.4) for timepoint in timepoints]
# Let's see it
plot.show()
And of course, we can do the same by gene.
#
# Plotting - by Gene
#
# Create Gene Grouping
geneGroup = df.groupby('gene')
# Generate Plots for Groups
for idx, geneDF in geneGroup:
plotTitle = "%s Gene Frequency over Time by Patient" % str(geneDf['gene'].values[0])
plot = plotByGroup(geneDf, 'pt #', 'xaxis', 'yaxis', title = plotTitle, xLab = "Days", yLab = "Frequency")
plot.show()
If this isn't what you're looking for, provide a clarification and I'll take another crack at it.

Categories

Resources