I am trying to use go.scatter with my conditional statements.
A and df['T_orNonT'] are columns in my dataframe, df.
If a row on "A" is less than or equal to 200, the column df['T_orNonT'] will show 'Non-T', otherwise it is 'T'
I want to plot them using go.scatter with 'T' or 'Non-T' showing up with different color. Here is my code:
import plotly.graph_objects as go
fig = go.Figure()
for i in range (0, length):
if A[i] <= 200:
df['T_or_NonT'].iloc[i] = 'Non-T'
fig = go.Figure()
fig.add_trace(go.Scatter(
x = df['Date'],
y = df['A'],
mode ='markers',
name='Non-T',
marker=dict(color ='red')))
fig.show()
else:
df['T_or_NonT'].iloc[i] = 'T'
fig = go.Figure()
fig.add_trace(go.Scatter(
x = df['Date'],
y = df['A'],
mode ='markers',
name='T',
marker=dict(color ='green')))
fig.show()
This should be the output:
Date A T or Non-T
07/21 201 T
08/21 255 T
09/21 198 Non-T
And then they will plot Date (monthly) vs Rainfall (which is the A column). The Ts are marked as red, and the Non-Ts are marked as green in the plot.
but I can't make it work. I want to know the right way to code this. by the way i am a python beginner-user.
PS. You can also suggest if there is another work-around
There are many ways to do this, but I think the easiest is to have a column of colors for the decision results. The easiest way to do this is to have a color column of judgment results, and then draw a scatter plot with the data extracted by the judgment condition items for that data frame.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
df = pd.DataFrame({'Date': pd.date_range('2018-01-21','2021-01-21',freq='1m'),
'A': np.random.randint(150,250, 36)})
df['T_or_NonT'] = np.where(df['A'] >= 200,'T','Non-T')
df['color'] = np.where(df['A'] >= 200,'red','green')
fig = go.Figure()
for t,c in zip(df['T_or_NonT'].unique(), df['color'].unique()):
dfs = df[df['T_or_NonT'] == t]
fig.add_trace(go.Scatter(
x = dfs['Date'],
y = dfs['A'],
mode = 'markers',
name = t,
marker = dict(
color = c
)
))
fig.show()
Related
I have a simple dataframe containing dates and a few headers. I need to remove specific dates from the plot.
fig1 = px.line(df, x=Date, y="Header1")
fig1.show()
I want to remove values from the chart itself (not from dataframe), like (removing 15/01/2022 & 22/02/2022).
date vs value plot
I would most likely rather do this with the dataset used to build your figure, instead of in the figure itself. But this suggestion should do exactly what you're asking for. How you find the outliers will be entirely up to you. Given some thresholds toolow, tohigh, the snippet below will turn Plot 1 into Plot 2
fig.for_each_trace(lambda t: highOutliers.extend([t.x[i] for i, val in enumerate(t.y) if val > toohigh]))
fig.for_each_trace(lambda t: lowOutliers.extend([t.x[i] for i, val in enumerate(t.y) if val < loolow]))
fig.update_xaxes(
rangebreaks=[dict(values=highOutliers+lowOutliers)]
)
fig.update_traces(connectgaps=True)
Plot 1:
Plot 2:
Complete code:
from numpy import random
import datetime
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
# Some sample data
y = np.random.normal(50, 5, 15)
datelist = pd.to_datetime(pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=len(y)).tolist())
df = pd.DataFrame({'date':datelist, 'y':y})
# Introduce some outliers
df.loc[5,'y'] = 120
df.loc[10,'y'] = 2
# build figure
fig = px.line(df, x = 'date', y = 'y')
# containers and thresholds for outliers
highOutliers = []
lowOutliers = []
toohigh = 100
toolow = 20
# find outliers
fig.for_each_trace(lambda t: highOutliers.extend([t.x[i] for i, val in enumerate(t.y) if val > toohigh]))
fig.for_each_trace(lambda t: lowOutliers.extend([t.x[i] for i, val in enumerate(t.y) if val < toolow]))
# define outliers as rangebreaks
fig.update_xaxes(
rangebreaks=[dict(values=highOutliers+lowOutliers)]
)
# connect gaps in the line
fig.update_traces(connectgaps=True)
fig.show()
I have a plotly chart that I am trying to add tweets to hover information.
The dataframe itself contains 7000+ rows (hourly crypto readings) and 139 tweets, labeled content. Of content, there are ~6861 rows of 'NaN' because content has 139 total tweets.
The code that I have below
fig = px.line(total_data, x = total_data.date,
y = total_data.doge_close)
fig.add_trace(
go.Scatter(
x=total_data[total_data.has_tweet==1].date,
y=total_data[total_data.has_tweet == 1['doge_close'],
mode = 'markers',
hovertemplate =
'<i>tweet:</i>'+ '<br>' +
'<i>%{text}</i>',
text = [t for t in total_data['content']],
name = 'has_tweets'))
fig.show()
produces this plot:
Where it says NaN, I'd like the actual content of the tweets at that time.
The "content" column can be loosely reproduced with this code below:
df = px.data.stocks().set_index('date')[['GOOG']].rename(columns={'GOOG':'values'})
df['has_tweet'] = df['tweet'].apply(lambda x: 0 if x != x else 1)
df['tweet'] = random.choices(['A tweet','Longer tweet', 'emoji','NaN'], weights=(5,10,5,80), k=len(df))
and can be generically reproduced with the code below:
import plotly.express as px
import plotly.graph_objects as go
import random
fig = px.line(df, x=df.index, y = 'values')
fig.add_trace(go.Scatter(x=df[df.has_tweet==1].index,
y = df[df.has_tweet==1]['values'],
mode = 'markers',
hovertemplate =
'<i>tweet:</i>'+ '<br>' +
'<i>%{text}</i>',
text = [t for t in df['tweet']],
name = 'has_tweets'))
fig.show()
Is there a way to filter out the 'NaN's from the dataframe in order to input the actual tweet content?
EDIT WITH SOLUTION
Thanks to a very kind commentor, I have figured out the solution and attached it below, for anyone in the future.
fig = px.line(total_data, x = total_data.date, y = total_data.doge_close)
fig.add_trace(go.Scatter(x=total_data[total_data.has_tweet==1].date,
y=total_data[total_data.has_tweet==1]['doge_close'],
mode = 'markers',
hovertemplate =
'<i>tweet:</i>'+ '<br>' +
'<i>%{text}</i>',
text = [t for t in total_data.loc[total_data['has_tweet']==1, 'content']],
name = 'has_tweets'))
fig.show()
which produces:
Based on your comment, instead of randomly assigning 0 or 1 to the "has_tweet" column, it should be 0 or 1 based on whether the "tweet" column is NaN or not. Also instead of the string "NaN" I am using np.nan, but this may need to be modified depending on what your actual data looks like.
We can create some data similar to yours like this:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import random
random.seed(42)
df = px.data.stocks().set_index('date')[['GOOG']].rename(columns={'GOOG':'values'})
df['tweet'] = random.choices(['A tweet','Longer tweet', 'emoji',np.nan], weights=(5,10,5,80), k=len(df))
df['has_tweet'] = df['tweet'].apply(lambda x: 0 if x != x else 1)
Then I believe the only change we need to make is pass just the rows with tweets to the text argument:
fig = px.line(df, x=df.index, y = 'values')
fig.add_trace(go.Scatter(x=df[df.has_tweet==1].index,
y = df.loc[df.has_tweet==1]['values'],
mode = 'markers',
hovertemplate =
'<i>tweet:</i>'+ '<br>' +
'<i>%{text}</i>',
text = [t for t in df.loc[df.has_tweet==1, 'tweet']],
name = 'has_tweets'))
fig.show()
I have the following piece of code
import plotly.express as px
import pandas as pd
import numpy as np
x = [1,2,3,4,5,6]
df = pd.DataFrame(
{
'x': x*3,
'y': list(np.array(x)) + list(np.array(x)**2) + list(np.array(x)**.5),
'color': list(np.array(x)*0) + list(np.array(x)*0+1) + list(np.array(x)*0+2),
}
)
for plotting_function in [px.scatter, px.line]:
fig = plotting_function(
df,
x = 'x',
y = 'y',
color = 'color',
title = f'Using {plotting_function.__name__}',
)
fig.show()
which produces the following two plots:
For some reason px.line is not producing the continuous color scale that I want, and in the documentation for px.scatter I cannot find how to join the points with lines. How can I produce a plot with a continuous color scale and lines joining the points for each trace?
This is the plot I want to produce:
I am not sure this is possible using only plotly.express. If you use px.line, then you can pass the argument markers=True as described in this answer, but from the px.line documentation it doesn't look like continuous color scales are supported.
UPDATED ANSWER: in order to have both a legend that groups both the lines and markers together, it's probably simpest to use go.Scatter with the argument mode='lines+markers'. You'll need to add the traces one at a time (by plotting each unique color portion of the data one at a time) in order to be able to control each line+marker group from the legend.
When plotting these traces, you will need some functions to retrieve the colors of the lines from the continuous color scale because go.Scatter won't know what color your lines are supposed to be unless you specify them - thankfully that has been answered here.
Also you won't be able to generate a colorbar adding the markers one color at a time, so to add a colorbar, you can plot all of the markers at once using go.Scatter, but use the argument marker=dict(size=0, color="rgba(0,0,0,0)", colorscale='Plasma', colorbar=dict(thickness=20)) to display a colorbar, but ensure that these duplicate markers are not visible.
Putting all of this together:
# import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
x = [1,2,3,4,5,6]
df = pd.DataFrame(
{
'x': x*3,
'y': list(np.array(x)) + list(np.array(x)**2) + list(np.array(x)**.5),
'color': list(np.array(x)*0) + list(np.array(x)*0+1) + list(np.array(x)*0+2),
}
)
# This function allows you to retrieve colors from a continuous color scale
# by providing the name of the color scale, and the normalized location between 0 and 1
# Reference: https://stackoverflow.com/questions/62710057/access-color-from-plotly-color-scale
def get_color(colorscale_name, loc):
from _plotly_utils.basevalidators import ColorscaleValidator
# first parameter: Name of the property being validated
# second parameter: a string, doesn't really matter in our use case
cv = ColorscaleValidator("colorscale", "")
# colorscale will be a list of lists: [[loc1, "rgb1"], [loc2, "rgb2"], ...]
colorscale = cv.validate_coerce(colorscale_name)
if hasattr(loc, "__iter__"):
return [get_continuous_color(colorscale, x) for x in loc]
return get_continuous_color(colorscale, loc)
# Identical to Adam's answer
import plotly.colors
from PIL import ImageColor
def get_continuous_color(colorscale, intermed):
"""
Plotly continuous colorscales assign colors to the range [0, 1]. This function computes the intermediate
color for any value in that range.
Plotly doesn't make the colorscales directly accessible in a common format.
Some are ready to use:
colorscale = plotly.colors.PLOTLY_SCALES["Greens"]
Others are just swatches that need to be constructed into a colorscale:
viridis_colors, scale = plotly.colors.convert_colors_to_same_type(plotly.colors.sequential.Viridis)
colorscale = plotly.colors.make_colorscale(viridis_colors, scale=scale)
:param colorscale: A plotly continuous colorscale defined with RGB string colors.
:param intermed: value in the range [0, 1]
:return: color in rgb string format
:rtype: str
"""
if len(colorscale) < 1:
raise ValueError("colorscale must have at least one color")
hex_to_rgb = lambda c: "rgb" + str(ImageColor.getcolor(c, "RGB"))
if intermed <= 0 or len(colorscale) == 1:
c = colorscale[0][1]
return c if c[0] != "#" else hex_to_rgb(c)
if intermed >= 1:
c = colorscale[-1][1]
return c if c[0] != "#" else hex_to_rgb(c)
for cutoff, color in colorscale:
if intermed > cutoff:
low_cutoff, low_color = cutoff, color
else:
high_cutoff, high_color = cutoff, color
break
if (low_color[0] == "#") or (high_color[0] == "#"):
# some color scale names (such as cividis) returns:
# [[loc1, "hex1"], [loc2, "hex2"], ...]
low_color = hex_to_rgb(low_color)
high_color = hex_to_rgb(high_color)
return plotly.colors.find_intermediate_color(
lowcolor=low_color,
highcolor=high_color,
intermed=((intermed - low_cutoff) / (high_cutoff - low_cutoff)),
colortype="rgb",
)
fig = go.Figure()
## add the lines+markers
for color_val in df.color.unique():
color_val_normalized = (color_val - min(df.color)) / (max(df.color) - min(df.color))
# print(f"color_val={color_val}, color_val_normalized={color_val_normalized}")
df_subset = df[df['color'] == color_val]
fig.add_trace(go.Scatter(
x=df_subset['x'],
y=df_subset['y'],
mode='lines+markers',
marker=dict(color=get_color('Plasma', color_val_normalized)),
name=f"line+marker {color_val}",
legendgroup=f"line+marker {color_val}"
))
## add invisible markers to display the colorbar without displaying the markers
fig.add_trace(go.Scatter(
x=df['x'],
y=df['y'],
mode='markers',
marker=dict(
size=0,
color="rgba(0,0,0,0)",
colorscale='Plasma',
cmin=min(df.color),
cmax=max(df.color),
colorbar=dict(thickness=40)
),
showlegend=False
))
fig.update_layout(
legend=dict(
yanchor="top",
y=0.99,
xanchor="left",
x=0.01),
yaxis_range=[min(df.y)-2,max(df.y)+2]
)
fig.show()
You can achieve this using only 2 more parameters in px.line:
markers=True
color_discrete_sequence=my_plotly_continuous_sequence
The complete code would look something like this (Note the list slicing [::4] so that the colors are well spaced):
import plotly.express as px
import pandas as pd
import numpy as np
x = [1, 2, 3, 4, 5, 6]
df = pd.DataFrame(
{
'x': x * 3,
'y': list(np.array(x)) + list(np.array(x) ** 2) + list(np.array(x) ** .5),
'color': list(np.array(x) * 0) + list(np.array(x) * 0 + 1) + list(np.array(x) * 0 + 2),
}
)
fig = px.line(
df,
x='x',
y='y',
color='color',
color_discrete_sequence=px.colors.sequential.Plasma[::4],
markers=True,
template='plotly'
)
fig.show()
This produces the following output.
In case you have more lines than the colors present in the colormap, you can construct a custom colorscale so that you get one complete sequence instead of a cycling sequence:
rgb = px.colors.convert_colors_to_same_type(px.colors.sequential.RdBu)[0]
colorscale = []
n_steps = 4 # Control the number of colors in the final colorscale
for i in range(len(rgb) - 1):
for step in np.linspace(0, 1, n_steps):
colorscale.append(px.colors.find_intermediate_color(rgb[i], rgb[i + 1], step, colortype='rgb'))
fig = px.line(df_e, x='temperature', y='probability', color='year', color_discrete_sequence=colorscale, height=900)
fig.show()
I have the following dataframe 'df_percentages':
df_percentages
Percentages_center Percentages zone2 Percentages total
Sleeping 77.496214 87.551742 12.202591
Low activity 21.339391 12.286724 81.511021
Middle activity 0.969207 0.124516 5.226317
High activity 0.158169 0.000000 1.009591
I am trying to create a vertically stacked bar-chart, with on the x-axis 3 seperate bars: one for 'Percentages_center', one for 'Percentages zone2' and one for 'Percentages total'. 1 bar should represent the percentages of sleeping, low activity, middle activity and high activity.
I've tried this using the following code, but I cant figure out how to make the bar chart:
x = ['Center', 'Zone2', 'Total']
plot = px.Figure(data=[go.Bar(
name = 'Sleeping (0-150 MP)',
x = x,
y = df_percentages['Percentages center']
),
go.Bar(
name = 'Low activity (151-2000 MP)',
x = x,
y = df_percentages['Percentages zone2']
),
go.Bar(
name = 'Middle activity (2001-6000 MP)',
x = x,
y = df_percentages['Percentages center']
),
go.Bar(
name = 'High activity (6000-10000)',
x = x,
y = df_percentages['Percentages zone2']
)
])
plot.update_layout(barmode='stack')
plot.show()
If you're open to plotly.express, I would suggest using
df = df.T # to get your df in the right shape
fig = px.bar(df, x = df.index, y = df.columns)
Plot:
Complete code:
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
df = pd.DataFrame({'Percentages_center': {'Sleeping': 77.496214,
'Low_activity': 21.339391,
'Middle_activity': 0.9692069999999999,
'High_activity': 0.158169},
'Percentages_zone2': {'Sleeping': 87.551742,
'Low_activity': 12.286724000000001,
'Middle_activity': 0.124516,
'High_activity': 0.0},
'Percentages_total': {'Sleeping': 12.202591,
'Low_activity': 81.511021,
'Middle_activity': 5.226317,
'High_activity': 1.009591}})
df = df.T
fig = px.bar(df, x = df.index, y = df.columns)
fig.show()
I have a number of charts, made with matplotlib and seaborn, that look like the example below.
I show how certain quantities evolve over time on a lineplot
The x-axis labels are not numbers but strings (e.g. 'Q1' or '2018 first half' etc)
I need to "extend" the x-axis to the right, with an empty period. The chart must show from Q1 to Q4, but there is no data for Q4 (the Q4 column is full of nans)
I need this because I need the charts to be side-by-side with others which do have data for Q4
matplotlib doesn't display the column full of nans
If the x-axis were numeric, it would be easy to extend the range of the plot; since it's not numeric, I don't know which x_range each tick corresponds to
I have found the solution below. It works, but it's not elegant: I use integers for the x-axis, add 1, then set the labels back to the strings. Is there a more elegant way?
This is the code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
from matplotlib.ticker import FuncFormatter
import seaborn as sns
df =pd.DataFrame()
df['period'] = ['Q1','Q2','Q3','Q4']
df['a'] = [3,4,5,np.nan]
df['b'] = [4,4,6,np.nan]
df = df.set_index( 'period')
fig, ax = plt.subplots(1,2)
sns.lineplot( data = df, ax =ax[0])
df_idx = df.index
df2 = df.set_index( np.arange(1, len(df_idx) + 1 ))
sns.lineplot(data = df2, ax = ax[1])
ax[1].set_xlim(1,4)
ax[1].set_xticklabels(df.index)
You can add these lines of code for ax[0]
left_buffer,right_buffer = 3,2
labels = ['Q1','Q2','Q3','Q4']
extanded_labels = ['']*left_buffer + labels + ['']*right_buffer
left_range = list(range(-left_buffer,0))
right_range = list(range(len(labels),len(labels)+right_buffer))
ticks_range = left_range + list(range(len(labels))) + right_range
aux_range = list(range(len(extanded_labels)))
ax[0].set_xticks(ticks_range)
ax[0].set_xticklabels(extanded_labels)
xticks = ax[0].xaxis.get_major_ticks()
for ind in aux_range[0:left_buffer]: xticks[ind].tick1line.set_visible(False)
for ind in aux_range[len(labels)+left_buffer:len(labels)+left_buffer+right_buffer]: xticks[ind].tick1line.set_visible(False)
in which left_buffer and right_buffer are margins you want to add to the left and to the right, respectively. Running the code, you will get
I may have actually found a simpler solution: I can draw a transparent line (alpha = 0 ) by plotting x = index of the dataframe, ie with all the labels, including those for which all values are nans, and y = the average value of the dataframe, so as to be sure it's within the range:
sns.lineplot(x = df.index, y = np.ones(df.shape[0]) * df.mean().mean() , ax = ax[0], alpha =0 )
This assumes the scale of the y a xis has not been changed manually; a better way of doing it would be to check whether it has:
y_centre = np.mean([ax[0].get_ylim()])
sns.lineplot(x = df.index, y = np.ones(df.shape[0]) * y_centre , ax = ax[0], alpha =0 )
Drawing a transparent line forces matplotlib to extend the axes so as to show all the x values, even those for which all the other values are nans.