I have a series of scatterplots (one example below), but I want to modify it so that the colors of the points in the plot become more red (or "hot") when they are clustered more closely with other points, while points that are spread out further are colored more blue (or "cold"). Is it possible to do this?
Currently, my code is pretty basic in its set up.
import plotly.express as px
fig = px.scatter(data, x='A', y='B', trendline='ols')
Using scipy.stats.gaussian_kde you can calculate the density and then use this to color the plot:
import pandas as pd
import plotly.express as px
from scipy import stats
df = pd.DataFrame({
'x':[0,0,1,1,2,2,2.25,2.5,2.5,3,3,4,2,4,8,2,2.75,3.5,2.5],
'y':[0,2,3,2,1,2,2.75,2.5,3,3,4,1,5,4,8,4,2.75,1.5,3.25]
})
kernel = stats.gaussian_kde([df.x, df.y])
df['z'] = kernel([df.x, df.y])
fig = px.scatter(df, x='x', y='y', color='z', trendline='ols', color_continuous_scale=px.colors.sequential.Bluered)
output:
Related
I am moving a visualization from the seaborn library to the plotly library.
I want a scatterplot with marginal histograms for my x and y variables.
I want to show a vertical and horizontal line for the average of my x and y variables.
To show my problem I created a dummy dataframe of random X and Y values
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=list('XY'))
I want to replicate the seaborn library output on plotly.
g = sns.JointGrid(data=df, x="X", y="Y")
g.plot(sns.scatterplot, sns.histplot)
g.refline(x=df.X.mean(), y=df.Y.mean())
plt.show()
When I do something similar on plotly using add_line and add_hline I get the following:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=list('XY'))
fig = px.scatter(df, x='X', y='Y',
marginal_y="histogram", marginal_x="histogram",
width=800, height=600)
fig.add_hline(y=df['X'].mean(), line_dash='dash', annotation_text= f"{df['X'].mean():.0f}")
fig.add_vline(x=df['Y'].mean(), line_dash='dash', annotation_text=f"{df['Y'].mean():.0f}")
fig.show()
My issue is that the horizontal line is plotted also plotted on the marginal x plot and the vertical line is also plotted on the marginal y plot. Is there a way to prevent the horizontal line to be plotted on the marginal x plot and the vertical line to be plotted on the marginal y plot?
You can define row and column of the panel grid when plotting hline/vline:
fig.add_hline(y=df['X'].mean(), line_dash='dash', row=1, annotation_text= f"{df['X'].mean():.0f}")
fig.add_vline(x=df['Y'].mean(), line_dash='dash', col=1, annotation_text=f"{df['Y'].mean():.0f}")
Sample output:
Here is my code:
import pandas as pd
import matplotlib.pyplot as plt
wine = pd.read_csv('red wine quality.csv')
wine = wine.dropna()
plt.figure()
wine.plot.scatter(x = 'pH', y = 'alcohol', c = 'quality', alpha = 0.4,\
cmap = plt.get_cmap('jet'), colorbar = True)
plt.savefig('scatter plot.png')
plt.tight_layout()
plt.show()
Here is the plot that I get:
I get a scatter plot with the y-axis labeled as 'alcohol' ranging from 9-15 and the color bar labeled as 'quality' ranging from 3-8. I thought I had designated in my code that the x-axis would show up labeled as 'pH', but I get nothing. I have tried adjusting my figsize down to [8, 8], setting dpi to 100, and labeling the axes, but nothing will make the x-axis show up. What am I doing wrong?
Here is an MCVE to play with:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(50, 4), columns=['A', 'B', 'C', 'D'])
plt.figure()
df.plot.scatter(x = 'A', y = 'B', c = 'C', alpha = 0.4,\
cmap = plt.get_cmap('jet'), colorbar = True)
plt.tight_layout()
plt.show()
X axis label and minor tick labels not showing on Pandas scatter plot is a known and open issue with Pandas (see BUG: Scatterplot x-axis label disappears with colorscale when using matplotlib backend ยท Issue #36064).
This bug occurs with Jupyter notebooks displaying Pandas scatterplots that have a colormap while using Matplotlib as the plotting backend. The simplest workaround is passing sharex=False to pandas.DataFrame.plot.scatter.
See Make pandas plot() show xlabel and xvalues
I'm trying to visualize correlations using a heatmap in matplotlib (1.4.3), which works fine. I'd like to highlight specific cells/points in the heatmap, and my first guess was to overlay a second plot that creates the highlights. As imshow creates a new window, this does not work as intended, though. A condensed version of my code is below. Is there another way to render something matrix-like on top of an existing figure?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(4, 4), columns=list('ABCD'))
corrmatrix = df.corr()
fig, ax = plt.subplots()
im = ax.imshow(corrmatrix, cmap='afmhot', interpolation='none')
plt.colorbar(im)
ax.set_xticks(np.arange(len(df.columns)))
ax.set_xticklabels(df.columns)
ax.set_yticks(np.arange(len(df.columns)))
ax.set_yticklabels(df.columns)
relevant_cells = df > 0.9
rel_ax = ax.imshow(relevant_cells, cmap='YlOrBr', interpolation='none')
plt.show()
Emphasis can be achieved by overlaying the two heatmaps and adjusting them by transparency. The color map has been intentionally changed for clarity: if C,C and A,C is True
rel_ax = ax.imshow(relevant_cells, cmap='Blues', interpolation='none', alpha=0.7)
i am trying to overlay two sets of latitude and longitude plots so that the first set has points of one color and the second set of points has a different color plotted on the same map. I have tried to share the same axis (ax) but it keeps plotting the points in 2 maps instead of 1 single map with both sets or colors of points. My code looks like this:
from sys import exit
from shapely.geometry import Point
import geopandas as gpd
from geopandas import GeoDataFrame as gdf
from shapely.geometry import Point, LineString
import pandas as pd
import matplotlib.pyplot as plt
dfp = pd.read_csv("\\\porfiler03\\gtdshare\\Long_Lats_90p.csv", delimiter=',', skiprows=0,
low_memory=False)
geometry = [Point(xy) for xy in zip(dfp['Longitude'], dfp['Latitude'])]
gdf = gpd.GeoDataFrame(dfp, geometry=geometry)
#this is a simple map that goes with geopandas
fig, ax = plt.subplots()
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
#world = world[(world.name=="Spain")]
gdf.plot(ax=world.plot(figsize=(10, 6)), marker='o', color='red', markersize=15);
dfn = pd.read_csv("\\\porfiler03\\gtdshare\\Long_Lats_90n.csv",
delimiter=',', skiprows=0,
low_memory=False)
geometry = [Point(xy) for xy in zip(dfn['Longitude'], dfn['Latitude'])]
gdf = gpd.GeoDataFrame(dfn, geometry=geometry)
#this is a simple map that goes with geopandas
#world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf.plot(ax=world.plot(figsize=(10, 6)), marker='o', color='yellow',
markersize=15);
My first plot looks like the second plot below but with red points in USA and Spain:
My second plot looks like this:
Thank you in helping me overlay these two different sets of points and colors into one map.
In your case, you want to plot 3 geodataframes (world, gdf1, and gdf2) on single axes. Then, after you create fig/axes, you must reuse the same axes (say, ax1) for each plot. Here is the summary of important steps:
Create figure/axes
fig, ax1 = plt.subplots(figsize=(5, 3.5))
Plot base map
world.plot(ax=ax1)
Plot a layer
gdf1.plot(ax=ax1)
Plot more layer
gdf2.plot(ax=ax1)
Hope this helps.
I have seaborn heatmap and I would like to plot a lineplot on top of it while using the same x and y axis that the heatmap is using.
I expected the line to behave like in this post and take up most of the space of the heatmap, but instead the output I got was the following plot where it only occupied a small section of the heatmap. How can I make the line take up most of the space in the heatmap?
Below is the minimal working example that produced the plot I linked above.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
num = 11
a = np.eye(num)
x = np.round(np.linspace(0, 1, num=num), 1)
y = np.round(np.linspace(0, 1, num=num), 1)
df = pd.DataFrame(a, columns=x, index=y)
f, ax = plt.subplots()
ax = sns.heatmap(df, cbar=False)
ax.axes.invert_yaxis()
sns.lineplot(x=x, y=y)
plt.show()
Perhaps just a simple fix here:
sns.lineplot(x=x*num, y=y*num)