I'm plotting a scatter plot from a Pandas dataframe in Matplotlib. Here is what the dataframe looks like:
X Y R
0 1 945 1236.334519
0 1 950 212.809352
0 1 950 290.663847
0 1 961 158.156856
And here is how i'm plotting the Dataframe:
ax1.scatter(myDF.X, myDF.Y, s=20, c='red', marker='s', alpha=0.5)
My problem is that i want to change how the marker is plotted according to how high or low the value of R is.
Example: if R is higher than 1000 (as it is in the first row of my example), color should be yellow instead of red and alpha should be 0.8 instead of 0.5. If R is lower than 1000, color should be blue and alpha should be 0.4 and so on.
Is there any way to do that or can i only use different dataframe with different data? Thanks in advance!
You can do a custom RGBA color array:
colors = [(1,1,0,0.8) if x>1000 else (1,0,0,0.4) for x in df.R]
plt.scatter(df.X,df.Y, c=colors)
Output:
Related
I currently have a dataframe, df:
In [1]: df
Out [1]:
one two
1.5 11.22
2 15.36
2.5 11
3.3 12.5
3.5 14.78
5 9
6.2 26.14
I used this code to get a heat map:
In [2]:
plt.figure(figsize=(30, 7))
plt.title('Test')
ax = sns.heatmap(data=df, annot=True,)
plt.xlabel('Test')
ax.invert_yaxis()
value = 6
index = np.abs(df.index - value).argmin()
ax.axhline(index + .5, ls='--')
print(index)
Out [2]:
I am looking for the y-axis, instead, to automatically scale and plot the df[2] values in their respective positions on the full axis. For example, there should be a clear empty space between 3.5 and 5.0 as there aren’t any values - I want the values in between on the y-axis with 0 value against them.
This can be easily achieved with a bar plot instead:
plt.bar(df['one'], df['two'], color=list('rgb'), width=0.2, alpha=0.4)
I need to build custom seaborn heatmap-like plot according to these requirements:
import pandas as pd
df = pd.DataFrame({"A": [0.3, 0.8, 1.3],
"B": [4, 9, 15],
"C": [650, 780, 900]})
df_info = pd.DataFrame({"id": ["min", "max"],
"A": [0.5, 0.9],
"B": [6, 10],
"C": [850, 880]})
df_info = df_info.set_index('id')
df
A B C
0 0.3 4 650
1 0.8 9 780
2 1.3 15 900
df_info
id A B C
min 0.5 6 850
max 0.9 10 880
Each value within df is supposed to be within a range defined in df_info.
For example the values for the column A are considered normal if they are within 0.5 and 0.9. Values that are outside the range should be colorized using a custom heatmap.
In particular:
Values that fall within the range defined for each column should not be colorized, plain black text on white background cell.
Values lower than min for that column should be colorized, for example in blue. The lower their values from the min the darker the shade of blue.
Values higher than max for that column should be colorized, for example in red. The higher their values from the max the darker the shade of red.
Q: I wouldn't know how to approach this with a standard heatmap, I'm not even sure I can accomplish this with a heatmap plot. Any suggestion?
As far as I know, a heatmap can only have one scale of values. I would suggest normalizing the data you have in the df dataframe so the values in every column follow:
between 0 and 1 if the value is between df_info's min max
below 0 if the value is below df_info's min
above 1 if the value is above df_info's max
To normalize your dataframe use :
for col in df:
df[col] = (df[col] - df_info[col]['min']) / (df_info[col]['max'] - df_info[col]['min'])
Finally, to create the color-coded heatmap use :
import seaborn as sns
from matplotlib.colors import LinearSegmentedColormap
vmin = df.min().min()
vmax = df.max().max()
colors = [[0, 'darkblue'],
[- vmin / (vmax - vmin), 'white'],
[(1 - vmin)/ (vmax - vmin), 'white'],
[1, 'darkred']]
cmap = LinearSegmentedColormap.from_list('', colors)
sns.heatmap(df, cmap=cmap, vmin=vmin, vmax=vmax)
The additional calculations with vmin and vmax allow a dynamic scaling of the colormap depending on the differences with the minimums and maximums.
Using your input dataframe we have the following heatmap:
Given the following DF of user RFM activity:
uid R F M
0 1 10 1 5
1 1 2 2 10
2 1 4 3 1
3 1 5 4 10
4 2 10 1 3
5 2 1 2 10
6 2 1 3 4
Recency: The time between the last purchase and today, represented by
the distance between the rightmost circle and the vertical dotted line
that's labeled Now.
Frequency: The time between purchases, represented by the distance
between the circles on a single line.
Monetary: The amount of money spent on each purchase, represented by
the size of the circle. This amount could be the average order value
or the quantity of products that the customer ordered.
I would like to plot something like the figure below:
Where the size of the circle is the M value and the distance is the R. Any help would be appreciated.
Update
As suggested by Diziet Asahi I've tried the following:
import matplotlib.pyplot as plt
def plot_users(df):
fig, ax = plt.subplots()
ax.axis('off')
ax.scatter(x=df['M'],y=df['uid'],s=30*df['R'], marker='o', color='grey')
ax.invert_xaxis()
ax.axvline(0, ls='--', color='black', zorder=-1)
for y in df['uid'].unique():
ax.axhline(y, color='grey', zorder=-1)
tmp = pd.DataFrame({'uid':[1,1,1,1,2,2,2],'R':[10,2,4,5,10,1,1],'F':[1,2,3,4,1,3,4],'M':[5,10,1,10,3,10,4]})
plot_users(tmp)
And I get the following:
So I think there is a bug, since first user has 4 records and the sizes also doesn't match.
you can use matplotlib's scatter() with the s= argument to draw markers with an area proportional to the value in M. The rest is just tweaking the appearance of the plot.
c = 'xkcd:dark grey'
fig, ax = plt.subplots()
ax.axis('off')
ax.scatter(x=df['R'],y=df['uid'],s=60*df['M'], marker='o', color=c)
ax.invert_xaxis()
ax.axvline(0, ls='--', color=c, zorder=-1)
for y in df['uid'].unique():
ax.axhline(y, color=c, zorder=-1)
ax.set_ymargin(1)
Data in form:
x1 x2
data= 2104, 3
1600, 3
2400, 3
1416, 2
3000, 4
1985, 4
y= 399900
329900
369000
232000
539900
299900
I want to plot scatter plot which have got 2 X feature {x1 and x2} and single Y,
but when I try
y=data.loc[:'y']
px=data.loc[:,['x1','x2']]
plt.scatter(px,y)
I get:
'ValueError: x and y must be the same size'.
So I tried this:
data=pd.read_csv('ex1data2.txt',names=['x1','x2','y'])
px=data.loc[:,['x1','x2']]
x1=px['x1']
x2=px['x2']
y=data.loc[:'y']
plt.scatter(x1,x2,y)
This time I got blank graph with full blue color painted inside.
I will be great full if i get some guide
You can only plot with one x and several y's. You could plot the different x's in a twiny axis:
fig, ax = plt.subplots()
ay = ax.twiny()
ax.scatter(df['x1'], df['y'])
ay.scatter(df['x2'], df['y'], color='r')
plt.show()
Output:
You can check the pandas functions for plotting dataframe content, it's very powerful.
But if you want to use matplotlib you can check the documentation (https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.scatter.html), and it's said that X and Y must be array-like. You are instead passing a list.
So the working code it's like this:
data = pd.read_csv("test.txt", header=None)
data
0 1 2
0 2104 3 399900
1 1600 3 329900
2 2400 3 369000
3 1416 2 232000
4 3000 4 539900
5 1985 4 299900
data.columns = ["x1", "x2", "y"]
data
x1 x2 y
0 2104 3 399900
1 1600 3 329900
2 2400 3 369000
3 1416 2 232000
4 3000 4 539900
5 1985 4 299900
# If you call scatter many times and then plt.show() a single image is created
plt.scatter(data["x1"], data["y"])
plt.scatter(data["x2"], data["y"])
plt.show()
Note that if you want to have data in an array format you can do data["x1"].values and it will return an ndarray.
You could use seaborn with a melted dataframe. seaborn.scatterplot has a hue argument, which allows to include multiple data series.
import seaborn as sns
ax = sns.scatterplot(x='value', hue='series', y='y',
data=data.melt(value_vars=['x1', 'x2'],
id_vars='y',
var_name='series'))
However, if your x values are that different, you might want to use twin axes, as in #Quang Hoang's answer.
I have a df that looks like below:
S.No Date A
0 12/07/03 76
1 12/07/13 1
2 12/07/23 32
3 12/08/03 12
4 12/08/04 22
5 12/08/05 11
I want to have a plot where the Y axis is A and X axis the Date, and the problem is with the color. I want all the occurences of 76 in red, 32 in blue and all other values of A in green color. Is this possible?
Yes, you can do so:
# define the color according to the values of df['A']
colors = np.select((df['A'].eq(76), df['A'].eq(32)), ('r','b'), 'g')
# pass the color to plt.scatter
plt.scatter(x=df['Date'],y=df['A'], c=colors)
Output: