How to label these points on the scatter plot - python

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_excel("path to the file")
fig, ax = plt.subplots()
fig.set_size_inches(7,3)
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
df.plot.scatter(x='Age',
y='Pos',
c='DarkBlue', xticks=([15,20,25,30,35,40]))
plt.show()
Got the plot but not able to label these points

Provided you'd like to label each point, you can loop over each coordinate plotted, assigning it a label using plt.text() at the plotted point's position, like so:
from matplotlib import pyplot as plt
y_points = [i for i in range(0, 20)]
x_points = [(i*3) for i in y_points]
offset = 5
plt.figure()
plt.grid(True)
plt.scatter(x_points, y_points)
for i in range(0, len(x_points)):
plt.text(x_points[i] - offset, y_points[i], f'{x_points[i]}')
plt.show()
In the above example it will give the following:
The offset is just to make the labels more readable so that they're not right on top of the scattered points.
Obviously we don't have access to your spreadsheet, but the same basic concept would apply.
EDIT
For non numerical values, you can simply define the string as the coordinate. This can be done like so:
from matplotlib import pyplot as plt
y_strings = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
x_values = [i for i, string in enumerate(y_strings)]
# Plot coordinates:
plt.scatter(x_values, y_strings)
for i, string in enumerate(y_strings):
plt.text(x_values[i], string, f'{x_values[i]}:{string}')
plt.grid(True)
plt.show()
Which will provide the following output:

Related

Place ellipsis on seaborn catplot

I have a seaborn.catplot that looks like this:
What I am trying to do is highlight differences in the graph with the following rules:
If A-B > 4, color it green
If A-B < -1, color it red
If A-B = <2= and >=0, color it blue
I am looking to produce something akin to the below image:
I have an MRE here:
# Stack Overflow Example
import numpy as np, pandas as pd, seaborn as sns
from random import choice
from string import ascii_lowercase, digits
chars = ascii_lowercase + digits
lst = [''.join(choice(chars) for _ in range(2)) for _ in range(100)]
np.random.seed(8)
t = pd.DataFrame(
{
'Key': [''.join(choice(chars) for _ in range(2)) for _ in range(5)]*2,
'Value': np.random.uniform(low=1, high=10, size=(10,)),
'Type': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B']
}
)
ax = sns.catplot(data=t, x='Value', y='Key', hue='Type', palette="dark").set(title="Stack Overflow Help Me")
plt.show()
I believe an ellipsis will need to be plotted around the points of interest, and I have looked into some questions:
Creating a Confidence Ellipses in a sccatterplot using matplotlib
plot ellipse in a seaborn scatter plot
But none seem to be doing this with catplot in particular, or with customizing their color and with rules.
How can I achieve the desired result with my toy example?
You could create ellipses around the midpoint of A and B, using the distance between A and B, increased by some padding, as width. The height should be a bit smaller than 1.
To get a full outline and transparent inner color, to_rgba() can be used. Setting the zorder to a low number puts the ellips behind the scatter points.
sns.scatterplot is an axes-level equivalent for sns.catplot, and is easier to work with when there is only one subplot.
Making the Key column of type pd.Categorical gives a fixed relation between y-position and label.
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
from matplotlib.colors import to_rgba
import seaborn as sns
import pandas as pd
import numpy as np
from string import ascii_lowercase, digits
chars = ascii_lowercase + digits
num = 9
df = pd.DataFrame({'Key': [''.join(np.random.choice([*chars], 2)) for _ in range(num)] * 2,
'Value': np.random.uniform(low=1, high=10, size=2 * num),
'Type': np.repeat(['A', 'B'], num)})
df['Key'] = pd.Categorical(df['Key']) # make the key categorical for a consistent ordering
sns.set_style('white')
ax = sns.scatterplot(data=df, x='Value', y='Key', hue='Type', palette="dark")
df_grouped = df.groupby(['Key', 'Type'])['Value'].mean().unstack()
for y_pos, y_label in enumerate(df['Key'].cat.categories):
A = df_grouped.loc[y_label, 'A']
B = df_grouped.loc[y_label, 'B']
dif = A - B
color = 'limegreen' if dif > 4 else 'crimson' if dif < -1 else 'dodgerblue' if 0 <= dif < 2 else None
if color is not None:
ell = Ellipse(xy=((A + B) / 2, y_pos), width=abs(dif) + 0.8, height=0.8,
fc=to_rgba(color, 0.1), lw=1, ec=color, zorder=0)
ax.add_patch(ell)
plt.tight_layout()
plt.show()

Line color as a function of column values in pandas dataframe

I am trying to plot two columns of a pandas dataframe against each other, grouped by a values in a third column. The color of each line should be determined by that third column, i.e. one color per group.
For example:
import pandas as pd
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3]})
df.groupby('colors').plot('x','y',ax=ax)
If I do it this way, I end up with three different lines plotting x against y, with each line a different color. I now want to determine the color by the values in 'colors'. How do I do this using a gradient colormap?
Looks like seaborn is applying the color intensity automatically based on the value in hue..
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2,3,4,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3,1.5,1.5,1.5]})
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors')
Gives:
you can change the colors by adding palette argument as below:
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
#more combinations : viridis, mako, flare, etc.
gives:
Edit (for colormap):
based on answers at Make seaborn show a colorbar instead of a legend when using hue in a bar plot?
import seaborn as sns
fig = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
norm = plt.Normalize(vmin = df['colors'].min(), vmax = df['colors'].max())
sm = plt.cm.ScalarMappable(cmap="mako", norm = norm)
fig.figure.colorbar(sm)
fig.get_legend().remove()
plt.show()
gives..
Hope that helps..
Complementing to Prateek's very good answer, once you have assigned the colors based on the intensity of the palette you choose (for example Mako):
plots = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors',palette='mako')
You can add a colorbar with matplotlib's function plt.colorbar() and assign the palette you used:
sm = plt.cm.ScalarMappable(cmap='mako')
plt.colorbar(sm)
After plt.show(), we get the combined output:

How to change this multiple line graph into a bar graph?

I have written a python code to generate multiple line graph. I want to change it to a bar graph, such that for every point(Time,pktCount) I get a bar depicting that value on that time.
code
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('C:\\Users\\Hp\\Documents\\XYZ.csv')
fig, ax = plt.subplots()
for i, group in df.groupby('Source'):
group.plot(x='Time', y='PktCount', ax=ax,label=group["Source"].iloc[0])
ax.set_title("PktCount Sent by nodes")
ax.set_ylabel("PktCount")
ax.set_xlabel("Time (milliSeconds)")
#optionally only set ticks at years present in the years column
plt.legend(title="Source Nodes", loc=0, fontsize='medium', fancybox=True)
plt.show()
This is my csv file :
Source,Destination,Bits,Energy,PktCount,Time
1,3,320,9.999983999999154773195,1,0
3,1,96,9.999979199797145566758,1,1082
3,4,320,9.999963199267886912408,2,1773
4,3,96,9.999974399702292006927,1,2842
1,3,320,9.999947199998309546390,2,7832
3,1,96,9.999937599065032479166,3,8965
3,4,320,9.999921598535773824816,4,10421
4,3,96,9.999948799404584013854,2,11822
2,3,384,9.999907199998736846248,1,13796
3,2,96,9.999892798283143074166,5,14990
1,3,320,9.999886399997464319585,3,18137
3,4,384,9.999873597648032688946,6,18488
3,4,384,9.999854397012922303726,7,25385
4,3,96,9.999919999106876020781,3,26453
1,3,320,9.999831999996619092780,4,27220
3,1,96,9.999828796810067870484,8,28366
2,3,384,9.999823999997473692496,2,31677
3,2,96,9.999804796557437119834,9,32873
1,3,320,9.999787199995773865975,5,34239
3,1,96,9.999783996354582686592,10,35370
1,3,320,9.999766399994928639170,6,41536
3,1,96,9.999763196151728253350,11,42667
1,3,320,9.999745599994083412365,7,49060
3,1,96,9.999742395948873820108,12,50192
2,3,384,9.999742399996210538744,3,50720
3,2,96,9.999718395696243069458,13,51925
You can explicitly collect values for x and y axis in two lists, then plot them separately:-
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('C:\\Users\\Hp\\Documents\\XYZ.csv')
fig, ax = plt.subplots()
x1, y1 = [], []
for i, group in df.groupby('Source'):
#collecting all values in these lists
x1.append(group['Time'].values.tolist())
y1.append(group['PktCount'].values.tolist())
ax.set_title("PktCount Sent by nodes")
ax.set_ylabel("PktCount")
ax.set_xlabel("Time (milliSeconds)")
color_l = ['r', 'y', 'g', 'b']
i = 0
for a, b in zip(x1, y1):
ax.bar(a, b, width = 400, color = color_l[i])
i += 1
plt.legend(('1', '2', '3', '4'))
plt.show()

Matplotlib - plotting grouped values with a for loop

I'm trying to plot a graph grouped by column values using a for loop without knowing the number of unique values in that column.
You can see sample code below (without a for loop) and the desired output.
I would like that each plot will have different color and marker (as seen below).
This is the code:
import pandas as pd
from numpy import random
df = pd.DataFrame(data = random.randn(5,4), index = ['A','B','C','D','E'],
columns = ['W','X','Y','Z'])
df['W'] = ['10/01/2018 12:00:00','10/03/2018 13:00:00',
'10/03/2018 12:30:00','10/04/2018 12:05:00',
'10/08/2018 12:00:15']
df['W']=pd.to_datetime(df['W'])
df['Entity'] = ['C201','C201','C201','C202','C202']
print(df.head())
fig, ax = plt.subplots()
df[df['Entity']=="C201"].plot(x="W",y="Y",label='C201',ax=ax,marker='x')
df[df['Entity']=="C202"].plot(x="W",y="Y",label='C202',ax=ax, marker='o')
This is the output:
You can first find out the unique values of your df['Entity'] and then loop over them. To generate new markers automatically for each Entity, you can define an order of some markers (let's say 5 in the answer below) which will repeat via marker=next(marker).
Complete minimal answer
import itertools
import pandas as pd
from numpy import random
import matplotlib.pyplot as plt
marker = itertools.cycle(('+', 'o', '*', '^', 's'))
df = pd.DataFrame(data = random.randn(5,4), index = ['A','B','C','D','E'],
columns = ['W','X','Y','Z'])
df['W'] = ['10/01/2018 12:00:00','10/03/2018 13:00:00',
'10/03/2018 12:30:00','10/04/2018 12:05:00',
'10/08/2018 12:00:15']
df['W']=pd.to_datetime(df['W'])
df['Entity'] = ['C201','C201','C201','C202','C202']
fig, ax = plt.subplots()
for idy in np.unique(df['Entity'].values):
df[df['Entity']==idy].plot(x="W",y="Y", label=idy, ax=ax, marker=next(marker))
plt.legend()
plt.show()

Independent axis for each subplot in pandas boxplot

The below code helps in obtaining subplots with unique colored boxes. But all subplots share a common set of x and y axis. I was looking forward to having independent axis for each sub-plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
df = pd.DataFrame(np.random.rand(140, 4), columns=['A', 'B', 'C', 'D'])
df['models'] = pd.Series(np.repeat(['model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'], 20))
bp_dict = df.boxplot(
by="models",layout=(2,2),figsize=(6,4),
return_type='both',
patch_artist = True,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
for row_key, (ax,row) in bp_dict.iteritems():
ax.set_xlabel('')
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
Here is an output of the above code:
I am trying to have separate x and y axis for each subplot...
You need to create the figure and subplots before hand and pass this in as an argument to df.boxplot(). This also means you can remove the argument layout=(2,2):
fig, axes = plt.subplots(2,2,sharex=False,sharey=False)
Then use:
bp_dict = df.boxplot(
by="models", ax=axes, figsize=(6,4),
return_type='both',
patch_artist = True,
)
You may set the ticklabels visible again, e.g. via
plt.setp(ax.get_xticklabels(), visible=True)
This does not make the axes independent though, they are still bound to each other, but it seems like you are asking about the visibilty, rather than the shared behaviour here.
If you really think it is necessary to un-share the axes after the creation of the boxplot array, you can do this, but you have to do everything 'by hand'. Searching a while through stackoverflow and looking at the matplotlib documentation pages I came up with the following solution to un-share the yaxes of the Axes instances, for the xaxes, you would have to go analogously:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
from matplotlib.ticker import AutoLocator, AutoMinorLocator
##using differently scaled data for the different random series:
df = pd.DataFrame(
np.asarray([
np.random.rand(140),
2*np.random.rand(140),
4*np.random.rand(140),
8*np.random.rand(140),
]).T,
columns=['A', 'B', 'C', 'D']
)
df['models'] = pd.Series(np.repeat([
'model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'
], 20))
##creating the boxplot array:
bp_dict = df.boxplot(
by="models",layout = (2,2),figsize=(6,8),
return_type='both',
patch_artist = True,
rot = 45,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
##adjusting the Axes instances to your needs
for row_key, (ax,row) in bp_dict.items():
ax.set_xlabel('')
##removing shared axes:
grouper = ax.get_shared_y_axes()
shared_ys = [a for a in grouper]
for ax_list in shared_ys:
for ax2 in ax_list:
grouper.remove(ax2)
##setting limits:
ax.axis('auto')
ax.relim() #<-- maybe not necessary
##adjusting tick positions:
ax.yaxis.set_major_locator(AutoLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
##making tick labels visible:
plt.setp(ax.get_yticklabels(), visible=True)
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
The resulting plot looks like this:
Explanation:
You first need to tell each Axes instance that it shouldn't share its yaxis with any other Axis instance. This post got me into the direction of how to do this -- Axes.get_shared_y_axes() returns a Grouper object, that holds references to all other Axes instances with which the current Axes should share its xaxis. Looping through those instances and calling Grouper.remove does the actual un-sharing.
Once the yaxis is un-shared, the y limits and the y ticks need to be adjusted. The former can be achieved with ax.axis('auto') and ax.relim() (not sure if the second command is necessary). The ticks can be adjusted by using ax.yaxis.set_major_locator() and ax.yaxis.set_minor_locator() with the appropriate Locators. Finally, the tick labels can be made visible using plt.setp(ax.get_yticklabels(), visible=True) (see here).
Considering all this, #DavidG's answer is in my opinion the better approach.

Categories

Resources