Plotting (x,y) point to point connections with python - python

I am trying to plot a point to point line plot in python.
My data is in a pandas dataframe as below..
df = pd.DataFrame({
'x_coordinate': [0, 0, 0, 0, 1, 1,-1,-1,-2,0],
'y_coordinate': [0, 2, 1, 3, 3, 1,1,-2,2,-1],
})
print(df)
x_coordinate y_coordinate
0 0 0
1 0 2
2 0 1
3 0 3
4 1 3
5 1 1
6 -1 1
7 -1 -2
8 -2 2
9 0 -1
when I plot this, it is joining from point to point as in the order in the df.
df.plot('x_coordinate','y_coordinate')
But, is there a way, I can plot an order number next to it ? I mean the order it is travelling. Say 1 for the first connection from (0,0) to (0,2) and 2 from (0,2) to (0,1) and so on ?

The plot is OK. If you want to check how each vertex is plotted, you need modified data. Here is the modified data (x only) and the plot.
df = pd.DataFrame({
'x_coordinate': [0.1, 0.2, 0.3, 0.4, 1.5, 1.6,-1.7,-1.8,-2.9,0.1],
'y_coordinate': [0, 2, 1, 3, 3, 1,1,-2,2,-1],
})
Edit
For your new request, the code is modified as follows (full runnable code).
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({
'x_coordinate': [0.1, 0.2, 0.3, 0.4, 1.5, 1.6,-1.7,-1.8,-2.9,0.1],
'y_coordinate': [0, 2, 1, 3, 3, 1,1,-2,2,-1],
})
fig = plt.figure(figsize=(6,5))
ax1 = fig.add_subplot(1, 1, 1)
df.plot('x_coordinate','y_coordinate', legend=False, ax=ax1)
for ea in zip(np.array((range(len(df)))), df.x_coordinate.values, df.y_coordinate.values):
text, x, y = "P"+str(ea[0]), ea[1], ea[2]
ax1.annotate(text, (x,y))

I found an easier way to do it.. Thought to share..
fig, ax = plt.subplots()
df.plot('x_coordinate','y_coordinate',ax=ax)
for k, v in df[['x_coordinate','y_coordinate']].iterrows():
ax.annotate('p'+str(k+1), v)
plt.show()

Related

if y is a pandas series object with 0 and 1, then what does y.values==0,1 or y.values==0,0 means?

y= pd.Series([0,1,0,1,1,0])
In the code below they have used this and i am stuck on this point. what does y.values==0,0 means and how all the other combination are different from one another.
plt.figure(dpi=120)
plt.scatter(pca[y.values==0,0], pca[y.values==0,1], alpha=0.5, label='Edible', s=2)
plt.scatter(pca[y.values==1,0], pca[y.values==1,1], alpha=0.5, label='Poisonous', s=2)
plt.legend()
Suppose the following numpy array pca and Series y:
import pandas as pd
import numpy as np
pca = np.arange(0, 12).reshape(-1, 2)
y = pd.Series([0, 1, 0, 1, 1, 0])
# pca
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
# y
0 0
1 1
2 0
3 1
4 1
5 0
dtype: int64
To get elements from a 2D array, you have to pass the coordinates of rows and columns you want to get:
# Get rows from pca where y==0 and get the first column (0)
>>> pca[y.values==0, 0] # or pca[y==0, 0]
array([ 0, 4, 10])
# Get rows from pca where y==0 and get the second column (1)
>>> pca[y.values==0, 1] # or pca[y==0, 1]
array([ 1, 5, 11])
# This is the same for other scatter line.
Instead of pass selected rows explicitly, here you are using a boolean mask y==0. It means you return another Series with the same length of y with boolean values:
>>> y == 0 # Original
0 True # 0
1 False # 1
2 True # 0
3 False # 1
4 False # 1
5 True # 0
dtype: bool

Bin data by x and y columns, and output the mean of a third column

I have three columns like so:
x y Value
0.5 0.5 3
2.3 1.2 5
2.7 1.6 10
3.3 4.1 4
3.5 4.2 6
3.8 4.6 8
I want to bin columns x and y and find the mean of column Value. ie. the average would be found between points (3,3) and (4,4), which would equal to 4+6+8/3 = 6. So the output should be like so:
x_bin y_bin mean_value
0, 1 0, 1 3
0, 1 1, 2 0
0, 1 2, 3 0
0, 1 3, 4 0
1, 2 0, 1 0
1, 2 1, 2 0
1, 2 2, 3 0
1, 2 3, 4 0
2, 3 0, 1 0
2, 3 1, 2 7.5
2, 3 2, 3 0
2, 3 3, 4 0
3, 4 0, 1 0
3, 4 1, 2 0
3, 4 2, 3 0
3, 4 3, 4 6
Ideally, I would like the output in a format where I could plot this as a heatmap grid.
Thanks in advance.
np.histogram2d is numpy's function to bin 2D data. By default, the result counts the points into each bin. With the weights= parameter, the points are weighted, and those weights are summed. Dividing the summed weights by the counts gives the means.
Seaborn's sns.heatmap can display these means and automatically annotate the cell values.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
x = [0.5, 2.3, 2.7, 3.3, 3.5, 3.8]
y = [0.5, 1.2, 1.6, 4.1, 4.2, 4.6]
value = [3, 5, 10, 4, 6, 8]
bins = (np.arange(6), np.arange(6))
sums, _, _ = np.histogram2d(x, y, bins=bins, weights=value)
counts, _, _ = np.histogram2d(x, y, bins=bins)
with np.errstate(divide='ignore', invalid='ignore'): # divide 0 by 0 results in NaN
means = sums / counts
sns.set_style('white')
ax = sns.heatmap(means, annot=True, fmt='.1f', cmap='turbo', vmin=0, vmax=10, square=True,
cbar=True, cbar_kws={'ticks': np.arange(11)})
ax.set_xticks(range(len(bins[0])))
ax.set_xticklabels(bins[0])
ax.set_yticks(range(len(bins[1])))
ax.set_yticklabels(bins[1])
ax.tick_params(labelrotation=0)
ax.grid(axis='both', color='0.3', clip_on=False)
ax.set_axisbelow(False)
plt.tight_layout()
plt.show()
The image shows the plots for:
bins = (np.arange(6), np.arange(6)),
bins = ([0, 2, 4, 6], [0, 2, 4, 6]) and
bins = (np.arange(0, 5.0001, 0.5), np.arange(0, 5.0001, 0.5))

Annotate values for stacked horizontal bar plot

I'm trying to annotate the values for a stacked horizontal bar graph created using pandas. Current code is below
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
d = {'group 1': [1, 2, 5, 7, 4, 5, 10],
'group 2': [5, 6, 1, 8, 2, 6, 2],
'group 3': [12, 2, 2, 4, 4, 8, 4]}
df = pd.DataFrame(d)
ax = df.plot.barh(stacked=True, figsize=(10,12))
for p in ax.patches:
ax.annotate(str(p.get_x()), xy=(p.get_x(), p.get_y()+0.2))
plt.legend(bbox_to_anchor=(0, -0.15), loc=3, prop={'size': 14}, frameon=False)
The problem is the annotation method I used gives the x starting points and not the values of each segment. I'd like to be able to annotate values of each segment in the center of each segment for each of the bars.
edit: for clarity, what I would like to achieve is something like this where the values are centered horizontally (and vertically) for each segment:
You can use the patches bbox to get the information you want.
ax = df.plot.barh(stacked=True, figsize=(10, 12))
for p in ax.patches:
left, bottom, width, height = p.get_bbox().bounds
ax.annotate(str(width), xy=(left+width/2, bottom+height/2),
ha='center', va='center')
Another possible solution is to get your df.values to a flatten array via values = df.values.flatten("F")
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
d = {'group 1': [1, 2, 5, 7, 4, 5, 10],
'group 2': [5, 6, 1, 8, 2, 6, 2],
'group 3': [12, 2, 2, 4, 4, 8, 4]}
df = pd.DataFrame(d)
ax = df.plot.barh(stacked=True, figsize=(10,12))
values = df.values.flatten("F")
for i, p in enumerate(ax.patches):
ax.annotate(str(values[i]), xy=(p.get_x()+ values[i]/2, p.get_y()+0.2))
plt.legend(bbox_to_anchor=(0, -0.15), loc=3, prop={'size': 14}, frameon=False);
From matplotlib 3.4.0 use matplotlib.pyplot.bar_label
The labels parameter can be used to customize annotations, but it's not required.
See this answer for additional details and examples.
Each group of containers must be iterated through to add labels.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1
Horizontal Stacked
d = {'group 1': [1, 2, 5, 7, 4, 5, 10],
'group 2': [5, 6, 1, 8, 2, 6, 2],
'group 3': [12, 2, 2, 4, 4, 8, 4]}
df = pd.DataFrame(d)
# add tot to sort the bars
df['tot'] = df.sum(axis=1)
# sort
df = df.sort_values('tot')
# plot all columns except tot
ax = df.iloc[:, :-1].plot.barh(stacked=True, figsize=(10, 12))
# iterate through each group of bars
for c in ax.containers:
# format the number of decimal places (if needed) and replace 0 with an empty string
labels = [f'{w:.0f}' if (w := v.get_width()) > 0 else '' for v in c ]
ax.bar_label(c, labels=labels, label_type='center')
Horizontal Grouped
Not stacked is a better presentation of the data, because it is easier to compare bar lengths visually.
# plot all columns except tot
ax = df.iloc[:, :-1].plot.barh(stacked=False, figsize=(8, 9))
# iterate through each group of bars
for c in ax.containers:
# format the number of decimal places (if needed) and replace 0 with an empty string
labels = [f'{w:.0f}' if (w := v.get_width()) > 0 else '' for v in c ]
ax.bar_label(c, labels=labels, label_type='center')
df view
group 1 group 2 group 3 tot
2 5 1 2 8
1 2 6 2 10
4 4 2 4 10
6 10 2 4 16
0 1 5 12 18
3 7 8 4 19
5 5 6 8 19

Pyplot contourf don't fill in "0" level

I'm plotting precipitation data from weather model output. I'm contouring the data I have, using contourf. However, I don't want it to fill in the "0" level with color (only the values >0). Is there a good way to do this? I've tried messing around with the levels.
Here's the code I'm using to plot:
m = Basemap(projection='stere', lon_0=centlon, lat_0=centlat,
lat_ts=centlat, width=width, height=height)
m.drawcoastlines()
m.drawstates()
m.drawcountries()
parallels = np.arange(0., 90, 10.)
m.drawparallels(parallels, labels=[1, 0, 0, 0], fontsize=10)
meridians = np.arange(180., 360, 10.)
m.drawmeridians(meridians, labels=[0, 0, 0, 1], fontsize=10)
lons, lats = m.makegrid(nx, ny)
x, y = m(lons, lats)
cs = m.contourf(x, y, snowfall)
cbar = plt.colorbar(cs)
cbar.ax.set_ylabel("Accumulated Snow (km/m^2)")
plt.show()
And here's the image I'm getting.
An example snowfall dataset would look something like:
0 0 0 0 0 0
0 0 1 1 1 0
0 1 2 2 1 0
0 2 3 2 1 0
0 1 0 1 2 0
0 0 0 0 0 0
This can also be achieved using 'locator' with MaxNLocator('prune = 'lower') from the ticker subclass. See docs.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
a = np.array([
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 1, 2, 2, 1, 0],
[0, 2, 3, 2, 1, 0],
[0, 1, 0, 1, 2, 0],
[0, 0, 0, 0, 0, 0]
])
fig, ax = plt.subplots(1)
p = ax.contourf(a, locator = ticker.MaxNLocator(prune = 'lower'))
fig.colorbar(p)
plt.show()
Image of output
The 'nbins' parameter can be used to control the number of intervals (levels)
p = ax.contourf(a, locator = ticker.MaxNLocator(prune = 'lower'), nbins = 5)
If you don't include 0 in your levels, you won't plot a contour at the 0 level.
For example:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 1, 2, 2, 1, 0],
[0, 2, 3, 2, 1, 0],
[0, 1, 0, 1, 2, 0],
[0, 0, 0, 0, 0, 0]
])
fig, ax = plt.subplots(1)
p = ax.contourf(a, levels=np.linspace(0.5, 3.0, 11))
fig.colorbar(p)
plt.show()
yields:
An alternative is to mask any datapoints which are 0:
p = ax.contourf(np.ma.masked_array(a, mask=(a==0)),
levels=np.linspace(0.0, 3.0, 13))
fig.colorbar(p)
Which looks like:
I suppose its up to you which of those matches your desired plot the most.
I was able to figure things out myself, there are two ways I found of solving this problem.
Mask out all data <0.01 from the data set using
np.ma.masked_less(snowfall, 0.01)
or
Set the levels of the plot to be from 0.01 -> whatever maximum value
levels = np.linspace(0.1, 10, 100)
then
cs = m.contourf(x, y, snowfall, levels)
I found that option 1 worked best for me.

Plot labelled and unlabeled data matplotlib

I have three list which are X, Y, Z
X = [[0.67910803031180977, 0.1443997264255876], [0.57, 0.87], [0.545, 0.854], [0.645, 0.1254], [0.645, 0.1354], [0.62, 0.83], [0.6945, 0.144], [0.9945, 0.45244], [0.235, 0.7754], [0.7, 0.85]]
Y = [0, 1, -1, -1, -1, 1, -1, -1, -1, 1]
Z = [0 1 1 0 0 1 0 1 1 1]
Where,
X is the dataset,
Y is labelset where 0 means "Normal", 1 means "LL" and -1 means "Unlabelled"
Z is outputset in which labels from Y is propagated to unlabelled labels.
Now, i am trying to plot a figure where one subplot contains the dataset as cluster with respect to each label from Y it belongs to and another subplot showing dataset with respect to Z.
I tried code from this example but i am not able to do it.
Please help.
I'm guessing at what you want, but here's an example of plotting the X values with colors determined by the Y and Z lists respectively. It's using a lot of default behavior -- color values between 0 and 1 get plotted into a default colorbar, iirc -- but you could make a more complicated function and pass a list of (rgb) or (rgba) values instead.
import matplotlib.pyplot as plt
from numpy import array
X = array([[0.67910803031180977, 0.1443997264255876], [0.57, 0.87],
[0.545, 0.854], [0.645, 0.1254], [0.645, 0.1354], [0.62, 0.83],
[0.6945, 0.144], [0.9945, 0.45244], [0.235, 0.7754], [0.7, 0.85]])
Y = [0, 1, -1, -1, -1, 1, -1, -1, -1, 1]
Z = [0, 1, 1, 0, 0, 1, 0, 1, 1, 1]
# for readability mostly
Xx = X.T[0]
Xy = X.T[1]
fig = plt.figure()
ax1 = fig.add_subplot(121)
ax1.scatter(Xx, Xy, c=map(lambda c: 0.3 * c + 0.5, Y), s=50, alpha=0.75)
ax1.set_xlabel('Y labels')
ax2 = fig.add_subplot(122)
ax2.scatter(Xx, Xy, c=map(lambda c: 0.3 * c + 0.5, Z), s=50, alpha=0.75)
ax2.set_xlabel('Z labels')
plt.show()

Categories

Resources