Mapping Column Data to Graph Properties - python

I have a dataframe called df that looks like this:
Qname X Y Magnitude
Bob 5 19 10
Tom 6 20 20
Jim 3 30 30
I would like to make a visual text plot of the data. I want to plot the Qnames on a figure with their coordinates set = X,Y and a s=Size.
I have tried:
fig = plt.figure()
ax = fig.add_axes((0,0,1,1))
X = df.X
Y = df.Y
S = df.magnitude
Name = df.Qname
ax.text(X, Y, Name, size=S, color='red', rotation=0, alpha=1.0, ha='center', va='center')
fig.show()
However nothing is showing up on my plot. Any help is greatly appreciated.

This should get you started. Matplotlib does not handle the text placement for you so you will probably need to play around with this.
import pandas as pd
import matplotlib.pyplot as plt
# replace this with your existing code to read the dataframe
df = pd.read_clipboard()
plt.scatter(df.X, df.Y, s=df.Magnitude)
# annotate the plot
# unfortunately you have to iterate over your points
# see http://stackoverflow.com/q/5147112/553404
for idx, row in df.iterrows():
# see http://stackoverflow.com/q/5147112/553404
# for better annotation options
plt.annotate(row['Qname'], xy=(row['X'], row['Y']))
plt.show()

Related

How to draw a figure by seaborn pairplot in several rows?

I have a dataset with 76 features and 1 dependent variable (y). I use seaborn to draw pairplot between features and y in Jupyter notebook. Since the No. of features is high, size of plot for every feature is very small, as can be seen below:
I am looking for a way to draw pairplot in several rows. Also, I don't want to copy and paste pairplot code in several cells in notebook. I am looking for a way to make this figure automatically.
The code I am using (I cannot share dataset, so I use a sample dataset):
from sklearn.datasets import load_boston
import math
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
X, y = load_boston(return_X_y=True)
X = pd.DataFrame(X)
y = pd.DataFrame(y)
data = pd.concat([X, y], axis=1)
y_name = 'y'
features_names = [f'feature_{i}' for i in range(1, X.shape[1]+1)]
column_names = features_names + [y_name]
data.columns = column_names
plot_size=7
num_plots_x=5 # No. of plots in every row
num_plots_y = math.ceil(len(features_names)/num_plots_x) # No. of plots in y direction
fig = plt.figure(figsize=(plot_size*num_plots_y, plot_size*num_plots_x), facecolor='white')
axes = [fig.add_subplot(num_plots_y,1,i+1) for i in range(num_plots_y)]
for i, ax in enumerate(axes):
start_index = i * num_plots_x
end_index = (i+1) * num_plots_x
if end_index > len(features_names): end_index = len(features_names)
sns.pairplot(x_vars=features_names[start_index:end_index], y_vars=y_name, data = data)
plt.savefig('figure.png')
The above code has two problems. It shows empty box at the top of the figure and then it shows the pairplots. Following is part of the figure that I get.
Second problem is that it only saves the last row as png file, not the whole figure.
If you have any idea to solve this, please let me know. Thank you.
When I run it directly (python script.py) then it opens every row in separated window - so it treats it as separated objects and it saves in file only last object.
Other problem is that sns doesn't need fig and axes - it can't use subplots to put all on one image - and when I remove fig axes then it stops showing first window with empty box.
I found that FacetGrid has col_wrap to put in many rows. And I found that someone suggested to add this col_wrap in pairplot - Add parameter col_wrap to pairplot #2121 and there is also example how to FacetGrid with scatterplot instead of pairplot and then it can use col_wrap.
Here is code which use FacetGrid with col_wrap
from sklearn.datasets import load_boston
import math
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
X, y = load_boston(return_X_y=True)
X = pd.DataFrame(X)
y = pd.DataFrame(y)
data = pd.concat([X, y], axis=1)
y_name = 'y'
features_names = [f'feature_{i}' for i in range(1, X.shape[1]+1)]
column_names = features_names + [y_name]
data.columns = column_names
plot_size=7
num_plots_x=5 # No. of plots in every row
num_plots_y = math.ceil(len(features_names)/num_plots_x) # No. of plots in y direction
'''
for i in range(num_plots_y):
start = i * num_plots_x
end = start + num_plots_x
sns.pairplot(x_vars=features_names[start:end], y_vars=y_name, data=data)
'''
g = sns.FacetGrid(pd.DataFrame(features_names), col=0, col_wrap=4, sharex=False)
for ax, x_var in zip(g.axes, features_names):
sns.scatterplot(data=data, x=x_var, y=y_name, ax=ax)
g.tight_layout()
plt.savefig('figure.png')
plt.show()
Result ('figure.png'):

Extracting data from an existing plot in pandas

So I was trying to extract some data from existing plots, I'm using the below code and it works perfectly, however, it seems that the original data are not integers and therefore, I end up getting alot of float datas which I dont need. I tried to use round() function but then I will have repetitave values which is not the required output. I'm not sure whether it's possible, but I was wondering if there's away to extract the values from the plot immediately as integers. below is a small sample of what iam trying to achieve.
any help is much appreciated, thanks!
This is the code:
from IPython.display import Image
ax = Image(r'Desktop\comp.png')
ax = plt.gca()
line = ax.lines[0]
x = line.get_xydata()
dataframe=pd.DataFrame(x, columns=['a','b'])
This is the image:
This what I get as a result:
However, I'd like to get something similar to this result:
Assuming you have the plot as matplotlib.axes object, you can extract the data with ax.lines methods get_xdata() and get_ydata()
line = ax.lines[0]
data_x, data_y = line.get_xdata(), line.get_ydata()
Then, create integer values for the new axis with
import math
new_x = range(math.ceil(min(data_x)), math.floor(max(data_x))+1)
And interpolate values with interp1d to get the corresponding y-values:
f = interp1d(data_x, data_y, kind='linear', bounds_error=False, fill_value=np.nan)
new_y = f(new_x)
The output as pandas DataFrame would look like this:
In [3]: pd.DataFrame(dict(a=new_x, b=new_y))
Out[3]:
a b
0 1 1.022186
1 2 4.899643
2 3 9.032727
3 4 16.073667
4 5 25.066514
5 6 36.888971
6 7 49.033702
7 8 64.018056
and as a plot like this:
Full example code
Full example code would look something like this:
import math
from matplotlib import pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
# Create data for example
data_x = np.array(range(9)) + np.random.rand(9)
data_y = data_x**2
# Create the plot
fig, ax = plt.subplots(nrows=1, ncols=1)
ax.plot(data_x, data_y, marker='s', label='original')
# Extract data from plot (your starting point)
line = ax.lines[0]
data_x, data_y = line.get_xdata(), line.get_ydata()
# Get the x-axis data as integer values
new_x = range(math.ceil(min(data_x)), math.floor(max(data_x))+1)
# Get the y-axis data at these points (interpolate)
f = interp1d(data_x, data_y, kind='linear', bounds_error=False, fill_value=np.nan)
new_y = f(new_x)
plt.plot(new_x, new_y, ls='', marker='o', label='new')
plt.grid()
plt.legend()
plt.show()

Proper Matplotlib axes construction / reuse

I currently am building a set of scatter plot charts using pandas plot.scatter. In this construction off of two base axes.
My current construction looks akin to
ax1 = pandas.scatter.plot()
ax2 = pandas.scatter.plot(ax=ax1)
for dataframe in list:
output_ax = pandas.scatter.plot(ax2)
output_ax.get_figure().save("outputfile.png")
total_output_ax = total_list.scatter.plot(ax2)
total_output_ax.get_figure().save("total_output.png")
This seems inefficient. For 1...N permutations I want to reuse a base axes that has 50% of the data already plotted. What I am trying to do is:
Add base data to scatter plot
For item x in y: (save data to base scatter and save image)
Add all data to scatter plot and save image
here's one way to do it with plt.scatter.
I plot column 0 on x-axis, and all other columns on y axis, one at a time.
Notice that there is only 1 ax object, and I don't replot all points, I just add points using the same axes with a for loop.
Each time I get a corresponding png image.
import numpy as np
import pandas as pd
np.random.seed(2)
testdf = pd.DataFrame(np.random.rand(20,4))
testdf.head(5) looks like this
0 1 2 3
0 0.435995 0.025926 0.549662 0.435322
1 0.420368 0.330335 0.204649 0.619271
2 0.299655 0.266827 0.621134 0.529142
3 0.134580 0.513578 0.184440 0.785335
4 0.853975 0.494237 0.846561 0.079645
#I put the first axis out of a loop, that can be in the loop as well
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(testdf[0],testdf[1], color='red')
fig.legend()
fig.savefig('fig_1.png')
colors = ['pink', 'green', 'black', 'blue']
for i in range(2,4):
ax.scatter(testdf[0], testdf[i], color=colors[i])
fig.legend()
fig.savefig('full_' + str(i) + '.png')
Then you get these 3 images (fig_1, fig_2, fig_3)
Axes objects cannot be simply copied or transferred. However, it is possible to set artists to visible/invisible in a plot. Given your ambiguous question, it is not fully clear how your data are stored but it seems to be a list of dataframes. In any case, the concept can easily be adapted to different input data.
import matplotlib.pyplot as plt
#test data generation
import pandas as pd
import numpy as np
rng = np.random.default_rng(123456)
df_list = [pd.DataFrame(rng.integers(0, 100, (7, 2))) for _ in range(3)]
#plot all dataframes into an axis object to ensure
#that all plots have the same scaling
fig, ax = plt.subplots()
patch_collections = []
for i, df in enumerate(df_list):
pc = ax.scatter(x=df[0], y=df[1], label=str(i))
pc.set_visible(False)
patch_collections.append(pc)
#store individual plots
for i, pc in enumerate(patch_collections):
pc.set_visible(True)
ax.set_title(f"Dataframe {i}")
fig.savefig(f"outputfile{i}.png")
pc.set_visible(False)
#store summary plot
[pc.set_visible(True) for pc in patch_collections]
ax.set_title("All dataframes")
ax.legend()
fig.savefig(f"outputfile_0_{i}.png")
plt.show()

How can I plot slice of certain DataFrame for each row with different color?

I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)

Matplotlib Name points on plots

I have searched and found that using annotate in matplotlib for jupyter, we can name the x and y of a point.
I have retried doing as you suggested.
import matplotlib.pyplot as plt
import pandas as pd
def fit_data():
fig = plt.figure(1,figsize=(20,6))
plt.subplot(111)
data1 = pd.DataFrame({"ID" : list(range(11)),
"R" : list(range(11)),
"Theta" : list(range(11))})
plt.scatter(data1['R'], data1['Theta'], marker='o', color='b', s=15)
for i, row in data1.iterrows():
plt.annotate(row["ID"], xy=(row["R"],row["Theta"]))
plt.xlabel('R',size=20)
plt.ylabel('Theta',size=20)
plt.show()
plt.close()
fit_data()
It still doesn't take the ID from my data. It is still plotting an arbitrary plot.
this is the image after using the revised code
My data is as follows
1 19.177 24.642
2 9.398 12.774
3 9.077 12.373
4 15.287 19.448
5 4.129 5.41
6 2.25 3.416
7 11.674 15.16
8 10.962 14.469
9 1.924 3.628
10 2.087 3.891
11 9.706 13.186
I suppose the confusion comes from the fact that while scatter can plot all points at once, while an annotation is a singular object. You would hence need one annotation per row in the dataframe.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"ID" : list(range(6)), # Do not copy this part.
"R" : [5,4,1,2,3,4], # Use your own data
"Theta" : [20,15,40,60,51,71]}) # instead.
fig = plt.figure(1,figsize=(20,6))
plt.subplot(111)
plt.scatter(df['R'], df['Theta'], marker='o', color='b', s=15)
for i, row in df.iterrows():
plt.annotate(row["ID"], xy=(row["R"],row["Theta"]))
plt.xlabel('R',size=20)
plt.ylabel('Theta',size=20)
plt.show()

Categories

Resources