Seaborn scatterplot - label data points [duplicate]

Seaborn scatterplot - label data points [duplicate] - python

This question already has answers here:
Adding labels in x y scatter plot with seaborn
(6 answers)
Closed 4 years ago.
I have a Seaborn scatterplot using data from a dataframe. I would like to add data labels to the plot, using other values in the df associated with that observation (row). Please see below - is there a way to add at least one of the column values (A or B) to the plot? Even better, is there a way to add two labels (in this case, both the values in column A and B?)
I have tried to use a for loop using functions like the below per my searches, but have not had success with this scatterplot.
Thank you for your help.
df_so = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
scatter_so=sns.lmplot(x='C', y='D', data=df_so,
fit_reg=False,y_jitter=0, scatter_kws={'alpha':0.2})
fig, ax = plt.subplots() #stuff like this does not work

Use:
df_so = pd.DataFrame(np.random.randint(0,100,size=(20, 4)), columns=list('ABCD'))
scatter_so=sns.lmplot(x='C', y='D', data=df_so,
fit_reg=False,y_jitter=0, scatter_kws={'alpha':0.2})
def label_point(x, y, val, ax):
a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
for i, point in a.iterrows():
ax.text(point['x']+.02, point['y'], str(point['val']))
label_point(df_so['C'], df_so['D'], '('+df_so['A'].astype(str)+', '+df_so['B'].astype(str)+')', plt.gca())
Output:

Related

Making a barplot to compare multiple categories at once [duplicate]

This question already has answers here:
Plotting two columns of dataFrame in seaborn
(1 answer)
Seaborn multiple barplots
(2 answers)
Closed last month.
sample data:
list1 = ['C','C1,C2','A9','GV5','A6','A3']
arr1 = np.random.default_rng().uniform(low=5,high=10,size=[6,3])
df = pd.DataFrame(arr1,index = list1, columns=["A","B","C"])
I can make a barplot to compare a single category of data, but I'm not sure how to display multiple categories at once in a side by side; showing the values of columns A, B, and C side by side for each index value.
I tried adding more into the y designation, but it was fruitless.
ax = sns.barplot(data = df, x = df.index, y='A') ##chart of only one category
ax = sns.barplot(data = df, x = df.index, y=('A','B','C')) ##doesn't work

How to create a scatter plot where x and y values are the column and row names

I have a question of plotting a scatter plot from a dataframe.
The data I would like to plot seems like this:
I would like to have a scatter plot where the x axis are the years and the y axis are named as cities. The sizes of the scatters on the scatterplot should be based on the data value.
the wished visualization of the data:
I searched examples of documents from different libraries and also stack overflow, but unfortunately I didn't find a suitable answer to this.
I would appreciate if anyone can help, either excel or python solution would be fine.
Thanks

Something like this should work:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# assuming your example data is in a dataframe called df
# rename columns so that we can apply 'wide_to_long'
df.rename(columns={1990: 'A1990', 1991: 'A1991', 2019: 'A2019', 2020: 'A2020'}, inplace=True)
# reshape data using 'wide_to_long' to get it into the right format for scatter plot
df = pd.wide_to_long(df, "A", i="City", j="year")
df.reset_index(inplace=True)
df["A"] = df["A"].astype(int)
# OPTIONAL: scale the "bubble size" variable in column A to make graph easier to interpret
df["A"] = (df["A"] + 0.5) * 100
# map years onto integers so we can only plot the years that we want
year_dict = {1990: 1, 1991: 2, 2019: 3, 2020: 4}
df['year_num'] = df['year'].map(year_dict)
# plot the data
fig, ax = plt.subplots()
plt.scatter(df['year_num'], df['City'], s=df['A'], alpha=0.5)
# label the years corresponding to 'year_num' values on the x-axis
plt.xticks(np.arange(1, 5, 1.0))
labels = [1990, 1991, 2019, 2020]
ax.set_xticklabels(labels)
plt.show()
You can play around with the colors/formatting options in matplotlib to get the look you want, but the above should accomplish the basic idea.

plot each chart on a different scale with Seaborn Distplot [duplicate]

This question already has answers here:
Seaborn displot facetgrid do not share y axis
(1 answer)
Prevent Sharing of Y Axes in a relplot
(1 answer)
Closed 6 months ago.
I have some data that has widely different scales. I want to create a displot showing all the features on graphic. I though facet_kws={'sharex': False} was the relevant parameter, but it doesn't appear to be working, what I am doing wrong?
import numpy as np
import pandas as pd
import seaborn as sns
import random
# Sample Dataframe
df = pd.DataFrame(np.random.randint(0,200,size=(200, 3)), columns=list('ABC'))
D= np.random.randint(0,10,size=(200, 1))
df['D']= D
# reshape dataframe
df2 = df.stack().reset_index(level=1).reset_index(drop=True).\
rename(columns={'level_1': 'Name', 0: 'Value'})
# plot
g = sns.displot(data=df2,
x='Value', col='Name',
col_wrap=3, kde=True,
facet_kws={'sharex': False})

The author of Seaborn(mwaskom) already answered at the comment on the question, but I'll answer in more detail.
Use the common_bins option as well, like the following. The documentation on that option is on the histplot() section.
g = sns.displot(data=df2,
x='Value', col='Name',
col_wrap=3, kde=True,
common_bins=False,
facet_kws={'sharex': False, 'sharey': False})
Also I suggest to correct your example code like the following, because it raises a ValueError on duplicate labels, for other readers.
df2 = df.stack().reset_index(level=1).reset_index(drop=True).\
rename(columns={'level_1': 'Name', 0: 'Value'})

How to plot a line graph of multiple rows in a Pandas DataFrame [duplicate]

This question already has answers here:
Remove prefix (or suffix) substring from column headers in pandas
(7 answers)
How to convert column names of a DataFrame from string to integers
(1 answer)
Rotate pandas DataFrame 90 degrees
(1 answer)
matplotlib large set of colors for plots
(1 answer)
How to plot multiple pandas columns
(3 answers)
Closed 7 months ago.
I have a Pandas DataFrame of measurements:
,Fp076,Fp084,Fp092,Fp099,Fp107,Fp115,Fp122,Fp130,Fp143,Fp151,Fp158,Fp166,Fp174,Fp181,Fp189,Fp197,Fp204,Fp212,Fp220,Fp227
0,0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147
1,-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467
2,0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189
3,2.6838,2.394591,2.493416,0.874906,2.113343,1.812258,1.667047,1.779347,1.515663,1.620196,1.539494,1.63528,1.555373,1.471318,1.610067,1.507087,1.467174,1.458346,1.681998,1.14625
4,0.368415,0.435004,0.155035,0.161064,0.180133,0.202117,0.142981,0.138321,0.122557,0.099213,0.098213,0.062174,0.123664,0.2051,0.167415,0.185133,0.127677,0.037875,0.156252,0.015579
5,0.213577,0.187244,0.274151,0.173572,0.296122,0.308341,0.164578,0.159559,0.318383,0.181329,0.260223,0.257395,0.241779,0.292731,0.244476,0.187523,0.247331,0.293338,0.323894,0.179478
6,0.096093,0.140454,0.067185,6.441058,0.016797,0.141757,0.181792,0.13692,0.204091,0.180182,0.149626,0.220342,0.179286,0.276316,0.104531,0.20343,0.045161,-0.004546,0.045833,0.193849
7,0.286467,0.086673,-0.106538,-0.261802,0.16964,0.182858,0.062774,0.20471,0.040105,0.086975,0.211068,0.182423,0.098721,0.077085,0.102986,0.129935,0.130571,0.176024,0.154079,0.102391
8,0.480631,0.714554,0.858241,0.746666,0.555411,0.452689,0.337912,0.333942,0.269359,0.221312,0.09818,0.226218,0.287361,0.209858,0.222951,0.207584,0.258397,0.026713,0.162048,0.149924
9,1.055405,0.638777,0.468793,0.41544,0.559187,0.471218,0.493805,0.544716,0.412903,0.412182,0.51041,0.383991,0.351397,0.383201,0.368308,0.237954,0.330242,0.262648,0.425204,0.434928
10,1.116658,0.737544,0.854376,-0.004434,0.419419,0.35921,0.377095,0.273815,0.258913,0.290614,0.271843,0.321572,0.234764,0.298931,0.206039,0.192746,0.200727,0.132419,0.229914,0.159857
11,-0.004305,0.052289,0.275035,-0.849414,0.104146,0.185819,0.128376,0.136433,0.091787,0.149753,0.107246,0.081407,0.118816,0.117434,0.169153,0.108273,0.205751,0.145238,0.153086,0.114278
12,0.836223,0.323901,0.269564,0.364082,0.343695,0.386785,0.24881,0.307267,0.222634,0.214189,0.12167,0.251107,0.134083,0.284545,0.175479,0.221877,0.184749,0.225089,0.205388,0.214972
where each row is the flux measurements at the frequencies in the header (76, 84, 92, 99... MHz). I'm trying to plot a line graph of the flux measurements for a row. Since the frequencies in the header are not linear, I've tried this:
f = np.array([76,84,92,99,107,115,122,130,143,151,158,166,174,181,189,197,204,212,220,227])
y1 = [0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147]
y2 = [-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467]
y3 = [0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189]
fig, ax = plt.subplots()
ax.scatter(f, y1, label = r'$\alpha = -0.37$')
ax.plot(f, y1)
ax.scatter(f, y2, label = r'$\alpha = NaN$')
ax.plot(f, y2)
ax.scatter(f, y3, label = r'$\alpha = -0.75$')
ax.plot(f, y3)
ax.set_xlabel('Frequency (MHz)')
ax.set_ylabel('Flux (Jy/beam)')
ax.grid(which = 'both', axis = 'both')
which is just copy-pasting the first three rows of data, to produce:
That's basically what I want, but what's a better way to do it?

There are many ways to solve this problem, but the simplest way (that I can think of) is to pivot your dataframe and then use seaborn to plot all the columns
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# convert you sample data
data = [[e for e in row.split(',') if e] for row in data_.split("\n") if row]
columns = data[0]
# create the `x` axis
columns = [int(col.replace('Fp','')) for col in columns]
columns = ['index'] + columns
data = data[1:]
df = pd.DataFrame(data=data, columns=columns)
df = df.drop(columns=['index'])
df = df.astype('float')
This is the example of the dataframe without transforming the headers with int(col.replace('Fp',''))
you can transform your columns as I did above using
df.columns = [int(col.replace('Fp','')) for col in df.columns]
Once this is done you can do the following pivot
# the pivot of your data
df_ = df.T
# plot your data
plt.figure(figsize=(15,8))
sns.lineplot(data=df_)
plt.title('Example of timeseries plot')
plt.xlabel('Frequency(MHz)')
plt.ylabel('Flux (Jy/beam)')
the output is
You can play around with the various plotting to your desire, but this would be the simplest way (tip - try to leverage as much of the seaborn or pandas plotting methods for this aggregated plots)

Plot dates in a time series on a x axis [duplicate]

This question already has answers here:
Plotting a time series?
(2 answers)
How to draw vertical lines on a given plot
(6 answers)
Closed 1 year ago.
I am trying to plot dates (N=50) on a 5 year time series chart and I'm having trouble trying to figure out how to run through an iteration on a for loop. Below is an example of what I'm trying to plot the dates on.
Visual of what I'm plotting dates on
Currently, I am trying:
for date in dataframe_with_dates.DATE:
plt.axvline(x = date, color = 'g')
plt.show()
and I'm receiving an error of:
Failed to convert value(s) to axis units: 'DATE'
I'm not sure if this has something to do with the dtype being datetime, or if I need to try another approach, but any advice/guidance is greatly appreciated!
Thank you!
This is what I am trying to accomplish: Example image
EDIT: Code to produce the plot
def plot_df(df_1, x, y, title = '', xlabel = 'DATE', ylabel = 'VALUE', dpi = 100):
plt.figure(figsize = (25,5), dpi = dpi)
plt.plot(x, y, color = 'tab:red')
plt.gca().set(title = title, xlabel = xlabel, ylabel = ylabel)
plt.show()
plot_df(df_VIX, x = df_VIX.DATE, y = df_VIX.AVG_VALUE, title = 'Daily VIX since 1990')
`
data_test = [['2016-01-04', 22.48, 23.36, 20.67, 20.70, 21.8025],
['2016-01-05', 20.75, 21.06, 19.25, 19.34, 20.1],
['2016-01-06', 21.67, 21.86, 19.8, 20.59, 20.98],
['2016-01-07', 23.22, 25.86, 22.4, 24.99, 24.1175],
['2016-01-08', 22.96, 25.86, 22.40, 24.99, 24.89]]
df_test = pd.DataFrame(data_test, columns = ['DATE','OPEN','HIGH','LOW','CLOSE', 'AVG_VALUE'])
df_test['DATE'] = pd.to_datetime(df_test['DATE'])
This will reproduce a sample of the exact data I'm using.

I think this is what you want:
df_test.plot(x='DATE', y='OPEN')
Or replace y='OPEN' with another column to plot. The x-axis will be formatted automatically by pandas to be similar to what you showed in the figure.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Seaborn scatterplot - label data points [duplicate] - python

Related

Making a barplot to compare multiple categories at once [duplicate]

How to create a scatter plot where x and y values are the column and row names

plot each chart on a different scale with Seaborn Distplot [duplicate]

How to plot a line graph of multiple rows in a Pandas DataFrame [duplicate]

Plot dates in a time series on a x axis [duplicate]

Categories

Resources