Thank you in advance for your help!
I am trying to create a boxplot in matplotlib and I get an error when trying to add the labels.
This is the code that pulls an error:
df_selected_station_D.boxplot(column='20 cm', by='Month',figsize=(15,5),grid=True, xlabel = 'x data');
This is the error it causes:
TypeError: boxplot() got an unexpected keyword argument 'xlabel'
What does this error mean and why am I getting it? (Complete code and images below)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
raw_data = pd.read_csv('all-deep-soil-temperatures.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()
df_selected_station = df_all_stations[df_all_stations['Station'] == 'Minot']
df_selected_station.fillna(method = 'ffill', inplace=True);
df_selected_station_D=df_selected_station.resample(rule='D').mean()
df_selected_station_D['Day'] = df_selected_station_D.index.dayofyear
mean=df_selected_station_D.groupby(by='Day').mean()
mean['Day']=mean.index
df_selected_station_D['Month'] = df_selected_station_D.index.month
df_selected_station_D.head()
df_selected_station_D.boxplot(column='20 cm', by='Month',figsize=(15,5),grid=True);
The data is not the same, but adding labels and modifying titles can be accomplished with the following code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
fig, ax1 = plt.subplots()
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4), columns=['Col1', 'Col2', 'Col3', 'Col4'])
ax1 = df.boxplot(column=['Col1', 'Col2', 'Col3'], figsize=(15,5), grid=True)
ax1.set_title('test title')
ax1.set_xlabel('x data')
ax1.set_ylabel('y data')
plt.show()
Related
I'm trying to create a bar plot from a DataFrame with Datetime Index.
This is an example working code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set()
index = pd.date_range('2012-01-01', periods=48, freq='M')
data = np.random.randint(100, size = (len(index),1))
df = pd.DataFrame(index=index, data=data, columns=['numbers'])
fig, ax = plt.subplots()
ax.bar(df.index, df['numbers'])
The result is:
As you can see the white bars cannot be distinguished well with respect of the background (why?).
I tried using instead:
df['numbers'].plot(kind='bar')
import matplotlib.ticker as ticker
ticklabels = df.index.strftime('%Y-%m')
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))
with this result:
But in this way I lose the automatic xticks labels (and grid) 6-months spacing.
Any idea?
You can just change the style:
import matplotlib.pyplot as plt
index = pd.date_range('2012-01-01', periods=48, freq='M')
data = np.random.randint(100, size = (len(index),1))
df = pd.DataFrame(index=index, data=data, columns=['numbers'])
plt.figure(figsize=(12, 5))
plt.style.use('default')
plt.bar(df.index,df['numbers'],color="red")
You do not actually use seaborn. Replace ax.bar(df.index, df['numbers'])
with
sns.barplot(df.index, df['numbers'], ax=ax)
I would like create an plot with to display the last value on line. But i can not create the plot with the last value on chart. Do you have an idea for to resolve my problem, thanks you !
Input :
DataFrame
Plot
Output :
Cross = Last Value In columns
Output Final
# import eikon as ek
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import os
import seaborn as sns; sns.set()
import pylab
from scipy import *
from pylab import *
fichier = "P:/GESTION_RPSE/GES - Gestion Epargne Salariale/Dvp Python/Florian/Absolute
Performance/PLOT.csv"
df = pd.read_csv(fichier)
df = df.drop(columns=['Unnamed: 0'])
# sns.set()
plt.figure(figsize=(16, 10))
df = df.melt('Date', var_name='Company', value_name='Value')
#palette = sns.color_palette("husl",12)
ax = sns.lineplot(x="Date", y="Value", hue='Company', data=df).set_title("LaLaLa")
plt.show()
Do you just want to put an 'X' at the end of your lines?
If so, you could pass markerevery=[-1] to the call to lineplot(). However there are a few caveats:
You have to use style= instead of hue= otherwise, there are no markers drawn
Filled markers work better than unfilled markers (like "x"). You can just use markers=True to use the default markers, or pass a list markers=['s','d','o',etc...]
code:
fmri = sns.load_dataset("fmri")
fig, ax = plt.subplots()
ax = sns.lineplot(x="timepoint", y="signal",
style="event", data=fmri, ci=None, markers=True, markevery=[-1], markersize=10)
I would like to depict the value of my variables found in a dataset in the form of a boxplot. The dataset is the following:
https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)
So far my code is the following:
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
df=pd.read_csv(file,names=['id', 'clump_thickness','unif_cell_size',
'unif_cell_shape', 'marg_adhesion', 'single_epith_cell_size',
'bare_nuclei', 'bland_chromatin', 'normal_nucleoli','mitoses','Class'])
#boxplot
plt.figure(figsize=(15,10))
names=list(df.columns)
names=names[:-1]
min_max_scaler=preprocessing.MinMaxScaler()
X = df.drop(["Class"],axis=1)
columnsN=list(X.columns)
x_scaled=min_max_scaler.fit_transform(X) #normalization
X[columnsN]=x_scaled
y = df['Class']
sns.set_context('notebook', font_scale=1.5)
sns.boxplot(x=X['unif_cell_size'],y=y,data=df.iloc[:, :-1],orient="h")
My boxplot returns the following figure:
but I would like to display my information like the following graph:
I know that is from a different dataset, but I can see that they have displayed the diagnosis, at the same time, for each feature with their values. I have tried to do it in different ways, but I am not able to do that graph.
I have tried the following:
data_st = pd.concat([y,X],axis=1)
data_st = pd.melt(data_st,id_vars=columnsN,
var_name="X",
value_name='value')
sns.boxplot(x='value', y="X", data=data_st,hue=y,palette='Set1')
plt.legend(loc='best')
but still no results. Any help?
Thanks
Reshape the data with pandas.DataFrame.melt:
Most of the benign (class 2) boxplots are at 0 (scaled) or 1 (unscaled), as they should be
print(df_scaled_melted.groupby(['Class', 'Attributes', 'Values'])['Values'].count().unstack()) after melt, to understand the counts
MinMaxScaler has been used, but is unnecessary in this case, because all of the data values are very close together. If you plot the data without scaling, the plot will look the same, except the y-axis range will be 1 - 10 instead.
This should really only be used in cases when the data is widely diverging, where an attribute will have too much influence with some ML algorithm.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import numpy as np
from sklearn.preprocessing import MinMaxScaler
# path to file
p = Path(r'c:\some_path_to_file\breast-cancer-wisconsin.data')
# create dataframe
df = pd.read_csv(p, names=['id', 'clump_thickness','unif_cell_size',
'unif_cell_shape', 'marg_adhesion', 'single_epith_cell_size',
'bare_nuclei', 'bland_chromatin', 'normal_nucleoli','mitoses','Class'])
# replace ? with np.NaN
df.replace('?', np.NaN, inplace=True)
# scale the data
min_max_scaler = MinMaxScaler()
df_scaled = pd.DataFrame(min_max_scaler.fit_transform(df.iloc[:, 1:-1]))
df_scaled.columns = df.columns[1:-1]
df_scaled['Class'] = df['Class']
# melt the dataframe
df_scaled_melted = df_scaled.iloc[:, 1:].melt(id_vars='Class', var_name='Attributes', value_name='Values')
# plot the data
plt.figure(figsize=(12, 8))
g = sns.boxplot(x='Attributes', y='Values', hue='Class', data=df_scaled_melted)
for item in g.get_xticklabels():
item.set_rotation(90)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
Without scaling:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import numpy as np
p = Path.cwd() / r'data\breast_cancer\breast-cancer-wisconsin.data'
df = pd.read_csv(p, names=['id', 'clump_thickness','unif_cell_size',
'unif_cell_shape', 'marg_adhesion', 'single_epith_cell_size',
'bare_nuclei', 'bland_chromatin', 'normal_nucleoli','mitoses','Class'])
df.replace('?', np.NaN, inplace=True)
df.dropna(inplace=True)
df = df.astype('int')
df_melted = df.iloc[:, 1:].melt(id_vars='Class', var_name='Attributes', value_name='Values')
plt.figure(figsize=(12, 8))
g = sns.boxplot(x='Attributes', y='Values', hue='Class', data=df_melted)
for item in g.get_xticklabels():
item.set_rotation(90)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
I was learning the pandas piping method with seaborn plots:
Most of the things are easily easily chained in one-liner, but I was having
difficulty piping the xticklabel rotations.
How to do so?
Code:
import numpy as np
import pandas as pd
# plotting
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
names = ['mpg','cylinders', 'displacement','horsepower','weight',
'acceleration','model_year', 'origin', 'car_name']
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data"
df = pd.read_csv(url, sep='\s+', names=names)
Plot
g = ( df.pipe((sns.factorplot, 'data'), x='model_year', y='mpg')
)
for ax in g.axes.flat:
plt.setp(ax.get_xticklabels(), rotation=45)
Required skeleton:
( df.pipe((sns.factorplot, 'data'), x='model_year', y='mpg')
.set(xlim=(0,90), ylim=(0,80))
.set (xticklabel_rotation = 45)
)
Is this possible?
Required Image:
But I am getting:
You were almost there. Instead of .set(xticklabel_rotation = 45) you wanted .set_xticklabels(rotation=45)
import pandas as pd
import seaborn as sns
names = ['mpg','cylinders', 'displacement','horsepower','weight',
'acceleration','model_year', 'origin', 'car_name']
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data"
df = pd.read_csv(url, sep='\s+', names=names)
(df.pipe((sns.factorplot, 'data'), x='model_year', y='mpg')
.set_xticklabels(rotation=45)
)
This gave me:
My Code:
import matplotlib.pyplot as plt
import pandas as pd
import os, glob
path = r'C:/Users/New folder'
all_files = glob.glob(os.path.join(path, "*.txt"))
df = pd.DataFrame()
for file_ in all_files:
file_df = pd.read_csv(file_,sep=',', parse_dates=[0], infer_datetime_format=True,header=None, usecols=[0,1,2,3,4,5,6], names=['Date','Time','open', 'high', 'low', 'close','volume','tradingsymbol'])
df = df[['Date','Time','close','volume','tradingsymbol']]
df["Time"] = pd.to_datetime(df['Time'])
df.set_index('Time', inplace=True)
print(df)
fig, axes = plt.subplots(nrows=2, ncols=1)
################### Volume ###########################
df.groupby('tradingsymbol')['volume'].plot(legend=True, rot=0, grid=True, ax=axes[0])
################### PRICE ###########################
df.groupby('tradingsymbol')['close'].plot(legend=True, rot=0, grid=True, ax=axes[1])
plt.show()
My Current Output is like:
I need add text annotation to matplotlib plot. My desired output similar to below image:
It's hard to answer this question without access to your dataset, or a simpler example. However, I'll try my best.
Let's begin by setting up a dataframe which may or may resemble your data:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 3)),
columns=['a', 'b', 'c'])
With the dataset we'll now proceed to plot it with
fig, ax = plt.subplots(1, 1)
df.plot(legend=True, ax=ax)
Finally, we'll loop over the columns and annotate each datapoint as
for col in df.columns:
for id, val in enumerate(df[col]):
ax.text(id, val, str(val))
This gave me the plot following plot, which resembles your desired figure.