Plotting dataframe using matplot lib [duplicate] - python

This question already has answers here:
Line plot with data points in pandas
(2 answers)
Closed 1 year ago.
Hi I am trying to get a line plot for a dataframe:
i = [0.01,0.02,0.03,....,0.98,0.99,1.00]
values= [76,98,22,.....,32,98,100]
but there is index from 0,1,...99 as well and when I plot the index line also gets plotted. How do I ignore the plotting of index? I used the following code:
plt.plot(df,color= 'blue', label= 'values')
plt.title('values for corresponding i')
plt.legend(loc= 'upper right')
plt.xlabel("i")
plt.ylabel("values")
plt.show()

You could use plot.line directly on pandas dataframe, it's a wrapper around matplotlib and it makes stuff easier.
Example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Generate random DataFrame
i = np.arange(0, 1, 0.01)
values = np.random.randint(1, 100, 100)
df = pd.DataFrame({"i": i, "values": values})
# Plot
df.plot.line(x="i", y="values", color="blue", label="values")
plt.title("values for corresponding i")
plt.legend(loc="upper right")
plt.xlabel("i")
plt.ylabel("values")
Result:

Related

draw multiple box plots on a single graph [duplicate]

This question already has answers here:
Import multiple CSV files into pandas and concatenate into one DataFrame
(20 answers)
dataframe to long format
(2 answers)
seaborn boxplot and stripplot points aren't aligned over the x-axis by hue
(1 answer)
Closed 6 months ago.
For a given dataset I am plotting a box plot of size of object at 10 different points as below:
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib as mpl
font_prop = font_manager.FontProperties( size=18)
def plot (path, name=""):
df = pd.read_csv(path, index_col=0)
df = df.dropna()
Position = [1 + i // df.shape[0] for i in range(df.size)]
df_n = [df[col] for col in df.columns]
df_t = pd.concat(df_n).tolist()
groups = [[] for i in range(max(Position))]
[groups[Position[i] - 1].append(df_t[i]) for i in range(len(df_t))]
plt.figure(figsize=(12, 5))
plt.scatter(Position, df_t, color='g')
b = plt.boxplot(groups, patch_artist=False)
for median in b['medians']:
median.set(color='r', linewidth=2)
A typical graph would be like this:
I have 4 different datasets and I would like to present a graph where on the position axis (x axis) there will be 4 bar plots above each position. How would I modify my code to do that?
Here is the sample dataset:
https://github.com/aebk2015/multipleboxplot.git
,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,Class
1,7.6,1.0,1.0,1.0,1.0,6.0,49.0,1.0,1.0,40.0,L
2,9.7,2.7,5.6,1.0,1.0,1.0,34.0,1.0,1.0,1.0,L
3,1.0,6.0,1.0,1.0,1.0,3.0,39.0,1.0,28.0,1.0,L
4,8.0,25.5,1.0,1.0,1.0,1.0,24.0,1.0,1.0,1.0,L
5,1.0,29.0,1.0,1.0,1.0,1.0,38.0,29.0,20.0,1.0,L
6,4.0,34.0,1.0,1.0,1.0,39.0,14.0,1.0,12.0,1.0,L
7,1.0,17.0,1.0,1.0,1.0,1.0,20.8,1.0,14.6,1.0,L
8,1.0,1.0,1.0,1.0,1.0,1.0,19.0,17.5,1.0,1.0,L
9,1.0,30.0,1.0,1.0,1.0,3.0,23.0,1.0,1.0,1.0,L
10,1.0,5.0,25.0,1.0,1.0,17.0,6.3,1.0,17.0,1.0,L
1,11.8,19.0,1.0,1.0,1.0,11.3,2.0,4.0,5.0,1.0,C
2,12.0,17.0,20.0,9.0,1.0,23.0,4.0,7.0,1.0,1.0,C
3,14.0,30.0,8.0,1.0,11.0,24.0,38.0,1.0,3.5,1.0,C
4,10.5,10.4,11.5,20.5,1.0,22.0,3.0,15.0,5.6,3.7,C
5,1.0,13.5,8.0,6.6,1.0,37.0,1.0,1.0,1.0,4.0,C
6,12.4,22.0,1.0,1.0,1.0,29.0,17.0,11.0,1.0,1.0,C
7,1.0,43.0,1.0,1.0,1.0,10.0,18.0,8.6,1.0,1.0,C
8,15.0,12.0,1.0,35.0,1.0,1.0,1.0,10.0,3.0,1.0,C
9,1.0,24.0,8.0,1.0,1.0,1.0,4.0,1.0,1.0,1.0,C
10,4.6,2.0,7.4,1.0,1.0,22.0,5.6,1.0,25.0,1.0,C
1,1.0,39.0,11.0,13.0,1.0,1.0,28.0,7.0,1.0,7.0,W
2,8.0,52.0,22.0,10.0,1.0,1.0,33.0,13.0,1.0,4.8,W
3,1.0,28.0,1.0,10.0,1.0,1.0,24.0,3.0,1.0,4.0,W
4,8.8,11.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,W
5,1.0,42.0,1.0,1.0,1.0,69.0,1.0,31.0,1.0,49.0,W
6,9.0,36.0,11.0,14.0,24.0,1.0,8.0,1.0,1.0,15.8,W
7,13.0,33.0,12.7,8.7,1.0,1.0,7.8,38.0,1.0,1.0,W
8,1.0,36.0,12.0,1.0,1.0,12.0,1.0,1.0,1.0,1.0,W
9,1.0,10.0,12.0,1.0,1.0,1.0,64.0,13.0,1.0,14.0,W
10,8.0,31.0,19.0,1.0,24.0,1.0,48.0,1.0,1.0,1.0,W
1,1.0,9.7,6.8,53.0,1.0,57.0,1.0,9.5,1.0,1.0,B
2,5.8,16.3,1.0,10.8,1.0,58.0,1.0,1.0,1.0,1.0,B
3,1.0,38.0,17.0,34.0,1.0,55.0,1.0,8.0,1.0,1.0,B
4,1.0,42.0,1.0,26.0,1.0,1.0,65.0,44.0,1.0,1.0,B
5,41.0,43.0,16.0,9.7,1.0,36.0,61.0,1.0,1.0,1.0,B
6,47.0,20.0,1.0,1.0,1.0,1.0,28.0,7.7,1.0,1.0,B
7,22.0,92.0,1.0,1.0,1.0,20.0,15.0,1.0,1.0,1.0,B
8,31.0,72.0,1.0,1.0,1.0,1.0,20.0,1.0,1.0,1.0,B

How to draw the smooth lineplot and display the dates on the x-axis with python? [duplicate]

This question already has answers here:
Passing datetime-like object to seaborn.lmplot
(2 answers)
format x-axis (dates) in sns.lmplot()
(1 answer)
How to plot int to datetime on x axis using seaborn?
(1 answer)
Closed 10 months ago.
I would really really appreciate it if you guys can point me to where to look. I have been trying to do it for 3 days and still can't find the right one. I need to draw the chart which looks as the first picture's chart and I need to display the dates on the X axis as it gets displayed on the second chart. I am complete beginner with seaborn, python and everything. I used lineplot first, which only met one criteria, display the dates on X-axis. But, the lines are actually sharp like in the second picture rather than smooth like in the first picture. Then, I kept digging and found implot. With that, I could get the design of the chart I wanted (Smoothed chart). But, the problem is when I tried to display the dates on the X-axis, it didn't work. I got an error could not convert string to float: '2022-07-27T13:31:00Z'.
Here is the code for implot, got the wanted plot design but date can't be displayed on X-axis
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([ "2022-07-27T13:31:00Z",
"2022-08-28T13:31:00Z",
"2022-09-29T13:31:00Z",
"2022-10-30T13:31:00Z",])
power = np.array([10,25,60,42])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
If I use the number instead of date, the output is this. Exactly as I need
Here is the code with which all the data gets displayed correctly. But, the plot design is not smoothed.
import seaborn as sns
import numpy as np
import scipy
import matplotlib.pyplot as plt
import pandas as pd
from pandas.core.apply import frame_apply
years = ["2022-03-22T13:30:00Z",
"2022-03-23T13:31:00Z",
"2022-04-24T19:27:00Z",
"2022-05-25T13:31:00Z",
"2022-06-26T13:31:00Z",
"2022-07-27T13:31:00Z",
"2022-08-28T13:31:00Z",
"2022-09-29T13:31:00Z",
"2022-10-30T13:31:00Z",
]
feature_1 =[0,
6,
1,
5,
9,
15,
21,
4,
1,
]
data_preproc = pd.DataFrame({
'Period': years,
# 'Feature 1': feature_1,
# 'Feature 2': feature_2,
# 'Feature 3': feature_3,
# 'Feature 4': feature_4,
"Feature 1" :feature_1
})
data_preproc['Period'] = pd.to_datetime(data_preproc['Period'],
format="%Y-%m-%d",errors='coerce')
data_preproc['Period'] = data_preproc['Period'].dt.strftime('%b')
# aiAlertPlot =sns.lineplot(x='Period', y='value', hue='variable',ci=None,
# data=pd.melt(data_preproc, ['Period']))
sns.lineplot(x="Period",y="Feature 1",data=data_preproc)
# plt.xticks(np.linspace(start=0, stop=21, num=52))
plt.xticks(rotation=90)
plt.legend(title="features")
plt.ylabel("Alerts")
plt.legend(loc='upper right')
plt.show()
The output is this. Correct data, wrong chart design.
lmplot is a model based method, which requires numeric x. If you think the date values are evenly spaced, you can just create another variable range which is numeric and calculate lmplot on that variable and then change the xticks labels.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([ "2022-07-27T13:31:00Z",
"2022-08-28T13:31:00Z",
"2022-09-29T13:31:00Z",
"2022-10-30T13:31:00Z",])
power = np.array([10,25,60,42])
df = pd.DataFrame(data = {'T': T, 'power': power})
df['range'] = np.arange(df.shape[0])
sns.lmplot(x='range', y='power', data=df, ci=None, order=4, truncate=False)
plt.xticks(df['range'], df['T'], rotation = 45);

Adding y values to a plot using matplotlib [duplicate]

This question already has answers here:
How to plot and annotate a grouped bar chart
(1 answer)
Python matplotlib multiple bars
(7 answers)
How to annotate grouped bar plot with percent by hue/legend group
(1 answer)
How to plot grouped bars in the correct order
(1 answer)
How to get a grouped bar plot of categorical data
(1 answer)
Closed 1 year ago.
I would like to add the y values to my plot for each year to make the graph easily readable but not sure how to do it. I have tried using the enumerate function but it does not return the desired output. Any guidance on this would be helpful.
import numpy as np
import pandas as pd
from matplotlib import pyplot
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
laikipia = pd.DataFrame({
"2020":[5.21, 20.91, 17.05],
"2021":[20.91, 19.91, 19.76],
},
index=["Cropland", "Forestland", "Shrubland"]
)
c = ['#FFDB5C', '#358221', '#EECFA8']
laikipia.plot(kind="bar", color=c)
plt.title("")
plt.xlabel("Laikipia LULC")
plt.ylabel("Area Loss (ha)")
plt.legend(laikipia, bbox_to_anchor=(1, 1))
plt.xticks(rotation=0)
#plt.yticks(round [plotdata], 0)
x = laikipia
y = [70, 60, 50, 40, 30, 20, 10, 0]
max_y_lim = max(y)
min_y_lim = min(y)
plt.ylim(min_y_lim, max_y_lim)
for i, v in enumerate(y):
plt.text(0, i, y[i], str(v), ha="center", va = "bottom")
plt.show()
plt.tight_layout()
plot output

How can you change the color and line type of an individual line in a line plot on Jupyter Notebook when plotting the entire Dataframe at once? [duplicate]

This question already has answers here:
Python pandas, Plotting options for multiple lines
(3 answers)
Closed 1 year ago.
I am trying to figure out how to change the line color and line type of an individual line after plotting an entire dataframe at once.
This is a snapshot of my dataframe:
Then I used this to plot it:
df_month.plot(figsize=(15,10), linewidth = 3.5)
plt.xlabel('Months', fontsize=18)
plt.ylabel('Average Precipitation (mm/d)', fontsize =19)
plt.title('Precipitation near Cape Scott Wind Farm', fontsize=22)
plt.savefig('CapeScott_precip.png')
And it resulted in this:
So I am wondering how I can change just the 'WRF-GFS' line to a dashed black line? Any guidance will be helpful, thank you!
Reset the index first:
df_plot = df_month.reset_index()
Loop through the columns to plot them, with different parameters for the 'WRF-GFS' column:
for col_name in df_plot.drop(columns='Datetime', axis=1).columns:
if col_name == 'WRF-GFS':
plt.plot(df_plot['Datetime'], df_plot[col_name], color='black', linestyle='dashed', linewidth=3.5, label=col_name)
else:
plt.plot(df_plot['Datetime'], df_plot[col_name], linewidth=3.5, label=col_name)
plt.legend(loc='best')
Running this with some random data:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(42)
df_plot = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df_plot['Datetime'] = list(range(10))
for col_name in df_plot.drop(columns='Datetime', axis=1).columns.tolist():
if col_name == 'B':
plt.plot(df_plot['Datetime'], df_plot[col_name], color='black', linestyle='dashed', linewidth=3.5, label=col_name)
else:
plt.plot(df_plot['Datetime'], df_plot[col_name], linewidth=3.5, label=col_name)
plt.legend(loc='best')
plt.show()

How to adjust space between Matplotlib/Seaborn subplots for multi-plot layouts [duplicate]

This question already has answers here:
Improve subplot size/spacing with many subplots
(8 answers)
Closed 5 months ago.
The following figure shows the standard Seaborn/Matplotlib Boxplots in a 2 X 2 grid layout:
It is pretty much what I want except that I would like to put some more space between the first row of the of the plots and the second row. The distance between the X-axis labels of the first row plots and the title of the second row plots is almost non-existent. I have been playing with the parameters as explained in this thread:
StackOverflow Thread
Here is my relevant code:
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
from PyPDF2 import PdfFileMerger
import seaborn as sns
num_cols = 2
num_rows = int(math.ceil(tot_plots / float(num_cols)))
fig, axes = plt.subplots(nrows=num_rows, ncols=num_cols, figsize=(16, 16))
x_var = df_orig['hra']
for idx, ax in enumerate(axes.flat):
data_var = current_cols[idx]
y_var = df_orig[data_var]
title_str = ''
sns.boxplot(x=x_var, y=y_var, ax=ax,
order=order, palette=color, showfliers=False)
ax.set_title(data_var + title_str)
ax.xaxis.label.set_visible(False)
ax.yaxis.label.set_visible(False)
ax.xaxis.set_tick_params(labelsize=8)
ax.yaxis.set_tick_params(labelsize=8)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90)
fig.suptitle("Sampling BoxPlots", x=0.5, y=0.93, fontsize=14, fontweight="bold")
plt.tight_layout()
plt.subplots_adjust(top=0.8)
pdf_pages = PdfPages(file_name)
pdf_pages.savefig()
pdf_pages.close()
Have you tried adjusting hspace = 0.8 instead? According to matplotlib's reference that's the argument for changing the height between subplots, and not top.
plt.subplots_adjust(hspace = 0.8)

Categories

Resources