How to control width of graph line in matplotlib? - python

I am trying to plot line graphs in matplotlib with the following data, x,y points belonging to same id is one line, so there are 3 lines in the below df.
id x y
0 1 0.50 0.0
1 1 1.00 0.3
2 1 1.50 0.5
4 1 2.00 0.7
5 2 0.20 0.0
6 2 1.00 0.8
7 2 1.50 1.0
8 2 2.00 1.2
9 2 3.50 2.0
10 3 0.10 0.0
11 3 1.10 0.5
12 3 3.55 2.2
It can be simply plotted with following code:
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib notebook
fig, ax = plt.subplots(figsize=(12,8))
cmap = plt.cm.get_cmap("viridis")
groups = df.groupby("id")
ngroups = len(groups)
for i1, (key, grp) in enumerate(groups):
grp.plot(linestyle="solid", x = "x", y = "y", ax = ax, label = key)
plt.show()
But, I have another data frame df2 where weight of each id is given, and I am hoping to find a way to control the thickness of each line according to it's weight, the larger the weight, thicker is the line. How can I do this? Also what relation will be followed between the weight and width of the line ?
id weight
0 1 5
1 2 15
2 3 2
Please let me know if anything is unclear.

Based on the comments, you need to know a few things:
How to set the line width?
That's simple: linewidth=number. See https://matplotlib.org/examples/pylab_examples/set_and_get.html
How to take the weight and make it a significant width?
This depends on the range of your weight. If it's consistently between 2 and 15, I'd recommend simply dividing it by 2, i.e.:
linewidth=weight/2
If you find this aesthetically unpleasing, divide by a bigger number, though that would obviously reduce the number of linewidths you get.
How to get the weight out of df2?
Given the df2 you described and the code you showed, key is the id of df2. So you want:
df2[df2['id'] == key]['weight']
Putting it all together:
Replace your grp.plot line with the following:
grp.plot(linestyle="solid",
linewidth=df2[df2['id'] == key]['weight'] / 2.0,
x = "x", y = "y", ax = ax, label = key)
(All this is is your line with the entry for linewidth added in.)

Related

Loop through rows of dataframe at specific row values

My dataframe contains three different replications for each treatment. I want to loop through both, so I want to loop through each treatment, and for each treatment calculate a model for each replication. I managed to loop through the treatments, but I need to also loop through the replications of each treatment. Ideally, the output should be saved into a new dataframe that contains 'treatment' and 'replication'. Any suggestion?
The dataframe (df) looks like this:
treatment replication time y
**8 1 1 0.1**
8 1 2 0.1
8 1 3 0.1
**8 2 1 0.1**
8 2 2 0.1
8 2 3 0.1
**10 1 1 0.1**
10 1 2 0.1
10 1 3 0.1
**10 2 1 0.1**
10 2 2 0.1
10 2 3 0.1
for i, g in df.groupby('treament'):
k = g.iloc[0].y
popt, pcov = curve_fit(model, x, y)
fit_m = popt
I now apply iterrows, but then I can no longer use the index of NPQ [0] to get the initial value. Any idea how to solve this? The error message reads as:
for index, row in HL.iterrows():
g = (index, row['filename'], row['hr'], row['time'], row['NPQ'])
k = g.iloc[0]['NPQ'])
AttributeError: 'tuple' object has no attribute 'iloc'
Thank you in advance
grouped_df = HL.groupby(["hr", "filename"])
for key, g in grouped_df:
k = g.iloc[0].y
popt, pcov = curve_fit(model, x, y)
fit_m = popt

Matplotlib - Plot uneven steps from DataFrame

I have this DataFrame with x-axis data organized in column. However, for the non-existent, the columns were omitted, so the steps are uneven. For instance:
0.1 0.2 0.5 ...
0 1 4 7 ...
1 2 5 8 ...
2 3 6 9 ...
I want to plot each of those in with x-axis np.arange(0, max(df.columns), step=0.1) and also combined plot of those. Is there any easy way to achieve this with matplotlib.pyplot?
plt.plot(np.arange(0, max(df.columns), step=0.1), new_data)
Any help would be appreciated.
If I understood you correctly, your final dataframe is supposed to look like this:
0.0 0.1 0.2 0.3 0.4 0.5
0 0.0 1 4 0.0 0.0 7
1 0.0 2 5 0.0 0.0 8
2 0.0 3 6 0.0 0.0 9
which can be generated (and then also plotted) like this:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({0.1:[1,2,3],0.2:[4,5,6],0.5:[7,8,9]})
## make sure to actually include the maximum value (add one step)
# or alternatively rather use np.linspace() with appropriate number of points
xs = np.arange(0, max(df.columns) +0.1, step=0.1)
df = df.reindex(columns=xs, fill_value=0.0)
plt.plot(df.T)
plt.show()
which yields:

minimum and maximum length to delimit my line

Sorry o-for the stupid question but I have been on it for over nearly an hour already. Here is a sample of my dataframe:
SEASON Total
0 2004-2005 4
1 2005-2006 4
2 2006-2007 1
3 2007-2008 7
4 2008-2009 7
5 2009-2010 4
6 2010-2011 4
7 2012-2013 4
8 2013-2014 1
9 2014-2015 2
10 2015-2016 3
11 2016-2017 13
12 2017-2018 18
13 2018-2019 8
I have done this:
plt.figure(figsize=(13,6))
plt.plot(per_year.index, per_year['Total'])
plt.xticks(per_year.index, per_year['SEASON'].unique());
plt.title('AVg assist PER YEAR')
plt.axvline(x=10,color='red', linestyle='--')
plt.axhline(y=3.8,color='orange', xmax=10)
plt.axhline(y=11.75, xmax=10)
plt.tight_layout()
All I want is to be able give a max length to my first horizontal line(where it has to stop) and minimum to my second horizontal line to say where it has to finish. I am pretty sure I can do it if change the axis to proper numbers. But I want to keep it as it is.
From the docs: the xmin and xmax arguments need to be between 0-1
Calculate the scale based on the number of x items
xmax = 10/len(per_year.index)
Or use the hline method of the axes:
ax = plt.gca()
ax.hlines(y=3.8,xmin=0, xmax=10, color='r')
ax.hlines(y=11.75,xmin=0, xmax=10, color='g')

How to create n subplots (box plots) automatically?

I need to show n (e.g. 5) box plots. How can I do it?
df =
col1 col2 col3 col4 col5 result
1 3 1 1 4 0
1 2 2 4 9 1
1 2 1 3 7 1
This is my current code. But it does not display the data inside plots. Also, plots are very thin if n is for example 10 (is it possible to go to a new line automatically?).
n=5
columns = df.columns
i = 0
fig, axes = plt.subplots(1, n, figsize=(20,5))
for ax in axes:
df.boxplot(by="result", column = [columns[i]], vert=False, grid=True)
i = i + 1
display(fig)
This example is for Azure Databricks, but I appreciate just a matplotlib solution as well if it's applicable.
I am not sure I got what you are trying to do, but the following code will show you the plots. You can control the figure sizes by changing the values of (10,10)
Code:
df.boxplot(by="result",figsize=(10,10));
Result:
To change the Vert and show the grid :
df.boxplot(by="result",figsize=(10,10),vert=False, grid=True);
I solved it myself as follows:
df.boxplot(by="result", column = columns[0:4], vert=False, grid=True, figsize=(30,10), layout = (3, 5))
If you want additional row to be generated, while fixing the number of columns to be constant: adjust the layout as follows:
In [41]: ncol = 2
In [42]: df
Out[42]:
v0 v1 v2 v3 v4 v5 v6
0 0 3 6 9 12 15 18
1 1 4 7 10 13 16 19
2 2 5 8 11 14 17 20
In [43]: df.boxplot(by='v6', layout=(df.shape[1] // ncol + 1, ncol)) # use floor division to determine how many row are required

Plot mean of subset of a Panda dataframe

Assume a big set of data like
Height (m) My data
0 18 5.0
1 25 6.0
2 10 1.0
3 13 1.5
4 32 8.0
5 26 6.7
6 23 5.0
7 5 2.0
8 7 2.0
And I want to plot the average (and, if possible, the standard deviation) of "My data" as a function of height, separated in the range [0,5),[5,10),[10,15) and so on.
Any idea? I've tried different approaches and none of them work
If I understand you correctly:
# Precompute bins for pd.cut
bins = list(range(0, df['Height (m)'].max() + 5, 5))
# Cut Height into intervals which exclude the right endpoint,
# with bin edges at multiples of 5
df['HeightBin'] = pd.cut(df['Height (m)'], bins=bins, right=False)
# Within each bin, get mean, stdev (normalized by N-1 by default),
# and also show sample size to explain why some std values are NaN
df.groupby('HeightBin')['My data'].agg(['mean', 'std', 'count'])
mean std count
HeightBin
[0, 5) NaN NaN 0
[5, 10) 2.00 0.000000 2
[10, 15) 1.25 0.353553 2
[15, 20) 5.00 NaN 1
[20, 25) 5.00 NaN 1
[25, 30) 6.35 0.494975 2
[30, 35) 8.00 NaN 1
If I understand correctly, this is what you would like to do:
import pandas as pd
import numpy as np
bins = np.arange(0, 30, 5) # adjust as desired
df_stats = pd.DataFrame(columns=['mean', 'st_dev']) # DataFrame for the results
df_stats['mean'] = df.groupby(pd.cut(df['Height (m)'], bins, right=False)).mean()['My data']
df_stats['st_dev'] = df.groupby(pd.cut(df['Height (m)'], bins, right=False)).std()['My data']

Categories

Resources