print multiple separate histograms in one loop - python

I want this to print out two histograms (of the first two columns), but this instead stacks the histograms within the same plot. How do I get it to output two separate histograms?
dataobj = pd.DataFrame([[1,2,3],[3,4,5],[6,7,8]])
for i in [0,1]:
a = np.array(dataobj.iloc[:,i])
plt.hist(a,bins = np.linspace(0,10,11))
Even better would be a solution where I can save the plots into an array which I could later call to display them.
Working in Jupyter

dataobj = pd.DataFrame([[1, 2, 3], [3, 4, 5], [6, 7, 8]])
fig, axes = plt.subplots(3, 1)
plt.rcParams['figure.figsize'] = (12, 12)
for i in range(3):
a = np.array(dataobj.iloc[:, i])
axes[i].hist(a, bins=np.linspace(0, 10, 11))
plt.show()
u need to use axes

Just add plt.show() in for loop, no need in subplots and axes. Like this
dataobj = pd.DataFrame([[1,2,3],[3,4,5],[6,7,8]])
for i in [0,1]:
a = np.array(dataobj.iloc[:,i])
plt.hist(a,bins = np.linspace(0,10,11))
plt.show()

Related

Plot subplots inside subplots matplotlib

Context: I'd like to plot multiple subplots (sparated by legend) based on patterns from the columns of a dataframe inside a subplot however, I'm not being able to separate each subplots into another set of subplots.
This is what I have:
import matplotlib.pyplot as plt
col_patterns = ['pattern1','pattern2']
# define subplot grid
fig, axs = plt.subplots(nrows=len(col_patterns), ncols=1, figsize=(30, 80))
plt.subplots_adjust()
fig.suptitle("Title", fontsize=18, y=0.95)
for col_pat,ax in zip(col_patterns,axs.ravel()):
col_pat_columns = [col for col in df.columns if col_pat in col]
df[col_pat_columns].plot(x='Week',ax=ax)
# chart formatting
ax.set_title(col_pat.upper())
ax.set_xlabel("")
Which results in something like this:
How could I make it so that each one of those suplots turn into another 6 subplots all layed out horizontally? (i.e. each figure legend would be its own subplot)
Thank you!
In your example, you're defining a 2x1 subplot and only looping through two axes objects that get created. In each of the two loops, when you call df[col_pat_columns].plot(x='Week',ax=ax), since col_pat_columns is a list and you're passing it to df, you're just plotting multiple columns from your dataframe. That's why it's multiple series on a single plot.
#fdireito is correct—you just need to set the ncols argument of plt.subplots() to the right number that you need, but you'd need to adjust your loops to accommodate.
If you want to stay in matplotlib, then here's a basic example. I had to take some guesses as to how your dataframe was structured and so on.
# import matplotlib
import matplotlib.pyplot as plt
# create some fake data
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
# a list of lists, where each inner list is a set of
# columns we want in the same row of subplots
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
The following is a simplified example of what your code ends up doing.
fig, axes = plt.subplots(len(col_patterns), 1)
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
2x1 subplot (what you have right now)
I use enumerate() with col_patterns to iterate through the subplot rows, and then use enumerate() with each column name in a given pattern to iterate through the subplot columns.
# the following will size your subplots according to
# - number of different column patterns you want matched (rows)
# - largest number of columns in a given column pattern (columns)
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col])
Correctly sized subplot
Here's all the code, with a couple additions I omitted from the code above for simplicity's sake.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
# what you have now
fig, axes = plt.subplots(len(col_patterns), 1, figsize=(12, 8))
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
ax.legend(pat, loc='upper left')
# what I think you want
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols, figsize=(16, 8), sharex=True, sharey=True, tight_layout=True)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col], label=col)
axes[nrow][ncol].legend(loc='upper left')
Another option you can consider is ditching matplotlib and using Seaborn relplots. There are several examples on that page that should help. If you have your dataframe set up correctly (long or "tidy" format), then to achieve the same as above, your one-liner would look something like this:
# import seaborn as sns
sns.relplot(data=df, kind='line', x=x_vals, y=y_vals, row=col_pattern, col=num_weeks_rolling)

Create multiple boxplots from statistics in one graph

I am having trouble finding a solution to plot multiple boxplots created from statistics into one graph.
From another application, I get a Dataframe that contains the different metrics needed to draw boxplots (median, quantile 1, ...). While I am able to plot a single boxplot from these statistics with the following code:
data = pd.read_excel("data.xlsx")
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(6, 6), sharey=True)
row = data.iloc[:, 0]
stats = [{
"label": i, # not required
"mean": row["sharpeRatio"], # not required
"med": row["sharpeRatio_med"],
"q1": row["sharpeRatio_q1"],
"q3": row["sharpeRatio_q3"],
# "cilo": 5.3 # not required
# "cihi": 5.7 # not required
"whislo": row["sharpeRatio_min"], # required
"whishi": row["sharpeRatio_max"], # required
"fliers": [] # required if showfliers=True
}]
axes.bxp(stats)
plt.show()
I am struggling to create a graph containing boxplots from all the rows in the dataframe. Do you have an idea how to achieve this?
You can pass a list of dictionaries to the bxp method. The easiest way to get such a list from your existing code is to put the dictionary construction inside a function and call it for each row of the dataframe.
Note that data.iloc[:, 0] would be the first column, not the first row.
import matplotlib.pyplot as plt
import pandas as pd
def stats(row):
return {"med": row["sharpeRatio_med"],
"q1": row["sharpeRatio_q1"],
"q3": row["sharpeRatio_q3"],
"whislo": row["sharpeRatio_min"],
"whishi": row["sharpeRatio_max"]}
data = pd.DataFrame({"sharpeRatio_med": [3, 4, 2],
"sharpeRatio_q1": [2, 3, 1],
"sharpeRatio_q3": [4, 5, 3],
"sharpeRatio_min": [1, 1, 0],
"sharpeRatio_max": [5, 6, 4]})
fig, axes = plt.subplots()
axes.bxp([stats(data.iloc[i, :]) for i in range(len(data))],
showfliers=False)
plt.show()

how to bond coordinate points after changing values by using a loop in matplotlib

ı need to change coordinate values and ı need to bond them
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
data = np.array([
[0, 2],
[0, 3],
[0, 6],
[0, 7],
[7, 9],
[7, 8],
])
list =data.tolist()
x, y = data.T
for x,y in list:
x+=1
plt.scatter(x,y,color='red',)
ı used a loop to change the values, everything worked properly so far.
plt.plot(x,y,)
ı dont know why but it does not work properly, ı couldnt bond the coordinate values after changed it by using a loop (it works without using a loop)
plt.show(x,y)
this is the graph without using:
for x,y in list:
x+=1
plt.scatter(x,y,color='red')
https://i.stack.imgur.com/KUQro.png
ı try to make this graph with the new coordinate values
Matplotlib and Numpy are designed to go hand in hand and arithmetic on arrays is MUCH simpler than on lists, so forget about converting to lists
data = np.array([ # shape of data is (6, 2) - six rows, two columns
[0, 2],
[0, 3],
[0, 6],
[0, 7],
[7, 9],
[7, 8],
])
x, y = data.T # x, y are two Numpy arrays, their shape (6,) because data.T is (2, 6)
# create a figure with two sublots in a single row, "1,2"
# stretch the h-size to accomodate two subplots,
# use exactly the same x limits and ticks on both subplots
fig, (ax0, ax1) = plt.subplots(1,2, figsize=(8,3), sharex=True)
# in one subplot the original data, in the other the modified x
ax0.plot(x, y)
ax1.plot(x+1, y) # simply add 1 to the array x

Show all lines in matplotlib line plot

How do I bring the other line to the front or show both the graphs together?
plot_yield_df.plot(figsize=(20,20))
If plot data overlaps, then one way to view both the data is increase the linewidth along with handling transparency, as shown:
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Original', linewidth=5, alpha=0.5)
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Predicted')
plt.legend()
Subplotting is other good way.
Problem
The lines are plotted in the order their columns appear in the dataframe. So for example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
a = np.random.rand(400)*0.9
b = np.random.rand(400)+1
a = np.c_[a,-a].flatten()
b = np.c_[b,-b].flatten()
df = pd.DataFrame({"A" : a, "B" : b})
df.plot()
plt.show()
Here the values of "B" hide those from "A".
Solution 1: Reverse column order
A solution is to reverse their order
df[df.columns[::-1]].plot()
That has also changed the order in the legend and the color coding.
Solution 2: Reverse z-order
So if that is not desired, you can instead play with the zorder.
ax = df.plot()
lines = ax.get_lines()
for line, j in zip(lines, list(range(len(lines)))[::-1]):
line.set_zorder(j)

pandas boxplot, groupby different ylim in each subplot

I have a dataframe and I would like to plot it as:
>>> X = pd.DataFrame(np.random.normal(0, 1, (100, 3)))
>>> X['NCP'] = np.random.randint(0, 5, 100)
>>> X[X['NCP'] == 0] += 100
>>> X.groupby('NCP').boxplot()
The result is what I want but all the subplots have the same ylim. This makes impossible to visualize the result properly. How can I set different ylim for each subplot?
What you asked for was to set the y axis separately for each axes. I believe that should be ax.set_ylim([a, b]). But every time I ran it for each axes it updated for all.
Because I couldn't figure out how to answer your question directly, I'm providing a work around.
X = pd.DataFrame(np.random.normal(0, 1, (100, 3)))
X['NCP'] = np.random.randint(0, 5, 100)
X[X['NCP'] == 0] += 100
groups = X.groupby('NCP')
print groups.groups.keys()
# This gets a number of subplots equal to the number of groups in a single
# column. you can adjust this yourself if you need.
fig, axes = plt.subplots(len(groups.groups), 1, figsize=[10, 12])
# Loop through each group and plot boxplot to appropriate axis
for i, k in enumerate(groups.groups.keys()):
group = groups.get_group(k)
group.boxplot(ax=axes[i], return_type='axes')
subplots DOCUMENTATION

Categories

Resources