How do I bring the other line to the front or show both the graphs together?
plot_yield_df.plot(figsize=(20,20))
If plot data overlaps, then one way to view both the data is increase the linewidth along with handling transparency, as shown:
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Original', linewidth=5, alpha=0.5)
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Predicted')
plt.legend()
Subplotting is other good way.
Problem
The lines are plotted in the order their columns appear in the dataframe. So for example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
a = np.random.rand(400)*0.9
b = np.random.rand(400)+1
a = np.c_[a,-a].flatten()
b = np.c_[b,-b].flatten()
df = pd.DataFrame({"A" : a, "B" : b})
df.plot()
plt.show()
Here the values of "B" hide those from "A".
Solution 1: Reverse column order
A solution is to reverse their order
df[df.columns[::-1]].plot()
That has also changed the order in the legend and the color coding.
Solution 2: Reverse z-order
So if that is not desired, you can instead play with the zorder.
ax = df.plot()
lines = ax.get_lines()
for line, j in zip(lines, list(range(len(lines)))[::-1]):
line.set_zorder(j)
Related
Context: I'd like to plot multiple subplots (sparated by legend) based on patterns from the columns of a dataframe inside a subplot however, I'm not being able to separate each subplots into another set of subplots.
This is what I have:
import matplotlib.pyplot as plt
col_patterns = ['pattern1','pattern2']
# define subplot grid
fig, axs = plt.subplots(nrows=len(col_patterns), ncols=1, figsize=(30, 80))
plt.subplots_adjust()
fig.suptitle("Title", fontsize=18, y=0.95)
for col_pat,ax in zip(col_patterns,axs.ravel()):
col_pat_columns = [col for col in df.columns if col_pat in col]
df[col_pat_columns].plot(x='Week',ax=ax)
# chart formatting
ax.set_title(col_pat.upper())
ax.set_xlabel("")
Which results in something like this:
How could I make it so that each one of those suplots turn into another 6 subplots all layed out horizontally? (i.e. each figure legend would be its own subplot)
Thank you!
In your example, you're defining a 2x1 subplot and only looping through two axes objects that get created. In each of the two loops, when you call df[col_pat_columns].plot(x='Week',ax=ax), since col_pat_columns is a list and you're passing it to df, you're just plotting multiple columns from your dataframe. That's why it's multiple series on a single plot.
#fdireito is correct—you just need to set the ncols argument of plt.subplots() to the right number that you need, but you'd need to adjust your loops to accommodate.
If you want to stay in matplotlib, then here's a basic example. I had to take some guesses as to how your dataframe was structured and so on.
# import matplotlib
import matplotlib.pyplot as plt
# create some fake data
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
# a list of lists, where each inner list is a set of
# columns we want in the same row of subplots
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
The following is a simplified example of what your code ends up doing.
fig, axes = plt.subplots(len(col_patterns), 1)
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
2x1 subplot (what you have right now)
I use enumerate() with col_patterns to iterate through the subplot rows, and then use enumerate() with each column name in a given pattern to iterate through the subplot columns.
# the following will size your subplots according to
# - number of different column patterns you want matched (rows)
# - largest number of columns in a given column pattern (columns)
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col])
Correctly sized subplot
Here's all the code, with a couple additions I omitted from the code above for simplicity's sake.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
# what you have now
fig, axes = plt.subplots(len(col_patterns), 1, figsize=(12, 8))
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
ax.legend(pat, loc='upper left')
# what I think you want
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols, figsize=(16, 8), sharex=True, sharey=True, tight_layout=True)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col], label=col)
axes[nrow][ncol].legend(loc='upper left')
Another option you can consider is ditching matplotlib and using Seaborn relplots. There are several examples on that page that should help. If you have your dataframe set up correctly (long or "tidy" format), then to achieve the same as above, your one-liner would look something like this:
# import seaborn as sns
sns.relplot(data=df, kind='line', x=x_vals, y=y_vals, row=col_pattern, col=num_weeks_rolling)
I am trying to plot multiple figures on a single pane using matplotlib.pyplot's subplot. Here is my current code.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({"col1": [1,2], "col2": [3,4], "col3": [5,6], "col4": [7,8], "target": [9,10]})
f, axs = plt.subplots(nrows = 2, ncols = 2, sharey = True)
# for ax in axs.flat:
# ax.label_outer()
for k, col in enumerate(df.columns):
if col != "target":
idx = np.unravel_index(k, (2,2))
axs[idx].scatter(df[col], df.target)
axs[idx].set_xlabel(col)
As it stands, with the two lines commented out, this prints all the xticks but only the xlabels for the bottom two plots.
If I uncomment those two lines, then the all the xlabels appear, but the xticks on the top row disappear. I think this is because the space has been 'freed up' by the [label_outer][2] function
I don't see how I can have both on the top row. If one prints out all the xlabels, then they are indeed all there.
Any help would be most appreciated!
You just need to call plt.tight_layout() after your loop. Refer to the guide to know more about options and capabilities.
I want this to print out two histograms (of the first two columns), but this instead stacks the histograms within the same plot. How do I get it to output two separate histograms?
dataobj = pd.DataFrame([[1,2,3],[3,4,5],[6,7,8]])
for i in [0,1]:
a = np.array(dataobj.iloc[:,i])
plt.hist(a,bins = np.linspace(0,10,11))
Even better would be a solution where I can save the plots into an array which I could later call to display them.
Working in Jupyter
dataobj = pd.DataFrame([[1, 2, 3], [3, 4, 5], [6, 7, 8]])
fig, axes = plt.subplots(3, 1)
plt.rcParams['figure.figsize'] = (12, 12)
for i in range(3):
a = np.array(dataobj.iloc[:, i])
axes[i].hist(a, bins=np.linspace(0, 10, 11))
plt.show()
u need to use axes
Just add plt.show() in for loop, no need in subplots and axes. Like this
dataobj = pd.DataFrame([[1,2,3],[3,4,5],[6,7,8]])
for i in [0,1]:
a = np.array(dataobj.iloc[:,i])
plt.hist(a,bins = np.linspace(0,10,11))
plt.show()
I am learning altair to add interactivity to my plots. I am trying to recreate a plot I do in matplotlib, however altair is adding noise to my curves.
this is my dataset
df1
linked here from github: https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv
This is the code:
fig, ax = plt.subplots(figsize=(8, 6))
for key, grp in df1.groupby(['Name']):
y=grp.logabsID
x=grp.VG
ax.plot(x, y, label=key)
plt.legend(loc='best')
plt.show()
#doing it directly from link
df1='https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv'
import altair as alt
alt.Chart(df1).mark_line(size=1).encode(
x='VG:Q',
y='logabsID:Q',
color='Name:N'
)
Here is the image of the plots I am generating:
matplotlib vs altair plot
How do I remove the noise from altair?
Altair sorts the x axis before drawing lines, so if you have multiple lines in one group it will often lead to "noise", as you call it. This is not noise, but rather an accurate representation of all the points in your dataset shown in the default sort order. Here is a simple example:
import numpy as np
import pandas as pd
import altair as alt
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5, 5, 4, 3, 2, 1],
'y': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'group': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
})
alt.Chart(df).mark_line().encode(
x='x:Q',
y='y:Q'
)
The best way to fix this is to set the detail encoding to a column that distinguishes between the different lines that you would like to be drawn individually:
alt.Chart(df).mark_line().encode(
x='x:Q',
y='y:Q',
detail='group:N'
)
If it is not the grouping that is important, but rather the order of the points, you can specify that by instead providing an order channel:
alt.Chart(df.reset_index()).mark_line().encode(
x='x:Q',
y='y:Q',
order='index:Q'
)
Notice that the two lines are connected on the right end. This is effectively what matplotlib does by default: it maintains the index order even if there is repeated data. Using the order channel for your data produces the result you're looking for:
df1 = pd.read_csv('https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv')
alt.Chart(df1.reset_index()).mark_line(size=1).encode(
x='VG:Q',
y='logabsID:Q',
color='Name:N',
order='index:Q'
)
The multiple lines in each group are drawn in order connected at the ends, just as they are in matplotlib.
I have a pandas dataframe and I want to create a plot of it:
import pandas as pd
from matplotlib.ticker import MultipleLocator, FormatStrFormatter, MaxNLocator
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
df.plot(kind='barh')
Nice, everything works as expected:
Now I wanted to hide some of the ticks on y axes. Looking at the docs, I thought I can achieve it with:
MaxNLocator: Finds up to a max number of intervals with ticks at nice
locations. MultipleLocator: Ticks and range are a multiple of base;
either integer or float.
But both of them plot not what I was expecting to see (the values on the y-axes do not show the correct numbers):
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MultipleLocator(2))
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MaxNLocator(3))
What do I do wrong?
Problem
The problem occurs because pandas barplots are categorical. Each bar is positioned at a succesive integer value starting at 0. Only the labels are adjusted to show the actual dataframe index. So here you have a FixedLocator with values 0,1,2,3,... and a FixedFormatter with values -5, -4, -3, .... Changing the Locator alone does not change the formatter, hence you get the numbers -5, -4, -3, ... but at different locations (one tick is not shown, hence the plot starts at -4 here).
A. Pandas solution
In addition to setting the locator you would need to set a formatter, which returns the correct values as function of the location. In the case of a dataframe index with successive integers as used here, this can be done by adding the minimum index to the location using a FuncFormatter. For other cases, the function for the FuncFormatter may become more complicated.
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.ticker import (MultipleLocator, MaxNLocator,
FuncFormatter, ScalarFormatter)
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MultipleLocator(2))
sf = ScalarFormatter()
sf.create_dummy_axis()
sf.set_locs((df.index.max(), df.index.min()))
ax.yaxis.set_major_formatter(FuncFormatter(lambda x,p: sf(x+df.index[0])))
plt.show()
B. Matplotlib solution
Using matplotlib, the solution is potentially easier. Since matplotlib bar plots are numeric in nature, they position the bars at the locations given to the first argument. Here, setting a locator alone is sufficient.
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.ticker import MultipleLocator, MaxNLocator
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
fig, ax = plt.subplots()
ax.barh(df.index, df.values[:,0])
ax.yaxis.set_major_locator(MultipleLocator(2))
plt.show()