How to plot multiple lines in subplot using python and matplotlib - python

I've been following the solutions provided by Merge matplotlib subplots with shared x-axis. See solution 35. In each subplot, there is one line, but I would like to have multiple lines in each subplot. For example, the top plot has the price of IBM and a 30 day moving average. The bottom plot has a 180 day and 30 day variance.
To plot multiple lines in my other python programs I used (data).plot(figsize=(10, 7)) where data is a dataframe indexed by date, but in the author's solution he uses line0, = ax0.plot(x, y, color='r') to assign the data series (x,y) to the plot. In the case of multiple lines in solution 35, how does one assign a dataframe with multiple columns to the plot?

You'll need to use (data).plot(ax=ax0) to work with pandas plotting.
For the legend you can use:
handles0, labels0 = ax0.get_legend_handles_labels()
handles1, labels1 = ax1.get_legend_handles_labels()
ax0.legend(handles=handles0 + handles1, labels=labels0 + labels1)

Related

Creating a single tidy seaborn plot in a 'for' loop

I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])
To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots

Matplotlib plotting data that doesnt exist

I am trying to plot three lines on one figure. I have data for three years for three sites and i am simply trying to plot them with the same x axis and same y axis. The first two lines span all three years of data, while the third dataset is usually more sparse. Using the object-oriented axes matplotlib format, when i try to plot my third set of data, I get points at the end of the graph that are out of the range of my third set of data. my third dataset is structured as tuples of dates and values such as:
data=
[('2019-07-15', 30.6),
('2019-07-16', 20.88),
('2019-07-17', 16.94),
('2019-07-18', 11.99),
('2019-07-19', 13.76),
('2019-07-20', 16.97),
('2019-07-21', 19.9),
('2019-07-22', 25.56),
('2019-07-23', 18.59),
...
('2020-08-11', 8.33),
('2020-08-12', 10.06),
('2020-08-13', 12.21),
('2020-08-15', 6.94),
('2020-08-16', 5.51),
('2020-08-17', 6.98),
('2020-08-18', 6.17)]
where the data ends in August 2020, yet the graph includes points at the end of 2020. This is happening with all my sites, as the first two datasets stay constant knowndf['DATE'] and knowndf['Value'] below.
Here is the problematic graph.
And here is what I have for the plotting:
fig, ax=plt.subplots(1,1,figsize=(15,12))
fig.tight_layout(pad=6)
ax.plot(knowndf['DATE'], knowndf['Value1'],'b',alpha=0.7)
ax.plot(knowndf['DATE'], knowndf['Value2'],color='red',alpha=0.7)
ax.plot(*zip(*data), 'g*', markersize=8) #when i plot this set of data i get nonexistent points
ax.tick_params(axis='x', rotation=45) #rotating for aesthetic
ax.set_xticks(ax.get_xticks()[::30]) #only want every 30th tick instead of every daily tick
I've tried ax.twinx() and that gives me two y axis that doesn't help me since i want to use the same x-axis and y-axis for all three sites. I've tried not using the axes approach, but there are things that come with axes that i need to plot with. Please please help!

Plot specific markers on plot in Python

I have a plot that has on x axis time and on y axis values in percentages. The plot is drawn based on a dataframe output. As I would need to review many plots, would be good to insert some pointers of a different color.
For example, each graph starts the timeline from 08:00 and finishes at 20:00. I would need a red marker at 12:00.
I have tried the following:
graph_df is a df that contains two columns: one with time and one with percentage data.
df = graph_df.loc[graph_df['time'] == "12:00"]
graph_df.plot(x="time", y="percentage", linewidth=1, kind='line')
plt.plot(df['time'], df['percentage'], 'o-', color='red')
plt.show()
plt.savefig(graph_name)
If I am using this section of the code, I am getting the marker at the correct percentage for 12:00, but always at the start of the timeline. In my case, the red dot is marked at 08:00, but with the right percentage associated.
Any idea why it's not correctly marked?
Thank you.
Converting the strings to datetime objects should work.
Replacing your first line with
graph_df["time"] = pd.to_datetime(graph_df["time"]).dt.time
df = graph_df[graph_df["time"].apply(lambda time: time.strftime("%H:%M"))=='12:00']
should do the job

python plot how to adjust a lengthy legend [duplicate]

I have a data file which consists of 131 columns and 4 rows. I am plotting it into python as follows
df = pd.read_csv('data.csv')
df.plot(figsize = (15,10))
Once it is plotted, all 131 legends are coming together like a huge tower over the line plots.
Please see the image here, which I have got :
Link to Image, I have clipped after v82 for better understanding
I have found some solutions on Stackoverflow (SO) to shift legend anywhere in the plot but I could not find any solution to break this legend tower into multiple small-small pieces and stack them one beside another.
Moreover, I want my plot something look like this
My desired plot :
Any help would be appreciable. Thank you.
You can specify the position of the legend in relative coordinates using loc and use ncol parameter to split the single legend column into multiple columns. To do so, you need an axis handle returned by the df.plot
df = pd.read_csv('data.csv')
ax = df.plot(figsize = (10,7))
ax.legend(loc=(1.01, 0.01), ncol=4)
plt.tight_layout()

Set x axis locator at hour intervals on matplotlib subplot

I am trying to create a figure with four subplots using the Matplotlib object based approach. I am having trouble setting the x-axis to hourly markers on each plot. With my present code the hourly marks are retained only on the last of the four subplots
I have a list which contains four dataframes that were read in from CSV. I used pd.to_datetime to create an index. No problem.
I can loop through the four dataframes and plot my y variable (TS_comp) against time. this works fine and I get date/time on each x-axis. But what I want is to have just hour markers on each of the x axis. When I add code in the loop to set the major locator it ends up that the x-axis labels are wiped on the first three subplots. The two lines of code from the loop below are:
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
I do not understand why this is happening as each time it goes through the loop it should be addressing a different axis object. Note x-axis time ranges are different so not a simple matter of sharing the x-axis across the subplots.
fig, ax = plt.subplots(nrows=2, ncols=2)
i=0
hours = mdates.HourLocator(interval = 1)
for ax in fig.get_axes():
ax.plot(dfs[i].TS_comp,'k-',markersize = 0.5)
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
i=i+1;
Expect to get hourly markers on each of the subplots, ended up with hourly markers on just the last plot

Categories

Resources