When I plot single plots with panda dataframes I have an x-axis.
However, when I make a subplot and try to make a shared x-axis the way I would when using numpy arrays without pandas, there are no numbers labels
I only want the numbers and label to appear on the last plot as they share the same x-axis.
The data loaded and the plot produced can be found here:
https://drive.google.com/open?id=1hTmTSkIcYl-usv_CCxLl8U6bAoO6tMRh
This is for combining and plotting the data logged from two different logging devices which represent the same time period.
import pandas as pd
import matplotlib.pyplot as plt
df1=pd.read_csv('data1.csv', sep=',',header=0)
df1.columns.values
cols1 = list(df1.columns.values)
df2=pd.read_csv('data2.dat', sep='\t',header=18)
df2.columns.values
cols2 = list(df2.columns.values)
start =10000
stop = 30000
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True, figsize=(10, 10))
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[1], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[2], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[3], ax=axes[2])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[4], ax=axes[2])
df2.iloc[start:stop].plot(x=cols2[0], y=cols2[3], ax=axes[3])
ax3.set_xlabel("Time [s]")
plt.show()
I expect there to be numbers and a label on the x-axis but instead, it only gives the pandas label "#timestamp"
UPDATE: I have found something that hints at the problem. I think the problem is due to the two files not having identical time spacing, the first column of each file is time, they are roughly 1 sample per second but not exactly. If I remove the x=cols[x] parts it then shows numbers on the x-axis but then there is a shift in time between the two plots as they are not plotting against time but rather against the index in the dataframe.
I am currently trying to interpolate the data so that they have the same x-axis but I would not have expected that to be necessary.
Related
I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])
To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots
Is there an easy way to align two subplots of a time series of different kinds (plot and barplot) in matplotlib? I use the pandas wrapper since I am dealing with pd.Series objects:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1)
series.head(3).plot(marker='o', ax=axes[0])
series.head(3).plot.bar(ax=axes[1])
plt.tight_layout()
The result is not visually great, it would be great to keep the code simplicity and:
Vertically align data points in the top plot to the bars on the bottom plot
Share the axis of the bar plot with the first and remove the visibility on x-axis labels of the top plot altogether (but keep grids whenever present)
Based on the ideas thrown in the comments, I think that this is the simplest solution (giving up the pandas API), which is exactly what I needed:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1, sharex=True)
axes[0].plot(series.head(3), marker='o')
axes[1].bar(series.head(3).index, series.head(3))
plt.tight_layout()
With eventual fix on the xticks for cases with missing values, where the xticks are not plotted daily (e.g. plt.xticks(series.head(3).index)).
Thanks for the help!
I am trying to create a figure with four subplots using the Matplotlib object based approach. I am having trouble setting the x-axis to hourly markers on each plot. With my present code the hourly marks are retained only on the last of the four subplots
I have a list which contains four dataframes that were read in from CSV. I used pd.to_datetime to create an index. No problem.
I can loop through the four dataframes and plot my y variable (TS_comp) against time. this works fine and I get date/time on each x-axis. But what I want is to have just hour markers on each of the x axis. When I add code in the loop to set the major locator it ends up that the x-axis labels are wiped on the first three subplots. The two lines of code from the loop below are:
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
I do not understand why this is happening as each time it goes through the loop it should be addressing a different axis object. Note x-axis time ranges are different so not a simple matter of sharing the x-axis across the subplots.
fig, ax = plt.subplots(nrows=2, ncols=2)
i=0
hours = mdates.HourLocator(interval = 1)
for ax in fig.get_axes():
ax.plot(dfs[i].TS_comp,'k-',markersize = 0.5)
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
i=i+1;
Expect to get hourly markers on each of the subplots, ended up with hourly markers on just the last plot
I am looping through a bunch of CSV files containing various measurements.
Each file might be from one of 4 different data sources.
In each file, I merge the data into monthly datasets, that I then plot in a 3x4 grid. After this plot has been saved, the loop moves on and does the same to the next file.
This part I got figured out, however I would like to add a visual clue to the plots, as to what data it is. As far as I understand it (and tried it)
plt.subplot(4,3,1)
plt.hist(Jan_Data,facecolor='Red')
plt.ylabel('value count')
plt.title('January')
does work, however this way, I would have to add the facecolor='Red' by hand to every 12 subplots. Looping through the plots wont work for this situation, since I want the ylabel only for the leftmost plots, and xlabels for the bottom row.
Setting facecolor at the beginning in
fig = plt.figure(figsize=(20,15),facecolor='Red')
does not work, since it only changes the background color of the 20 by 15 figure now, which subsequently gets ignored when I save it to a PNG, since it only gets set for screen output.
So is there just a simple setthecolorofallbars='Red' command for plt.hist(… or plt.savefig(… I am missing, or should I just copy n' paste it to all twelve months?
You can use mpl.rc("axes", color_cycle="red") to set the default color cycle for all your axes.
In this little toy example, I use the with mpl.rc_context block to limit the effects of mpl.rc to just the block. This way you don't spoil the default parameters for your whole session.
import matplotlib as mpl
import matplotlib.pylab as plt
import numpy as np
np.random.seed(42)
# create some toy data
n, m = 2, 2
data = []
for i in range(n*m):
data.append(np.random.rand(30))
# and do the plotting
with mpl.rc_context():
mpl.rc("axes", color_cycle="red")
fig, axes = plt.subplots(n, m, figsize=(8,8))
for ax, d in zip(axes.flat, data):
ax.hist(d)
The problem with the x- and y-labels (when you use loops) can be solved by using plt.subplots as you can access every axis seperately.
import matplotlib.pyplot as plt
import numpy.random
# creating figure with 4 plots
fig,ax = plt.subplots(2,2)
# some data
data = numpy.random.randn(4,1000)
# some titles
title = ['Jan','Feb','Mar','April']
xlabel = ['xlabel1','xlabel2']
ylabel = ['ylabel1','ylabel2']
for i in range(ax.size):
a = ax[i/2,i%2]
a.hist(data[i],facecolor='r',bins=50)
a.set_title(title[i])
# write the ylabels on all axis on the left hand side
for j in range(ax.shape[0]):
ax[j,0].set_ylabel(ylabel[j])
# write the xlabels an all axis on the bottom
for j in range(ax.shape[1]):
ax[-1,j].set_xlabel(xlabels[j])
fig.tight_layout()
All features (like titles) which are not constant can be put into arrays and placed at the appropriate axis.
I have a pandas dataframe al_df that contains the population of Alabama from a recent US census. I created a cumulative function that I plot using seaborn, resulting in this chart:
The code that relates to the plotting is this:
figure(num=None, figsize=(20, 10))
plt.title('Cumulative Distribution Function for ALABAMA population')
plt.xlabel('City')
plt.ylabel('Percentage')
#sns.set_style("whitegrid", {"ytick.major.size": "0.1",})
plt.plot(al_df.pop_cum_perc)
My questions are:
1) How can I change the ticks, so the yaxis shows a grid line every 0.1 units instead of the default 0.2 shown?
2) How can I change the x axis to show the actual names of the city, plotted vertically, instead of the "rank" of the city (from the Pandas index)? (there are over 300 names, so they are not going to fit well horizontally).
For question 1) ,add:
plt.yticks(np.arange(0,1+0.1,0.1))
Question 2), I found this in the matplotlib gallery:
ticks_and_spines example code
The matplotlib way would be to use MutlipLocator. The second one is also straight forward
from matplotlib.ticker import *
plt.plot(range(10))
ax=plt.gca()
ax.yaxis.set_major_locator(MultipleLocator(0.5))
plt.xticks(range(10), list('ABCDEFGHIJ'), rotation=90) #would be range(3xx), List_of_city_names, rotation=90
plt.savefig('temp.png')
After some research, and not been able to find a "native" Seaborn solution, I came up with the code below, partially based on #Pablo Reyes and #CT Zhu suggestions, and using matplotlib functions:
from matplotlib.ticker import *
figure(num=None, figsize=(20, 10))
plt.title('Cumulative Distribution Function for ALABAMA population')
plt.xlabel('City')
plt.ylabel('Percentage')
plt.plot(al_df.pop_cum_perc)
#set the tick size of y axis
ax = plt.gca()
ax.yaxis.set_major_locator(MultipleLocator(0.1))
#set the labels of y axis and text orientation
ax.xaxis.set_major_locator(MultipleLocator(10))
ax.set_xticklabels(labels, rotation =90)
The solution introduced a new element "labels" which I had to specify before the plot, as a new Python list created from my Pandas dataframe:
labels = al_df.NAME.values[:]
Producing the following chart:
This requires some tweaking, since specifying a display of every city in the pandas data frame, like this:
ax.xaxis.set_major_locator(MultipleLocator(1))
Produces a chart impossible to read (displaying only x axis):