Pandas - formatting tick labels - python

I have a pandas dataframe with dates in column 0 and times in column 1. I wish to plot data in columns 2,3,4...n as a function of the date and time. How do I go about formatting the tick labels in the code below so that I can display both the Date and time in the plot. Thanks in advance. I'm new to stackoverflow (and python for that matter) so sorry but I don't have enough a reputation that allows me to attach the image that I get from my code below.
df3=pd.read_table('filename.txt',
sep=',',
skiprows=4,
na_values='N\A',
index_col=[0,1]) # date and time are my indices
datedf=df3.ix[['01:07:2013'],['AOT_1640','AOT_870']]
fig, axes = plt.subplots(nrows=2, ncols=1)
for i, c in enumerate(datedf.columns):
print i,c
datedf[c].plot(ax=axes[i], figsize=(12, 10), title=c)
plt.savefig('testing123.png', bbox_inches='tight')

You could combine columns 0 and 1 into a single date & time column, set that to your index and then the pandas .plot attribute will automatically use the index as the x-tick labels. Hard to say how it will work with your data set as I can't see it but the main point is that Pandas uses the index for the x-tick labels unless you tell it not to. Be warned that this doesn't work well with hierarchical indexing (at least in my very limited experience).

Related

Creating a single tidy seaborn plot in a 'for' loop

I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])
To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots

I have a large data set where the rows are a series of coordinates and need to plot specific rows

I have a very large dataset of coordinates that I need plot and specify specific rows instead of just editing the raw excel file.
The data is organized as so
frames xsnout ysnout xMLA yMLA
0 532.732971 503.774200 617.231018 492.803711
1 532.472351 504.891632 617.638550 493.078583
2 532.453552 505.676300 615.956116 493.2839
3 532.356079 505.914642 616.226318 494.179047
4 532.360718 506.818054 615.836548 495.555298
The column "frames" is the specific video frame for each of these coordinates (xsnout,ysnout) (xMLA,yMLA). Below is my code which is able to plot all frames and all data points without specifying the row
import numpy as np
import matplotlib.pyplot as plt
#import data
df = pd.read_excel("E:\\Clark\\Flow Tank\\Respirometry\\Cropped_videos\\F1\\MG\\F1_MG_4Hz_simplified.xlsx")
#different body points
ax1 = df.plot(kind='scatter', x='xsnout', y='ysnout', color='r', label='snout')
ax2 = df.plot(kind='scatter', x='xMLA', y='yMLA', color='g', ax=ax1)
How would I specify just a single row instead of plotting the whole dataset? And is there anyway to connect the coordinates of a single row with a line?
Thank you and any help would be greatly appreciated
How would I specify just a single row instead of plotting the whole dataset?
To do this you can slice your dataframe. There's a large variety of ways of doing this and they'll depend on exactly what you're trying to do. For instance, you can use df.iloc[] to specify which rows you want. This is short for index locator. Note the brackets! If you want to specify your rows by their row index (and same for columns), you have to use .loc[]. For example, the plot with the original data you provided is:
Slicing the dataframe with iloc:
ax1 = df.iloc[2:5, :].plot(kind='scatter', x='xsnout', y='ysnout', color='r', label='snout')
ax2 = df.iloc[2:5, :].plot(kind='scatter', x='xMLA', y='yMLA', color='g', ax=ax1)
Gives you this:
If you specify something like this, you get only a single line:
df.iloc[1:2, :]
And is there anyway to connect the coordinates of a single row with a line?
What exactly do you mean by this? You want to connect the points (xsnout, ysnout) with (xMLA, yMLA)? If that's so, then you can do it with this:
plt.plot([df['xsnout'], df['xMLA']], [df['ysnout'], df['yMLA']])

Plotting the hours of the day instead of time

title probably does not make sense, but I will try to explain.
I am plotting chemical concentrations overtime. The x axis should be hours since midnight local time (i.e., 0,4,8,12,16,20). However, when I do this all of the xticks get smushed together to to left.
xticks = range(0,24,4)
ozoneest["mean"].plot(ax=ax, xticks=xticks,)
Results in:
xticks is only accepting arrays of datetime variables, which have values: 00:00, 04:00, 08:00, 12:00, 16:00, 20:00.
xticks = pd.date_range("2000/01/01", end="2000/01/02", freq="4H").time
ozoneest["mean"].plot(ax=ax, xticks=xticks,)
results in:
This is close to what I want, but I want just the number of the hour
Thanks!
I assume that your data is stored in a pandas dataframe with a DatetimeIndex that has an "Hour" frequency. I cannot exactly reproduce your problem seeing as you have not shared the code generating the ax object. Whether it is created with matplotlib or pandas, the problem is that the x-axis unit is based on the number of time periods (based on the DatetimeIndex frequency in pandas, days in matplotlib) that have passed since 1970-01-01. So the xticks = range(0,24,4) land far to the left relative to your datetimes. You can check the x-axis values of the default xticks with ax.get_xticks().
Here are two ways of formatting the xticks and labels as you want. I suggest that you do not create a new DatetimeIndex for the hours as this makes the code less easy to reuse, use instead the DatetimeIndex of the dataframe as shown in the second solution.
Create sample dataframe
import numpy as np # v 1.20.2
import pandas as pd # v 1.2.5
rng = np.random.default_rng(seed=123) # random number generator
time = pd.date_range(start="2000/01/01", end="2000/01/02", freq="H")[:-1]
mean = rng.normal(size=len(time))
ozoneest = pd.DataFrame(dict(mean=mean), index=time)
ozoneest.head()
Pandas plot with default xticks
ozoneest["mean"].plot()
Simple solution: do not use the DatetimeIndex as the x-axis
xticks = range(0,24,4)
ax = ozoneest["mean"].plot(use_index=False, xticks=xticks)
General solution: select xticks from DatetimeIndex and create labels with strftime
xticks = ozoneest.index[::4]
xticklabels = xticks.strftime("%H")
ax = ozoneest["mean"].plot()
ax.set_xticks(xticks)
ax.set_xticks([], minor=True)
ax.set_xticklabels(xticklabels)
This solution is more general because you do not need to manually adjust the xticks if the range of time of your dataset changes and the tick labels can be easily customized in many ways.
If you want to remove the leading zeros, you can use the following list comprehension:
xticklabels = [tick[1:] if tick[0] == "0" else tick for tick in xticks.strftime("%H")]

Adjusting data in a dictionary and then plotting it

I have
x = collections.Counter(df.f.values.tolist())
if 'nan' in x:
del x['nan']
plt.bar(range(len(x)), x.values(), align='center')
plt.xticks(range(len(x)), list(x.keys()))
plt.show()
My question is, how can I remove the nan's from the dictionary that is created, and how can I change the order of the bar plot to go from 1-5? The first 3 nan's are empty spots in the data (intentional since its from a poll), and the last one is the title of the column. I tried manually changing the range part of plt.bar to be 1-5 but it does not seem to work.
You can use .value_counts on a pandas.Series to simply get how many times each value occurs. This makes it simple to then make a barplot.
By default, value_counts will ignore the NaN values, so that takes care of that, and by using .sort_index() we can guarantee the values are plotted in order. It seems we need to use .to_frame() so that it only plots one color for the column (it chooses one color per row for a Series).
Sample Data
import pandas as pd
import numpy as np
# Get your plot settings
import seaborn as sns
sns.set()
np.random.seed(123)
df = pd.DataFrame({'f': np.random.randint(1,6,100)})
df = df.append(pd.DataFrame({'f': np.repeat(np.NaN,1000)}))
Code
df.f.value_counts().to_frame().sort_index().plot(kind='bar', legend=False)

plot x-axis not displaying correctly for rolling mean

I'm obviously making a very basic mistake in adding a rolling mean plot to my figure.
The basic plot of close prices works fine, but as soon as I add the rolling mean to the plot, the x-axis dates get screwed up and I can't see what it's trying to do.
Here's the code:
import pandas as pd
import matplotlib.pyplot as plot
df = pd.read_csv('historical_price_data.csv')
df['Date'] = pd.to_datetime(df.Date, infer_datetime_format=True)
df.sort_index(inplace=True)
ax = df[['Date', 'Close']].plot(figsize=(14, 7), x='Date', color='black')
rolling_mean = df.Close.rolling(window=7).mean()
plot.plot(rolling_mean, color='blue', label='Rolling Mean')
plot.show()
With this sample data set I am getting this figure:
Given this simplicity of this code, I'm obviously making a very basic mistake, I just can't see what it is.
EDIT: Interesting, although #AndreyPortnoy's suggestion to set the index to Date results in the odd error that Date is not in the index, when I use the built-in's per his suggestion, the figure is no longer a complete mess, but for some reason the x-axis is reversed, and the ticks are no longer dates, but apparently ints (?) even though df.types shows Date is datetime64[ns]
#Sandipan\ Dey: Here's what the dataset looks like. Per code above I'm using pd.to_datetime() to convert to datetime64, and have tried df[::-1] to fix the problem where it is reversed when the 2nd plot (mov_avg) is added to the figure (but not reversed when figure only has the 1 plot.)
The fact that your dates for the moving averages start at 1970 suggests that an integer range index is used. It was generated by default when you read in the csv file. Try inserting
df.set_index('Date', inplace=True)
before
df.sort_index(inplace=True)
Then you can do
ax = df['Close'].plot(figsize=(14, 7), color='black')
rolling_mean = df.Close.rolling(window=7).mean()
plot.plot(rolling_mean, color='blue', label='Rolling Mean')
Note that I'm not passing x explicitly, letting pandas and matplotlib infer it.
You can simplify your code by using the builtin plotting facilities like so:
df['mov_avg'] = df['Close'].rolling(window=7).mean()
df[['Close', 'mov_avg']].plot(figsize=(14, 7))

Categories

Resources