Plot specific markers on plot in Python - python

I have a plot that has on x axis time and on y axis values in percentages. The plot is drawn based on a dataframe output. As I would need to review many plots, would be good to insert some pointers of a different color.
For example, each graph starts the timeline from 08:00 and finishes at 20:00. I would need a red marker at 12:00.
I have tried the following:
graph_df is a df that contains two columns: one with time and one with percentage data.
df = graph_df.loc[graph_df['time'] == "12:00"]
graph_df.plot(x="time", y="percentage", linewidth=1, kind='line')
plt.plot(df['time'], df['percentage'], 'o-', color='red')
plt.show()
plt.savefig(graph_name)
If I am using this section of the code, I am getting the marker at the correct percentage for 12:00, but always at the start of the timeline. In my case, the red dot is marked at 08:00, but with the right percentage associated.
Any idea why it's not correctly marked?
Thank you.

Converting the strings to datetime objects should work.
Replacing your first line with
graph_df["time"] = pd.to_datetime(graph_df["time"]).dt.time
df = graph_df[graph_df["time"].apply(lambda time: time.strftime("%H:%M"))=='12:00']
should do the job

Related

How to plot multiple lines in subplot using python and matplotlib

I've been following the solutions provided by Merge matplotlib subplots with shared x-axis. See solution 35. In each subplot, there is one line, but I would like to have multiple lines in each subplot. For example, the top plot has the price of IBM and a 30 day moving average. The bottom plot has a 180 day and 30 day variance.
To plot multiple lines in my other python programs I used (data).plot(figsize=(10, 7)) where data is a dataframe indexed by date, but in the author's solution he uses line0, = ax0.plot(x, y, color='r') to assign the data series (x,y) to the plot. In the case of multiple lines in solution 35, how does one assign a dataframe with multiple columns to the plot?
You'll need to use (data).plot(ax=ax0) to work with pandas plotting.
For the legend you can use:
handles0, labels0 = ax0.get_legend_handles_labels()
handles1, labels1 = ax1.get_legend_handles_labels()
ax0.legend(handles=handles0 + handles1, labels=labels0 + labels1)

Datetime plotting

Python beginner here :/!
The csv files can be found here (https://www.waterdatafortexas.org/groundwater/well/8739308)
#I'm trying to subset my data and plot them by years or every 6 months but I just cant make it work, this is my code so far
data=pd.read_csv('Water well.csv')
data["datetime"]=pd.to_datetime(data["datetime"])
data["datetime"]
fig, ax = plt.subplots()
ax.plot(data["datetime"], data["water_level(ft below land surface)"])
ax.set_xticklabels(data["datetime"], rotation= 90)
and this is my data and the output. As you can see, it only plots 2021 by time
This is my data of water levels from 2016 to 2021 and the output of the code
data
When you run your script, you get the following warning:
UserWarning: FixedFormatter should only be used together with FixedLocator
ax.set_xticklabels(data["datetime"], rotation= 90)
Your example demonstrates, why they included this warning.
Comment out your line
#ax.set_xticklabels(data["datetime"], rotation= 90)
and you have the following (correct) output:
Your code takes now the nine automatically generated x-axis ticks, removes the correct labels, and labels them instead with the first nine entries of the dataframe. Obviously, these labels are wrong, and this is the reason they provide you with the warning - either let matplotlib do the automatic labeling or do both using FixedFormatter and FixedLocator to ensure that tick positions and labels match.
For more information on Tick locators and formatters consult the matplotlib documentation.
P.S.: You also have to invert the y-axis because the data are in ft below land surface.
The problem is, you have too much data, you have to simplify it.
At first you can try to do something like this:
data["datetime"]=pd.to_datetime(data["datetime"])
date = data["datetime"][0::1000][0:10]
temp = data["water_level(ft below land surface)"][0::1000][0:10]
fig, ax = plt.subplots()
ax.plot(date, temp)
ax.set_xticklabels(date, rotation= 90)
date = data["datetime"][0::1000][0:10]
This line mean: take the index 0, then 1000, then 2000, ...
So you will have an new array. And then with this new array you just take the first 10 indexes.
It's a dirty solution
The best solution in my opinion is to create a new dataset with the average temperature for each day or each week. And after you display the result

How to plot the legend of a set of data with different color label in Mathplotlib

I have a 1:1 plot in which the dot colour are different based on the condition (A-F), which comes from the same data frame column.
df is a data frame with data for every 1 min. df60 is a data frame with data for every 1 hour.
plt.figure()
colors = {'A':'green', 'B':'aqua', 'C':'blue','D':'black','E':'yellow','F':'red'}
x = df['Method1'].loc['2020-01-01 00:00':'2020-01-15 23:59'].resample('h').mean()
y = df['Method2'].loc['2020-01-01 00:00':'2020-01-15 23:59'].resample('h').mean()
plt.scatter(x, y, c=df60['Method1'].loc['2020-01-01 00:00':'2020-01-15 23:59'].map(colors))
plt.show()
I have tried to plot the legend showing that which is A-F. However, since the data comes from the same column, it does not show what I am expecting. Are there any methods which help me to show the legend properly without breaking the column into several columns?
You can define the legend manually by, for instance:
handles=[Line2D([0],[0],label=k,marker="o",markerfacecolor=v,markeredgecolor=v,linestyle="None") for k,v in colors.items()]
plt.legend(handles=handles)
This should produce:
I hope this helps. Not really sure if there is a more elegant solution, though...

Do not display missing values ​matplotlib

I would like to remove the flat lines on my graph by keeping the labels x.
I have this code which gives me a picture
dates = df_stock.loc[start_date:end_date].index.values
x_values = np.array([datetime.datetime.strptime(d, "%Y-%m-%d %H:%M:%S") for d in dates])
fig, ax = plt.subplots(figsize=(15,9))
# y values
y_values = np.array(df_stock.loc[start_date:end_date, 'Bid'])
# plotting
_ = ax.plot(x_values, y_values, label='Bid')
# formatting
formatter = mdates.DateFormatter('%m-%d %H:%M')
ax.xaxis.set_major_formatter(formatter)
The flat lines correspond to data which does not exist I would like to know if it is possible not to display them while keeping the gap of the x labels.
thank you so much
You want to have time on the x-axis and time is equidistant -- independent whether you have data or not.
You now have several options:
don't use time on the x-axis but samples/index
do as in 1. but change the ticks & labels to draw time again (but this time not equidistantly)
make the value-vector equidistant and use NaNs to fill the gaps
Why is this so?
Per default, matplotlib produces a line plot, which connects the points with lines using the order in which they are presented. In contrast to this a scatter plot just plots the individual points, not suggesting any underlying order. You achieve the same result as if you would use a line plot without markers.
In general, you have 3-4 options
use the plot command but only plot markers (add linestyle='')
use the scatter command.
if you use NaNs, plotdoes not know what to plot and plots nothing (but also won't connect non-existing points with lines)
use a loop and plot connected sections as separate lines in the same axes
options 1/2 are the easiest if you want to do almost no changes on your code. Option 3 is the most proper and 4 mimics this result.

Set x axis locator at hour intervals on matplotlib subplot

I am trying to create a figure with four subplots using the Matplotlib object based approach. I am having trouble setting the x-axis to hourly markers on each plot. With my present code the hourly marks are retained only on the last of the four subplots
I have a list which contains four dataframes that were read in from CSV. I used pd.to_datetime to create an index. No problem.
I can loop through the four dataframes and plot my y variable (TS_comp) against time. this works fine and I get date/time on each x-axis. But what I want is to have just hour markers on each of the x axis. When I add code in the loop to set the major locator it ends up that the x-axis labels are wiped on the first three subplots. The two lines of code from the loop below are:
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
I do not understand why this is happening as each time it goes through the loop it should be addressing a different axis object. Note x-axis time ranges are different so not a simple matter of sharing the x-axis across the subplots.
fig, ax = plt.subplots(nrows=2, ncols=2)
i=0
hours = mdates.HourLocator(interval = 1)
for ax in fig.get_axes():
ax.plot(dfs[i].TS_comp,'k-',markersize = 0.5)
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
i=i+1;
Expect to get hourly markers on each of the subplots, ended up with hourly markers on just the last plot

Categories

Resources