Matplotlib plotting data that doesnt exist - python

I am trying to plot three lines on one figure. I have data for three years for three sites and i am simply trying to plot them with the same x axis and same y axis. The first two lines span all three years of data, while the third dataset is usually more sparse. Using the object-oriented axes matplotlib format, when i try to plot my third set of data, I get points at the end of the graph that are out of the range of my third set of data. my third dataset is structured as tuples of dates and values such as:
data=
[('2019-07-15', 30.6),
('2019-07-16', 20.88),
('2019-07-17', 16.94),
('2019-07-18', 11.99),
('2019-07-19', 13.76),
('2019-07-20', 16.97),
('2019-07-21', 19.9),
('2019-07-22', 25.56),
('2019-07-23', 18.59),
...
('2020-08-11', 8.33),
('2020-08-12', 10.06),
('2020-08-13', 12.21),
('2020-08-15', 6.94),
('2020-08-16', 5.51),
('2020-08-17', 6.98),
('2020-08-18', 6.17)]
where the data ends in August 2020, yet the graph includes points at the end of 2020. This is happening with all my sites, as the first two datasets stay constant knowndf['DATE'] and knowndf['Value'] below.
Here is the problematic graph.
And here is what I have for the plotting:
fig, ax=plt.subplots(1,1,figsize=(15,12))
fig.tight_layout(pad=6)
ax.plot(knowndf['DATE'], knowndf['Value1'],'b',alpha=0.7)
ax.plot(knowndf['DATE'], knowndf['Value2'],color='red',alpha=0.7)
ax.plot(*zip(*data), 'g*', markersize=8) #when i plot this set of data i get nonexistent points
ax.tick_params(axis='x', rotation=45) #rotating for aesthetic
ax.set_xticks(ax.get_xticks()[::30]) #only want every 30th tick instead of every daily tick
I've tried ax.twinx() and that gives me two y axis that doesn't help me since i want to use the same x-axis and y-axis for all three sites. I've tried not using the axes approach, but there are things that come with axes that i need to plot with. Please please help!

Related

Datetime plotting

Python beginner here :/!
The csv files can be found here (https://www.waterdatafortexas.org/groundwater/well/8739308)
#I'm trying to subset my data and plot them by years or every 6 months but I just cant make it work, this is my code so far
data=pd.read_csv('Water well.csv')
data["datetime"]=pd.to_datetime(data["datetime"])
data["datetime"]
fig, ax = plt.subplots()
ax.plot(data["datetime"], data["water_level(ft below land surface)"])
ax.set_xticklabels(data["datetime"], rotation= 90)
and this is my data and the output. As you can see, it only plots 2021 by time
This is my data of water levels from 2016 to 2021 and the output of the code
data
When you run your script, you get the following warning:
UserWarning: FixedFormatter should only be used together with FixedLocator
ax.set_xticklabels(data["datetime"], rotation= 90)
Your example demonstrates, why they included this warning.
Comment out your line
#ax.set_xticklabels(data["datetime"], rotation= 90)
and you have the following (correct) output:
Your code takes now the nine automatically generated x-axis ticks, removes the correct labels, and labels them instead with the first nine entries of the dataframe. Obviously, these labels are wrong, and this is the reason they provide you with the warning - either let matplotlib do the automatic labeling or do both using FixedFormatter and FixedLocator to ensure that tick positions and labels match.
For more information on Tick locators and formatters consult the matplotlib documentation.
P.S.: You also have to invert the y-axis because the data are in ft below land surface.
The problem is, you have too much data, you have to simplify it.
At first you can try to do something like this:
data["datetime"]=pd.to_datetime(data["datetime"])
date = data["datetime"][0::1000][0:10]
temp = data["water_level(ft below land surface)"][0::1000][0:10]
fig, ax = plt.subplots()
ax.plot(date, temp)
ax.set_xticklabels(date, rotation= 90)
date = data["datetime"][0::1000][0:10]
This line mean: take the index 0, then 1000, then 2000, ...
So you will have an new array. And then with this new array you just take the first 10 indexes.
It's a dirty solution
The best solution in my opinion is to create a new dataset with the average temperature for each day or each week. And after you display the result

Do not display missing values ​matplotlib

I would like to remove the flat lines on my graph by keeping the labels x.
I have this code which gives me a picture
dates = df_stock.loc[start_date:end_date].index.values
x_values = np.array([datetime.datetime.strptime(d, "%Y-%m-%d %H:%M:%S") for d in dates])
fig, ax = plt.subplots(figsize=(15,9))
# y values
y_values = np.array(df_stock.loc[start_date:end_date, 'Bid'])
# plotting
_ = ax.plot(x_values, y_values, label='Bid')
# formatting
formatter = mdates.DateFormatter('%m-%d %H:%M')
ax.xaxis.set_major_formatter(formatter)
The flat lines correspond to data which does not exist I would like to know if it is possible not to display them while keeping the gap of the x labels.
thank you so much
You want to have time on the x-axis and time is equidistant -- independent whether you have data or not.
You now have several options:
don't use time on the x-axis but samples/index
do as in 1. but change the ticks & labels to draw time again (but this time not equidistantly)
make the value-vector equidistant and use NaNs to fill the gaps
Why is this so?
Per default, matplotlib produces a line plot, which connects the points with lines using the order in which they are presented. In contrast to this a scatter plot just plots the individual points, not suggesting any underlying order. You achieve the same result as if you would use a line plot without markers.
In general, you have 3-4 options
use the plot command but only plot markers (add linestyle='')
use the scatter command.
if you use NaNs, plotdoes not know what to plot and plots nothing (but also won't connect non-existing points with lines)
use a loop and plot connected sections as separate lines in the same axes
options 1/2 are the easiest if you want to do almost no changes on your code. Option 3 is the most proper and 4 mimics this result.

Set x axis locator at hour intervals on matplotlib subplot

I am trying to create a figure with four subplots using the Matplotlib object based approach. I am having trouble setting the x-axis to hourly markers on each plot. With my present code the hourly marks are retained only on the last of the four subplots
I have a list which contains four dataframes that were read in from CSV. I used pd.to_datetime to create an index. No problem.
I can loop through the four dataframes and plot my y variable (TS_comp) against time. this works fine and I get date/time on each x-axis. But what I want is to have just hour markers on each of the x axis. When I add code in the loop to set the major locator it ends up that the x-axis labels are wiped on the first three subplots. The two lines of code from the loop below are:
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
I do not understand why this is happening as each time it goes through the loop it should be addressing a different axis object. Note x-axis time ranges are different so not a simple matter of sharing the x-axis across the subplots.
fig, ax = plt.subplots(nrows=2, ncols=2)
i=0
hours = mdates.HourLocator(interval = 1)
for ax in fig.get_axes():
ax.plot(dfs[i].TS_comp,'k-',markersize = 0.5)
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
i=i+1;
Expect to get hourly markers on each of the subplots, ended up with hourly markers on just the last plot

Change X axis labeling using Pandas/matplotlib in Python

I am plotting some columns of a csv using Pandas/Matplotlib. The index column is the time in seconds (which has very high number).
For example:
401287629.8
401287630.8
401287631.7
401287632.8
401287633.8
401287634.8
I need this to be printed as my xticklabel when i plot. But it is changing the number format as shown below:
plt.figure()
ax = dfPlot.plot()
legend = ax.legend(loc='center left', bbox_to_anchor=(1,0.5))
labels = ax.get_xticklabels()
for label in labels:
label.set_rotation(45)
label.set_fontsize(10)
I couldn't find a way for the xticklabel to print the exact value rather than shortened version of it.
This is essentially the same problem as How to remove relative shift in matplotlib axis
The solution is to tell the formatter to not use an offset
ax.get_xaxis().get_major_formatter().set_useOffset(False)
Also related:
useOffset=False in config file?
https://github.com/matplotlib/matplotlib/issues/2400
https://github.com/matplotlib/matplotlib/pull/2401
If it's not rude of me to point out, you're asking for a great deal of precision from a single chart. Your sample data shows a six-second difference over two times that are both over twelve and a half-years long.
You have to cut your cloth to your measure on this one. If you want to keep the years, you can't keep the seconds. If you want to keep the seconds, you can't have the years.

Xaxis Labels not matching Data Points - Pandas/Matplotlib

I have a TimeSeries in Pandas that I want to plot. I have 336 records in the TimeSeries. I only want to show the date/time (index of the TimeSeries) on the x-axis once per every 20 or so data points.
Here is how I am trying to do this:
stats.plot()
ax.set_xticklabels(stats.index, rotation=45 )
ax.xaxis.set_major_locator(MultipleLocator(20))
ax.xaxis.set_minor_locator(NullLocator())
ax.yaxis.set_major_locator(MultipleLocator(.075))
draw()
My x-axis show the correct number of labels (18), but these are the first 18 in the series, they are not correctly corresponding to the datapoints in the plot.
The problem is you are using set_xticklabels which sets the value of the tick labels independent of the data. The ticks are labeled sequentially from the list you pass in.
From this I can't really tell what you are trying to do, but the behavior you are seeing is the 'correct' behavior for the library (it's doing exactly what you told it to, but that isn't what you want it to do).

Categories

Resources