Plotting with numpy and pylab - python

I have some data, that I have loaded up into numpy, I do not have a csv or any file loaded up with the range of dates I need, however I know what this array length is.
Currently I am just doing this to print up a simple graph:
t = numpy.arange(0.0, len(data), 1)
pylab.plot(t, data)
Would it be possible to replace t here so that I can specify a start and end date and it would print the actual date? Say, I have 365 days in my dataset, it would give the plot actually dates such as DD/MM/YYYY , 1/1/1999.1/2/1999.....12/31/1999?

You might want to take a look at plot_date()
and the matliplot dates api.

Related

pandas/matplotlib graph on frequency of appearance

I am a pandas newbie and I want to make a graph from a CSV I have. On this csv, there's some date written to it, and I want to make a graph of how frequent those date appears.
This is how it looks :
2022-01-12
2022-01-12
2022-01-12
2022-01-13
2022-01-13
2022-01-14
Here, we can see that I have three records on the 12th of january, 2 records the 13th and only one records the 14th. So we should see a decrease on the graph.
So, I tried converting my csv like this :
date,records
2022-01-12,3
2022-01-13,2
2022-01-14,1
And then make a graph with the date as the x axis and the records amount as the y axis.
But is there a way panda (or matplotlib I never understand which one to use) can make a graph based on the frequency of appearance, so that I don't have to convert the csv before ?
There is a function of PANDAS which allows you to count the number of values.
First off, you'd need to read your csv file into a dataframe. Do this by using:
import pandas as pd
df = pd.read_csv("~csv file name~")
Using the unique() function in the pandas library, you can display all of the unique values. The syntax should look like:
uniqueVals = df("~column name~").unique()
That should return a list of all the unique values. Then what you'll do is use the function value_counts() with whatever value you are trying to count in square brackets after the normal brackets. The syntax should look something like this:
totalOfVals = []
for date in uniqueVals:
numDate = df[date].valuecounts("~Whatever date you're looking for~")
totalOfVals.append(numDate)
Then you can use the two arrays you have for the unique dates and the amount of dates there are to then use matplotlib to create a graph.
You'll want to use the syntax:
import matplotlib.pyplot as mpl
mpl.plot(uniqueVals, totalOfVals, color = "~whatever colour you want the line to be~", marker = "~whatever you want the marker to look like~")
mpl.xlabel('Date')
mpl.ylabel('Number of occurrences')
mpl.title('Number of occurrences of dates')
mpl.grid(True)
mpl.show()
And that should display a graph with all the dates and number of occurrences with a grid behind it. Of course if you don't want the grid just either set mpl.grid to False or just get rid of it.

How do I convert unusual time string into date time

I measured the seeing index and I need to plot it as a function of time, but the time I received from the measurement is a string with 02-09-2022_time_11-53-51,045 format. How can I convert it into something Python could read and I could use in my plot?
Using pandas I extracted time and seeing_index columns from the txt file received by the measurement. Python correctly plotted seeing index values on Y axes, but besides plotting time values on the X axis, it just added a number to each row and plotted index against row number. What can I do so it was index against time?
You may try this:
df.time = pd.to_datetime(df.time, format='%d-%m-%Y_time_%H-%M-%S,%f')

how to combine 4D xarray data

I have a 4D xarray which contains time, lev, lat, and lon. The data is for specific day so that the length of time is 1. My goal is to use 4D xarray with same attributess but include a month data so that the time length will be 30.
I try to google it but cannot find useful information. I appreciate it if anyone can provide some insights.
If you have multiple points in a time series, you can use xr.DataArray.resample to change the frequency of a datetime dimension. Once you have resampled, you'll get a DataArrayResample object, to which you can apply any of the methods listed in the DataArrayResample API docs.
If you only have a single point in time, you can't resample to a higher frequency. Your best bet is probably to simply select and drop the time dim altogether, then use expand_dims to expand the dimensions again to include the full time dim you want. Just be careful because this overwrites the time dimension's values with whatever you want, regardless of what was in there before:
target_dates = pd.date_range('2018-08-01', '2018-08-30', freq='D')
daily = (
da
.isel(time=0, drop=True)
.expand_dims(time=target_dates)
)

Sanitizing Time Series whose plots shows erratic graph lines

I want to plot timelines, my dates are formatted as day/month/year.
When building the index, I take care of that:
# format Date
test['DATA'] = pd.to_datetime(test['DATA'], format='%d/%m/%Y')
test.set_index('DATA', inplace=True)
and with a double check I see months and days are correctly interpreted:
#the number of month reflect the month, not the day : correctly imported!
test['Year'] = test.index.year
test['Month'] = test.index.month
test['Weekday Name'] = test.index.weekday_name
However, when I plot, I see datapoints get connected erratically (although their distribution seems to be correct, since I expect a seasonality):
# Start and end of the date range to extract
start, end = '2018-01', '2018-04'
# Plot daily, weekly resampled, and 7-day rolling mean time series together
fig, ax = plt.subplots()
ax.plot(test.loc['2018', 'TMIN °C'],
marker='.', linestyle='-', linewidth=0.5, label='Daily')
I suspect it may have to do with misinterpreted dates, or that dates are not put in the right sequence, but could not find a way to verify where an error may be.
Could you help validating how to import correctly my timeseries ?
Oh, it was super simple. I assumed datetime was automatically sorted, instead one must sort :
test.loc['2018-01':'2018-03'].sort_index().index #sorted
test.loc['2018-01':'2018-03'].index #not sorted
This question may be delated or marked as duplicate, I let it for moderators:
Pandas - Sorting a dataframe by using datetimeindex

Looking to plot 16,000 data points vs. Time using MatPlotLib

I am trying to plot a drone's altitude vs time (Time on the X-axis and altitudes on the Y-axis). I converted my list of timestamps into a MatPlotLib-readable format using dates = matplotlib.dates.date2num(timestamps). The length of the altitudes list and the converted timestamps list is 16587 exactly, so there is no mismatch there. The graph came out absolutely horrendous and I would like to know how to make this readable with so much data. My full code is
timestamps = []
for stamp in times: #convert list of timestamp Strings to Python timestamp objects
stamp = date + " " + stamp
stamp = stamp.replace('.', ':') # We want the milliseconds to be behind a colon so it can be easily formatted to DateTime
stamp = datetime.strptime(stamp, '%Y-%m-%d %H:%M:%S:%f')
timestamps.append(stamp)
dates = matplotlib.dates.date2num(timestamps)
for alt in altitudes:
alt = round(float(alt), 2)
plt.plot_date(dates, altitudes)
plt.show()
The graph is indeed unreadable, even if it is not clear what's your expectation.
When plotting a huge number of points, I guess is better to specify also the alpha parameter to add some transparency and "see through" clouds of overlapping points.
Then you can specify your x and yticks (maybe also with rotation parameter) to show a smaller portion of them and add plt.grid(True)
These are just basic suggestions. Try to be more specific in "make this readable".

Categories

Resources