Changing y axis limit in a loop - python

I am working on dataframe. I have data for a week ,since its huge I am dividing the dataframe for each day. I am plotting a parameter temperature with time. In most of all the days temperature will be within 20 to 30 ,some days it will exceed above 30. I need to write a code in such a way that, in a day, when the temperature is within 20 and 30 ,my plot Y axis limit should be (20,30), if it is out of those range I need to have a limit (0,50). My current code looks like this
listofDF = [df_i for (_, df_i) in df.groupby(pd.Grouper(key="filename", freq="1D"))] #for dividing daraframe
for df in ((listofDF)):
if len(df)!= 0:
y = df[' Temp']
x = df['time']
plot(x,y)
plt.ylim(20,30)
Thanks for your help in advance. I know someone may think why so much requirement, the reason is, I am analysing through lots of data, I should have a standard scale, so I can just keep on looking instead of looking for Y axis and see the value

You could use:
plt.ylim(y.min() - 5, y.max() + 5)
This will scale between the min and max temperature values every plot (+- 5 to have some empty space above and below.

Related

Averaging a certain number of values from a data and plotting it wrt another set of data

I have ran a simulation and got a set of data. It consists of three rows. Row 1 contains time row 2 contains energy values and row 3 a specific wavelength.
Now for every wavelength value there are 10 energy values and likewise for each energy value there is a time.
Now suppose I have 10 wavelength for which I have 10*10 =100 energy values. So what I want to do is I want to write a code which first averages the energy value for a specific wavelength and then plots the value of average energy vs wavelength.
I am stuck for almost a week any help would be much appreciated.
I am not exactly sure if this is what you are looking for, if not, give an example of your data.
# Dummy data
energy = list(range(0,100))
wavelength = list(range(0,10))
# Compute how many energy values for each wavelength
k = int(len(energy)/len(wavelength))
# Compute average energy for each block of k values
energy_avg = [sum(energy[i:i+k])/k for i in range(0, len(energy), k)]
# Plot
import matplotlib.pyplot as plt
plt.plot(wavelength, energy_avg , '.')
plt.xlabel('wavelength')
plt.ylabel('average energy')
plt.show()

How to loop to create plots for each hour of day of geospatial data?

I am trying to create an animated map (by generating multiple plots) of road traffic throughout a week, where the thickness of roads is represented by the volume of traffic at a specific time of day.
This is sort of what I'm looking for (but for each hour of each day):
The data has a structure that looks like this:
HMGNS_LNK_ID geometry DOW Hour Normalised Value
2 MULTILINESTRING ((251... 1 0 0.233623
2 MULTILINESTRING ((251... 1 1 0.136391
2 MULTILINESTRING ((251... 1 2 0.108916
DOW stands for 'day of the week' (1 = Monday) and so for every Hour of each of the 7 days I want to plot the map with roads' thickness by the value Normalised Value.
I encounter a problem that when trying to loop with this code:
for dow in df['DOW']:
fig, ax = plt.subplots(1)
day_df = df[df['DOW']==dow]
for hour in day_df['Hour']:
day_hour_df = day_df[day_df['Hour']==hour]
day_hour_df.plot(ax=ax, linewidth=day_hour_df['Normalised Value'])
plt.savefig("day{}_hour{}.png".format(dow, hour), dpi = 200, facecolor='#333333')
The problem is that the figures are saved only for day 1, so until day1_hour_23 and after that, it comes back to day1_hour0 and overwrites the plot with something new. I can't figure out why it stops at DOW 2.
I'm not even sure if the data structure is correct. I would greatly appreciate any help with that. Please find the full code in my repo.
Cheers!
The problem is with the way you loop and subset df. Let's go through the loop in detail. First time in the outer loop, dow will be 1 and day_df = df[df['DOW']==dow] will select all rows with 1 in the column DOW. Now the inner loop goes through the selected rows and creates day1_hour0 to day1_hour23. Inner loop done, great.
Now we come second time into the outer loop and dow is again 1. day_df = df[df['DOW']==dow] will select all rows with 1 in the column DOW, i.e., the same set of rows that it used the previous time through the outer loop. So, it (re)writes day1_hour0 to day1_hour23 again.
I would suggest using (geo)pandas.groupby:
for dow, day_gdf in df.groupby("DOW"):
for hour, day_hour_gdf in day_gdf.groupby("Hour"):
fig, ax = plt.subplots(1)
print(f"Doing dow={dow}, hour={hour}")
day_hour_gdf.plot(ax=ax, linewidth=day_hour_gdf['Normalised Value'])
plt.savefig("day{}_hour{}.png".format(dow, hour), dpi = 200, facecolor='#333333')
plt.close()
Bonus Tip: Checkout pandas-bokeh if you want to generate interactive graphs with background tiles that can be saved as HTMLs or embedded in jupyter notebooks. The learning curve can be a bit steep with bokeh, but you can produce really nice interactive plots.
Cheers!

Cleaning up x-axis because there are too many datapoints

I have a data set that is like this
Date Time Cash
1/1/20 12:00pm 2
1/1/20 12:02pm 15
1/1/20 12:03pm 20
1/1/20 15:06pm 30
2/1/20 11:28am 5
. .
. .
. .
3/1/20 15:00pm 3
I basically grouped all the data by date along the y-axis and time along the x-axis, and plotted a facetgrid as shown below:
df_new= df[:300]
g = sns.FacetGrid(df_new.groupby(['Date','Time']).Cash.sum().reset_index(), col="Date", col_wrap=3)
g = g.map(plt.plot, "Time", "Cash", marker=".")
g.set_xticklabels(rotation=45)
What I got back was hideous(as shown below). So I'm wondering is there anyway to tidy up the x-axis? Maybe like having 5-10 time data labels so the time can be visible, or maybe expanding the image?
Edit: I am plotting using seaborn. I will want it to look something like that below where the x-axis has only a couple of labels:
Thanks for your inputs.
Have you tried to use moving average instead of the actual data? You can count the moving average of any data with the following function:
def moving_average(a, n=10) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
Set n to average you need, you can play around with that value. a is in your case variable Cash represented as numpy array.
After that, set column Cash to the moving average count from real values and plot it. The plot curve will be smoother.
P.S. the plot of suicides you have added in edit is really unreadable, as the range for y axis is way higher than needed. In practice, try to avoid such plots.
Edit
I did not notice how you aggregate the data at first, you might want to work with date and time merged. I do not know where you load data from, in case you load it from csv you can add this to read_csv method: parse_dates=[['Date', 'Time']]. In case not, you can play around with the dataframe:
df['datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
you create a new column with datetime and can work simply with that.

how to plot two time series that have different sample rates on the same graph with matplotlib

I have two sets of data that I would like to plot on the same graph. Both sets of data have 200 seconds worth of data. DatasetA (BLUE) is sampled at 25 Hz and DatasetB (Red) is sampled at 40Hz. Hence DatasetA has 25*200 = 5000 (time,value) samples... and DatasetB has 40*200 = 8000 (time,value) samples.
datasets with different sample rates
As you can see above, I have managed to plot these in matplotlib using the 'plot_date' function. As far as I can tell, the 'plot' function will not work because the number of (x,y) pairs are different in each sample. The issue I have is the format of the xaxis. I would like the time to be a duration in seconds, rather than an exact time of the format hh:mm:ss. Currently, the seconds value resets back to zero when it hits each minute (as seen in the zoomed out image below).
zoomed out full time scale
How can I make the plot show the time increasing from 0-200 seconds rather than showing hours:min:sec ?
Is there a matplotlib.dates.DateFormatter that can do this (I have tried, but can't figure it out...)? Or do I somehow need to manipulate the datetime x-axis values to be a duration, rather than an exact time? (how to do this)?
FYI:
The code below is how I am converting the original csv list of float values (in seconds) into datetime objects, and again into matplotlib date-time objects -- to be used with the axes.plot_date() function.
from matplotlib import dates
import datetime
## arbitrary start date... we're dealing with milliseconds here.. so only showing time on the graph.
base_datetime = datetime.datetime(2018,1,1)
csvDateTime = map(lambda x: base_datetime + datetime.timedelta(seconds=x), csvTime)
csvMatTime = map(lambda x: dates.date2num(x), csvDateTime)
Thanks for your help/suggestions!
Well, thanks to ImportanceOfBeingErnst for pointing out that I was vastly over-complicating things...
It turns out that I really only need the ax.plot(x,y) function rather than the ax.plot_date(mdatetime, y) function. Plot can actually plot varied lengths of data as long as each individual trace has the same number of x and y values. Since the data is all given in seconds I can easily plot using 0 as my "reference time".
For anyone else struggling with plotting duration rather than exact times, you can simply manipulate the "time" (x) data by using python's map() function, or better yet a list comprehension to "time shift" the data or convert to a single unit of time (e.g. simply turn minutes into seconds by dividing by 60).
"Time Shifting" might look like:
# build some sample 25 Hz time data
time = range(0,1000,1)
time = [x*.04 for x in time]
# "time shift it by 5 seconds, since this data is recorded 5 seconds after the other signal
time = [x+5 for x in time]
Here is my plotting code for any other matplotlib beginners like me :) (this will not run, since I have not converted my variables to generic data... but nevertheless it is a simple example of using matplotlib.)
fig,ax = plt.subplots()
ax.grid()
ax.set_title(plotTitle)
ax.set_xlabel("time (s)")
ax.set_ylabel("value")
# begin looping over the different sets of data.
tup = 0
while (tup < len(alldata)):
outTime = alldata[tup][1].get("time")
# each signal is time shifted 5 seconds later.
# in addition each signal has different sampling frequency,
# so len(outTime) is different for almost every signal.
outTime = [x +(5*tup) for x in outTime]
for key in alldata[tup][1]:
if(key not in channelSelection):
## if we dont want to plot that data then skip it.
continue
else:
data = alldata[tup][1].get(key)
## using list comprehension to scale y values.
data = [100*x for x in data]
ax.plot(outTime,data,linestyle='solid', linewidth='1', marker='')
tup+=1
plt.show()

Time axis python

I have 2 questions, firstable, Im plotting a time series data with matplotlib, the time data is every minute, and its value on Min/60/24 (example minute 1 = .00069) Could anyone give me a tip for plotting easily on H:M format on x axis. How could I set a specific range on this axis. Thanks
Normally, you'll just plot it, and then set the x-axis labels to be the format you want. So if $t$ is your time variable that goes from 0 = 0:00 to 0.5 = 12:00 and 1 = 23:59 (or whatever), then set up your x-axis labels with format
"%d:%d"%(int(t*24),t*24*60-int(t*24)*60)
Or
"%d:%d"%(int(t*24)%24,(t*24*60)%60)

Categories

Resources