I have to plot data that have DateTime and GPS coordinate information. As an example:
2013-03-01 19:55:00 45.4565 65.6783
2013-03-01 01:40:00 46.3121 -12.3456
2013-03-02 11:25:00 23.1234 -85.3456
2013-03-05 05:00:00 15.4565 32.1234
......
This is just a random example matching the type of data I have. The whole data set is for a week and the timestamps are rounded to the nearest 5 minutes.
What I would like to do in python is to visualize this data for location patterns for each 24 hour period for the entire week. So, x-axis would have time of day. I am struggling to thing about how the location would be shown. May be a 3D plot is needed.
It would show each day in a different color and also one more for an average of the entire week (i.e. the whole week averaged into a 24 hour period).
Any idea how one would go about visualizing this using python and matplotlib?
Note that I cannot plot the locations of an actual map for now. But just as (x,y) co-ordinates.
Try using Folium's HeatMap with Time.
First you define a function to generate the map.
def generateBaseMap(default_location, default_zoom_start=12):
base_map = folium.Map(location=default_location, control_scale=True, zoom_start=default_zoom_start)
return base_map
Then you add latitude and longitudes in a list which is ordered by date.
date_list = []
for date in df.date.sort_values().unique():
date_list.append(df.loc[df.date == date, ['lat', 'lng']].values.tolist())
Then you plot the Heat Map with Time
from folium.plugins import HeatMapWithTime
base_map = generateBaseMap(default_zoom_start=11, default_location = [lat, longitude])
HeatMapWithTime(date_list, radius=5, gradient={0.2: 'blue', 0.4: 'lime', 0.6: 'orange', 1: 'red'}, min_opacity=0.5, max_opacity=0.8, use_local_extrema=True).add_to(base_map)
base_map
Hope this helps.
Related
I am a type 1 diabetic and wear a continuous glucose monitor that measures my blood glucose levels every 5 minutes. The company that makes the CGM generates a report with a graph that looks like the figure at the bottom of this post. My goal is to learn how to recreate this graph for myself in a Jupyter notebook.
The data that I have, for example, looks like this:
Timestamp
Glucose Value (mg/dL)
2021-07-11 00:11:25
116.0
2021-07-11 00:16:25
118.0
2021-07-11 00:21:25
121.0
2021-07-11 00:26:24
123.0
2021-07-11 00:31:25
124.0
The graph is using data from a 30 day period and summarizing the distribution of values at each point in time. Is there a name for this type of graph, and how can I create it myself using Pandas/matplotlib/seaborn?
So far, I have tried creating a graph with the IQR split by day which is rather easy - using ploty:
glucose['Day'] = glucose['Timestamp'].dt.day_name()
fig = px.box(glucose, x="Day", y="Glucose Value (mg/dL)",
points="all", color='Day')
fig.show()
But now I am unsure how to easily calculate the IQR for specific time periods and average them.
Thank you so much for your help!
Answering my own question with help from the links that Joe provided in the comments:
I was able to group the dataframe by hour, then use .quantile to generate a new dataframe with rows as hours and columns as 10%, 25%, 50%, 75%, and 90%. From there it was a matter of simple formatting with matplotlib to copy the original one.
grouped = df.groupby([df['Timestamp'].dt.hour])
i = grouped['bgl'].quantile([.1, .25, .5, .75, .9]).unstack()
Thanks a lot Joe!
I am trying to create an animated map (by generating multiple plots) of road traffic throughout a week, where the thickness of roads is represented by the volume of traffic at a specific time of day.
This is sort of what I'm looking for (but for each hour of each day):
The data has a structure that looks like this:
HMGNS_LNK_ID geometry DOW Hour Normalised Value
2 MULTILINESTRING ((251... 1 0 0.233623
2 MULTILINESTRING ((251... 1 1 0.136391
2 MULTILINESTRING ((251... 1 2 0.108916
DOW stands for 'day of the week' (1 = Monday) and so for every Hour of each of the 7 days I want to plot the map with roads' thickness by the value Normalised Value.
I encounter a problem that when trying to loop with this code:
for dow in df['DOW']:
fig, ax = plt.subplots(1)
day_df = df[df['DOW']==dow]
for hour in day_df['Hour']:
day_hour_df = day_df[day_df['Hour']==hour]
day_hour_df.plot(ax=ax, linewidth=day_hour_df['Normalised Value'])
plt.savefig("day{}_hour{}.png".format(dow, hour), dpi = 200, facecolor='#333333')
The problem is that the figures are saved only for day 1, so until day1_hour_23 and after that, it comes back to day1_hour0 and overwrites the plot with something new. I can't figure out why it stops at DOW 2.
I'm not even sure if the data structure is correct. I would greatly appreciate any help with that. Please find the full code in my repo.
Cheers!
The problem is with the way you loop and subset df. Let's go through the loop in detail. First time in the outer loop, dow will be 1 and day_df = df[df['DOW']==dow] will select all rows with 1 in the column DOW. Now the inner loop goes through the selected rows and creates day1_hour0 to day1_hour23. Inner loop done, great.
Now we come second time into the outer loop and dow is again 1. day_df = df[df['DOW']==dow] will select all rows with 1 in the column DOW, i.e., the same set of rows that it used the previous time through the outer loop. So, it (re)writes day1_hour0 to day1_hour23 again.
I would suggest using (geo)pandas.groupby:
for dow, day_gdf in df.groupby("DOW"):
for hour, day_hour_gdf in day_gdf.groupby("Hour"):
fig, ax = plt.subplots(1)
print(f"Doing dow={dow}, hour={hour}")
day_hour_gdf.plot(ax=ax, linewidth=day_hour_gdf['Normalised Value'])
plt.savefig("day{}_hour{}.png".format(dow, hour), dpi = 200, facecolor='#333333')
plt.close()
Bonus Tip: Checkout pandas-bokeh if you want to generate interactive graphs with background tiles that can be saved as HTMLs or embedded in jupyter notebooks. The learning curve can be a bit steep with bokeh, but you can produce really nice interactive plots.
Cheers!
I have a data set that is like this
Date Time Cash
1/1/20 12:00pm 2
1/1/20 12:02pm 15
1/1/20 12:03pm 20
1/1/20 15:06pm 30
2/1/20 11:28am 5
. .
. .
. .
3/1/20 15:00pm 3
I basically grouped all the data by date along the y-axis and time along the x-axis, and plotted a facetgrid as shown below:
df_new= df[:300]
g = sns.FacetGrid(df_new.groupby(['Date','Time']).Cash.sum().reset_index(), col="Date", col_wrap=3)
g = g.map(plt.plot, "Time", "Cash", marker=".")
g.set_xticklabels(rotation=45)
What I got back was hideous(as shown below). So I'm wondering is there anyway to tidy up the x-axis? Maybe like having 5-10 time data labels so the time can be visible, or maybe expanding the image?
Edit: I am plotting using seaborn. I will want it to look something like that below where the x-axis has only a couple of labels:
Thanks for your inputs.
Have you tried to use moving average instead of the actual data? You can count the moving average of any data with the following function:
def moving_average(a, n=10) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
Set n to average you need, you can play around with that value. a is in your case variable Cash represented as numpy array.
After that, set column Cash to the moving average count from real values and plot it. The plot curve will be smoother.
P.S. the plot of suicides you have added in edit is really unreadable, as the range for y axis is way higher than needed. In practice, try to avoid such plots.
Edit
I did not notice how you aggregate the data at first, you might want to work with date and time merged. I do not know where you load data from, in case you load it from csv you can add this to read_csv method: parse_dates=[['Date', 'Time']]. In case not, you can play around with the dataframe:
df['datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
you create a new column with datetime and can work simply with that.
I have two sets of data that I would like to plot on the same graph. Both sets of data have 200 seconds worth of data. DatasetA (BLUE) is sampled at 25 Hz and DatasetB (Red) is sampled at 40Hz. Hence DatasetA has 25*200 = 5000 (time,value) samples... and DatasetB has 40*200 = 8000 (time,value) samples.
datasets with different sample rates
As you can see above, I have managed to plot these in matplotlib using the 'plot_date' function. As far as I can tell, the 'plot' function will not work because the number of (x,y) pairs are different in each sample. The issue I have is the format of the xaxis. I would like the time to be a duration in seconds, rather than an exact time of the format hh:mm:ss. Currently, the seconds value resets back to zero when it hits each minute (as seen in the zoomed out image below).
zoomed out full time scale
How can I make the plot show the time increasing from 0-200 seconds rather than showing hours:min:sec ?
Is there a matplotlib.dates.DateFormatter that can do this (I have tried, but can't figure it out...)? Or do I somehow need to manipulate the datetime x-axis values to be a duration, rather than an exact time? (how to do this)?
FYI:
The code below is how I am converting the original csv list of float values (in seconds) into datetime objects, and again into matplotlib date-time objects -- to be used with the axes.plot_date() function.
from matplotlib import dates
import datetime
## arbitrary start date... we're dealing with milliseconds here.. so only showing time on the graph.
base_datetime = datetime.datetime(2018,1,1)
csvDateTime = map(lambda x: base_datetime + datetime.timedelta(seconds=x), csvTime)
csvMatTime = map(lambda x: dates.date2num(x), csvDateTime)
Thanks for your help/suggestions!
Well, thanks to ImportanceOfBeingErnst for pointing out that I was vastly over-complicating things...
It turns out that I really only need the ax.plot(x,y) function rather than the ax.plot_date(mdatetime, y) function. Plot can actually plot varied lengths of data as long as each individual trace has the same number of x and y values. Since the data is all given in seconds I can easily plot using 0 as my "reference time".
For anyone else struggling with plotting duration rather than exact times, you can simply manipulate the "time" (x) data by using python's map() function, or better yet a list comprehension to "time shift" the data or convert to a single unit of time (e.g. simply turn minutes into seconds by dividing by 60).
"Time Shifting" might look like:
# build some sample 25 Hz time data
time = range(0,1000,1)
time = [x*.04 for x in time]
# "time shift it by 5 seconds, since this data is recorded 5 seconds after the other signal
time = [x+5 for x in time]
Here is my plotting code for any other matplotlib beginners like me :) (this will not run, since I have not converted my variables to generic data... but nevertheless it is a simple example of using matplotlib.)
fig,ax = plt.subplots()
ax.grid()
ax.set_title(plotTitle)
ax.set_xlabel("time (s)")
ax.set_ylabel("value")
# begin looping over the different sets of data.
tup = 0
while (tup < len(alldata)):
outTime = alldata[tup][1].get("time")
# each signal is time shifted 5 seconds later.
# in addition each signal has different sampling frequency,
# so len(outTime) is different for almost every signal.
outTime = [x +(5*tup) for x in outTime]
for key in alldata[tup][1]:
if(key not in channelSelection):
## if we dont want to plot that data then skip it.
continue
else:
data = alldata[tup][1].get(key)
## using list comprehension to scale y values.
data = [100*x for x in data]
ax.plot(outTime,data,linestyle='solid', linewidth='1', marker='')
tup+=1
plt.show()
I have 2 questions, firstable, Im plotting a time series data with matplotlib, the time data is every minute, and its value on Min/60/24 (example minute 1 = .00069) Could anyone give me a tip for plotting easily on H:M format on x axis. How could I set a specific range on this axis. Thanks
Normally, you'll just plot it, and then set the x-axis labels to be the format you want. So if $t$ is your time variable that goes from 0 = 0:00 to 0.5 = 12:00 and 1 = 23:59 (or whatever), then set up your x-axis labels with format
"%d:%d"%(int(t*24),t*24*60-int(t*24)*60)
Or
"%d:%d"%(int(t*24)%24,(t*24*60)%60)