How to convert timedelta for plotly histogram? - python

I want to plot histograms for timedelta64 (example: Timedelta('0 days 00:00:44.749500')
But in both cases, ploty.histogram() does not recognize the correct time but rather displays values (e.g., 50T 60T (See image)).
How do I have to convert the datetime/timedelta that plotly .histogram() recognizes the correct timeaxis? Thanks
fig = px.histogram(x=df_TT_redux["T_delta"],color=df_TT_redux["event_source"],log_y=True)
EDIT:
THanks to LittlePanic404
converting to ISO Format gives some interesting results. I guess I have to tweak that still a bit.
using
import isodate
df_TT_redux["T_delta3"]=[isodate.duration_isoformat(x) for x in df_TT_redux["T_delta"]]
fig = px.histogram(x=df_TT_redux["T_delta3"],color=df_TT_redux["event_source"],color_discrete_map=color_discrete_map,log_y=False,log_x=False,nbins=100)
However,
another way of solving this could be this:
df_TT_redux["T_delta2"]=df_TT_redux["T_delta"]/pd.Timedelta("1 hour")
or
.../pd.Timedelta("1 minute"). Depending on your case

Related

Datetime to Time/HH:MM format – investigating events on multiple dates by the time of day

I have a pandas dataframe with a column "Datetime" which has values in pd.Timestamp / np.datetime64 format. How should I extract the hours and minutes while keeping the status of this "HH:MM" as "continuous plottable values?"
I want to plot a histogram of the dataframe column (pd.Series) based on the frequency in "HH:MM sense" in which case the x-axis would range from 00:00 to 23:59 etc.
import pandas as pd
# ...
new_df["Datetime"][0]
> Timestamp('2022-08-08 16:58:00')
I saw examples of extracting the time as a string. Not good enough. I could also use groupby hour and then e.g. plot a bar chart by count but that's not exactly what I was looking for, either...
...or I could convert each row to a string and then immediately back to pd.Timestamp with the same date. It's not ideal, but works. Any better ideas?
I battled with this a bit longer and got it working decently. Is this really the most straightforward way of doing it? The lambda stuff feels always a bit far-fetched, and this one still keeps the full date which isn't a problem per se but not necessary, either (and requires extra formatting on the xaxis).
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig, ax = plt.subplots()
plt.xticks(rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
# pd.Timestamp convers the date automatically to "today" if YYYYMMDD is not specified
new_df["Datetime"].apply(lambda t:pd.Timestamp(f'{t.hour:02d}:{t.minute:02d}')).hist(ax=ax)

How to regulate number of ticks plot?

I have a dataframe with shape (2000, 2). There are two columns: value and date. I want to plot it with date on x axis. But since there are 2000 days there, I want to keep only 10 ticks on x axis. I tried this:
plt.plot(data["Date"], data["Value"])
plt.locator_params(axis='x', nbins=10)
plt.show()
But plot looks like this:
How to fix it?
From your plot, I'm going to assume your problem is that your "Date" column are strings, and not datetimes (or pandas' Timestamp), so matplotlib considers it as categories. If it was datetime-like, matplotlib would automatically select a somewhat-suitable tick spacing:
You would need to convert those string back to datetimes, for example with dateutil.parser
from dateutil import parser
data['Date_dt'] = data['Date'].apply(parser.parse)
or via strptime (the formatting string in args could change depending on your date format)
from datetime import datetime
data['Date_dt'] = data['Date_str'].apply(datetime.strptime, args=['%Y-%m-%d %H:%M:%S'])
If for some obscure reason, you really just want EXACTLY 10 ticks, you could do something along the lines of:
plt.xticks(pd.date_range(data['Date'].min(), data['Date'].max(), periods=10))

Building a plot and correcting visualisation python, pandas, matplotlib

I am building a plot. I have two types of data. Tampstamp column store dates and favorite count stores count of likes. I want to visualise favorite count during the time since posting the tweet.
I believe your timestamp are strings. Convert it to datetime type and matplotlib\pandas will give you a nicer x-axis:
df['timestamp'] = pd.to_datetime(df['timestamp'])
# plot
plt.figure(figsize=(20,5))
df.plot(x='timestamp',y='favorite_count')

Convert date in pandas

I know this has been asked like 100 times but I still don't get it and the given solutions don't get me anywhere.
Im trying to convert time into a comparable format with Pandas/Python. I used a db entries as data and currently I have trouble using time like this:
52 2017-08-04 12:26:56.348698
53 2017-08-04 12:28:22.961560
54 2017-08-04 12:34:20.299041
the goal is to use it as year1 and year2 to make a graph like:
def sns_compare(year1,year2):
f, (ax1) = plt.subplots(1, figsize=LARGE_FIGSIZE)
for yr in range(int(year1),int(year2)):
sns.distplot(tag2[str(yr)].dropna(), hist=False, kde=True, rug=False, bins=25)
sns_compare(year1,year2)
When I try to to it like this I get ValueError: invalid literal for int() with base 10: '2017-08-04 12:34:20.299041'.
So currently I think about using Regex to manipulate the time fields but this cant be the way to go or at least I cant imagine. I tried all kind of suggestions from SO/GitHub but nothing really worked. I also don't know what the "optimal" time structure should look like. Is it 20170804123420299041 or something like 2017-08-04-12-34-20-299041. I hope somebody can make this clear to me.
This is your data:
from matplotlib import pyplot as plt
from datetime import datetime
import pandas as pd
df = pd.DataFrame([("2017-08-04 12:26",56.348698),("2017-08-04 12:28",22.961560),("2017-08-04 12:34",20.299041)])
df.columns = ["date", "val"]
First, we convert to datetime, then we reduce year1, next we convert to days.
df['date'] = pd.to_datetime(df["date"])
df["days"]=(df['date'] -datetime(year1,1,1)).dt.total_seconds()/86400.0
plot the data, and display only the days between year1 and year2
plt.scatter(df["days"],df["val"])
plt.xlim((0,(year2-year1)*365))
plt.show()
Have you looked at pd.to_datetime? Pandas and Seaborn should be able to handle dates fine, and you don't have to convert them to integers.

Pandas Time Series: How to plot only times of day (no dates) against other values?

As I am preparing to do some regressions on a rather big dataset I would like to visualize the data at first.
The data we are talking about is data about the New York subway (hourly entries, rain, weather and such) for May, 2011.
When creating the dataframe I converted hours and time to pandas datetime format.
Now I realize that what I want to do does not make much sense from a logical point of view for the example at hand. However, I would still like to plot the exact time of day against the hourly entries. Which, as I said, is not very meaningful since ENTRIESn_hourly is aggregated. But let's for the sake of the argument assume the ENTRIESn_hourly would be explicitly related to the exact timestamp.
Now how would I go about taking only the times and ignoring the dates and then plot that out?
Please find the jupyter notebook here: https://github.com/FBosler/Udacity/blob/master/Example.ipynb
Thx alot!
IIUC you can do it this way:
In [9]: weather_turnstile.plot.line(x=weather_turnstile.Date_Time.dt.time, y='ENTRIESn_hourly', marker='o', alpha=0.3)
Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0xc2a63c8>
.dt accessor gives you access to the following attributes:
In [10]: weather_turnstile.Date_Time.dt.
weather_turnstile.Date_Time.dt.ceil weather_turnstile.Date_Time.dt.is_quarter_end weather_turnstile.Date_Time.dt.strftime
weather_turnstile.Date_Time.dt.date weather_turnstile.Date_Time.dt.is_quarter_start weather_turnstile.Date_Time.dt.time
weather_turnstile.Date_Time.dt.day weather_turnstile.Date_Time.dt.is_year_end weather_turnstile.Date_Time.dt.to_period
weather_turnstile.Date_Time.dt.dayofweek weather_turnstile.Date_Time.dt.is_year_start weather_turnstile.Date_Time.dt.to_pydatetime
weather_turnstile.Date_Time.dt.dayofyear weather_turnstile.Date_Time.dt.microsecond weather_turnstile.Date_Time.dt.tz
weather_turnstile.Date_Time.dt.days_in_month weather_turnstile.Date_Time.dt.minute weather_turnstile.Date_Time.dt.tz_convert
weather_turnstile.Date_Time.dt.daysinmonth weather_turnstile.Date_Time.dt.month weather_turnstile.Date_Time.dt.tz_localize
weather_turnstile.Date_Time.dt.floor weather_turnstile.Date_Time.dt.nanosecond weather_turnstile.Date_Time.dt.week
weather_turnstile.Date_Time.dt.freq weather_turnstile.Date_Time.dt.normalize weather_turnstile.Date_Time.dt.weekday
weather_turnstile.Date_Time.dt.hour weather_turnstile.Date_Time.dt.quarter weather_turnstile.Date_Time.dt.weekday_name
weather_turnstile.Date_Time.dt.is_month_end weather_turnstile.Date_Time.dt.round weather_turnstile.Date_Time.dt.weekofyear
weather_turnstile.Date_Time.dt.is_month_start weather_turnstile.Date_Time.dt.second weather_turnstile.Date_Time.dt.year

Categories

Resources