Plotting CSV data using myplotlib and pandas in python

Plotting CSV data using myplotlib and pandas in python - python

This is what I currently have, I need to plot time on the x and turbidity on the y. Before I can plot the volts from a csv file need to go into the equation Turbidity = (0.07642 * volts) + (-15.122)) and then graphed. I am also getting a date error, I will post the columns below. Here are the columns below, how can I get it to overlook the logger time and the loggerID? I just need the date time on x and the raw sensor converted to turbidity on the y.
Date/Time (UTC) Logger Time (unix timestamp) Raw Sensor (mV) LoggerID
6/27/2018 18:45 1530125111 4.61 Mill Creek B
7/3/2018 18:30 1530642609 92.14 Mill Creek B
7/3/2018 18:45 1530643509 92.03 Mill Creek B
7/3/2018 20:00 1530648013 91.24 Mill Creek B
...
import pandas as pd
from datetime import datetime
import csv
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
headers = ['Raw Sensor','Date','Time']
df = pd.read_csv('turbiditydata.csv',names=headers)
print (df)
df['Date'] = df['Date'].map(lambda x: datetime.strptime(str(x), '%d/%m/%y %H:%M'))
x = df['Date']
y = df['Turbidity']
plt.plot(x,y)
plt.gcf().autofmt_xdate()
plt.title('Turbidity Over Time')
plt.show()

Related

seaborn : plotting time on x-axis

I'm working with a dataset that only contains datetime objects and I have retrieved the day of the week and reformatted the time in a separate column like this (conversion functions included below):
datetime day_of_week time_of_day
0 2021-06-13 12:56:16 Sunday 20:00:00
5 2021-06-13 12:56:54 Sunday 20:00:00
6 2021-06-13 12:57:27 Sunday 20:00:00
7 2021-07-16 18:55:42 Friday 20:00:00
8 2021-07-16 18:56:03 Friday 20:00:00
9 2021-06-04 18:42:06 Friday 20:00:00
10 2021-06-04 18:49:05 Friday 20:00:00
11 2021-06-04 18:58:22 Friday 20:00:00
What I would like to do is create a kde plot with x-axis = time_of_day (spanning 00:00:00 to 23:59:59), y-axis to be the count of each day_of_week at each hour of the day, and hue = day_of_week. In essence, I'd have seven different distributions representing occurrences during each day of the week.
Here's a sample of the data and my code. Any help would be appreciated:
df = pd.DataFrame([
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:54',
'2021-06-13 12:56:54',
'2021-06-13 12:57:27',
'2021-07-16 18:55:42',
'2021-07-16 18:56:03',
'2021-06-04 18:42:06',
'2021-06-04 18:49:05',
'2021-06-04 18:58:22',
'2021-06-08 21:31:44',
'2021-06-09 02:14:30',
'2021-06-09 02:20:19',
'2021-06-12 18:05:47',
'2021-06-15 23:46:41',
'2021-06-15 23:47:18',
'2021-06-16 14:19:08',
'2021-06-17 19:08:17',
'2021-06-17 22:37:27',
'2021-06-21 23:31:32',
'2021-06-23 20:32:09',
'2021-06-24 16:04:21',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-08-31 21:38:07',
'2020-08-31 21:38:22',
'2020-08-31 21:38:42',
'2020-08-31 21:39:03',
], columns=['datetime'])
def convert_date(date):
return calendar.day_name[date.weekday()]
def convert_hour(time):
return time[:2]+':00:00'
df['day_of_week'] = pd.to_datetime(df['datetime']).apply(convert_date)
df['time_of_day'] = df['datetime'].astype(str).apply(convert_hour)

Let's try:
converting the datetime column to_datetime
Create a Categorical column from day_of_week codes (so categorical ordering functions correctly)
normalizing the time_of_day to a single day (so comparisons function correctly). This makes it seem like all events occurred within the same day making plotting logic much simpler.
plot the kdeplot
set the xaxis formatter to only display HH:MM:SS
import calendar
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt, dates as mdates
# df = pd.DataFrame({...})
# Convert to datetime
df['datetime'] = pd.to_datetime(df['datetime'])
# Create Categorical Column
cat_type = pd.CategoricalDtype(list(calendar.day_name), ordered=True)
df['day_of_week'] = pd.Categorical.from_codes(
df['datetime'].dt.day_of_week, dtype=cat_type
)
# Create Normalized Date Column
df['time_of_day'] = pd.to_datetime('2000-01-01 ' +
df['datetime'].dt.time.astype(str))
# Plot
ax = sns.kdeplot(data=df, x='time_of_day', hue='day_of_week')
# X axis format
ax.set_xlim([pd.to_datetime('2000-01-01 00:00:00'),
pd.to_datetime('2000-01-01 23:59:59')])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
plt.tight_layout()
plt.show()
Note sample size is small here:
If looking for count on y then maybe histplot is better:
ax = sns.histplot(data=df, x='time_of_day', hue='day_of_week')

I would use Timestamp of pandas straight away. By the way your convert_hour function seems to do wrong. It gives time_of_the day as 20:00:00 for all data.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_context("paper", font_scale=2)
sns.set_style('whitegrid')
df['day_of_week'] = df['datetime'].apply(lambda x: pd.Timestamp(x).day_name())
df['time_of_day'] = df['datetime'].apply(lambda x: pd.Timestamp(x).hour)
plt.figure(figsize=(8, 4))
for idx, day in enumerate(days):
sns.kdeplot(df[df.day_of_week == day]['time_of_day'], label=day)
The kde for wednesday, looks a bit strange because the time varies between 2 and 20, hence the long tail from -20 to 40 in the plot.

Here is a simple code and using df.plot.kde.
Added more data so that multiple values are present for each day_of_week for kde to plot. Simplified the code to remove functions.
df1 = pd.DataFrame([
'2020-09-01 16:39:03',
'2020-09-02 16:39:03',
'2020-09-03 16:39:03',
'2020-09-04 16:39:03',
'2020-09-05 16:39:03',
'2020-09-06 16:39:03',
'2020-09-07 16:39:03',
'2020-09-08 16:39:03',
], columns=['datetime'])
df = pd.concat([df,df1]).reset_index(drop=True)
df['day_of_week'] = pd.to_datetime(df['datetime']).dt.day_name()
df['time_of_day'] = df['datetime'].str.split(expand=True)[1].str.split(':',expand=True)[0].astype(int)
df.pivot(columns='day_of_week').time_of_day.plot.kde()
Plots:

How to plot time only of pandas datetime64[ns] attribute

I have a dataframe of a long time range in format datetime64[ns] and a int value
Data looks like this:
MIN_DEP DELAY
0 2018-01-01 05:09:00 0
1 2018-01-01 05:13:00 0
2 2018-01-01 05:39:00 0
3 2018-01-01 05:43:00 0
4 2018-01-01 06:12:00 34
... ... ...
77005 2020-09-30 23:42:00 0
77006 2020-09-30 23:43:00 0
77007 2020-09-30 23:43:00 43
77008 2020-10-01 00:18:00 0
77009 2020-10-01 00:59:00 0
[77010 rows x 2 columns]
MIN_DEP datetime64[ns]
DELAY int64
dtype: object
Target is to plot all the data in just a 00:00 - 24:00 range on the x-axis, no dates anymore.
As i try to plot it, the timeline is 00:00 at any point. How to fix this?
import matplotlib.dates as mdates
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
tried to convert the timestamps before to dt.time and plot it then
pd_to_stat['time'] = pd.to_datetime(pd_to_stat['MIN_DEP'], format='%H:%M').dt.time
fig, ax = plt.subplots()
ax.plot(pd_to_stat['time'],pd_to_stat['DELAY'])
plt.show()
Plot does not allow to do that:
TypeError: float() argument must be a string or a number, not 'datetime.time'

According to your requirement, I guess you don't need the dates and as well as the seconds field in your timestamp. So you need a little bit of preprocessing at first.
Remove the seconds field using the code below
dataset['MIN_DEP'] = dataset['MIN_DEP'].strftime("%H:%M")
Then you can remove the date from your timestamp in the following manner
dataset['MIN_DEP'] = pd.Series([val.time() for val in dataset['MIN_DEP']])
Then you can plot your data in the usual manner.

This seems to work now. I did not recognise, the plot was still splitting up in dates. To work around I hat to replace all the dates with the same date and plottet it hiding the date using DateFormatter
import matplotlib.dates as mdates
pd_to_stat['MIN_DEP'] = pd_to_stat['MIN_DEP'].map(lambda t: t.replace(year=2020, month=1, day=1))
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()

How to plot data, time on x-axis not datetime

id timestamp energy
0 a 2012-03-18 10:00:00 0.034
1 b 2012-03-20 10:30:00 0.052
2 c 2013-05-29 11:00:00 0.055
3 d 2014-06-20 01:00:00 0.028
4 a 2015-02-10 12:00:00 0.069
I want to plot these data like below.
just time on x-axis, not date nor datetime.
because I want to see the values per each hour.
https://i.stack.imgur.com/u73eJ.png
but this code plot like this.
plt.plot(df['timestamp'], df['energy'])
https://i.stack.imgur.com/yd6NL.png
I tried some codes but they just format the X data hide date part and plot like second graph.
+ df['timestamp'] is datetime type.
what should I do? Thanks.

you can convert your datetime into time, if your df["timestamp"] is already in datetime format then
df["time"] = df["timestamp"].map(lambda x: x.time())
plt.plot(df['time'], df['energy'])
if df["timestamp"] is of type string then you can add one more line in front as df["timestamp"] = pd.to_datetime(df["timestamp"])
Update: look like matplotlib does not accept time types, just convert to string
df["time"] = df["timestamp"].map(lambda x: x.strftime("%H:%M"))
plt.scatter(df['time'], df['energy'])

First check, if type of df["timestamp"] is in datetime format.
if not
import pandas as pd
time = pd.to_datetime(df["timestamp"])
print(type(time))
Then,
import matplotlib.pyplot as plt
values = df['energy']
plt.plot_date(dates , values )
plt.xticks(rotation=45)
plt.show()

How to draw a line chart from datetime only data

There is csv with only the date and time of access as below
2018-09-01 13:23:14 UTC
2018-09-01 13:23:29 UTC
2018-09-01 13:23:32 UTC
2018-09-01 13:23:34 UTC
...
2018-10-21 20:04:16 UTC
2018-10-21 20:04:18 UTC
2018-10-21 20:04:20 UTC
2018-10-21 20:04:21 UTC
2018-10-21 20:04:24 UTC
2018-10-21 20:04:26 UTC
2018-10-21 20:04:27 UTC
I would like to confirm in which time zone the access is heavy, with a line chart in minutes.
I tried it like this, but it will not work.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import csv
with open('./access.csv', 'r', encoding='utf-8-sig') as f:
i = 0
header = next(f)
time = []
count = []
for row in f:
time.append(row)
count.append(1)
df = pd.DataFrame({
'time': pd.to_datetime(time),
'count': count
})
df = df.set_index('time')
plt.show()
How can it work?

You can load the csv as a pandas Series like this:
df = pd.read_csv('./access.csv')
From there you can turn the values to datetime then plot the minute values as a line plot with matplotlib:
df = pd.to_datetime(df)
min_counts = df.dt.minute.value_counts()
plt.plot(min_counts.index, min_counts)
plt.show()

Wrong labels when plotting a time series pandas dataframe with matplotlib

I am working with a dataframe containing data of 1 week.
y
ds
2017-08-31 10:15:00 1.000000
2017-08-31 10:20:00 1.049107
2017-08-31 10:25:00 1.098214
...
2017-09-07 10:05:00 99.901786
2017-09-07 10:10:00 99.950893
2017-09-07 10:15:00 100.000000
I create a new index by combining the weekday and time i.e.
y
dayIndex
4 - 10:15 1.000000
4 - 10:20 1.049107
4 - 10:25 1.098214
...
4 - 10:05 99.901786
4 - 10:10 99.950893
4 - 10:15 100.000000
The plot of this data is the following:
The plot is correct as the labels reflect the data in the dataframe. However, when zooming in, the labels do not seem correct as they no longer correspond to their original values:
What is causing this behavior?
Here is the code to reproduce this:
import datetime
import numpy as np
import pandas as pd
dtnow = datetime.datetime.now()
dindex = pd.date_range(dtnow , dtnow + datetime.timedelta(7), freq='5T')
data = np.linspace(1,100, num=len(dindex))
df = pd.DataFrame({'ds': dindex, 'y': data})
df = df.set_index('ds')
df = df.resample('5T').mean()
df['dayIndex'] = df.index.strftime('%w - %H:%M')
df= df.set_index('dayIndex')
df.plot()

"What is causing this behavior?"
The formatter of an axes of a pandas dates plot is a matplotlib.ticker.FixedFormatter (see e.g.
print plt.gca().xaxis.get_major_formatter()). "Fixed" means that it formats the ith tick (if shown) with some constant string.
When zooming or panning, you shift the tick locations, but not the format strings.
In short: A pandas date plot may not be the best choice for interactive plots.
Solution
A solution is usually to use matplotlib formatters directly. This requires the dates to be datetime objects (which can be ensured using df.index.to_pydatetime()).
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates
dtnow = datetime.datetime.now()
dindex = pd.date_range(dtnow , dtnow + datetime.timedelta(7), freq='110T')
data = np.linspace(1,100, num=len(dindex))
df = pd.DataFrame({'ds': dindex, 'y': data})
df = df.set_index('ds')
df.index.to_pydatetime()
df.plot(marker="o")
plt.gca().xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%w - %H:%M'))
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting CSV data using myplotlib and pandas in python - python

Related

seaborn : plotting time on x-axis

How to plot time only of pandas datetime64[ns] attribute

How to plot data, time on x-axis not datetime

How to draw a line chart from datetime only data

Wrong labels when plotting a time series pandas dataframe with matplotlib

Categories

Resources