I am reading some data from the file, and then I have to plot this data for visual representation.
The data in the file is present in the following format:
16:08:45,64,31.2
16:08:46,60,29.3
16:08:47,60,29.3
16:08:48,60,29.3
16:08:49,60,29.3
.
.
This data is present in a file with the current date.
The file consist of time (Hour:Minute:Second), Adc Counts, Temperature Value
I used the following code to read the data from the file using Pandas.
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
year = 2018 # For 2018
'''
month = input ('Enter Month : ')
date = input ('Enter Date : ')
month = int(month)
date = int(date)
'''
# Hardcoded Values for Testing Purpose
month = 1
date = 20
headers = ['Time', 'ADC', 'Temperature']
filename = '%.2d-%.2d-%.2d.txt' % (month, date, year-2000)
print (filename)
try:
df = pd.read_table( filename, ',', names=headers,\
engine='python', header=None)
except:
print ('No Such File in Database')
print ('Exiting Program')
exit()
FMT = '%H:%M:%S'
df['Time'] = df['Time'].map(lambda x: datetime.strptime(str(x), FMT))
df['Time'] = df['Time'].map(lambda x: x.replace(day=date, month=month, year=year))
plt.plot( df['Time'], df['ADC'])
plt.ylim( [0,200])
#plt.gcf().autofmt_xdate()
plt.show()
I didn't get why the x-axis doesn't have correct values.
Is it due to the reason, that samples are too close ( 1sec) apart?
I only want time information on X-Axis.
Please suggest how I can get that.
Thanks in advance.
Update:
From comments, I am able to find the reason, why this is happening.
Pandas is treating my Date and Time as timestamp object while for plotting it with Matplotlib, it should be datetime object.
My problem will get solved if I am able to convert df['Time'] from Timestamp to datetime.
I searched online and found the pd.to_datetime, will do this work for me, but unfortunately, it doesn't work.
The following commands will list what I have done.
>>> x = pd.Timestamp( "2018-01-20")
>>> x
Timestamp('2018-01-20 00:00:00')
>>> pd.to_datetime( x, format="%Y-%m-%d")
Timestamp('2018-01-20 00:00:00')
As you can see above, i still get timestamp object
I searched again to check why it is not working, then I found that the following line will work.
>>> x = pd.Timestamp( "2018-01-20")
>>> y = pd.to_datetime( x, format="%Y-%m-%d")
>>> y.to_datetime()
Warning (from warnings module):
File "D:\Program Files\Python36\lib\idlelib\run.py", line 457
exec(code, self.locals)
FutureWarning: to_datetime is deprecated. Use self.to_pydatetime()
datetime.datetime(2018, 1, 20, 0, 0)
>>>
To remove this warning following commands can be used.
>>> x = pd.Timestamp( "2018-01-20")
>>> y = pd.to_datetime( x, format="%Y-%m-%d")
>>> y.to_pydatetime()
datetime.datetime(2018, 1, 20, 0, 0)
Now I tested these command in my project to see if this is working with my dataframe or not, I just took one value for testing.
>>> x = df['Time'][0].to_pydatetime()
>>> x
datetime.datetime(2018, 1, 20, 16, 8, 45)
So, yes it is working, now I have to apply this on complete column of dataframe and I used the following command.
>>> df['New'] = df['Time'].apply(lambda x: x.to_pydatetime())
>>> df['New'][0]
Timestamp('2018-01-20 16:08:45')
But it's still the same.
I am newbie, so I am not able to understand what I am doing wrong.
First we need a minimal, verifiable example:
u = u"""16:08:45,64,31.2
16:08:46,54,29.3
16:08:47,36,34.3
16:08:48,67,36.3
16:08:49,87,29.3"""
import io
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
year = 2018
month = 1
date = 20
headers = ['Time', 'ADC', 'Temperature']
df = pd.read_table(io.StringIO(u), ',', names=headers,\
engine='python', header=None)
FMT = '%H:%M:%S'
df['Time'] = df['Time'].map(lambda x: datetime.strptime(str(x), FMT))
df['Time'] = df['Time'].map(lambda x: x.replace(day=date, month=month, year=year))
plt.plot( df['Time'], df['ADC'])
plt.ylim( [20,100])
plt.gcf().autofmt_xdate()
plt.show()
Running this code with pandas 0.20.1 and matplotlib 2.1, the following plot is produced, which looks as desired:
The reason this works, even if the dates are pandas time stamps is that matplotlib uses the pandas converters internally, if they are available.
If not, one may first try to load them manually,
import pandas.plotting._converter as pandacnv
pandacnv.register()
If this also fails, one may indeed try to convert the timestamps to datetime objects.
dt = [x.to_pydatetime() for x in df['Time']]
plt.plot( dt, df['ADC'])
Related
My problem is when I plot the users joining by day the advance year appear, it should not have year 2023. I tried to search it into my csv file and there is no row holding the value of 2023.
data = pd.read_csv('users-current.csv')
#transform datetime to date
data['dateCreated'] = pd.to_datetime(data['created_on']).dt.date
#date Count Registered
dataCreated = data.groupby('dateCreated').size()
#dataCreatedArray = np.array([dataCreated], dtype = object)
dataCreated.head(50)
dataCreated.plot().invert_xaxis()
plt.title('Users Joining in a Day',pad=20, fontdict={'fontsize':24})
plt.show()
the output:
column in my csv used below:
This is because the range of x is automatically generated. Instead, you can explicitly limit a range of x using plt.xlim(), as follows:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('users-current.csv')
#transform datetime to date
data['dateCreated'] = pd.to_datetime(data['created_on']).dt.date
#date Count Registered
dataCreated = data.groupby('dateCreated').size()
#dataCreatedArray = np.array([dataCreated], dtype = object)
dataCreated.head(50)
dataCreated.plot().invert_xaxis()
# import datetime, and use this code to set a period as you want.
plt.xlim([datetime.date(2021, 1, 1), datetime.date(2022, 12, 31)])
plt.title('Users Joining in a Day', pad=20, fontdict={'fontsize':24})
plt.show()
I am new to Python/pandas coming from an R background. I am having trouble understanding how I can manipulate an existing column to create a new column based on multiple conditions of the existing column. There are 10 different conditions that need to met but for simplicity I will use a 2 case scenario.
In R:
install.packages("lubridate")
library(lubridate)
df <- data.frame("Date" = c("2020-07-01", "2020-07-15"))
df$Date <- as.Date(df$Date, format = "%Y-%m-%d")
df$Fiscal <- ifelse(day(df$Date) > 14,
paste0(year(df$Date),"-",month(df$Date) + 1,"-01"),
paste0(year(df$Date),"-",month(df$Date),"-01")
)
df$Fiscal <- as.Date(df$Fiscal, format = "%Y-%m-%d")
In Python I have:
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst = True, format = "%Y-%m-%d")
df.loc[df['Date'].dt.day > 14,
'Fiscal'] = "-".join([dt.datetime.strftime(df['Date'].dt.year), dt.datetime.strftime(df['Date'].dt.month + 1),"01"])
df.loc[df['Date'].dt.day <= 14,
'Fiscal'] = "-".join([dt.datetime.strftime(df['Date'].dt.year), dt.datetime.strftime(df['Date'].dt.month),"01"])
If I don't convert the 'Date' field it says that it expects a string, however if I do convert the date field, I still get an error as it seems it is applying to a 'Series' object.
TypeError: descriptor 'strftime' for 'datetime.date' objects doesn't apply to a 'Series' object
I understand I may have some terminology or concepts incorrect and apologize, however the answers I have seen dealing with creating a new column with multiple conditions do not seem to be manipulating the existing column they are checking the condition on, and simply taking on an assigned value. I can only imagine there is a more efficient way of doing this that is less 'R-ey' but I am not sure where to start.
This isn't intended as a full answer, just as an illustration how strftime works: strftime is a method of a date(time) object that takes a format-string as argument:
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst = True, format = "%Y-%m-%d")
s = [dt.date(df['Date'][i].year, df['Date'][i].month + 1, 1).strftime('%Y-%m-%d')
for i in df['Date'].index]
print(s)
Result:
['2020-08-01', '2020-08-01']
Again: No full answer, just a hint.
EDIT: You can vectorise this, for example by:
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst=True, format='%Y-%m-%d')
df['Fiscal'] = df['Date'].apply(lambda d: dt.date(d.year, d.month, 1)
if d.day < 15 else
dt.date(d.year, d.month + 1, 1))
print(df)
Result:
Date Fiscal
0 2020-07-01 2020-07-01
1 2020-07-15 2020-08-01
Here I'm using an on-the-fly lambda function. You could also do it with an externally defined function:
def to_fiscal(date):
if date.day < 15:
return dt.date(date.year, date.month, 1)
return dt.date(date.year, date.month + 1, 1)
df['Fiscal'] = df['Date'].apply(to_fiscal)
In general vectorisation is better than looping over rows because the looping is done on a more "lower" level and that is much more efficient.
Until someone tells me otherwise I will do it this way. If there's a way to do it vectorized (or just a better way in general) I would greatly appreciate it
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst=True, format='%Y-%m-%d')
test_list = list()
for i in df['Date'].index:
mth = df['Date'][i].month
yr = df['Date'][i].year
dy = df['Date'][i].day
if(dy > 14):
new_date = dt.date(yr, mth + 1, 1)
else:
new_date = dt.date(yr, mth, 1)
test_list.append(new_date)
df['New_Date'] = test_list
I have the following date: 2019-11-20 which corresponds to week 47 of the calendar year. This is also what my excel document says. However, when I do it in Python I get week 46 instead. I will upload my code but I do not get what's wrong with it. I tried to split up the column I had to date and time separately but still, I get the same problem. Very odd I do not know what's wrong and my local time at my laptop is fine. Thanks for your help in advance!
Here is my code:
import pandas as pd
from datetime import datetime
import numpy as np
import re
df = pd.read_csv (r'C:\Users\user\document.csv')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['startedAt'] = df['startedAt'].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S').strftime('%d-%m-%y %H:%M:%S'))
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['endedAt'] = pd.to_datetime(df['endedAt'], format='%Y-%m-%d')
df['startedAt'] = pd.to_datetime(df['startedAt'])
df['Date_started'] = df['startedAt'].dt.strftime('%d/%m/%Y')
df['Time_started'] = df['startedAt'].dt.strftime('%H:%M:%S')
df['Date_started'] = pd.to_datetime(df['Date_started'], errors='coerce')
df['week'] = df['Date_started'].dt.strftime('%U')
print(df)
I want to plot a line graph of ECG in mV and time in HH:MM:SS:MMM . its a 10 second ECG strip.
image of ECG CSV file with two ECG values and time
I Have extract the Time column and now i want to convert the time column in dataframe of python and then plot it on graph
but when I apply to_datetime() function it give me the following error
to assemble mappings requires at least that [year, month, day] be
specified: [day,month,year] is missing
Screenshot of error i get
please Help me to resolve this error , I only want to put %H:%M:%S.%f because i don not have the year , months and days.
As commented you can add a date to those times. The date can be arbitrary. Then you can convert to datetime and use them to plot your graph.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
data = {'Time':['11:20:15.333','12:00:00.444', '13:46:00.100'],
'A':[1,3,2],'B':[5,5,4]}
df = pd.DataFrame(data=data)
df["Date"] = "2019-09-09"
df['Datetime'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['Time'])
df = df[["Datetime", "A", "B"]].set_index("Datetime")
ax = df.plot(x_compat=True)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M:%S.%f"))
plt.show()
See to_timedelta
Date :
df1 = {'Time':['11:20:15.333','10:00:00.444'],'P1':['102','102'],'P2':['240','247']}
df1 = pd.DataFrame(data=df1)
df1
Code :
df1['Time'] = pd.to_timedelta(df1['Time'])
df1
Result:
Time P1 P2
0 11:20:15.333000 102 240
1 10:00:00.444000 102 247
Reference : https://stackoverflow.com/a/46801500/1855988
You need to specify the format you want to convert the time to. You can find out more information here about what each symbol means.
# before it is object
df['column_name'] = pd.to_datetime(df['column_name'], format="%H:%M:%S,%f")
df = df.astype('datetime64')
df['column_name'] = pd.to_datetime(df['column_name'], format='%H:%M:%S', errors='coerce').dt.time
# coming back to object
print(df.head())
# print(df.info())
I have 2 numpy arrays with (time and date), and the third with rain. At the end I would like to plot all the info at a xy-plot with matplotlib!
This i what I got so far
import os
import time
from datetime import datetime
import time
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
date = np.array(["01.06.2015", "01.06.2015", "01.06.2015"], dtype=object)
time = np.array(["12:23:00", "14:54:00", "14:56:00"], dtype=object)
# Rain
rain = np.array([2.544, 1.072, 1.735]
# Calculations to make one array of time and date,
# called timestamp
A = np.vstack((date, time))
A_transp = A.transpose()
A_transp.shape
A_transp.type
So at the end as mentioned I would like to have an (x,y)-Plot, with timestamps(so time and date combined as an array of floating point numbers and the rain on the other axes.
Thank you for your help
Markus
Thank you for your help, but I do not come to a conclusion!
Further stepps I did!
# Get a new .out file, to get a time tuple
# see strptime.
# Finally I would like to make a floating point number out of the
# timetuple, to plot the hole thing!
#
mydata = np.savetxt('A_transp.out', A_transp
,fmt="%s")
# Dateconv
dateconv = lambda s: datetime.strptime(s, '%d.%m.%Y %H:%M:%S')
# ColNames
col_names = ["Timestamp"]
# DataTypes
dtypes = ["object"]
# Read in the new file
mydata_next = np.genfromtxt('A_transp.out', delimiter=None,
names=col_names, dtype=dtypes, converters={"Timestamp":dateconv})
So after the np.genfromtxt following error message appears
Traceback (most recent call last):
File "parsivel.py", line 155, in <module>
names=col_names, dtype=dtypes, converters={"Timestamp":dateconv})
File "/home/unix/anaconda2/lib/python2.7/site-
packages/numpy/lib/npyio.py", line 1867, in genfromtxt
output = np.array(data, dtype)
ValueError: Setting void-array with object members using buffer.
What I would try after that would be the following.
#B = mdates.strpdate2num(mydata_next) # fail
#B = time.mktime(mydata_next) # fail
#B = plt.dates.date2num(mydata_next) # fail
And finally I would like to plot the following
# Plot
# Fail
#plt.plot_date(mydata_next, rain)
#plt.show()
But at the moment all the plots fail, because I can not make a time tuple out of A_transp! Maybe also the strptime function is not right here, or there is another way as the detour via np.savetxt and the try of rearanging A_transp?
Starting from your original date and time arrays, you can obtain a date-time string representation in a single array just by adding them:
In[61]: date_time = date + time
In[62]: date_time
Out[62]: array(['01.06.201512:23:00', '01.06.201514:54:00', '01.06.201514:56:00'], dtype=object)
Now you can convert the date-time strings into datetime format. For example:
In[63]: date_time2 = [datetime.strptime(d, '%d.%m.%Y%H:%M:%S') for d in date_time]
In[64]: date_time2
Out[64]:
[datetime.datetime(2015, 6, 1, 12, 23),
datetime.datetime(2015, 6, 1, 14, 54),
datetime.datetime(2015, 6, 1, 14, 56)]
And that's all you need to plot your data with:
plt.plot_date(date_time2, rain)