I have a problem with the following code. I get an error "strptime() argument 1 must be str, not Timestamp"
I guess that what I should do is to convert date from timestamp to string but I do not know what to do.
class TweetAnalyzer:
def tweets_to_data_frame(self,ElonMuskTweets):
df = pd.DataFrame(data=[tweet.text for tweet in ElonMuskTweets],columns=['Tweets'])
df['Text length'] = np.array ([len(tweet.text)for tweet in ElonMuskTweets])
df['Date and time of creation'] = np.array ([tweet.created_at for tweet in ElonMuskTweets])
df['Likes'] = np.array ([tweet.favorite_count for tweet in ElonMuskTweets])
df['Retweets'] = np.array ([tweet.retweet_count for tweet in ElonMuskTweets])
list_of_dates = []
list_of_times = []
for date in df['Date and time of creation']:
date_time_obj = datetime.strptime(date, '%Y-%m-%d %H:%M:%S')
list_of_dates.append(date_time_obj.date())
list_of_times.append(date_time_obj.time())
df['Date'] = list_of_dates
df['Time'] = list_of_times
df['Date'] = pd.to_datetime(df['Date'])
start_date = '2018-04-13'
end_date = '2019-04-13'
mask1 = (df['Date'] >= start_date) & (df['Date'] <= end_date)
MuskTweets18_19 = df.loc[mask1]
return MuskTweets18_19.to_csv ('elonmusk_tweets.csv',index=False)
I get the error in
date_time_obj = datetime.strptime(date, '%Y-%m-%d %H:%M:%S')
How can I solve this prolem?
Thank you in advance
Can you coerce the data type to a string to perform this calculation?
date_time_obj = datetime.strptime(str(date), '%Y-%m-%d %H:%M:%S')
If it says "strptime() argument 1 must be str, not Timestamp", likely that you already have the pandas.Timestamp object, i.e., it is not a string but a parsed date time, only it is in Pandas' format, not Python's. So to convert, use this:
date_time_obj = date.to_pydatetime()
instead of date_time_obj = datetime.strptime(date, '%Y-%m-%d %H:%M:%S')
If the object is a Python Timestamp, you can implement:
timestamp = Timestamp('2017-11-12 00:00:00')
str_timestamp = str(timestamp)
import pandas as pd
import datetime
base = pd.to_datetime("2022-10-10")
date_list = [datetime.datetime.strftime(pd.to_datetime(base - datetime.timedelta(days=x)),"%Y-%m-%d") for x in range(7)]
print(date_list)
output will be
['2022-10-10',
'2022-10-09',
'2022-10-08',
'2022-10-07',
'2022-10-06',
'2022-10-05',
'2022-10-04']
Just adding to the above answers as ran into the following probem using the solutions provided:
AttributeError: module 'datetime' has no attribute 'strptime'
Based on the answer found here, you need to either coerce the timestamp into a string like this:
date_time_obj = datetime.datetime.strptime(str(date), '%Y-%m-%d %H:%M:%S')
Or make sure to import the class and not just the module like this:
from datetime import datetime
Related
I have a dataframe with timestamp of different formats one with 05-28-2022 14:05:30 and one with 06-04-2022 03:04:13.002 both I want to convert into iso format how can I do that?
input output
05-28-2022 14:05:30 -> 2022-05-28T14:05:30.000+0000
06-04-2022 03:04:13.002 -> 2022-06-04T03:04:13.002+0000
You can use strptime() + strftime(). Here is an example:
from datetime import datetime
import pytz
# parse str to instance
first = datetime.strptime('05-28-2022 14:05:30', '%m-%d-%Y %H:%M:%S')
first = first.replace(tzinfo=pytz.UTC)
print(first.strftime('%Y-%m-%dT%H:%M:%S.%f%z'))
print(f'{first.isoformat()}')
second = datetime.strptime('06-04-2022 03:04:13.002', '%m-%d-%Y %H:%M:%S.%f')
second = second.replace(tzinfo=pytz.UTC)
print(second.strftime('%Y-%m-%dT%H:%M:%S.%f%z'))
print(second.isoformat())
# 2022-05-28T14:05:30.000000+0000
# 2022-05-28T14:05:30+00:00
# 2022-06-04T03:04:13.002000+0000
# 2022-06-04T03:04:13.002000+00:00
See datetime docs. Also you can use other packages for dates processing / formatting:
iso8601
pendulum
dateutil
arrow
Example with dataframe:
import pandas as pd
import pytz
from datetime import datetime
df = pd.DataFrame({'date': ['05-28-2022 14:05:30', '06-04-2022 03:04:13.002']})
def convert_date(x):
dt_format = '%m-%d-%Y %H:%M:%S.%f' if x.rfind('.', 1) > -1 else '%m-%d-%Y %H:%M:%S'
dt = datetime.strptime(x, dt_format).replace(tzinfo=pytz.UTC)
return dt.strftime('%Y-%m-%dT%H:%M:%S.%f%z')
df['new_date'] = df['date'].apply(convert_date)
print(df)
date new_date
0 05-28-2022 14:05:30 2022-05-28T14:05:30.000000+0000
1 06-04-2022 03:04:13.002 2022-06-04T03:04:13.002000+0000
I am using django python. Now I want to convert the following timing string into hours, minutes ,am/pm format.
string_time = '2022-09-13 11:00:00.996795+00'
expected output:
11:00 am
actual output is :
ValueError: time data '2022-09-13 11:00:00.996795+00' does not match format '%m/%d/%y %H:%M:%S'
my code :
def time_slots(self,string_time='2022-09-13 11:00:00.996795+00'):
print(datetime.strptime(string_time, '%m/%d/%y %H:%M:%S'),type(start_time))
start_time = datetime.strptime(string_time, '%m/%d/%y %H:%M:%S')
return formated_start_time
When you remove the last three chars ('+00') and replace the space with T you can use datetime.datetime.fromisoformat(str) to get a datetime object.
from datetime import datetime
timestr = '2022-09-13 11:00:00.996795+00'
timestr = timestr.rstrip(timestr[-3:]).replace(' ', 'T')
date = datetime.fromisoformat(timestr)
from there you can use date.hour and date.minute to get the values you want.
e.g.:
hour = date.hour%12
minute = date.minute
addition = ''
if date.hour > 12:
addition = 'pm'
else:
addition = 'am'
print(f'{hour}:{minute} {addition}')
I'm not sure if the last string +00 is useful.
If not, the following implementation can help you.
from datetime import datetime
def time_slots(string_time='2022-09-13 11:00:00.996795+00'):
date = datetime.strptime(string_time[:-3], '%Y-%m-%d %H:%M:%S.%f')
return date.strftime("%H:%M %p")
output = time_slots()
print(output) # the output is: 11:00 AM
You can use the parse function provided by dateutil:
from dateutil import parse
string_time = '2022-09-13 11:00:00.996795+00'
dt = parse(string_time)
return dt.strftime("%H:%M %p")
Result: 11:00 AM
I am unexperienced with Python and am trying to parse all timestamps of the following csv as datetime objects in order to then perform functions on them (e.g. find timestamp differences etc.).
However, I can parse single lines but not the whole timestamp column. I am getting a 'KeyError: '2010-12-30 14:32:00' for the first date of the timestamp column, when reaching the line below my 'not working' comment.
Thanks in advance.
from datetime import datetime, timedelta
import pandas as pd
from dateutil.parser import parse
csvFile = pd.read_csv('runningComplete.csv')
column = csvFile['timestamp']
column = column.str.slice(0, 19, 1)
print(column)
dt1 = datetime.strptime(column[1], '%Y-%m-%d %H:%M:%S')
print(dt1)
dt2 = datetime.strptime(column[2], '%Y-%m-%d %H:%M:%S')
print(dt1)
dt3 = dt1 - dt2
print(dt3)
for row in column:
print(row)
Not working:
for row in column:
timestamp = datetime.strptime(column[row], '%Y-%m-%d %H:%M:%S')
Can anyone please tell me how to save my parsed datetime objects to a list? Please see code after the last comment where the problem comes up - Why do I get the AttributeError: 'datetime.datetime' object has no attribute 'toList'? Thanks!
from datetime import datetime, timedelta
import pandas as pd
from dateutil.parser import parse
csvFile = pd.read_csv('myFile.csv')
column = csvFile['timestamp']
column = column.str.slice(0, 19, 1)
dt1 = datetime.strptime(column[1], '%Y-%m-%d %H:%M:%S')
print("dt1", dt1) #output: dt1 2010-12-30 15:06:00
dt2 = datetime.strptime(column[2], '%Y-%m-%d %H:%M:%S')
print("dt2", dt2) #output: dt2 2010-12-30 16:34:00
dt3 = dt1 - dt2
print("dt3", dt3) #output: dt3 -1 day, 22:32:00
#works:
for row in range(len(column)):
timestamp = datetime.strptime(column[row], '%Y-%m-%d %H:%M:%S')
print("timestamp", timestamp) #output (excerpt): timestamp 2010-12-30 14:32:00 timestamp 2010-12-30 15:06:00
#trying to save all parsed timestamps in list, NOT WORKING
myNewList = timestamp.toList()
print(myNewList)
you should create the list before the for loop, and then add each element to it in the loop, like so:
myNewList = []
#works:
for row in range(len(column)):
timestamp = datetime.strptime(column[row], '%Y-%m-%d %H:%M:%S')
print("timestamp", timestamp)
myNewList.append(timestamp)
print(myNewList)
I am trying to implement the following : a function to return the third-latest date, given such an array of Date objects
This is the solution I came up with, and I get the following error:
TypeError: must be string, not date
import datetime
def third_latest():
timestamps = ['2011-06-2', '2011-08-05', '2011-02-04', '2010-1-14', '2010-12-13', '2010-1-12', '2010-2-11', '2010-2-07', '2010-12-02', '2011-11-30', '2010-11-26', '2010-11-23', '2010-11-22', '2010-11-16']
dates = [datetime.datetime.strptime(ts, "%Y-%m-%d") for ts in timestamps]
dates.sort()
sorteddates = [datetime.datetime.strftime(ts, "%Y-%m-%d") for ts in dates]
return str(sorteddates[3])
def main():
third_latest()
if __name__ == "__main__":
main()
strftime converts datetime objects into strings according to formats.
Replace :
sorteddates = [datetime.datetime.strftime(ts, "%Y-%m-%d") for ts in dates]
with
sorteddates = [ts.strftime("%Y-%m-%d") for ts in dates]
Check out the documentation for strftime.
Also, you can check out the working version of your code here.