AttributeError: 'DataFrame' object has no attribute 'to_datetime' - python

I want to convert all the items in the 'Time' column of my pandas dataframe from UTC to Eastern time. However, following the answer in this stackoverflow post, some of the keywords are not known in pandas 0.20.3. Overall, how should I do this task?
tweets_df = pd.read_csv('valid_tweets.csv')
tweets_df['Time'] = tweets_df.to_datetime(tweets_df['Time'])
tweets_df.set_index('Time', drop=False, inplace=True)
error is:
tweets_df['Time'] = tweets_df.to_datetime(tweets_df['Time'])
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/core/generic.py", line 3081, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'to_datetime'
items from the Time column look like this:
2016-10-20 03:43:11+00:00
Update:
using
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])
tweets_df.set_index('Time', drop=False, inplace=True)
tweets_df.index = tweets_df.index.tz_localize('UTC').tz_convert('US/Eastern')
did no time conversion. Any idea what could be fixed?
Update 2:
So the following code, does not do in-place conversion meaning when I print the row['Time'] using iterrows() it shows the original values. Do you know how to do the in-place conversion?
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])
for index, row in tweets_df.iterrows():
row['Time'].tz_localize('UTC').tz_convert('US/Eastern')
for index, row in tweets_df.iterrows():
print(row['Time'])

to_datetime is a function defined in pandas not a method on a DataFrame. Try:
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])

Related

Getting attribute error and key error in pandas dataframe

I am running
import pandas as pd
df= pd.read_csv("RELIANCE.csv",parse_dates=['Date'], index_col=['Date'])
df.head(2)
It gives output below
Open High Low Close Adj Close Volume
Date
2019-08-19 1281.050049 1296.800049 1280.000000 1292.599976 1287.764648 7459859.0
2019-08-20 1289.800049 1292.599976 1272.599976 1275.949951 1271.176880 6843460.0
but type(df.Date[0]) throws AttributeError: 'DataFrame' object has no attribute 'Date' and df['2019-08-19'] throws KeyError: '2019-08-19'
Can anybody tell me How to resolve this error?
I think you can use .loc
df.loc['2019-08-19']
AttributeError is probably because index name is not stored as a attribute for any data frame, so you can't address it directly. Instead, you can do something like type(df.index[0]) or df.index.dtype to index type.

Timestamp object has no attribute dt

I am trying to convert a new column in a dataframe through a function based on the values in the date column, but get an error indicating "Timestamp object has no attribute dt." However, if I run this outside of a function, the dt attribute works fine.
Any guidance would be appreciated.
This code runs with no issues:
sample = {'Date': ['2015-07-02 11:47:00', '2015-08-02 11:30:00']}
dftest = pd.DataFrame.from_dict(sample)
dftest['Date'] = pd.to_datetime(dftest['Date'])
display(dftest.info())
dftest['year'] = dftest['Date'].dt.year
dftest['month'] = dftest['Date'].dt.month
This code gives me the error message:
sample = {'Date': ['2015-07-02 11:47:00', '2015-08-02 11:30:00']}
dftest = pd.DataFrame.from_dict(sample)
dftest['Date'] = pd.to_datetime(dftest['Date'])
def CALLYMD(dftest):
if dftest['Date'].dt.month>9:
return str(dftest['Date'].dt.year) + '1231'
elif dftest['Date'].dt.month>6:
return str(dftest['Date'].dt.year) + '0930'
elif dftest['Date'].dt.month>3:
return str(dftest['Date'].dt.year) + '0630'
else:
return str(dftest['Date'].dt.year) + '0331'
dftest['CALLYMD'] = dftest.apply(CALLYMD, axis=1)
Lastly, I'm open to any suggestions on how to make this code better as I'm still learning.
I'm guessing you should remove .dt in the second case. When you do apply it's applying to each element, .dt is needed when it's a group of data, if it's only one element you don't need .dt otherwise it will raise
{AttributeError: 'Timestamp' object has no attribute 'dt'}
reference: https://stackoverflow.com/a/48967889/13720936
After looking at the timestamp documentation, I found removing the .dt and just doing .year and .month works. However, I'm still confused as to why it works in the first code but does not work in the second code.
here is how to create a yearmonth bucket using the year and month
for key, item in df.iterrows():
year=pd.to_datetime(item['Date']).year
month=str(pd.to_datetime(item['Date']).month)
df.loc[key,'YearMonth']="{:.0f}{}".format(year,month.zfill(2))

str object has no attribute strftime

AttributeError: 'str' object has no attribute 'strftime'
if __name__=="__main__":
df=pd.read_excel("abhi.xlsx")
#print(df)
today = datetime.datetime.now().strftime("%d-%m")
yearNow = datetime.datetime.now().strftime("%Y")
#print(type(today))
writeInd =[]
for index, item in df.iterrows():
print(index,item['Birthday'])
pr_bday = item['Birthday'].strftime("%d-%m")
print(pr_bday)
Because your column 'Birthday' is not datetime type, actually it is a string type object.
You can use df.dtype to check type of each column in df.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html
Or you can just use type(item['Birthday']) to get the type directly.

Attribute error while creating list from string values

I have imported excel file with some data and removed missing values.
df = pd.read_excel (r'file.xlsx', na_values = missing_values)
Im trying to split string values to make them into list for later actions.
df['GENRE'] = df['GENRE'].map(lambda x: x.split(','))
df['ACTORS'] = df['ACTORS'].map(lambda x: x.split(',')[:3])
df['DIRECTOR'] = df['DIRECTOR'].map(lambda x: x.split(','))
But it gives me following error - AttributeError: 'list' object has no attribute 'split'
I've done the same with csv and it worked.. could it be because its excel?
Im sure it's simple but i can't get my head around it.example of my dataframe
Try using str.split, the Pandas way:
df['GENRE'] = df['GENRE'].str.split(',')
df['ACTORS'] = df['ACTORS'].str.split(',').str[:3]
df['DIRECTOR'] = df['DIRECTOR'].str.split(',')

AttributeError: 'DataFrame' object has no attribute 'timestamp'

I want to select only those rows that have a timestamp that belongs to last 36 hours. My PySpark DataFrame df has a column unix_timestamp that is a timestamp in seconds.
This is my current code, but it fails with the error AttributeError: 'DataFrame' object has no attribute 'timestamp'. I tried to change it to unix_timestamp, but it fails all the time.
import datetime
hours_36 = (datetime.datetime.now() - datetime.timedelta(hours = 36)).strftime("%Y-%m-%d %H:%M:%S")
df = df.withColumn("unix_timestamp", df.unix_timestamp.cast("timestamp")).filter(df.timestamp > hours_36)
The time stamp column doesn't exist yet when you try to refer to it; You can either use pyspark.sql.functions.col to refer to it in a dynamic way without specifying which data frame object the column belongs to as:
import pyspark.sql.functions as F
df = df.withColumn("unix_timestamp", df.unix_timestamp.cast("timestamp")).filter(F.col("unix_timestamp") > hours_36)
Or without creating the intermediate column:
df.filter(df.unix_timestamp.cast("timestamp") > hours_36)
The API Doc tells me that you can also use a String notation for filtering:
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.filter
import pyspark.sql.functions as F
df = df.withColumn("unix_timestamp", df.unix_timestamp.cast("timestamp"))
.filter("unix_timestamp > %s" % hours_36)
Maybe its not so effienc though

Categories

Resources