Getting attribute error and key error in pandas dataframe - python

I am running
import pandas as pd
df= pd.read_csv("RELIANCE.csv",parse_dates=['Date'], index_col=['Date'])
df.head(2)
It gives output below
Open High Low Close Adj Close Volume
Date
2019-08-19 1281.050049 1296.800049 1280.000000 1292.599976 1287.764648 7459859.0
2019-08-20 1289.800049 1292.599976 1272.599976 1275.949951 1271.176880 6843460.0
but type(df.Date[0]) throws AttributeError: 'DataFrame' object has no attribute 'Date' and df['2019-08-19'] throws KeyError: '2019-08-19'
Can anybody tell me How to resolve this error?

I think you can use .loc
df.loc['2019-08-19']
AttributeError is probably because index name is not stored as a attribute for any data frame, so you can't address it directly. Instead, you can do something like type(df.index[0]) or df.index.dtype to index type.

Related

'str' object has no attribute 'columns' when I try to use the function "to.csv"

I'm trying to use the funcion to.csv to export a dataset but the error "'str' object has no attribute 'columns'" was reported. That's my script:
import pandas as pd
data=pd.read_csv('Documents/Pos/ETLSIM/Dados/ETLSIM.DORES_MG_2019_t.csv', low_memory="false")
data2 = pd.read_csv('Documents/Pos/ETLSIM/ETLSIM.DORES_MG_2018_t.csv', low_memory="false")
df_concat = pd.concat([data,data2], sort = False)
df_concat.to_csv('concatenado.csv')

AttributeError: Cannot access callable attribute 'nlargest' of 'DataFrameGroupBy' objects

Data Overview
Hello everyone
I need to get the two platforms with the most visits per day for one year in total. So:
Group the data by day
Extract the two platforms with most visits for each day
I tried this code:
df.groupby(pd.Grouper(key="Datum", freq="1D")).nlargest(2, 'Visits')
and got that error:
AttributeError: Cannot access callable attribute 'nlargest' of 'DataFrameGroupBy' objects, try using the 'apply' method
Thanks a lot for your help! :)
Why not just use apply, as the error message states:
import pandas as pd
# dataframe example
d = {'Platform': ['location', 'office', 'station'], 'Date': ['01.08.2019', '01.08.2019', '01.08.2019'], 'Visits': [4372, 48176, 2012]}
df = pd.DataFrame(data=d)
df.groupby(pd.Grouper(key="Date")).apply(lambda grp: grp.nlargest(2, 'Visits'))

Dask compute gives AttributeError: 'Series' object has no attribute 'encode'

I would like to apply a function to each row of a dask dataframe.
Executing the operation with ddf.compute() gives me an error:
AttributeError: 'Series' object has no attribute 'encode'
This is my code:
def polar(data):
data=scale(sid.polarity_scores(data.tweet)['compound'])
return data
t_data['sentiment'] = t_data.map_partitions(polar, meta=('sentiment', int))
And using t_data.head() also result in same error.
I have found out the answer. You have to apply for partition.
t_data['sentiment']=t_data.map_partitions(lambda df : df.apply(polar,axis=1))
You can use the following:
t_data.apply(polar, axis=1)

AttributeError: 'DataFrame' object has no attribute 'to_datetime'

I want to convert all the items in the 'Time' column of my pandas dataframe from UTC to Eastern time. However, following the answer in this stackoverflow post, some of the keywords are not known in pandas 0.20.3. Overall, how should I do this task?
tweets_df = pd.read_csv('valid_tweets.csv')
tweets_df['Time'] = tweets_df.to_datetime(tweets_df['Time'])
tweets_df.set_index('Time', drop=False, inplace=True)
error is:
tweets_df['Time'] = tweets_df.to_datetime(tweets_df['Time'])
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/core/generic.py", line 3081, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'to_datetime'
items from the Time column look like this:
2016-10-20 03:43:11+00:00
Update:
using
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])
tweets_df.set_index('Time', drop=False, inplace=True)
tweets_df.index = tweets_df.index.tz_localize('UTC').tz_convert('US/Eastern')
did no time conversion. Any idea what could be fixed?
Update 2:
So the following code, does not do in-place conversion meaning when I print the row['Time'] using iterrows() it shows the original values. Do you know how to do the in-place conversion?
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])
for index, row in tweets_df.iterrows():
row['Time'].tz_localize('UTC').tz_convert('US/Eastern')
for index, row in tweets_df.iterrows():
print(row['Time'])
to_datetime is a function defined in pandas not a method on a DataFrame. Try:
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])

AttributeError: 'DataFrame' object has no attribute 'timestamp'

I want to select only those rows that have a timestamp that belongs to last 36 hours. My PySpark DataFrame df has a column unix_timestamp that is a timestamp in seconds.
This is my current code, but it fails with the error AttributeError: 'DataFrame' object has no attribute 'timestamp'. I tried to change it to unix_timestamp, but it fails all the time.
import datetime
hours_36 = (datetime.datetime.now() - datetime.timedelta(hours = 36)).strftime("%Y-%m-%d %H:%M:%S")
df = df.withColumn("unix_timestamp", df.unix_timestamp.cast("timestamp")).filter(df.timestamp > hours_36)
The time stamp column doesn't exist yet when you try to refer to it; You can either use pyspark.sql.functions.col to refer to it in a dynamic way without specifying which data frame object the column belongs to as:
import pyspark.sql.functions as F
df = df.withColumn("unix_timestamp", df.unix_timestamp.cast("timestamp")).filter(F.col("unix_timestamp") > hours_36)
Or without creating the intermediate column:
df.filter(df.unix_timestamp.cast("timestamp") > hours_36)
The API Doc tells me that you can also use a String notation for filtering:
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.filter
import pyspark.sql.functions as F
df = df.withColumn("unix_timestamp", df.unix_timestamp.cast("timestamp"))
.filter("unix_timestamp > %s" % hours_36)
Maybe its not so effienc though

Categories

Resources