I have a dataframe and I want to make a deep copy of it so I can modify the copy and use it in further processing.
I am working in Azure Databricks.
My dataframe is called "a" and I tried the following command:
b = a.copy(deep=True)
When I run it, I encounter the following error :
'DataFrame' object has no attribute 'copy'
I also tried to use 'iloc' or 'loc' function to create a new dataframe only with the columns that I need, but same error ('DataFrame' object has no attribute 'lit').
Any ideas why is this happening?..
Assuming you're working in Python, check whether you're using a Spark DataFrame or a pandas DataFrame. If you're using a pandas one then I couldn't tell you what's going on without more information; if you're using the spark one then you should use
newDataFrame = oldDataFrame.select('*')
Related
I have created a dataframe in databricks as a combination of multiple dataframes. I am now trying to upload that df to a table in my database and I have used this code many times before with no problem, but now it is not working.
My code is
df.write.saveAsTable("dashboardco.AccountList")
getting the error:
AttributeError: 'DataFrame' object has no attribute 'write'
Thanks for any help!
Most probably your DataFrame is the Pandas DataFrame object, not Spark DataFrame object.
try:
spark.createDataFrame(df).write.saveAsTable("dashboardco.AccountList")
You are a life saver. thank you so much !!
I have a large dataframe and I am trying to use pandas group by in combination with mean()
For example:
df.groupby(['id_column'])['weight'].mean()
If the 'id_column' is a string I get the following error:
AttributeError: 'StringDtype' object has no attribute 'storage'
If I convert the 'id_column' to a float I don't get the error.
If I subset the dataframe to be smaller I don't get the error e.g. only select one day of data or only select data from one id.
I am using
Pandas Version: 1.2.5
Version: 1.22.3
I encountered the same error message
AttributeError: 'StringDtype' object has no attribute 'storage'
while I was performing set_index().
Since the dataframe had been loaded from a pickle, I too thought it could be an incompatibility problem from an older pandas. So I did a df = df.copy() on the line just before the error, and that fixed it.
I am new to python, I have imported a file into jupyter as follows:
df = pd.read_csv(r"C:\Users\shalotte1\Documents\EBQS_INTEGRATEDQUOTEDOCUMENT\groceries.csv")
I am using the following code to determine the number of rows and columns in the data
df.shape()
However I am getting the following error:
TypeError: 'tuple' object is not callable
You want df.shape - this will return a tuple as in (n_rows, n_cols). You are then trying to call this tuple as though it were a function.
As you are new to python, I would recommend you to read this page. This will make you get aware of other causes too so that you can solve this problem again if it appears in the future.
https://careerkarma.com/blog/python-typeerror-tuple-object-is-not-callable/
I have this csv file from https://www.data.gov.au/dataset/airport-traffic-data/resource/f0fbdc3d-1a82-4671-956f-7fee3bf9d7f2
I'm trying to aggregate with
airportdata = Airports.groupby(['Year_Ended_December'])('Dom_Pax_in','Dom_Pax_Out')
airportdata.sum()
However, I keep getting 'DataFrameGroupBy' object is not callable
and it wont print the data I want
How to fix the issue?
You need to execute the sum aggregation before extracting the columns:
airportdata_agg = Airports.groupby(['Year_Ended_December']).sum()[['Dom_Pax_in','Dom_Pax_Out']]
Alternatively, if you'd like to ensure you're not aggregating columns you are not going to use:
airportdata_agg = Airports[['Dom_Pax_in','Dom_Pax_Out', 'Year_Ended_December']].groupby(['Year_Ended_December']).sum()
I have a data frame "df" with a column called "column1". By running the below code:
df.column1.value_counts()
I get the output which contains values in column1 and its frequency. I want this result in the excel. When I try to this by running the below code:
df.column1.value_counts().to_excel("result.xlsx",index=None)
I get the below error:
AttributeError: 'Series' object has no attribute 'to_excel'
How can I accomplish the above task?
You are using index = None, You need the index, its the name of the values.
pd.DataFrame(df.column1.value_counts()).to_excel("result.xlsx")
If go through the documentation Series had no method to_excelit applies only to Dataframe.
So either you can save it another frame and create an excel as:
a=df.column1.value_counts()
a.to_excel("result.xlsx")
Look at Merlin comment I think it is the best way:
pd.DataFrame(df.column1.value_counts()).to_excel("result.xlsx")