DataFrame object has no attribute 'sample'

DataFrame object has no attribute 'sample' - python

Simple code like this won't work anymore on my python shell:
import pandas as pd
df=pd.read_csv("K:/01. Personal/04. Models/10. Location/output.csv",index_col=None)
df.sample(3000)
The error I get is:
AttributeError: 'DataFrame' object has no attribute 'sample'
DataFrames definitely have a sample function, and this used to work.
I recently had some trouble installing and then uninstalling another distribution of python. I don't know if this could be related.
I've previously had a similar problem when trying to execute a script which had the same name as a module I was importing, this is not the case here, and pandas.read_csv is actually working.
What could cause this?

As given in the documentation of DataFrame.sample -
DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
Returns a random sample of items from an axis of object.
New in version 0.16.1.
(Emphasis mine).
DataFrame.sample is added in 0.16.1 , you can either -
Upgrade your pandas version to latest, you can use pip for that, Example -
pip install pandas --upgrade
Or if you don't want to upgrade, and want to sample few rows from the dataframe, you can also use random.sample(), Example -
import random
num = 100 #number of samples
sampleddata = df.loc[random.sample(list(df.index),num)]

Related

Facing "'numpy.ndarray' object has no attribute 'median'" error while using median

I was going through a cheat sheet of NumPy and found the median() function. As I was trying it, an error was thrown. Here, you can find the cheat sheet. The code is:
import numpy as np
check = np.array([1, 2, 3])
check.median()
The error message is:
'numpy.ndarray' object has no attribute 'median'
In the official NumPy doc, I found that I should use median() as np.median(check). So, I was wondering if check.median() is an obsolete method or I was doing anything in a wrong way?
The NumPy version I was using was 1.19.5. Then I upgraded it to the latest version (1.21.2). Still, the same error persists.

how could I solve "a result has failed to un-serialize. please ensure that the objects returned by the function are always picklable." when I run LDA？

When I use the pyLDAvis.gensim functoion to construct visualization at google colab, it shows this error:
a result has failed to un-serialize. please ensure that the objects
returned by the function are always picklable.
My codes are:
!pip install pyldavis
import pyLDAvis
import pyLDAvis.gensim_models as pg
pyLDAvis.enable_notebook()
vis = pg.prepare(lda, corpus, dictionary, sort_topics=False) # construct visualization

Upgrading pandas to '1.2.0' version worked for me.

Why am I getting "module 'rpy2.robjects.conversion' has no attribute 'py2rpy'" error?

I'm trying to convert a pandas dataframe into an R dataframe using the guide here. The code I have so far is:
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
pd_df = pd.DataFrame({'int_values': [1, 2, 3],
'str_values': ['abc', 'def', 'ghi']})
with localconverter(ro.default_converter + pandas2ri.converter):
r_from_pd_df = ro.conversion.py2rpy(pd_df)
However, this is giving me the following error:
Traceback (most recent call last):
File <my_file_ref>, line 13, in <module>
r_from_pd_df = ro.conversion.py2rpy(pd_df)
AttributeError: module 'rpy2.robjects.conversion' has no attribute 'py2rpy'
I have found this relevant question where the OP refers to function names being changed however doesn't explain what the changes are. I've tried looking at the module but I'm not quite advanced enough to be able to make sense of it.
The accepted answer refers to checking versions which I've done and I'm definitely using rpy2 v3.3.3 which is the same as the guide I'm following.
Has anyone encountered this error and found a solution?

The section of the rpy2 documentation you are pointing out is built by running the code example. This is means that the example did work with the corresponding version of rpy2.
I am not sure you are using that version of rpy2 at runtime? For example, add print(rpy2.__version__) to check that this is the case.
For what is worth, the latest release in the rpy2 3.3.x series is 3.3.6 and there is probably no good reason to stay with 3.3.3. Otherwise rpy2 3.4.0 was just released; if using R 4.0 or greater, or the latest release of the R packages dplyr or ggplot2 together with their rpy2 wrapper, I would recommend to use that release.
[PS: I just tried your example with rpy2-3.4.0 and it runs without error]

DataFrame.set_index returns 'str' object is not callable

I'm not looking for a solution here as I found a workaround; mostly I'd just like to understand why my original approach didn't work given that the work around did.
I have a dataframe of 2803 rows with the default numeric key. I want to replace that with the values in column 0, namely TKR.
So I use f.set_index('TKR') and get
f.set_index('TKR')
Traceback (most recent call last):
File "<ipython-input-4-39232ca70c3d>", line 1, in <module>
f.set_index('TKR')
TypeError: 'str' object is not callable
So I think maybe there's some noise in my TKR column and rather than scrolling through 2803 rows I try f.head().set_index('TKR')
When that works I try f.head(100).set_index('TKR') which also works. I continue with parameters of 500, 1000, and 1500 all of which work. So do 2800, 2801, 2802 and 2803. Finally I settle on
f.head(len(f)).set_index('TKR')
which works and will handle a different size dataframe next month. I would just like to understand why this works and the original, simpler, and (I thought) by the book method doesn't.
I'm using Python 3.6 (64 bit) and Pandas 0.18.1 on a Windows 10 machine

You might have accidentally assigned the pd.DataFrame.set_index() to a value.
example of this mistake: f.set_index = 'intended_col_name'
As a result for the rest of your code .set_index was changed into a str, which is not callable, resulting in this error.
Try restarting your notebook, remove the wrong code and replace it with f.set_index('TKR')

I know it's been a long while, but I think some people may need the answer in the future.
What you do with f.set_index('TKR') is totally right as long as 'TKR' is a column of DataFrame f.
That is to say, this is a bug you are not supposed to have. It is always because that you redefine some build-in function methods or functions of python in your former steps(Possibly 'set_index'). So, the way to fix is to review your code to find out which part is wrong.
If you are using Jupiter notebook, restart it and run this block only can fix this problem.

I believe I have a solution for you.
I ran into the same problem and I was constructing my dataframes from a dictionary, like this:
df_beta = df['Beta']
df_returns = df['Returns']
then, trying to do df_beta.set_index(Date) would fail. My workaround was
df_beta = df['Beta'].copy()
df_returns = df['Returns'].copy()
So apparently, if you build your dataframes as a "view" of another existing dataframe, you can't set index and it will raise 'Series not callable' error. If instead you create an explicit new object copying the original dataframes, then you can call reset_index, which is what you kind of do when you compute the head.
Hope this helps, 2 years later :)

I have the same problem here.
import tushare as ts
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts.set_token('*************************************')
tspro = ts.pro_api()
gjyx = tspro.daily(ts_code='000516.SZ', start_date='20190101')
# this doesn't work
# out:'str' object is not callable
gjyx = gjyx.set_index('trade_date')
# this works
gjyx = gjyx.head(len(gjyx)).set_index('trade_date')
jupyter notebook 6.1.6, python 3.9.1, miniconda3, win10
But when I upload this ipynb to ubuntu on AWS, it works.

I once had this same issue.
This simple line of code keep throwing TypeError: 'series' object is not callable error again and again.
df = df.set_index('Date')
I had to shutdown my kernel and restart the jupyter notebook to fix it.

How to fix this error: "SQLContext object has no no attribute 'jsonFile'

I am learning Spark now. When I tried to load a json file, as follows:
people=sqlContext.jsonFile("C:\wdchentxt\CustomerData.json")
I got the following error:
AttributeError: 'SQLContext' object has no attribute 'jsonFile'
I am running this on Windows 7 PC, with spark-2.1.0-bin-hadoop2.7, and Python 2.7.13 (Dec 17, 2016).
Thank you for any suggestions that you may have.

You probably forgot to import the implicits. This is what my solution looks like in Scala:
def loadJson(filename: String, sqlContext: SqlContext): Dataset[Row] = {
import sqlContext._
import sqlContext.implicits._
val df = sqlContext.read.json(filename)
df
}

First, the more recent versions of Spark (like the one you are using) involve .read.json(..) instead of the deprecated .readJson(..).
Second, you need to be sure that your SqlContext is setup right, as mentioned here: pyspark : NameError: name 'spark' is not defined. In my case, it's setup like this:
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
myObjects = sqlContext.read.json('file:///home/cloudera/Downloads/json_files/firehose-1-2018-08-24-17-27-47-7066324b')
Note that they have version-specific quick-start tutorials that can help with getting some of the basic operations right, as mentioned here: name spark is not defined
So, my point is to always check to ensure that with whatever library or language you are using (and this applies in general across all technologies) that you are following the documentation that matches the version you are running because it is very common for breaking changes to create a lot of confusion if there is a version mismatch. In cases where the technology you are trying to use is not well documented in the version you are running, that's when you need to evaluate if you should upgrade to a more recent version or create a support ticket with those who maintain the project so that you can help them to better support their users.
You can find a guide on all of the version-specific changes of Spark here: https://spark.apache.org/docs/latest/sql-programming-guide.html#upgrading-from-spark-sql-16-to-20
You can also find version-specific documentation on Spark and PySpark here (e.g. for version 1.6.1): https://spark.apache.org/docs/1.6.1/sql-programming-guide.html

As mentioned before, .jsonFile (...) has been deprecated1, use this instead:
people = sqlContext.read.json("C:\wdchentxt\CustomerData.json").rdd
Source:
[1]: https://docs.databricks.com/spark/latest/data-sources/read-json.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

DataFrame object has no attribute 'sample' - python

Related

Facing "'numpy.ndarray' object has no attribute 'median'" error while using median

how could I solve "a result has failed to un-serialize. please ensure that the objects returned by the function are always picklable." when I run LDA？

Why am I getting "module 'rpy2.robjects.conversion' has no attribute 'py2rpy'" error?

DataFrame.set_index returns 'str' object is not callable

How to fix this error: "SQLContext object has no no attribute 'jsonFile'

Categories

Resources