I have the following code:
df(df.Sex=='male')
I get an error stating that the DataFrame object is not callable.
How can I solve this?
It is called boolean indexing and need [] only:
df[df.Sex=='male']
Or:
df.query("Sex =='male'")
Related
I have a Koalas / Pandas-on-Spark dataframe named df.
When I try the function below I get a TypeError: str object is not callable
df[~(df.time.eq('00:00:00').groupby(df.vehicle_id).transform('sum')>=2)]
When I check the datatypes of both columns I get:
print(df.time.dtype)
<U0
print(df.vehicle_id.dtype)
<U0
Is that something that might have to deal with it?
My goal is to create a pandas dataframe with a datetimeindex from a django model. I am using the django-pandas package for this purpose, specifically, the 'to_timeseries()' method.
First, I used the .values() method on my qs. This still returns a qs, but it contains dictionaries. I then used to_timeseries() to create my dataframe. Everything here worked as expected: the pivot, the values, etc. But my index is just a list of strings. I don't know why.
I have been able to find a great many manipulations in the pandas documentation, including how to turn a column or series into datetime objects. However, my index is not a Series, it is an Index, and none of these methods work. How do I make this happen? Thank you.
df = mclv.to_timeseries(index='day',
pivot_columns='medicine',
values='takentoday',
storage='wide')
df = df['day'].astype(Timestamp)
raise TypeError(f"dtype '{dtype}' not understood")
TypeError: dtype '<class 'pandas._libs.tslibs.timestamps.Timestamp'>' not understood
AttributeError: 'DataFrame' object has no attribute 'DateTimeIndex'
df = pd.DatetimeIndex(df, inplace=True)
TypeError: __new__() got an unexpected keyword argument 'inplace'
TypeError: Cannot cast DatetimeIndex to dtype
etc...
Correction & update: django-pandas did work as its authors expected. The problem was my misunderstanding of what it was doing, and how.
I tried to do this:
get_sent_score_neut_df =
pandas.Series(get_sent_score_neut).to_frame(name='sentiment-
neutral').reset_index().apply(lambda x: float(x))
And when I want to merge/join it with another DataFrame I created the same way the error I get is:
AttributeError: 'Series' object has no attribute '_join_compat'
Is there a way to fix that?
That´s the line of code I used to merge/join them:
sentMerge = pandas.DataFrame.join(get_sent_score_pos_df, get_sent_score_neut_df)
Besides: i have tried to rename the index with ```.reset_index`(name='xyz')``
(assigning column names to a pandas series) which causes my IDE to responed with "unexpected argument".
I am converting some code from Pandas to pyspark. In pandas, lets imagine I have the following mock dataframe, df:
And in pandas, I define a certain variable the following way:
value = df.groupby(["Age", "Siblings"]).size()
And the output is a series as follows:
However, when trying to covert this to pyspark, an error comes up: AttributeError: 'GroupedData' object has no attribute 'size'. Can anyone help me solve this?
The equivalent of size in pyspark is count:
df.groupby(["Age", "Siblings"]).count()
You can also use the agg method, which is more flexible as it allows you to set column alias or add other types of aggregations:
import pyspark.sql.functions as F
df.groupby('Age', 'Siblings').agg(F.count('*').alias('count'))
I am new to python, I have imported a file into jupyter as follows:
df = pd.read_csv(r"C:\Users\shalotte1\Documents\EBQS_INTEGRATEDQUOTEDOCUMENT\groceries.csv")
I am using the following code to determine the number of rows and columns in the data
df.shape()
However I am getting the following error:
TypeError: 'tuple' object is not callable
You want df.shape - this will return a tuple as in (n_rows, n_cols). You are then trying to call this tuple as though it were a function.
As you are new to python, I would recommend you to read this page. This will make you get aware of other causes too so that you can solve this problem again if it appears in the future.
https://careerkarma.com/blog/python-typeerror-tuple-object-is-not-callable/