Panda Get_Value throwing error : '[xxxx]' is an invalid key - python

I am trying to use Python DataFrame.Get_Value(Index,ColumnName) to get value of column and it keep throwing following Error
"'[10004]' is an invalid key" where 10004 is index value.
This is how Dataframe looks:
I have successfully used get_value before.. I dont know whats wrong with this dataframe.

First, pandas.DataFrame.get_value is deprecated (and should have been get_value, as opposed to Get_Value). It's better to use a non-deprecated method such as .loc or .at instead:
df.loc[10004, 'Column_Name']
# Or:
df.at[10004, 'Column_Name']
Your issue with might be that you have 10004 stored as a string instead of an integer. Try surrounding the index by quotes (df.loc['10004', 'Column_Name']). You can check this easily by saying: df.index.dtype, and seeing if it returns dtype('O')

Related

.strip() with in-place solution not working

I'm trying to find a solution for stripping blank spaces from some strings in my DataFrame. I found this solution, where someone said this:
I agree with the other answers that there's no inplace parameter for
the strip function, as seen in the
documentation
for str.strip.
To add to that: I've found the str functions for pandas Series
usually used when selecting specific rows. Like
df[df['Name'].str.contains('69'). I'd say this is a possible reason
that it doesn't have an inplace parameter -- it's not meant to be
completely "stand-alone" like rename or drop.
Also to add! I think a more pythonic solution is to use negative
indices instead:
data['Name'] = data['Name'].str.strip().str[-5:]
This way, we don't have to assume that there are 18 characters, and/or
we'll consistently get "last 5 characters" instead!
So, I have a list of DataFrames called 'dataframes'. On the first dataframe (which is dataframes[0]), I have a column named 'CNJ' with string values, some of them with a blank space in the end. For example:
Input:
dataframes[0]['cnj'][9]
Output:
'0100758-73.2019.5.01.0064 '
So, following the comment above, I did this:
Input:
dataframes[0]['cnj'] = dataframes[0]['cnj'].strip()
Then I get the following error:
AttributeError: 'Series' object has no attribute 'strip'
Since the solution given on the other topic worked, what am I doing wrong to get this error? It seemed to me it shouldn't work because its a Series, but it should get the same result as the one mentioned above (data['Name'] = data['Name'].str.strip().str[-5:]), right?
Use
dataframes[0]['cnj']=dataframes[0]['cnj'].str.strip()
or better yet, store the dataframe in a variable first:
df0=dataframes[0]
df0['cnj']=df0['cnj'].str.strip()
The code in the solution you posted uses .str. :
data['Name'] = data['Name'].str.strip().str[-5:]
The Pandas Series object has no string or date manipulation methods methods. These are exposed through the Series.str and Series.dt accessor objects.
The result of Series.str.strip() is a new series. That's why .str[-5:] is needed to retrieve the last 5 characters. That results is a new series again. That expression is equivalent to :
temp_series=data['Name'].str.strip()
data['Name'] = temp_series.str[-5:]
You could just apply a transformation function on the column values like this.
data["Name"] = data["Name"].apply(lambda x: str(x).strip()[-5:])
What you need is a string without the right spaces is a series or a dataframe right, at least that's my understanding looking at your query, use str.rstrip() which will work both on series and dataframe objects.
Note: strip() usually is only for string datatypes, so the error you are getting is appropriate.
Refer to link , and try implementing str.rstrip() provided by pandas.
For str.strip() you can refer to this link, it works for me.
In your case, assuming the dataframe column to be s, you can use the below code:
df[s].str.strip()

Getting an error when converting to float to get top 10 largest values

I am trying to use the nlargest function to return top 10 values using code below as,
df['roi'].astype(float).nlargest(3, 'roi')
But get an error of
ValueError: keep must be either "first", "last" or "all"
the roi column is an object, which is why I use the astype float but am still getting an error
When I try the keep = all or keep = first or last filter in the nlargest function I get an error of TypeError: nlargest() got multiple values for argument 'keep'
Thanks!
To use the method as you want, you must change your code to:
df.astype(float).nlargest(3, 'roi')
Since this syntax works only for pandas.DataFrames. If you want to specify the colum by its key, as in a dictionary, then you'll be working with pandas.Series, and the correct syntax would be
df['roi'].astype(float).nlargest(3)
The docs for both methods are here, for DataFrames, and here, for Series
For a one-liner you'll need to convert "roi" to a float type first, and then perform nlargest:
Passing a dictionary to .astype allows us to return the entire DataFrame making selective changes to specific columns' dtypes, and then we can perform .nlargest on that returned DataFrame (instead of just having a Series).
df.astype({"roi": float}).nlargest(3, columns="roi")

Taking part of string in python error

I got an error Can only use .str accessor with string values (i.e. inferred_type is 'string', 'unicode' or 'mixed')
For this code
newestdata = newestdf.assign(
idobject=newestdf.index.str.split('/').str[1].str.replace("-", "").str.extract('(\d+)', expand=False).astype(int))
What I used to take a part of this:
OOOO-ASAS/INTEL-64646/OOOO-15445/PPPO-9
But that's what happens in one python script, but in another don't, it works well. Do you have an idea what is the problem?
There is problem you have mixed data - some numeric with strings in index.
Need cast to string as first step:
newestdata = newestdf.assign(
idobject=newestdf.index.astype(str).str.split('/').str[1].str.replace("-", "").str.extract('(\d+)', expand=False).astype(int))
^^^^^^^^^^

Error passing Pandas Dataframe to Scikit Learn

I get the following error when passing a pandas dataframe to scikitlearn algorithms:
invalid literal for float(): 2.,3
How do I find the row or column with the problem in order to fix or eliminate it? Is there something like df[df.isnull().any(axis=1)] for a specific value (in my case I guess 2.,3)?
If you know what column it is, you can use
df[df.your_column == 2.,3]
then you'll get all rows where the specified column has a value of 2.,3
You might have to use
df[df.your_column == '2.,3']

Pandas apply function issue

I have a dataframe (data) of numeric variables and I want to analyse the distribution of each column by using the Shapiro test from scipy.
from scipy import stats
data.apply(stats.shapiro, axis=0)
But I keep getting the following error message:
ValueError: ('could not convert string to float: M', u'occurred at index 0')
I've checked the documentation and it says the first argument of the apply function should be a function, which stats.shapiro is (as far as I'm aware).
What am I doing wrong, and how can I fix it?
Found the problem. There was a column of type object which resulted in the error message above. Apply the function only to numeric columns solved this issue.

Categories

Resources