I'm trying to sum an entire column by country, but when I use
my_df.groupby('COUNTRY').VALUES.mean()
It throws a
DataError: No numeric types to aggregate
And when I use
my_df.groupby('COUNTRY').VALUES.sum()
It produces really big values that are far from realistic (maybe by adding them as strings together?)
Could it be that it interprets the values in the column as strings, or am I using the function the wrong way?
I'm trying to accomplish exactly what this guy is doing at 1:45 https://www.youtube.com/watch?v=qy0fDqoMJx8
i.e the values column contains integers that I want to sum by each country.
The values in the column was interpreted as strings, they explain how to convert the datatype in this question
Change data type of columns in Pandas
I understand you are trying to achieve a count of countries, but Is not clear if you want the count the countries or it is based on another variable:
Try:
my_df['COUNTRY'].value_counts()
within the same column or if the sum is based on another variable:
my_df[['COUNTRY','other_variable']].groupby(['COUNTRY']).sum()
Your question is not clear, you should show your dataframe
You need to convert your VALUES series to a numeric type before performing any computations. For example, for conversion to integers:
# convert to integers, non-convertible values will become NaN
my_df['VALUES'] = pd.to_numeric(my_df['VALUES'], downcast='integer', errors='coerce')
# perform groupby as normal
grouped = my_df.groupby('COUNTRY')['VALUES'].mean()
Related
So, I have this column
And, I wanted to convert into this column
I want to replace zero with number. So, how can I do that using pandas python?
Try with groupby
out = df.groupby(['col1','col2','col3'],as_index=False).max()
I want to count how many different values exists in a pandas dataframe column and return it as an INTEGER. I need to store that number in a variable so I can use it later.
I tried this:
count = pd.Series(table.column.nunique())
I get the expected result, but as pandas series not as integer, so I can't use it later in my function.
I've also looked for something that could convert it into numeric, but I havn't found anything.
Does this solve your issue?
table['column'].nunique()
I want to strip and separate values from a column to another column in Pandas dataframe. Te current values are like
df['column']
14.535.00
14.535.00
14.535.00
I want to remove the 00 after second dot(.) and store them in another column
df['new_column'] as int values so that I could perform arithmetic operations
Edit 1: Seems like apply is always bad, seems like a more accepted solution is to use list comprehensions.
df['new_column'] = [str(x).split('.')[-1] for x in df.iloc[:,0]]
DON'T DO WHAT'S BELOW
I think this is a good instance for using apply. You might not need the str call.
What this is doing is taking the values in your column (aka a Series) and applying a function to them. The function takes each item, makes it a string, splits on the period, and grabs the last value. We then store the results of all this into a new column.
df['new_column'] = df['column'].iloc[:,0].apply(lambda x: str(x).split('.')[-1])
should result in something like what you want
When I want to retrieve the jth+1 value from the column of a panda dataframe, I can write: df["column_name"].ix[j]
When I check the type of the above code, I get:
type(df["column_name"].ix[i]) #str
I want to write less lengthy code though subsetting by the index. So I write:
df[[i]].ix[j]
However, when I check the type, I get: pandas.core.series.Series
How I rewrite this for the indexical subsetting to produce a str?
The double subscripting does something else than what you seem to imply it does - it returns a DataFrame of the corresponding columns.
As far as I know, the shortest way to do what you're asking using column-row ordering is
df.iloc[:, j].ix[i]
(There's the shorter
df.icol(j).ix[i]
but it's deprecated.)
One way to do this is like so:
df.ix[i][j]
This is kind of funky though, because the first index is the row, and the second is the column, which is rather not pandas. More like matrix indexing than pandas indexing.
I have a huge dataframe, and I index it like so:
df.ix[<integer>]
Depending on the index, sometimes this will have only one row of values. Pandas automatically converts this to a Series, which, quite frankly, is annoying because I can't operate on it the same way I can a df.
How do I either:
1) Stop pandas from converting and keep it as a dataframe ?
OR
2) easily convert the resulting series back to a dataframe ?
pd.DataFrame(df.ix[<integer>]) does not work because it doesn't keep the original columns. It treats the <integer> as the column, and the columns as indices. Much appreciated.
You can do df.ix[[n]] to get a one-row dataframe of row n.