Finding values index wise - python

I have to perform RMSE on two columns with different Non-Nan values.
I have found the indices of Non-Nan Values in the first column. Now I have filtered out the values of 2nd column according to those indices.
This is the code I used to find the values of indices:-
b = np.argwhere(y.notnull().values).tolist()
Here y is the column which stores the indices of Non-Nan values in b.
I have another column x and have to match b with values of x. Filter out those values and store it in another column.

If you're using pandas dataframes, you can use pandas iloc
df[x].iloc[b]
You can just get the values using the values attribute
df[x].iloc[b].values

OR if wanted a list do:
print(df[column].iloc[b].values.tolist())

Related

How to find the list of indices of NaN values of a given column of a dataframe in python [duplicate]

I have been worried about how to find indices of all rows with null values in a particular column of a pandas dataframe in python. If A is one of the entries in df.columns then I need to find indices of each row with null values in A
Supposing you need the indices as a list, one option would be:
df[df['A'].isnull()].index.tolist()
np.where(df['column_name'].isnull())[0]
np.where(Series_object) returns the indices of True occurrences in the column. So, you will be getting the indices where isnull() returned True.
The [0] is needed because np.where returns a tuple and you need to access the first element of the tuple to get the array of indices.
Similarly, if you want to get the indices of all non-null values in the column, you can run
np.where(df['column_name'].isnull() == False)[0]

Pandas: Sorting values using two column elements

how to sort a data-frame using two column values, at first look at 1st column values and only if values are duplicate then look at the 2nd column values
Use sort_values() on dataframe as:-
df.sort_values(by=['col1', 'col2'])

Extracting values from pandas DataFrame using a pandas Series

I have a pandas Series that contains key-value pairs, where the key is the name of a column in my pandas DataFrame and the value is an index in that column of the DataFrame.
For example:
Series:
Series
Then in my DataFrame:
Dataframe
Therefore, from my DataFrame I want to extract the value at index 12 from my DataFrame for 'A', which is 435.81 . I want to put all these values into another Series, so something like { 'A': 435.81 , 'AAP': 468.97,...}
My reputation is low so I can't post my images as images instead of links (can someone help fix this? thanks!)
I think this indexing is what you're looking for.
pd.Series(np.diag(df.loc[ser,ser.axes[0]]), index=df.columns)
df.loc allows you to index based on string indices. You get your rows given from the values in ser (first positional argument in df.loc) and you get your column location from the labels of ser (I don't know if there is a better way to get the labels from a series than ser.axes[0]). The values you want are along the main diagonal of the result, so you take just the diagonal and associate them with the column labels.
The indexing I gave before only works if your DataFrame uses integer row indices, or if the data type of your Series values matches the DataFrame row indices. If you have a DataFrame with non-integer row indices, but still want to get values based on integer rows, then use the following (however, all indices from your series must be within the range of the DataFrame, which is not the case with 'AAL' being 1758 and only 12 rows, for example):
pd.Series(np.diag(df.iloc[ser,:].loc[:,ser.axes[0]]), index=df.columns)

Find index of all rows with null values in a particular column in pandas dataframe

I have been worried about how to find indices of all rows with null values in a particular column of a pandas dataframe in python. If A is one of the entries in df.columns then I need to find indices of each row with null values in A
Supposing you need the indices as a list, one option would be:
df[df['A'].isnull()].index.tolist()
np.where(df['column_name'].isnull())[0]
np.where(Series_object) returns the indices of True occurrences in the column. So, you will be getting the indices where isnull() returned True.
The [0] is needed because np.where returns a tuple and you need to access the first element of the tuple to get the array of indices.
Similarly, if you want to get the indices of all non-null values in the column, you can run
np.where(df['column_name'].isnull() == False)[0]

pandas unique values multiple columns different dtypes

Similar to pandas unique values multiple columns I want to count the number of unique values per column. However, as the dtypes differ I get the following error:
The data frame looks like
A small[['TARGET', 'title']].apply(pd.Series.describe) gives me the result, but only for the category types and I am unsure how to filter the index for only the last row with the unique values per column
Use apply and np.unique to grab the unique values in each column and take its size:
small[['TARGET','title']].apply(lambda x: np.unique(x).size)
Thanks!

Categories

Resources