I am having a dataframe containing multiple columns and multiple rows. I am trying to find the column which contains the entry 'some_string'. I managed to this by
col = df.columns[df.isin(['some_string']).any()]
I would like to have col as a string, but instead it is of the following type
In [47]:
print(col)
Out[47]:
Index(['col_N'], dtype='object')
So how can I get just 'col_N' returned? I just can't find an answer to that! Tnx
You can treat your output as a list. If you have only one match you can as for
print(col[0])
If you have one or more and you want to print then all, you can convert it to a list:
print(list(col))
or you can only pass the values of col to the print:
print(*col)
I think typecasting will help
list_of_columns = list(df.columns)
Related
df.dropna(subset=[column_name], inplace=True)
I have a dataframe with some missing values in some column (column_name). For example, one value is the empty string, ''. But the code above doesn't drop the row with such empty values.
Isn't the code right way to do it?
The code does not drop those because they are not na they are an empty string. For those you would have to do something like:
rows_to_drop = df[df[column_name]==''].index
df.drop(rows_to_drop, inplace=True)
Alternative:
Something like this would also work:
df = df.loc[df[column_name]!='',:]
I have a pandas dataframe and when I try to acess its columns (like df[["a"]) it is not possible because
the columns are defined as an "Index" object (pandas.core.indexes.base.Index). or Index(['col2','col2'], [![enter image description here][1]][1]dtype='object')
I tried convert it doing something like df.columns = df.columns.tolist() and also df.columns = [str(col) for col in df.columns]
but the columns remained as an Index object.
What I want is to make df.columns and it would return a list object.
What Can I do ?
columns is not callable. So, you need to remove the parenthesis ():
df.columns will give you the name of the columns as an object.
list(df.columns) will give you the name of the columns as a list.
In your example, list(ss.columns) will return a list of column names.
try this:
df.columns.values.tolist()
since you were trying to convert it using this approach, you missed the values attribute
You have to wrap it over list Constructor to function it like a list i.e list(ss.columns).
list(ss.columns)
Hope this works!
I'm aware that you can use
df1 = df1[df1['Computer Name'] != 'someNameToBeDropped']
to drop a given string as a row
what if i wanted to do it the other way around. Let's say dropping everything except what i have in a list of strings.
is there a simple hack I haven't noticed?
Try this to get rows such that value of col is in that given list
df = df[df[column].isin(list_of_strings)]
Additional to exclude what's in the list
df = df[~df[column].isin(list_of_values)]
I have a DataFrame 'df' with a string column. I was trying to remove a list of special values from this column.
For example if the column 'number' is: onE1, I want it change to 1; if the column is FOur4, I want it change to 4
I used the following code:
for i in ['onE','TwO','ThRee', 'FOur']:
print(i)
df['new_number'] = df['number'].str.replace(i,'')
Although print(i) shows the i go through the list of strings, the column 'new_number' only removed 'FOur' from column 'number', the rest string 'onE','TwO','ThRee' are still in column 'new_number', which means onE1, is still onE1; but value FOur4 changed to 4 in the column 'new_number'
So what is wrong with this piece of code?
To get the numbers from the string in the dataFrame you can use this:
number = ''.join(x for x in df['number'].str if x.isdigit())
I found a similar post to this question
pandas replace (erase) different characters from strings
we can use regex to solve this issue
df['new_number'] = df['number'].str.replace('onE|TwO|ThRee|FOur','')
I have a small problem: I have a column in my DataFrame, which has multiple rows, and in each row it holds either 1 or more values starting with 'M' letter followed by 3 digits. If there is more than 1 value, they are separated by a comma.
I would like to print out a view of the DataFrame, only featuring rows where that 1 column holds values I specify (e.g. I want them to hold any item from list ['M111', 'M222'].
I have started to build my boolean mask in the following way:
df[df['Column'].apply(lambda x: x.split(', ').isin(['M111', 'M222']))]
In my mind, .apply() with .split() methods in there first convert 'Column' values to lists in each row with 1 or more values in it, and then .isin() method confirms whether or not any of items in list of items in each row are in the list of specified values ['M111', 'M222'].
In practice however, instead of getting a desired view of DataFrame, I get error
'TypeError: unhashable type: 'list'
What am I doing wrong?
Kind regards,
Greem
I think you need:
df2 = df[df['Column'].str.contains('|'.join(['M111', 'M222']))]
You can only access the isin() method with a Pandas object. But split() returns a list. Wrapping split() in a Series will work:
# sample data
data = {'Column':['M111, M000','M333, M444']}
df = pd.DataFrame(data)
print(df)
Column
0 M111, M000
1 M333, M444
Now wrap split() in a Series.
Note that isin() will return a list of boolean values, one for each element coming out of split(). You want to know "whether or not any of item in list...are in the list of specified values", so add any() to your apply function.
df[df['Column'].apply(lambda x: pd.Series(x.split(', ')).isin(['M111', 'M222']).any())]
Output:
Column
0 M111, M000
As others have pointed out, there are simpler ways to go about achieving your end goal. But this is how to resolve the specific issue you're encountering with isin().