Finding only one world from pandas column [duplicate] - python

I want to filter a pandas data frame based on exact match of a string.
I have a data frame as below
df1 = pd.DataFrame({'vals': [1, 2, 3, 4,5], 'ids': [u'aball', u'bball', u'cnut', u'fball','aballl']})
I want to filter all the rows except the row that has 'aball'.As you can see I have one more entry with ids == 'aballl'. I want that filterd out. Hence the below code does not work:
df1[df1['ids'].str.contains("aball")]
even str.match does not work
df1[df1['ids'].str.match("aball")]
Any help would be greatly appreciated.

Keeping it simple, this should work:
df1[df1['ids'] == "aball"]

You can try this:
df1[~(df1['ids'] == "aball")]
Essentially it will find all entries matching "aball" and then negate it.

Related

Get column value of row with condition based on another column

I keep writing code like this, when I want a specific column in a row, where I select the row first based on another column.
my_col = "Value I am looking for"
df.loc[df["Primary Key"] == "blablablublablablalblalblallaaaabblablalblabla"].iloc[0][
my_col
]
I don't know why, but it seems weird. Is there a more beautiful solution to this?
It would be helpful with a complete minimally working example, since it is not clear what your data structure looks like. You could use the example given here:
import pandas as pd
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
index=['cobra', 'viper', 'sidewinder'],
columns=['max_speed', 'shield'])
If you are then trying to e.g. select the viper-row based on its max_speed, and then obtain its shield-value like so:
my_col = "shield"
df.loc[df["max_speed"] == 4].iloc[0][my_col]
then I guess that is the way to do that - not a lot of fat in that command.

Changing column values for a value in an adjacent column in the same dataframe using Python

I am quite new to Python programming.
I am working with the following dataframe:
Before
Note that in column "FBgn", there is a mix of FBgn and FBtr string values. I would like to replace the FBtr-containing values with FBgn values provided in the adjacent column called "## FlyBase_FBgn". However, I want to keep the FBgn values in column "FBgn". Maybe keep in mind that I am showing only a portion of the dataframe (reality: 1432 rows). How would I do that? I tried the replace() method from Pandas, but it did not work.
This is actually what I would like to have:
After
Thanks a lot!
With Pandas, you could try:
df.loc[df["FBgn"].str.contains("FBtr"), "FBgn"] = df["## FlyBase_FBgn"]
Welcome to stackoverflow. Please next time provide more info including your code. It is always helpful
Please see the code below, I think you need something similar
import pandas as pd
#ignore the dict1, I just wanted to recreate your df
dict1= {"FBgn": ['FBtr389394949', 'FBgn3093840', 'FBtr000025'], "FBtr": ['FBgn546466646', '', 'FBgn15565555']}
df = pd.DataFrame(dict1) #recreating your dataframe
#print df
print(df)
#function to replace the values
def replace_values(df):
for i in range(0, (df.size//2)):
if 'tr' in df['FBgn'][i]:
df['FBgn'][i] = df['FBtr'][i]
return df
df = replace_values(df)
#print new df
print(df)

Python.pandas: how to select rows where objects start with letters 'PL'

I have specific problem with pandas: I need to select rows in dataframe which start with specific letters.
Details: I've imported my data to dataframe and selected columns that I need. I've also narrowed it down to row index I need. Now I also need to select rows in other column where objects START with letters 'pl'.
Is there any solution to select row only based on first two characters in it?
I was thinking about
pl = df[‘Code’] == pl*
but it won't work due to row indexing. Advise appreciated!
Use startswith for this:
df = df[df['Code'].str.startswith('pl')]
Fully reproducible example for those who want to try it.
import pandas as pd
df = pd.DataFrame([["plusieurs", 1], ["toi", 2], ["plutot", 3]])
df.columns = ["Code", "number"]
df = df[df.Code.str.startswith("pl")] # alternative is df = df[df["Code"].str.startswith("pl")]
If you use a string method on the Series that should return you a true/false result. You can then use that as a filter combined with .loc to create your data subset.
new_df = df.loc[df[‘Code’].str.startswith('pl')].copy()
The condition is just a filter, then you need to apply it to the dataframe. as filter you may use the method Series.str.startswith and do
df_pl = df[df['Code'].str.startswith('pl')]

Pass a list to str.contains - Pandas

I have a pandas related question: I need to filter a column (approx. 40k entries) based on substrings included (or not) in the column. Each of the entries in the column is basically a very long list of attributes (text) which I need to be able to filter individually. This line of code works, but it is not scalable (I have hundreds of attribures I have to filter for):
df[df['Product Lev 1'].str.contains('W1 Rough wood', na=False) & df['Product Lev 1'].str.contains('W1.2', na=False)]
Is there a possibiltiy to insert all the items I have to filter and pass it as a list? Or any similr solution ?
THANK YOU!
Like this:
data = {'col_1': [3, 2, 1, 0], 'col_2': ['aaaaDB', 'bbbbbbCB', 'cccccEB', 'ddddddUB']}
df=pd.DataFrame.from_dict(data)
lst = ['DB','CB'] #replace with your list
rstr = '|'.join(lst)
df[df['col_2'].str.upper().str.contains(rstr)]

Filter Pandas Data Frame based on exact string match

I want to filter a pandas data frame based on exact match of a string.
I have a data frame as below
df1 = pd.DataFrame({'vals': [1, 2, 3, 4,5], 'ids': [u'aball', u'bball', u'cnut', u'fball','aballl']})
I want to filter all the rows except the row that has 'aball'.As you can see I have one more entry with ids == 'aballl'. I want that filterd out. Hence the below code does not work:
df1[df1['ids'].str.contains("aball")]
even str.match does not work
df1[df1['ids'].str.match("aball")]
Any help would be greatly appreciated.
Keeping it simple, this should work:
df1[df1['ids'] == "aball"]
You can try this:
df1[~(df1['ids'] == "aball")]
Essentially it will find all entries matching "aball" and then negate it.

Categories

Resources