I have a list of columns in a dataframe that either contains a hashmark followed by a string or two hashmarks followed by a string. I wanted to eliminate the rows that contain only one hashmark.
df[df["column name"].str.contains("#") == False]
I've tried using the code above but it erased the entire column. I hoped that it would erase only the rows including only one hashmark. I do not know what to do.
can you try this:
df['len']=df['column name'].str.count('#') #how many "#" expressions are in the column.
df=df[df["len"]>1]
#or one line
df=df[df['column name'].str.count('#')>1]
if each of them have at least one '#' , and its either ## or #,
df[df["column name"].str.contains("##") == False]
above code will get you one # ones.
df[df["column name"].str.contains("##") == True]
above code will eliminate #'s and get you ## ones.
Related
I need to have my column be only integers, but unfortunately the CSV I need to use has the letters T and Z interspersed.
For example:
2022-09-24T00:30:49Z
How would I go about removing these letters for only this one column? I have tried looking elsewhere, but I can't find anything specifically relating to this.
df["column"]= df["column"].replace(r"T|Z", "", regex = True)
def custom_func(x):
if isinstance(x,str):
return x.translate(None,'TZ')
return x
df.col = df.col.apply(custom_func)
You can use applymap method to apply it to each individual cell if you may
This question should at least get you started in the right direction: Remove cell in a CSV file if a certain value is there
How do I replace the cell values in a column if they contain a number in general or contain a specific thing like a comma, replace the whole cell value with something else.
Say for example a column that has a comma meaning it has more than one thing I want it to be replaced by text like "ENM".
For a column that has a cell with a number value, I want to replace it by 'UNM'
As you have not provided examples of what your expected and current output look like, I'm making some assumptions below. What it seems like you're trying to do is iterate through every value in a column and if the value meets certain conditions, change it to something else.
Just a general pointer. Iterating through dataframes requires some important considerations for larger sizes. Read through this answer for more insight.
Start by defining a function you want to use to check the value:
def has_comma(value):
if ',' in value:
return True
return False
Then use the pandas.DataFrame.replace method to make the change.
for i in df['column_name']:
if has_comma(i):
df['column_name'] = df['column_name'].replace([i], 'ENM')
else:
df['column_name'] = df['column_name'].replace([i], 'UNM')
Say you have a column, i.e. pandas Series called col
The following code can be used to map values with comma to "ENM" as per your example
col.mask(col.str.contains(','), "ENM")
You can overwrite your original column with this result if that's what you want to do. This approach will be much faster than looping through each element.
For mapping floats to "UNM" as per your example the following would work
col.mask(col.apply(isinstance, args=(float,)), "UNM")
Hopefully you get the idea.
See https://pandas.pydata.org/docs/reference/api/pandas.Series.mask.html for more info on masking
in a given dataframe in pandas, is there a way to see all the Booleans present in filt in the code below:
filt = dataframe['tag1'] =='ABC'
filt
TLDR
It's possible. I think you should use indexing, it's extensively described here. To be more specific you can use boolean indexing.
Code should look like this
filt = df[df.loc[:,"tag1"] == 'ABC]
Now what actually happens here
df.loc[:,"tag1"] returns all rows : character, but limits columns to just "tag1". Next df.loc[:,"tag1"] == 'ABC comperes returned rows with value "ABC", as the result grid of True/False will be created. True row was equal to "ABC" etc. Now the grand final. Whenever you pass grid of logical values to an dataframe they are treated as indicators whether or not to include the result. So let's say value at [0,0] in passed grid is True, therefore it will be included in the result.
I understand it's hard to wrap one's head around this concept but once you get it it's super useful. The best is to just play around with this iloc[] and loc[] functions.
I am trying to make a new column depending on different criteria. I want to add characters to the string dependent on the starting characters of the column.
An example of the data:
RH~111~header~120~~~~~~~ball
RL~111~detailed~12~~~~~hat
RA~111~account~13~~~~~~~~~car
I want to change those starting with RH and RL, but not the ones starting with RA. So I want to look like:
RH~111~header~120~~1~~~~~ball
RL~111~detailed~12~~cancel~~~ball
RA~111~account~12~~~~~~~~~ball
I have attempted to use str split, but it doesn't seem to actually be splitting the string up
(np.where(~df['1'].str.startswith('RH'),
df['1'].str.split('~').str[5],
df['1']))
This is referencing the correct columns but not splitting it where I thought it would, and cant seem to get further than this. I feel like I am not really going about this the right way.
Define a function to replace element No pos in arr list:
def repl(arr, pos):
arr[pos] = '1' if arr[0] == 'RH' else 'cancel'
return '~'.join(arr)
Then perform the substitution:
df[0] = df[0].mask(df[0].str.match('^R[HL]'),
df[0].str.split('~').apply(repl, pos=5))
Details:
str.match provides that only proper elements are substituted.
df[0].str.split('~') splits the column of strings into a column
of lists (resulting from splitting of each string).
apply(repl, pos=5) computes the value to sobstitute.
I assumed that you have a DataFrame with a single column, so its column
name is 0 (an integer), instead of '1' (a string).
If this is not the case, change the column name in the code above.
I want to extract last two field values from a variable of varying length. For example, consider the three values below:
fe80::e590:1001:7d11:1c7e
ff02::1:ff1f:fb6
fe80::7cbe:e61:f5ab:e62 ff02::1:ff1f:fb6
These three lines are of variable lengths. I want to extract only the last two field values if i split each line by delimiter :
That is, from the three lines, i want:
7d11, 1c7e
ff1f, fb6
ff1f, fb6
Can this be done using split()? I am not getting any ideas.
If s is the string containing the IPv6 address, use
s.split(":")[-2:]
to get the last two components. The split() method will return a list of all components, and the [-2:] will slice this list to return only the last two elements.
You can use str.rsplit() to split from the right:
>>> ipaddress = 'fe80::e590:1001:7d11:1c7e'
>>> ipaddress.rsplit(':', 2) # splits at most 2 times from the right
['fe80::e590:1001', '7d11', '1c7e']
This avoids the unnecessary splitting of the first part of the address.