Unable to remove certain symbols from a string? - python

#Loading of the .csv data from the Happy Planet Index and WHO, defining and clearing the data for RDF conversion.
hsi_data = pd.read_csv("HPI_Main.csv", sep=';')
hsi_data = hsi_data.replace(to_replace=[" ", "%"], value="", regex=True)
hsi_data = hsi_data.replace(to_replace=",", value=".", regex=True)
hsi_data = hsi_data.fillna("unknown")
pd.set_option("display.max_rows", None, "display.max_columns", None)
hsi_data.columns = hsi_data.columns.str.replace(' ', '_')
for x in hsi_data["GDP/capita"]:
re.sub(r'$', ' ', x)
print(x)
The whole point is to remove $ sign from the data in GDP/capita, and convert it into a integer. However nothing seems to remove the symbol, no re.sub nor replace or remove, its like it isnt detecting it?

I looked at the Happy Planet data. The problem you are having with re replacement is that the $ sign is a special character in a regular expression, so must be escaped. This works:
x = " $67,646 "
z = re.sub("\\$", " ", x)
print(z)

If you want to remove the $ sign and convert the value to an integer, you could replace the last 3 lines of the above code with the following -
hsi_data['GDP/Capita'] = hsi_data['GDP/Capita'].str.replace('$','').astype(int)

Related

Split a string if character is present else don't split

I have a string like below in python
testing_abc
I want to split string based on _ and extract the 2 element
I have done like below
split_string = string.split('_')[1]
I am getting the correct output as expected
abc
Now I want this to work for below strings
1) xyz
When I use
split_string = string.split('_')[1]
I get below error
list index out of range
expected output I want is xyz
2) testing_abc_bbc
When I use
split_string = string.split('_')[1]
I get abc as output
expected output I want is abc_bbc
Basically What I want is
1) If string contains `_` then print everything after the first `_` as variable
2) If string doesn't contain `_` then print the string as variable
How can I achieve what I want
Set the maxsplit argument of split to 1 and then take the last element of the resulting list.
>>> "testing_abc".split("_", 1)[-1]
'abc'
>>> "xyz".split("_", 1)[-1]
'xyz'
>>> "testing_abc_bbc".split("_", 1)[-1]
'abc_bbc'
You can use list slicing and str.join in case _ is in the string, and you can just get the first element of the split (which is the only element) in the other case:
sp = string.split('_')
result = '_'.join(sp[1:]) if len(sp) > 1 else sp[0]
All of the ways are good but there is a very simple and optimum way for this.
Try:
s = 'aaabbnkkjbg_gghjkk_ttty'
try:
ans = s[s.index('_')+1:]
except:
ans = s
Ok so your error is supposed to happen/expected because you are using '_' as your delimiter and it doesn't contain it.
See How to check a string for specific characters? for character checking.
If you want to only split iff the string contains a '_' and only on the first one,
input_string = "blah_end"
delimiter = '_'
if delimiter in input_string:
result = input_string.split("_", 1)[1] # The ",1" says only split once
else:
# Do whatever here. If you want a space, " " to be a delimiter too you can try that.
result = input_string
this code will solve your problem
txt = "apple_banana_cherry_orange"
# setting the maxsplit parameter to 1, will return a list with 2 elements!
x = txt.split("_", 1)
print(x[-1])

How to add text within two string delimiters

I want to add some text within two delimiters in a string.
Previous string:
'ABC [123]'
New string needs to be like this:
'ABC [123 sometext]'
How do I do that?
slightly more versatile I'd say, without using replace:
s = 'ABC [123]'
insert = 'sometext'
insert_after = '123'
delimiter = ' '
ix = s.index(insert_after)
if ix != -1:
s = s[:ix+len(insert_after)] + delimiter + insert + s[ix+len(insert_after):]
# or with an f-string:
# s = f"{s[:ix+len(insert_after)]}{delimiter}{insert}{s[ix+len(insert_after):]}"
print(s)
# ABC [123 sometext]
If the insert patterns get more complex, I'd also suggest to take a look at regex. If the pattern is simple however, not using regex should be the more efficient solution.
Most of these types of changes depend on prerequisite knowledge of string pattern.
In your case simple str.replace would do the trick.
varstr = 'ABC [123]';
varstr.replace(']',' sometext]');
You can profit a lot from str doc and diving into regex;
All the above answers are correct but if somehow you are trying to add a variable
variable_string = 'ABC [123]'
sometext = "The text you want to add"
variable_string = variable_string.replace("]", " " + sometext + "]")
print(variable_string)

Text stripping issue

Apologies in advance if this turns out to be a PEBKAC issue, but I can't see what I'm doing wrong.
Python 3.5.1 (FWIW)
I've pulled data from an online source, each line of the page is .strip() 'ed of \r\n, etc. and converted to a utf-8 string. The lines I'm looking for are reduced further below.
I want to take two strings, join them and strip out all the non-alphanumerics.
> x = "ABC"
> y = "Some-text as an example."
> z = x+y.lower()
> type z
<class 'str'>
So here's the problem.
> z = z.strip("'-. ")
> print z
Why is the result:
ABCsome-text as an example.
and not, as I would like:
ABCsometextasanexample
I can get it to work with four .replace() commands, but strip really doesn't want to work here. I've also tried separate split commands:
> y = y.strip("-")
> print(y)
some-text as an example.
Whereas
> y.replace("-", '')
> print(y)
sometext as an example.
Any thoughts on what I might be doing wrong with .strip()?
Since you wish to remove all the non-alphanumeric characters, lets make it more generic using:
import re
x = "ABC"
y = "Some-text as an example."
z = x+y.lower()
z = re.sub(r'\W+', '', z)
Strip doesn't strip all characters, it only removes characters from the ends of strings.
From the official documentation
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped
Another solution would be using python's filter():
import re
x = "ABC"
y = "Some-text as an example."
z = x+y.lower()
z = filter(lambda c: c.isalnum(), z)
As others have pointed out, the problem with strip() is that it only operates on characters at the beginning and end of strings—so using replace() multiple times would be the way to accomplish what you want using just string methods.
Although not the question you asked, here's how to do it using one call to do with the re.sub() function in the re regular-expression module. The arbitrary characters to be replaced are defined by the contents of the string variable name chars.
import re
x = "ABC"
y = "Some-text as an example."
z = x + y.lower()
print('before: {!r}'.format(z)) # -> before: 'ABCsome-text as an example.'
chars = "'-. " # Characters to be replaced.
z = re.sub('(' + '|'.join(re.escape(ch) for ch in chars) + ')', '', z)
print('after: {!r}'.format(z)) # -> after: 'ABCsometextasanexample'

How to replace " [ ] ' " with blank space from a string ['flag = no'] with python

Trying to replace the bracket and single quotes with space using re.sub, its throwing error from ['flag = no']
import re
import subprocess
#string to search text
lst = r'(flask) C:\Users\user1\AppData\Local\Programs\Python\Python35-enter code heretion>python secureassistchk.py flag = no'
#search flag = no within string & return "['flag = no']
dat = re.findall('flag.*', lst)
print("Print FLAG:", dat)
# replace [' with blank space , this doesn't work
#dat3 = re.sub('[\(\)\{\}<>]', '', dat)
#dat3 = re.sub('\b[]\b','', dat)
dat3 = re.sub('[ ]','',dat)
print("Print FLAG:", dat3)
The error is caused by the fact that dat is a list, not a string.
Try:
dat = re.findall('flag.*', lst)[0]
Here, I fixed it for you:
Code:
dat3 = re.sub('\[|\]','', str(dat))
print("Print FLAG:", dat3)
Result:
"'flag = no'"
Edit:
Ok, I missed the part about quotes. This is the corrected regex:
dat3 = re.sub('\[|\]|\'','', str(dat))
The first problem in your initial query was explained by Maciek:
dat is not a string object.
The second problem with your query was that the character you want to replace must be escaped with a \ if they are special characters. You must also chain them with a pipe (a.k.a '|' character).
For example, if you want to add white spaces to your list of replaced characters, the regex will be changed to:
dat3 = re.sub('\[|\]|\'| ','', str(dat))
You should notice the additional pipe and space character.

lexical analysis python

Recently I made the following observation:
>>> x= "\'"
>>> x
"'"
>>> y="'"
>>> y
"'"
>>> print x
'
>>> print y
'
Can anyone please explain why is it so. I am using python 2.7.x. I know well about escape sequences.
I want to do the following:
I have a string with single quotes in it and I have to enter it in a database so I need to replace the instance of single quote(') with a backslash followed by a single quote(\'). How can I achieve this.
Inside a pair of "", you don't need to escape the ' character. You can, of course, but as you've seen it's unnecessary and has no effect whatsoever.
It'd be necessary to escape if you were to write a ' inside a pair of '' or a " inside a pair of "":
x = '\''
y = "\""
EDIT :
Regarding the last part in the question, added after the edit:
I have a string with single quotes in it and I have to enter it in a database so I need to replace the instance of single quote(') with a backslash followed by a single quote(\'). How can I achieve this
Any of the following will work, notice the use of raw strings for avoiding the need to escape special characters:
v = "\\'"
w = '\\\''
x = r'\''
y = r"\'"
print v, w, x, y
> \' \' \' \'

Categories

Resources