Deleting of 'd' and 'n' character in strip in python - python

dataframe
string1
Data%2Fxxx
Data%2Ffrance
Data%2Fdenmark
Data%2Fnorway
Code
df['string1'] = [x.strip('Data%2F') for x in df.string1]
output
string1
xxx
france
enmark
orway
So, strip function is removing 'd' and 'n' first character. Does anyone know why?How can i stop this from removing?Is this related to '\d' and '\n' ?
python version - 3.7.4

The strip() method returns a copy of the string with both leading and trailing characters stripped. According to https://docs.python.org/3/library/stdtypes.html#str.strip, "The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped." Examples from the documentation:
>>> ' spacious '.strip()
'spacious'
>>> 'www.example.com'.strip('cmowz.')
'example'
In other words, x.strip('Data%2F') is directing Python to strip any a's, t's, D's etc. from the beginning and end of the string. This is why "Data%2Faloha".strip("Data%2F") would actually return 'loh' unless you have, say, a space at the end, which is not part of the chars argument in your example. This is my best guess as to what's happening for you.

str.replace() should work perfectly for you.
>>> x.replace('Data%2F', '')

The correct way to proceed is with string.replace()
df['string1'] = [x.replace('Data%2F','') for x in dbppp.string1]
The string.strip() method returns a copy of the string in which all chars have been stripped from the beginning and the end of the string.
When I tested, it gave me a different result but still incorrect.
string.strip() is more used if you want to remove spaces from the start and end of a string for example.

It should be because of \n if it happens with t as well. You should rather use replace because it won't get rid of whitespaces.
string.replace("Data%2F","")

Related

splitlines of quote splits '\n' in sub-quote

Given I have a quote that contains a double sub-quote with a '\n',
If one performs a splitlines on the parent quote, the child quote is split too.
double_quote_in_simple_quote = 'v\n"x\ny"\nz'
print(double_quote_in_simple_quote.splitlines())
Resulting output
['v', '"x', 'y"', 'z']
I would have expected the following:
['v', '"x\ny"', 'z']
Because the '\n' is in the scope of the sub-quote.
I was hoping to get an explanation why it behaves as such and if you have any alternative to 'splitlines' at the level of the main quote only?
Thank you
The split function doesn't care about additional levels of quoting; it simply splits on every occurrence of the character you split on. (There isn't really a concept of nested quoting; a string is a string, and may or may not contain literal quotes, which are treated the same as any other character.)
If you want to implement quoting inside of strings, you have to do it yourself.
Perhaps use a regular expression;
import re
tokens = re.findall(r'"[^"]*"|[^"]*', double_quote_in_simple_quote)
splitresult = [
x if x.startswith('"') else x.split('\n')
for x in tokens]
Demo: https://ideone.com/lAgJTb
It is due to the nature of escape sequences in Python.
\n in python means a new line character. Whenever this sequence is captured by python, it treats it as line breakers and considers skipping a line. splitlines() method splits a string into a list and the splitting is done at line breaks. That's why you get a list without new line character.
However, you can get away with it by specifying a parameter which won't consider the escape line by default :
print(double_quote_in_simple_quote.splitlines(keepends=True))
>>> ['"x\\ny"']
I came up with a nasty code that can get you around while you try to find another method that splits quotes without the characteristics that makes Python's behaves as it does.
double_quote_in_simple_quote = '"x\ny"'
double_quote_in_simple_quote = double_quote_in_simple_quote.replace("\n", "$n")
splitted_quote = double_quote_in_simple_quote.splitlines()
print(splitted_quote)
splitted_quote_decoded = [quote.replace('$n', '\n') for quote in splitted_quote]
print(splitted_quote_decoded)
The idea is to replace the \n by something not meaningful yet not used, and then reverse it. I used your example, but I'm sure you will be able to tune it to fit your needs. My output was:
['"x$ny"']
['"x\ny"']
If you double-quote a string in Python, that doesn't mean there are nested strings, per se. Whatever the outermost quotes are, Python will start and end the string object according to that. Any internal quote-like characters are treated as the ascii characters.
>>> print('dog')
dog
>>> print('"dog"')
"dog"
Note how in the second line, the quotes are also printed, because those actual quote-characters are a part of the string. No nesting happening.

rstrip on python behaving awkwardly: 'HelloWorld'.rstrip('World') deletes everything after 'He'

I had started learning python recently, and I have encountered a awkward behavior with rstrip.
My understanding is that the str.rstrip() will delete all characters as soon as it encounters one of the character it took in an argument.
>>> a = 'HelloWorld'
>>> a.rstrip('l')
'HelloWorld'
>>> a = 'HelloWorld'
>>> a.rstrip('World')
'He'
I'm totally confused by the result I've got.
Shouldn't the first code supposed be producing 'He' instead?
str.rstrip() removes characters from the end of the string that are present in the set of characters supplied to rstrip() (if passed, and not None). rstrip() stops as soon as a character in the string is found that is not in the set of stripping characters.
So, for the first example, the final character in string a is 'd' which is not in the set of stripping characters passed to rstrip(). Therefore nothing is stripped from the original string.
In the second example, any of the characters in the set 'World' can be removed from the end of the string. You will notice that the substring 'lloWorld' comprises characters that are present in the stripping characters, so all of those characters are removed, leaving 'He' as the final result.
However, consider the case that there is a character at the end of the string that is not in the set 'World', e.g. a full stop.
>>> 'HelloWorld'.rstrip('World')
'He'
>>> 'HelloWorld.'.rstrip('World')
'HelloWorld.'
nothing has been stripped because '.' is not in 'World'.
The rstrip() in Python takes a set of characters as the argument.
If no argument is provided, the default is a whitespace.
It starts stripping the set of characters from the right if and only if it encounters character(s) mentioned in your character set, on the extreme right of the input string.
Which explains why l wasn't stripped in the first example.
In the second example World matches with the character set provided in the argument. So, World is stripped as well as the double l and o in Hello because World has has an l and an o in it.
Which leaves you with He which is the correct outcome.
The [rl]strip methods strip away any char that is in the string that you pass as argument. Since l and o are in 'World', they get stripped as well.
From the rstrip docs:
the characters in the string will be stripped from the end of the string this method is called on.
... not the string passed itself is stripped, but its characters!
It seems to start from the right and stop as soon as it encounters a character that is not in the set you passed as argument.
Take a look at the documentation.
rstrip removes all of the characters in its argument that are in the right side of the string you modify. So since "lloWorld" are all made up of characters in "World", those get removed; e isn't, so it stops there.
On the other hand, in the first call, l isn't actually on the right side (rstrip) of the string, so nothing is removed.
string.rstrip returns a copy of the string with trailing characters removed.
So, a.rstrip('l') won't change anything as 'l' isn't at the end.
But, a.rstrip('World') will, as the string 'World' exists at the end, all characters in string 'World' are removed from the text.
Check this for a further read : http://python-reference.readthedocs.io/en/latest/docs/str/rstrip.html

Getting only one '\' character when joining a list

I want to have a string with the'\w'character after each letter.
For example:
my_string = 'asdfg'
What I want:
my_string = 'a\ws\wd\wd\wf\wg\w'
Now, how I approached this is first storing each letter into a list:
list=[]
for i in my_string:
list.append(i)
And then joining it with a \w character in between to form my new string. However, I ran into some problems.
'\w'.join(list)
I'm getting a double backslash character instead of one:
'q\\ww\\we\\wr\\wt\\wy\\wu\\wy\\wt\\wr\\we\\ws\\wd\\wf\\wt\\wy\\wu\\wi\\wo\\wk\\wn\\wn'
I'd greatly appreciate any help in fixing this. Thanks.
\w is not a character. You might be thinking of another escape character, but '\w' simply evaluates to '\\w', since \w just doesn't exist.
Oh, you might also want to replace your for loop with simply list(my_string) or tuple(my_string) - or even the entire thing with '[whatever character you actually wanted]'.join(my_string) - it's simpler and does the same thing. To get your expected result, you'll also need to add the character to the end of the string, as in '[x]'.join(my_string) + '[x]'. As it stands now, you won't get the character after the very last letter.

remove this unidentified character "\" from string python

I want to remove this string "\". But i can't remove it because it's needed to do "\t or \n". Then i try this one """"\"""". But the python still don't do anything. I think the binary is not get this string. This is the code
remove = string.replace(""""\"""", " ")
And I want to replace
"\workspace\file.txt" become "workspace file.txt"
Anyone have any idea? Thanks in advance
You're trying to replace a backslash, but since Python uses backslashes as escape characters, you actually have to escape the backslash itself.
remove = str.replace("\\", " ")
DEMO:
In [1]: r"\workspace\file.txt".replace("\\", " ")
Out[1]: ' workspace file.txt'
Note the leading space. You may want to call str.strip on the end result.
You have to escape the backslash, as it has special meaning in strings. Even in your original string, if you leave it like that, it'll come out...not as you expect:
"\workspace\file.txt" --> '\\workspace\x0cile.txt'
Here's something that will break the string up by a backslash, join them together with a space where the backslash was, and trim the whitespace before and after the string. It also contains the correctly escaped string you need.
" ".join("\\workspace\\file.txt".split("\\")).strip()
View this way you can archive this,
>>> str = r"\workspace\file.txt"
>>> str.replace("\\", " ").strip()
'workspace file.txt'
>>>

convert vim regex to python for re.sub

I have a working regex under vim: /^ \{-}\a.*$\n
I implement a global search and replace as :%s/^ \{-}\a.*$\n//
This works great -- removes all lines that start with any number of spaces (matched non-greedily), followed by a letter and anything else to the end of the line including the newline.
I cannot (to save my soul) figure out the analogous regex in Python. Here's what make sense to me:
x = re.sub("^ *?\a.$\n","",y)
But this doesn't do anything.
Many thanks for your sagacious replies.
\a means the bell character (0x07) in Python, and $\n is a redundant bad idea, so:
x = re.sub(r"^ *[A-Za-z].*\n","",y)
Also, there's no reason to write ' *?' instead of ' *' here, as it's always going to be followed by a non-space if it's matching.
If you want to match any number of whitespace, you can also use the \s sequence.
Any letter will be matched by the [a-zA-Z] character class. You also don't need to use the $ and the \n, either will do.
Suggest the following:
x = re.sub(r"^\s*[a-zA-Z].*(\r|\n)","",y)
If you want at least one whitespace, use \s+ instead of \s*

Categories

Resources