Convert string to unicode view - python

I have a string
s = "Санкт-Петербург"
I want to convert the string to
\u0421\u0430\u043D\u043A\u0442-\u041F\u0435\u0442\u0435\u0440\u0431\u0443\u0440\u0433
My code
x = "Санкт-Петербург"
y = str(x.encode('unicode-escape')) # I want y to be string
print(y) # b'\\u0421\\u0430\\u043d\\u043a\\u0442-\\u041f\\u0435\\u0442\\u0435\\u0440\\u0431\\u0443\\u0440\\u0433'
What is the best way to get rid of b' and \\ ?

Use a slice of [2:-1], which will slice off the b' and the ending ', then replace all the '\' with ''. Here is how:
x = "Санкт-Петербург"
y = str(x.encode('unicode-escape'))
print(y[2:-1].replace('\\\\', '\\'))
Output:
\u0421\u0430\u043d\u043a\u0442-\u041f\u0435\u0442\u0435\u0440\u0431\u0443\u0440\u0433

All you need to do is replace 4 backslashes with 2 backslashes using replace() method. After that you can strip away the b' at the start and ' at the end using strip() method.
Here's how you can proceed:
y = y.replace('\\\\', '\\') # Replace backslashes
y = y.strip("b'") # Strip the unnecessary parts
print(y) # Print out the result
Output
\u0421\u0430\u043D\u043A\u0442-\u041F\u0435\u0442\u0435\u0440\u0431\u0443\u0440\u0433
PS: strip("b'") also strips the end of the string for single quotes.
EDIT
As pointed out by #Aplet123, strip("b'") might cause some issues with any string that contains an ASCII b or ' at the start or end. Thus, instead of strip(), string slice could be used. All you need to do is replace the line y = y.strip("b'") with:
y = y[2:][:-1]
Here, 2: will strip away the b' at the start and :-1 will strip away the trailing '

Related

Rstrip not removing correct backslashes or giving position

So,
I have a string that looks like \uisfhb\dfjn
This will vary in length. Im struggling to get my head around rsplit and the fact that backslash is an escape character. i only want "dfjn"
i currently have
more = "\\\\uisfhb\dfjn"
more = more.replace(r'"\\\\', r"\\")
sharename = more.rsplit(r'\\', 2)
print(sharename)
and im getting back
['', 'uisfhb\dfjn']
If you want to partition a string on a literal backslash, you need to escape the backslash with another backslash in the separator.
>>> more.split('\\')
['', '', 'uisfhb', 'dfjn']
>>> more.rsplit('\\', 1)
['\\\\uisfhb', 'dfjn']
>>> more.rpartition('\\')
('\\\\uisfhb', '\\', 'dfjn')
Once the string has been split, the last element can be accessed using the index -1:
>>> sharename = more.rsplit('\\', 1)[-1]
>>> sharename
'dfjn'
or using sequence-unpacking syntax (the * operator)
>>> *_, sharename = more.rpartition('\\')
>>> sharename
'dfjn'
I think this is an issue with raw strings. Try this:
more = "\\\\uisfhb\dfjn"
more = more.replace("\\\\", "\\")
sharename = more.split("\\")[2] # using split and not rsplit
print(sharename)
If sharename is the last node in the tree, this will get it:
>>>more = "\\\\uisfhb\dfjn"
>>>sharename = more.split('\\')[-1]
>>>sharename
'dfjn'

How do I trim a string after certain amount of characters appear more then once in Python?

I am trying to scan a string and every time it reads a certain character 3 times, I would like to cut the remaining string
for example:
The string "C:\Temp\Test\Documents\Test.doc" would turn into "C:\Temp\Test\"
Every time the string hits "\" 3 times it should trim the string
here is my code that I am working on
prefix = ["" for x in range(size)]
num = 0
...
...
for char in os.path.realpath(src):
for x in prefix:
x = char
if x =='\': # I get an error here
num = num + 1
if num == 3:
break
print (num)
print(prefix)
...
...
the os.path.realpath(src) is the string with with the filepath. The "prefix" variable is the string array that I want to store the trimmed string.
Please let me know what I need to fix or if there is a simpler way to perform this.
Do split and then slice list to grab required and join:
s = 'C:\Temp\Test\Documents\Test.doc'
print('\\'.join(s.split('\\')[:3]) + '\\')
# C:\Temp\Test\
Note that \ (backslash) is an escaping character. To specifically mean a backslash, force it to be a backslash by adding a backslash before backslash \\, thereby removing the special meaning of backslash.
In python the backslash character is used as an escape character. If you do \n it does a newline, \t does a tab. There are many other things such as \" lets you do a quote in a string. If you want a regular backslash you should do "\\"
try
s = "C:\\Temp\\Test\\Documents\\Test.doc"
answer = '\\'.join(s.split('\\', 3)[:3])
Something like this would do..
x = "C:\Temp\Test\Documents\Test.doc"
print('\\'.join(x.split("\\")[:3])+"\\")

Text stripping issue

Apologies in advance if this turns out to be a PEBKAC issue, but I can't see what I'm doing wrong.
Python 3.5.1 (FWIW)
I've pulled data from an online source, each line of the page is .strip() 'ed of \r\n, etc. and converted to a utf-8 string. The lines I'm looking for are reduced further below.
I want to take two strings, join them and strip out all the non-alphanumerics.
> x = "ABC"
> y = "Some-text as an example."
> z = x+y.lower()
> type z
<class 'str'>
So here's the problem.
> z = z.strip("'-. ")
> print z
Why is the result:
ABCsome-text as an example.
and not, as I would like:
ABCsometextasanexample
I can get it to work with four .replace() commands, but strip really doesn't want to work here. I've also tried separate split commands:
> y = y.strip("-")
> print(y)
some-text as an example.
Whereas
> y.replace("-", '')
> print(y)
sometext as an example.
Any thoughts on what I might be doing wrong with .strip()?
Since you wish to remove all the non-alphanumeric characters, lets make it more generic using:
import re
x = "ABC"
y = "Some-text as an example."
z = x+y.lower()
z = re.sub(r'\W+', '', z)
Strip doesn't strip all characters, it only removes characters from the ends of strings.
From the official documentation
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped
Another solution would be using python's filter():
import re
x = "ABC"
y = "Some-text as an example."
z = x+y.lower()
z = filter(lambda c: c.isalnum(), z)
As others have pointed out, the problem with strip() is that it only operates on characters at the beginning and end of strings—so using replace() multiple times would be the way to accomplish what you want using just string methods.
Although not the question you asked, here's how to do it using one call to do with the re.sub() function in the re regular-expression module. The arbitrary characters to be replaced are defined by the contents of the string variable name chars.
import re
x = "ABC"
y = "Some-text as an example."
z = x + y.lower()
print('before: {!r}'.format(z)) # -> before: 'ABCsome-text as an example.'
chars = "'-. " # Characters to be replaced.
z = re.sub('(' + '|'.join(re.escape(ch) for ch in chars) + ')', '', z)
print('after: {!r}'.format(z)) # -> after: 'ABCsometextasanexample'

Python string strip function

Why are following code can remove '+':
a = '+'
a.strip('+')
#output: ''
a = '1+'
a.strip('+')
#output: '1'
a = '+66'
a.strip('+')
#output: '66'
But the followings can't:
a = '1+2'
a.strip('+')
#output: '1+2'
Why?
The strip() function only removes leading and trailing characters - on the outside of the string. Since in your last example the + is in the middle, it doesn't remove it. Maybe try using replace() instead:
my_str = "1+2"
new_str = my_str.replace("+", "")
strip only removes the specified heading and trailing characters the string, not in the middle.
Similarly, rstrip only removes the trailing ones.

lexical analysis python

Recently I made the following observation:
>>> x= "\'"
>>> x
"'"
>>> y="'"
>>> y
"'"
>>> print x
'
>>> print y
'
Can anyone please explain why is it so. I am using python 2.7.x. I know well about escape sequences.
I want to do the following:
I have a string with single quotes in it and I have to enter it in a database so I need to replace the instance of single quote(') with a backslash followed by a single quote(\'). How can I achieve this.
Inside a pair of "", you don't need to escape the ' character. You can, of course, but as you've seen it's unnecessary and has no effect whatsoever.
It'd be necessary to escape if you were to write a ' inside a pair of '' or a " inside a pair of "":
x = '\''
y = "\""
EDIT :
Regarding the last part in the question, added after the edit:
I have a string with single quotes in it and I have to enter it in a database so I need to replace the instance of single quote(') with a backslash followed by a single quote(\'). How can I achieve this
Any of the following will work, notice the use of raw strings for avoiding the need to escape special characters:
v = "\\'"
w = '\\\''
x = r'\''
y = r"\'"
print v, w, x, y
> \' \' \' \'

Categories

Resources