Python: string is escaped when creating a tuple [duplicate] - python

This question already has an answer here:
Why does printing a tuple (list, dict, etc.) in Python double the backslashes?
(1 answer)
Closed 7 months ago.
I have the following code:
string = "ad\23e\4x{\s"
data = (string,)
When I print the data my string in the tuple has an extra slash for each slash a total of 6 back slashes.
How can I avoid the extra back slashes?

The object data is a tuple. When you print a tuple, Python call repr for each element. If you want to format it another way, you have to do the conversion yourself.
>>> s = "ad\23e\4x{\s"
>>> d = (s,)
>>> print d
('ad\x13e\x04{\\s',)
>>> print '(%s,)' % (', '.join('"%s"' % _ for _ in d))
("adex{\s")

Those extra backslashes aren't actually in your string, they are just how Python represents strings (the idea being that you could paste that back into a program and it would work). It's doing that because the tuple's __str__() implementation calls repr() on each item. If you print string or print data[0] you will see what's actually in the string.

You mean something like this?
In [11]: string = r'ad\23e\4x{\s'
In [12]: string
Out[12]: 'ad\\23e\\4x{\\s'
In [13]: print string
ad\23e\4x{\s
In [14]: data=(string,)
In [15]: data
Out[15]: ('ad\\23e\\4x{\\s',)
In [16]: print data
('ad\\23e\\4x{\\s',)
In [17]: print data[0]
ad\23e\4x{\s

Related

What is point of using the concatenation(+) in Python when commas(,) do the job?

print("This is a string" + 123)
Concatenating throws error, but using a comma instead does the job.
As you already been told, your code raises an error because you can only concatenate two strings. In your case one of the arguments of the concatenation is an integer.
print("This is a string" + str (123))
But your question is more something "plus vs. comma". Why one should ever use + when , works?
Well, that is true for print arguments, but actually there are other scenario in which you may need a concatenation. For example in an assignment
A = "This is a string" + str (123)
Using comma, in this case, would lead to a different (and probably unexpected) result. It would generate a tuple and not a concatenation.
Hey here you are trying to concatenate the string and integer. It will throw type error.
You can try something like
print("This is a string"+str(123))
Commas (,) are not actually concatenating the values it's just printing it in a fashion that it looks like concatenation.
Concatenation on the Other hand will actually join two strings.
That's one case of print(). However if you do need a string, concatenation is the way:
x = "This is a string, "+str(123)
gets you " This is a string, 123"
Should you write
x = "This is a string", 123
you would get the tuple ("This is a string",123). That's not a string but an entirely different type.
If you have your int value in a variable, you can print it out with f-string (format string).
Format take more inputs like print(("one text number {num1} and another text number {num2}").format(num1=variable1, num2=variable2)
x = 123
print(("This is a string {x}").format(x=x))
The above code outputs:
This is a string 123
You can read more about it here:
python-f-strings
# You can concatenate strings and int variables with a comma, however a comma will silently insert a space between the values, whereas '+' will not. Also '+' when used with mixed types will give unexpected results or just error altogether.
>>> start = "Jaime Resendiz is"
>>> middle = 21
>>> end = "years old!
>>> print(start, middle, end)
>>> 'Jaime Resendiz is 21 years old!'
It's simple cause 123 is an int type and you cannot concatenate int with str type.
>>> s = 123
>>> type(s)
<class 'int'>
>>>
>>> w = "Hello"+ s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "int") to str
>>>
>>>
>>> w = "Hello" + str(s)
>>>
>>> w
'Hello123'
>>>
You can see the error , so you can convert the s variable that its value is 123 to string using str() function. But situations like this that you want to concatenate strings with other types? I think you should use f-strings
Example
>>> boolean = True
>>> fl = 1.2
>>> integer = 100
>>>
>>> sentence = f"Hello variables! {boolean} {fl} {integer}"
>>> sentence
'Hello variables! True 1.2 100'

Create raw unicode character from hex string representation/enter single backslash [duplicate]

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
I want to create a raw unicode character from a string hex representation. That is, I have a string s = '\u0222' which will be the 'Ȣ' character.
Now, this works if I do
>>> s = '\u0222'
>>> print(s)
'Ȣ'
but, if I try to do concatenation, it comes out as
>>> h = '0222'
>>> s = r'\u' + '0222'
>>> print(s)
\u0222
>>> s
'\\u0222'
because as it can be seen, what's actually in string is '\\u' not '\u'. How can I create the unicode character from hex strings or, how can I enter a true single backslash?
This was a lot harder to solve than I initially expected:
code = '0222'
uni_code = r'\u' + code
s = uni_code.encode().decode('unicode_escape')
print(s)
Or
code = b'0222'
uni_code = b'\u' + code
s = uni_code.decode('unicode_escape')
print(s)
Entering \u0222 is only for string constants and the Python interpreter generates a single Unicode code point for that syntax. It's not meant to be constructed manually. The chr() function is used to generate Unicode code points. The following works for strings or integers:
>>> chr(int('0222',16)) # convert string to int base 16
'Ȣ'
>>> chr(0x222) # or just pass an integer.
'Ȣ'
And FYI ord() is the complementary function:
>>> hex(ord('Ȣ'))
'0x222'

Remove unicode characters python [duplicate]

This question already has answers here:
Replace non-ASCII characters with a single space
(12 answers)
Closed 6 years ago.
I am pulling tweets in python using tweepy.
It gives the entire data in type unicode.
Eg: print type(data) gives me <type 'unicode'>
It contains unicode characters in it.
Eg: hello\u2026 im am fine\u2019s
I want to remove all of these unicode characters. Is there any regular expression i can use?
str.replace isn't a viable option as unicode characters can be any values, from smileys to unicode apostrophes.
In [10]: from unicodedata import normalize
In [11]: out_text = normalize('NFKD', input_text).encode('ascii','ignore')
Try this.
Edit
Actually normalize Return the normal form form for the Unicode string unistr. Valid values for form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’. If you wana more about NFKD go to this link
In [12]: u = unichr(40960) + u'abcd' + unichr(1972)
In [13]: u.encode('utf-8')
Out[13]: '\xea\x80\x80abcd\xde\xb4'
In [14]: u
Out[14]: u'\ua000abcd\u07b4'
In [16]: u.encode('ascii', 'ignore')
Out[16]: 'abcd'
From the above code you will get what encode('ascii','ignore') does.
Ref : https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize

Select last chars of string until whitespace in Python [duplicate]

This question already has answers here:
Python: Cut off the last word of a sentence?
(10 answers)
Closed 8 years ago.
Is there any efficient way to select the last characters of a string until there's a whitespace in Python?
For example I have the following string:
str = 'Hello my name is John'
I want to return 'John'. But if the str was:
str = 'Hello my name is Sally'
I want to retrun 'Sally'
Just split the string on whitespace, and get the last element of the array. Or use rsplit() to start splitting from end:
>>> st = 'Hello my name is John'
>>> st.rsplit(' ', 1)
['Hello my name is', 'John']
>>>
>>> st.rsplit(' ', 1)[1]
'John'
The 2nd argument specifies the number of split to do. Since you just want last element, we just need to split once.
As specified in comments, you can just pass None as 1st argument, in which case the default delimiter which is whitespace will be used:
>>> st.rsplit(None, 1)[-1]
'John'
Using -1 as index is safe, in case there is no whitespace in your string.
It really depends what you mean by efficient, but the simplest (efficient use of programmer time) way I can think of is:
str.split()[-1]
This fails for empty strings, so you'll want to check that.
I think this is what you want:
str[str.rfind(' ')+1:]
this creates a substring from str starting at the character after the right-most-found-space, and up until the last character.
This works for all strings - empty or otherwise (unless it's not a string object, e.g. a None object would throw an error)

str.strip() strange behavior [duplicate]

This question already has answers here:
How do the .strip/.rstrip/.lstrip string methods work in Python?
(4 answers)
Closed 28 days ago.
>>> t1 = "abcd.org.gz"
>>> t1
'abcd.org.gz'
>>> t1.strip("g")
'abcd.org.gz'
>>> t1.strip("gz")
'abcd.org.'
>>> t1.strip(".gz")
'abcd.or'
Why is the 'g' of '.org' gone?
strip(".gz") removes any of the characters ., g and z from the beginning and end of the string.
x.strip(y) will remove all characters that appear in y from the beginning and end of x.
That means
'foo42'.strip('1234567890') == 'foo'
becuase '4' and '2' both appear in '1234567890'.
Use os.path.splitext if you want to remove the file extension.
>>> import os.path
>>> t1 = "abcd.org.gz"
>>> os.path.splitext(t1)
('abcd.org', '.gz')
In Python 3.9, there are two new string methods .removeprefix() and .removesuffix() to remove the beginning or end of a string, respectively. Thankfully this time, the method names make it aptly clear what these methods are supposed to perform.
>>> print (sys.version)
3.9.0
>>> t1 = "abcd.org.gz"
>>> t1.removesuffix('gz')
'abcd.org.'
>>> t1
'abcd.org.gz'
>>> t1.removesuffix('gz').removesuffix('.gz')
'abcd.org.' # No unexpected effect from last removesuffix call
The argument given to strip is a set of characters to be removed, not a substring. From the docs:
The chars argument is a string specifying the set of characters to be removed.
as far as I know strip removes from the beginning or end of a string only. If you want to remove from the whole string use replace.

Categories

Resources