Internal working of strings with null characters

Internal working of strings with null characters - python

I just tried replacing a character in a python string with a null ('') character. Some weird things are happening. Can someone please explain me why is all this happening?
>>> a = "SampleText"
>>> a
'SampleText'
>>> a.replace('a','\0')
'S\x00mpleText'
>>> len(a)
10
>>> a.replace('\0','a')
'SampleText'
>>> len(a)
10
>>> a.replace('a','')
'SmpleText'
>>> len(a)
10
>>> a.replace('','a')
'aSaaamapalaeaTaeaxata'
>>> len(a)
10

The replace function returns the new string and therefore you need to asign it to a variable again. if you write a = a.replace('a','\0') it'll work as you expect it.

Related

Convert bytes -> string -> back to bytes, and get original value

I checked all Stackoverflow questions on this matter and none can answer my problem.I need to convert \\ to \.
Edited:
This is what I am trying:
>>> a = b'\xe5jb\x8c?Q$\xf3\x1d\x97^\xfa3O\xa6U.txt'
>>> b = str(a)
>>> b
"b'\\xe5jb\\x8c?Q$\\xf3\\x1d\\x97^\\xfa3O\\xa6U.txt'"
>>> b = b.replace('b\'','')
>>> b = b[:len(b)-1]
>>> b
'\\xe5jb\\x8c?Q$\\xf3\\x1d\\x97^\\xfa3O\\xa6U.txt'
>>> c = bytes(b,'utf8')
>>> c
b'\\xe5jb\\x8c?Q$\\xf3\\x1d\\x97^\\xfa3O\\xa6U.txt'
>>> a == c
False
How do I make a==c True? I tried
.replace("\\\\","\\")
but this doesn't help. The string remains the same. I need to store the byte in variable 'a' to a file as a text and call it back. Python-3.8, Windows=10

You can convert c to a string with the decode method, and then use ast.literal_eval to evaluate it as a bytes literal after wrapping it with b'...':
from ast import literal_eval
a = b'\xe5jb\x8c?Q$\xf3\x1d\x97^\xfa3O\xa6U.txt'
c = b'\\xe5jb\\x8c?Q$\\xf3\\x1d\\x97^\\xfa3O\\xa6U.txt'
c = literal_eval("b'%s'" % c.decode())
print(a == c)
This outputs: True

Use .replace() function for string

counting the number of vowels in a word in a string

I am a beginner, and I am trying to find out the number of vowels in each word in a string. So for instance, if I had "Hello there WORLD", I want to get an output of [2, 2, 1].
Oh, and I am using Python.
I have this so far
[S.count(x) in (S.split()) if x is 'AEIOUaeiou']
where S="Hello there WORLD"
but it keeps saying error. Any hints?

x is 'AEIOUaeiou'
This tests whether x is precisely the same object as 'AEIOUaeiou'. This is almost never what you want when you compare objects. e.g. the following could be False:
>>> a = 'Nikki'
>>> b = 'Nikki'
>>> a is b
False
Although, it may be True as sometimes Python will optimise identical strings to actually use the same object.
>>> a == b
True
This will always be True as the values are compared rather than the identity of the objects.
What you probably want is:
x in 'AEIOUaeiou'

Obviously, S in S.count and S in S.split cannot be the same S. I suggest using more semantic names.
>>> phrase = 'Hello there WORLD'
>>> [sum(letter.casefold() in 'aeiouy' for letter in word) for word in phrase.split()]
[2, 2, 1]

Why does my IDE suggest to rewrite != 0 to is not 0

My python IDE PyCharm by defaults suggests to change the following line of python:
if variable != 0:
to
if variable is not 0:
Why does it suggest this? Does it matter at all for the execution (i.e. does this behave different for any edge cases)?

It's a bug. You should not test integers by identity. Although it may work ok for small integers, it's just an implementation detail.
If you were checking variable is False, that would be ok. Perhaps the IDE is tripped up by the semantics

The != operator checks for non equality of value. The is operator is used to check for identity. In Python, you cannot have two instances of the same integer literal so the expressions have the same effect. The is not 0 reads more like English which is probably why the IDE is suggesting it (although I wouldn't accept the recommendation).
I did try some analysis. I dumped the bytecode for both the expressions and can't see any difference in the opcodes. One has COMPARE_OP 3 (!=) and the other has COMPARE_OP 9 (is not). They're the same. I then tried some performance runs and found that time taken is negligibly higher for the !=.

is not should be preferred if your matching object's identity not equality.
see these examples
>>> a=[1,2,3]
>>> b=[1,2,3] #both are eqaul
>>> if a is not b:
print('they are eqaul but they are not the same object')
they are eqaul but they are not the same object
>>> if a != b:
print('hello') #prints nothing because both have same value
>>> a=100000
>>> b=100000
>>> a is b
False
>>> if a is not b:
print('they are eqaul but they are not the same object')
they are eqaul but they are not the same object
>>> if a!=b:
print('something') #prints nothing as != matches their value not identity
But if the numbers stored in a and b are small integers or small strings then a is not b will not work as python does some caching, and they both point to the same object.
>>> a=2
>>> b=2
>>> a is b
True
>>> a='wow'
>>> b='wow'
>>> a is b
True
>>> a=9999999
>>> b=9999999
>>> a is b
False

The operator "is not" is checking for object identity and the operator != checks for object equality. I do not think there you should do this in your case but maybe your ide suggests this for the general case?

function print in python shell

Can anyone explain me difference in python shell between output variable through "print" and when I just write variable name to output it?
>>> a = 5
>>> a
5
>>> print a
5
>>> b = 'some text'
>>> b
'some text'
>>> print b
some text
When I do this with text I understand difference but in int or float - I dont know.

Just entering an expression (such as a variable name) will actually output the representation of the result as returned by the repr() function, whereas print will convert the result to a string using the str() function.>>> s = "abc"
Printing repr() will give the same result as entering the expression directly:
>>> "abc"
'abc'
>>> print repr("abc")
'abc'

Python's shell always returns the last value evaluated. When a is 5, it evaluates to 5, thus you see it. When you call print, print outputs the value (without quotes) and returns nothing, thus nothing gets produced after print is done. Thus, evaluating b results in 'some test' and printing it just results in some text.

Strange Python behavior from inappropriate usage of 'is not' comparison?

I (incorrectly?) used 'is not' in a comparison and found this curious behavior:
>>> a = 256
>>> b = int('256')
>>> c = 300
>>> d = int('300')
>>>
>>> a is not b
False
>>> c is not d
True
Obviously I should have used:
>>> a != b
False
>>> c != d
False
But it worked for a long time due to small-valued test-cases until I happened to
use a number of 495.
If this is invalid syntax, then why? And shouldn't I at least get a warning?

"is" is not a check of equality of value, but a check that two variables point to the same instance of an object.
ints and strings are confusing for this as is and == can happen to give the same result due to how the internals of the language work.

For small numbers, Python is reusing the object instances, but for larger numbers, it creates new instances for them.
See this:
>>> a=256
>>> b=int('256')
>>> c=300
>>> d=int('300')
>>> id(a)
158013588
>>> id(b)
158013588
>>> id(c)
158151472
>>> id(d)
158151436
which is exactly why a is b, but c isn't d.

Don't use is [not] to compare integers; use == and != instead. Even though is works in current CPython for small numbers due to an optimization, it's unreliable and semantically wrong. The syntax itself is valid, but the benefits of a warning (which would have to be checked on every use of is and could be problematic with subclasses of int) are presumably not worth the trouble.
This is covered elsewhere on SO, but I didn't find it just now.

Int is an object in python, and python caches small integer between [-5,256] by default, so where you use int in [-5,256], they are identical.
a = 256
b = 256
a is b # True
If you declare two integers not in [-5,256], python will create two objects which are not the same(though they have the same value).
a = 257
b = 257
a is b # False
In your case, using != instead to compare the value is the right way.
a = 257
b = 257
a != b # False

For more understanding why this occurs take a look to Python-2.6.5/Objects/intobject.c:78:small_ints array and Python-2.6.5/Objects/intobject.c:1292:_PyInt_Init function in python sources.
Also similar thing occurs with lists:
>>> a = [12]
>>> id_a = id(a)
>>> del(a)
>>> id([1,2,34]) == id_a
True
>>>
Removed lists are not destroyed. They are reused

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Internal working of strings with null characters - python

The replace function returns the new string and therefore you need to asign it to a variable again. if you write a = a.replace('a','\0') it'll work as you expect it.

Related

Convert bytes -> string -> back to bytes, and get original value

counting the number of vowels in a word in a string

Why does my IDE suggest to rewrite != 0 to is not 0

function print in python shell

Strange Python behavior from inappropriate usage of 'is not' comparison?

Categories

Resources