This question already has answers here:
Does Python do slice-by-reference on strings?
(2 answers)
Closed 2 years ago.
While going through the mutable and immutable topic in depth. I found that the variable address is different when you simply call "s" v/s when you call it by index. Why is this so?
s = "hello!"
print(id(s))
print(id(s[0:5]))
s[0:5] creates a new, temporary string. That, of course, has a different ID. One "normal" case would be to assign this expression to a variable; the separate object is ready for that assignment.
Also note that a common way to copy a sequence is with a whole-sequence slice, such as
s_copy = s[:]
It's creating a new string, as Python strings are immutable (cannot be changed once created)
However, many implementations of Python will tag the same object if they find they're creating the same string, which makes for a good example. Here's what happens in a CPython 3.9.1 shell (which I suspect is one of the most common at time of writing)
>>> s = "hello!"
>>> print(id(s))
4512419376
>>> print(id(s[0:5])) # new string from slice of original
4511329712
>>> print(id(s[0:5])) # refers to the newly-created string
4511329712
>>> print(id(s[0:6])) # references original string
4512419376
>>> print(id(s[:])) # trivial slice also refers to original
4512419376
In general you should not rely on this behavior and only consider it an example.
Check for string equality with ==!
See also Why does comparing strings using either '==' or 'is' sometimes produce a different result?
Strings are immutable and thus by creating a new substring by slicing, the new String has a new address in memory. Because the old String cannot be altered due to it being immutable.
Related
This question already has answers here:
In Python, when are two objects the same?
(2 answers)
Closed 9 days ago.
From what I read in the documents,
is operator is used to check if two values are located on the same part of the memory
So I compared two empty lists and as I expected I got False as a result.
print([] is []) # False
But why is it different for strings?
print('' is '') # True
TL;DR: Differences in optimization.
The difference in behavior between the is operator for strings and lists is due to how Python handles small strings and small lists in memory.
Python has an optimization called "interning" for small strings that are often used, such as empty strings or single-letter strings. When Python encounters such strings, it will store them in a special pool in memory and reuse the same objects for every instance of that string.
On the other hand, lists do not have this optimization, and every time you create a new list, Python will allocate a new memory location for it, even if it's an empty list.
Therefore, when you compare two empty strings using the is operator, they will refer to the same object in memory and return True, while two empty lists will refer to different objects in memory and return False.
a = "haha"
b = "haha"
print a is b # this is True
The above code prints true. I've read that one of the reasons for this is because strings are immutable, so one copy in memory will be enough. But in the case of a tuple:
a = (1, 2, 3)
b = (1, 2, 3)
print a is b # this is False
This will print False despite the fact that tuples are also immutable in python. After doing some more research, I discovered that tuples can contain mutable elements, so I guess it makes sense to have multiple copies of tuples in memory if it's too expensive to figure out whether a tuple contains mutable objects or not. But when I tried it on frozenset
a = frozenset([1,2])
b = frozenset([1,2])
print a is b # False
This will also print false. As far as I know frozenset are themselves immutable and can only contain immutable objects (I tried to create a frozenset which contains a tuple which contains a mutable list but it's not allowed), and that we can use == to check if two frozensets are identical in value, so why does python create two copies of them in memory?
Your sentence "I've read that one of the reasons for this is because strings are immutable, so one copy in memory will be enough." is correct but it is not true all the times.
for example if you do the same with the string
"dgjudfigur89tyur9egjr9ivr89egre8frejf9reimfkldsmgoifsgjurt89igjkmrt0ivmkrt8g,rt89gjtrt"
It won't be the same object (at least on my python's version).
The same phenomenon can be replicated in integers, where 256 will be the same object but 257 won't.
It has to do with the way python caches objects, it saves "simple" objects. Each object has its criteria, for string it is only containing certains characters, for integers their range.
It's because of the way the python byteops are compiled. When your program is run the first time it compiles the code into byte operations. When it does this and sees string (or some integer) literals in the code, it will create a string object and use a reference to that string object wherever you typed that literal. But in the case of a tuple it's difficult (in some cases impossible) to determine that the tuples are the same, so it doesn't take the extra time to perform this optimization. It is for this reason that you should not generally use is for comparing objects.
I am from c background and a beginner in python. I want to know how strings are actually stored in memory in case of python.
I did something like
s="foo"
id(s)=140542718184424
id(s[0])= 140542719027040
id(s[1])= 140542718832152
id(s[2])= 140542718832152
I did not understand how each character is getting stored in memory and and why id of s is not equal to id of s[0] (like it use to be in c) and why id of s1 and s2 are same?
Python has no characters. Indexing into a string creates a new string, which (like every other object) promptly vanquishes if you don't keep a reference to it around. So the id()s in your example can't be compared with each other, an object's id is only unique as long as the object lives. In particular, id(s[0]) != id(s) because the former is a new (temporary) object, and id(s[1]) == id(s[2]) because after the first operand is evaluated, the first temporary string is destroyed and the second temporary string is allocated to the previously freed memory. The latter is an implementation detail and a coincidence and cannot be relied on.
Reasoning about string memory is further complicated by implementation details like small strings (along with integers, some tuples, and more) being interned, so some_str is other_str may be true for equal strings that come from different sources (e.g. from indexing into a string with different indices).
This article is a good reading which explains how strings are stored. Briefly:
When working with empty strings or ASCII strings of one character Python uses string interning. Interned strings act as singletons, that is, if you have two identical strings that are interned, there is only one copy of them in the memory.
Python does not UTF-8 internally to provide constant access to substrings:
s = 'hello world'
s[0]
s[7]
both do not require to scan the string from the initial char (or, more correctly, the first substring of length 1) to the i-th position.
This is why Python uses the three kinds of internal representations for Unicode strings with 1, 2 or 4 byte(s) per char (Latin-1, UCS-2, UCS-4 encoding) and does not use the space-optimised UTF-8.
This is implementation dependent, but some implementations (not only of Python, other languages too) may keep a moderate-size set of constant values around for expected frequent use. In Python's case those might be values like True, None, 'o', 1, 2, etc. This way, when one of those common values is needed, there is no overhead to create it--just refer to the existing value.
While I know that there is the possibility:
>>> a = "abc"
>>> result = a[-1]
>>> a = a[:-1]
Now I also know that strings are immutable and therefore something like this:
>>> a.pop()
c
is not possible.
But is this really the preferred way?
Strings are "immutable" for good reason: It really saves a lot of headaches, more often than you'd think. It also allows python to be very smart about optimizing their use. If you want to process your string in increments, you can pull out part of it with split() or separate it into two parts using indices:
a = "abc"
a, result = a[:-1], a[-1]
This shows that you're splitting your string in two. If you'll be examining every byte of the string, you can iterate over it (in reverse, if you wish):
for result in reversed(a):
...
I should add this seems a little contrived: Your string is more likely to have some separator, and then you'll use split:
ans = "foo,blah,etc."
for a in ans.split(","):
...
Not only is it the preferred way, it's the only reasonable way. Because strings are immutable, in order to "remove" a char from a string you have to create a new string whenever you want a different string value.
You may be wondering why strings are immutable, given that you have to make a whole new string every time you change a character. After all, C strings are just arrays of characters and are thus mutable, and some languages that support strings more cleanly than C allow mutable strings as well. There are two reasons to have immutable strings: security/safety and performance.
Security is probably the most important reason for strings to be immutable. When strings are immutable, you can't pass a string into some library and then have that string change from under your feet when you don't expect it. You may wonder which library would change string parameters, but if you're shipping code to clients you can't control their versions of the standard library, and malicious clients may change out their standard libraries in order to break your program and find out more about its internals. Immutable objects are also easier to reason about, which is really important when you try to prove that your system is secure against particular threats. This ease of reasoning is especially important for thread safety, since immutable objects are automatically thread-safe.
Performance is surprisingly often better for immutable strings. Whenever you take a slice of a string, the Python runtime only places a view over the original string, so there is no new string allocation. Since strings are immutable, you get copy semantics without actually copying, which is a real performance win.
Eric Lippert explains more about the rationale behind immutable of strings (in C#, not Python) here.
The precise wording of the question makes me think it's impossible.
return to me means you have a function, which you have passed a string as a parameter.
You cannot change this parameter. Assigning to it will only change the value of the parameter within the function, not the passed in string. E.g.
>>> def removeAndReturnLastCharacter(a):
c = a[-1]
a = a[:-1]
return c
>>> b = "Hello, Gaukler!"
>>> removeAndReturnLastCharacter(b)
!
>>> b # b has not been changed
Hello, Gaukler!
Yes, python strings are immutable and any modification will result in creating a new string. This is how it's mostly done.
So, go ahead with it.
I decided to go with a for loop and just avoid the item in question, is it an acceptable alternative?
new = ''
for item in str:
if item == str[n]:
continue
else:
new += item
This question already has answers here:
About the changing id of an immutable string
(5 answers)
Closed 4 years ago.
Why does "hello" is "hello" produce True in Python?
I read the following here:
If two string literals are equal, they have been put to same
memory location. A string is an immutable entity. No harm can
be done.
So there is one and only one place in memory for every Python string? Sounds pretty strange. What's going on here?
Python (like Java, C, C++, .NET) uses string pooling / interning. The interpreter realises that "hello" is the same as "hello", so it optimizes and uses the same location in memory.
Another goodie: "hell" + "o" is "hello" ==> True
So there is one and only one place in memory for every Python string?
No, only ones the interpreter has decided to optimise, which is a decision based on a policy that isn't part of the language specification and which may change in different CPython versions.
eg. on my install (2.6.2 Linux):
>>> 'X'*10 is 'X'*10
True
>>> 'X'*30 is 'X'*30
False
similarly for ints:
>>> 2**8 is 2**8
True
>>> 2**9 is 2**9
False
So don't rely on 'string' is 'string': even just looking at the C implementation it isn't safe.
Literal strings are probably grouped based on their hash or something similar. Two of the same literal strings will be stored in the same memory, and any references both refer to that.
Memory Code
-------
| myLine = "hello"
| /
|hello <
| \
| myLine = "hello"
-------
The is operator returns true if both arguments are the same object. Your result is a consequence of this, and the quoted bit.
In the case of string literals, these are interned, meaning they are compared to known strings. If an identical string is already known, the literal takes that value, instead of an alternative one. Thus, they become the same object, and the expression is true.
The Python interpreter/compiler parses the string literals, i.e. the quoted list of characters. When it does this, it can detect "I've seen this string before", and use the same representation as last time. It can do this since it knows that strings defined in this way cannot be changed.
Why is it strange. If the string is immutable it makes a lot of sense to only store it once. .NET has the same behavior.
I think if any two variables (not just strings) contain the same value, the value will be stored only once not twice and both the variables will point to the same location. This saves memory.