Python: Do (explicit) string parameters hurt performance? - python

Suppose some function that always gets some parameter s that it does not use.
def someFunc(s):
# do something _not_ using s, for example
a=1
now consider this call
someFunc("the unused string")
which gives a string as a parameter that is not built during runtime but compiled straight into the binary (hope thats right).
The question is: when calling someFunc this way for, say, severalthousand times the reference to "the unused string" is always passed but does that slow the program down?
in my naive thoughts i'd say the reference to "the unused string" is 'constant' and available in O(1) when a call to someFunc occurs. So i'd say 'no, that does not hurt performance'.
Same question as before: "Am I right?"
thanks for some :-)

The string is passed (by reference) each time, but the overhead is way too tiny to really affect performance unless it's in a super-tight loop.

this is an implementation detail of CPython, and may not apply to other pythons but yes, in many cases in a compiled module, a constant string will reference the same object, minimizing the overhead.
In general, even if it didn't, you really shouldn't worry about it, as it's probably imperceptibly tiny compared to other things going on.
However, here's a little interesting piece of code:
>>> def somefunc(x):
... print id(x) # prints the memory address of object pointed to by x
...
>>>
>>> def test():
... somefunc("hello")
...
>>> test()
134900896
>>> test()
134900896 # Hooray, like expected, it's the same object id
>>> somefunc("h" + "ello")
134900896 # Whoa, how'd that work?
What's happening here is that python keeps a global string lookup and in many cases, even when you concatenate two strings, you will get the same object if the values match up.
Note that this is an implementation detail, and you should NOT rely on it, as strings from any of: files, sockets, databases, string slicing, regex, or really any C module are not guaranteed to have this property. But it is interesting nonetheless.

Related

Basic python question about assignment and changing variable

Extremely basic question that I don't quite get.
If I have this line:
my_string = "how now brown cow"
and I change it like so
my_string.split()
Is this acceptable coding practice to just straight write it like that to change it?
or should I instead change it like so:
my_string = my_string.split()
don't both effectively do the same thing?
when would I use one over the other?
how does this ultimately affect my code?
always try to avoid:
my_string = my_string.split()
never, ever do something like that. the main problem with that is it's going to introduce a lot of code bugs in the future, especially for another maintainer of the code. the main problem with this, is that the result of this the split() operation is not a string anymore: it's a list. Therefore, assigning a result of this type to a variable named my_string is bound to cause more problems in the end.
The first line doesn't actually change it - it calls the .split() method on the string, but since you're not doing anything with what that function call returns, the results are just discarded.
In the second case, you assign the returned values to my_string - that means your original string is discarded, but my_string no refers to the parts returned by .split().
Both calls to .split() do the same thing, but the lines of your program do something different.
You would only use the first example if you wanted to know if a split would cause an error, for example:
try:
my_string.split()
except:
print('That was unexpected...')
The second example is the typical use, although you could us the result directly in some other way, for example passing it to a function:
print(my_string.split())
It's not a bad question though - you'll find that some libraries favour methods that change the contents of the object they are called on, while other libraries favour returning the processed result without touching the original. They are different programming paradigms and programmers can be very divided on the subject.
In most cases, Python itself (and its built-in functions and standard libraries) favours the more functional approach and will return the result of the operation, without changing the original, but there are exceptions.

what does it mean by 'passed by assignment'?

As follow is my understanding of types & parameters passing in java and python:
In java, there are primitive types and non-primitive types. Former are not object, latter are objects.
In python, they are all objects.
In java, arguments are passed by value because:
primitive types are copied and then passed, so they are passed by value for sure. non-primitive types are passed by reference but reference(pointer) is also value, so they are also passed by value.
In python, the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects.
Based on official doc, arguments are passed by assignment. What does it mean by 'passed by assignment'? Is objects in java work the same way as python? What result in the difference (passed by value in java and passed by argument in python)?
And is there any wrong understanding above?
tl;dr: You're right that Python's semantics are essentially Java's semantics, without any primitive types.
"Passed by assignment" is actually making a different distinction than the one you're asking about.1 The idea is that argument passing to functions (and other callables) works exactly the same way assignment works.
Consider:
def f(x):
pass
a = 3
b = a
f(a)
b = a means that the target b, in this case a name in the global namespace, becomes a reference to whatever value a references.
f(a) means that the target x, in this case a name in the local namespace of the frame built to execute f, becomes a reference to whatever value a references.
The semantics are identical. Whenever a value gets assigned to a target (which isn't always a simple name—e.g., think lst[0] = a or spam.eggs = a), it follows the same set of assignment rules—whether it's an assignment statement, a function call, an as clause, or a loop iteration variable, there's just one set of rules.
But overall, your intuitive idea that Python is like Java but with only reference types is accurate: You always "pass a reference by value".
Arguing over whether that counts as "pass by reference" or "pass by value" is pointless. Trying to come up with a new unambiguous name for it that nobody will argue about is even more pointless. Liskov invented the term "call by object" three decades ago, and if that never caught on, anything someone comes up with today isn't likely to do any better.
You understand the actual semantics, and that's what matters.
And yes, this means there is no copying. In Java, only primitive values are copied, and Python doesn't have primitive values, so nothing is copied.
the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects
It's much better to see this as "the only difference is that there are no 'primitive types' (not even simple numbers)", just as you said at the start.
It's also worth asking why Python has no primitive types—or why Java does.2
Making everything "boxed" can be very slow. Adding 2 + 3 in Python means dereferencing the 2 and 3 objects, getting the native values out of them, adding them together, and wrapping the result up in a new 5 object (or looking it up in a table because you already have an existing 5 object). That's a lot more work than just adding two ints.3
While a good JIT like Hotspot—or like PyPy for Python—can often automatically do those optimizations, sometimes "often" isn't good enough. That's why Java has native types: to let you manually optimize things in those cases.
Python, instead, relies on third-party libraries like Numpy, which let you pay the boxing costs just once for a whole array, instead of once per element. Which keeps the language simpler, but at the cost of needing Numpy.4
1. As far as I know, "passed by assignment" appears a couple times in the FAQs, but is not actually used in the reference docs or glossary. The reference docs already lean toward intuitive over rigorous, but the FAQ, like the tutorial, goes much further in that direction. So, asking what a term in the FAQ means, beyond the intuitive idea it's trying to get across, may not be a meaningful question in the first place.
2. I'm going to ignore the issue of Java's lack of operator overloading here. There's no reason they couldn't include special language rules for a handful of core classes, even if they didn't let you do the same thing with your own classes—e.g., Go does exactly that for things like range, and people rarely complain.
3. … or even than looping over two arrays of 30-bit digits, which is what Python actually does. The cost of working on unlimited-size "bigints" is tiny compared to the cost of boxing, so Python just always pays that extra, barely-noticeable cost. Python 2 did, like Java, have separate fixed and bigint types, but a couple decades of experience showed that it wasn't getting any performance benefits out of the extra complexity.
4. The implementation of Numpy is of course far from simple. But using it is pretty simple, and a lot more people need to use Numpy than need to write Numpy, so that turns out to be a pretty decent tradeoff.
Similar to passing reference types by value in C#.
Docs: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/passing-reference-type-parameters#passing-reference-types-by-value
Code demo:
# mutable object
l = [9, 8, 7]
def createNewList(l1: list):
# l1+[0] will create a new list object, the reference address of the local variable l1 is changed without affecting the variable l
l1 = l1+[0]
def changeList(l1: list):
# Add an element to the end of the list, because l1 and l refer to the same object, so l will also change
l1.append(0)
print(l)
createNewList(l)
print(l)
changeList(l)
print(l)
# immutable object
num = 9
def changeValue(val: int):
# int is an immutable type, and changing the val makes the val point to the new object 8,
# it's not change the num value
value = 8
print(num)
changeValue(num)
print(num)

Returning a mutable object

I have a question about performance.
If I have a case when I pass a mutable object to a function and I make changes to that object, I know that in python it will change the value and all the pointers will point to the value that has been changed.
But what is the right way to write code?
Should I use the return statement and assign all over again the pointer to the output of the function to show to the one that reads the code that it is being changed?
How hard is it hurting the performance if you preform an assignment of a pointer to the same memory it is already pointing?
Thanks!
Should use the return statement and assign all over again the pointer to the output of the function to show to the one that reads the code that it is being changed?
No, that would make it more confusing. If you’re going to change an object in place, you should make it obvious, and part of making it obvious is not returning the same object. Take existing Python APIs as inspiration:
>>> import random
>>> a = [1, 2, 3]
>>> random.shuffle(a)
random.shuffle didn’t return anything, so the only thing it could have done was shuffle the list in place.
Performance is irrelevant here.
a) It has no significant effect on the performance.
b) If your function takes a object and changes it, this is a side-effect and you should therefore not return that object. That would be confusing and misleading as it implies that the input is different from the output.
c) If you think you should write code that is useless just to provide information to readers of your code, use comments instead.

Why are mutable strings slower than immutable strings?

Why are mutable strings slower than immutable strings?
EDIT:
>>> import UserString
... def test():
... s = UserString.MutableString('Python')
... for i in range(3):
... s[0] = 'a'
...
... if __name__=='__main__':
... from timeit import Timer
... t = Timer("test()", "from __main__ import test")
... print t.timeit()
13.5236170292
>>> import UserString
... def test():
... s = UserString.MutableString('Python')
... s = 'abcd'
... for i in range(3):
... s = 'a' + s[1:]
...
... if __name__=='__main__':
... from timeit import Timer
... t = Timer("test()", "from __main__ import test")
... print t.timeit()
6.24725079536
>>> import UserString
... def test():
... s = UserString.MutableString('Python')
... for i in range(3):
... s = 'a' + s[1:]
...
... if __name__=='__main__':
... from timeit import Timer
... t = Timer("test()", "from __main__ import test")
... print t.timeit()
38.6385951042
i think it is obvious why i put s = UserString.MutableString('Python') on second test.
In a hypothetical language that offers both mutable and immutable, otherwise equivalent, string types (I can't really think of one offhand -- e.g., Python and Java both have immutable strings only, and other ways to make one through mutation which add indirectness and therefore can of course slow things down a bit;-), there's no real reason for any performance difference -- for example, in C++, interchangeably using a std::string or a const std::string I would expect to cause no performance difference (admittedly a compiler might be able to optimize code using the latter better by counting on the immutability, but I don't know any real-world ones that do perform such theoretically possible optimizations;-).
Having immutable strings may and does in fact allow very substantial optimizations in Java and Python. For example, if the strings get hashed, the hash can be cached, and will never have to be recomputed (since the string can't change) -- that's especially important in Python, which uses hashed strings (for look-ups in sets and dictionaries) so lavishly and even "behind the scenes". Fresh copies never need to be made "just in case" the previous one has changed in the meantime -- references to a single copy can always be handed out systematically whenever that string is required. Python also copiously uses "interning" of (some) strings, potentially allowing constant-time comparisons and many other similarly fast operations -- think of it as one more way, a more advanced one to be sure, to take advantage of strings' immutability to cache more of the results of operations often performed on them.
That's not to say that a given compiler is going to take advantage of all possible optimizations, of course. For example, when a slice of a string is requested, there is no real need to make a new object and copy the data over -- the new slice might refer to the old one with an offset (and an independently stored length), potentially a great optimization for big strings out of which many slices are taken. Python doesn't do that because, unless particular care is taken in memory management, this might easily result in the "big" string being all kept in memory when only a small slice of it is actually needed -- but it's a tradeoff that a different implementation might definitely choose to perform (with that burden of extra memory management, to be sure -- more complex, harder-to-debug compiler and runtime code for the hypothetical language in question).
I'm just scratching the surface here -- and many of these advantages would be hard to keep if otherwise interchangeable string types could exist in both mutable and immutable versions (which I suspect is why, to the best of my current knowledge at least, C++ compilers actually don't bother with such optimizations, despite being generally very performance-conscious). But by offering only immutable strings as the primitive, fundamental data type (and thus implicitly accepting some disadvantage when you'd really need a mutable one;-), languages such as Java and Python can clearly gain all sorts of advantages -- performance issues being only one group of them (Python's choice to allow only immutable primitive types to be hashable, for example, is not a performance-centered design decision -- it's more about clarity and predictability of behavior for sets and dictionaries!-).
I don't know if they are really a lot slower but they make thinking about programming easier a lot of the times, because the state of the object/string can't change. That's the most important property to immutability to me.
Furthermore you might assume that immutable string are faster because they have less state(which can change), which might mean lower memory consumption, CPU-cycles.
I also found this interesting article while googling which I would like to quote:
knowing that a string is immutable
makes it easy to lay it out at
construction time — fixed and
unchanging storage requirements
with an immutable string, python can intern it and refer to it internally by it's address in memory. This means that to compare two strings, it only has to compare their addresses in memory (unless one of them isn't interned). Also, keep in mind that not all strings are interned. I've seen example of constructed strings that are not interned.
with mutable strings, string comparison would involve comparing them character by character and would also require either storing identical strings in different locations (malloc is not free) or adding logic to keep track of how many times a given string is referred to and making a copy for every mutation if there were more than one referrer.
It seems like python is optimized for string comparison. This makes sense because even string manipulation involves string comparison in most cases so for most use cases, it's the lowest common denominator.
Another advantage of immutable strings is that it makes it possible for them to be hashable which is a requirement for using them for dictionary keys. imagine a scenario where they were mutable:
s = 'a'
d = {s : 1}
s = s + 'b'
d[s] = ?
I suppose python could keep track of which dicts have which strings as keys and update all of their hashtables when a string was modified but that's just adding more overhead to dict insertion. It's not to far off the mark to say that you can't do anything in python without a dict insertion/lookup so that would be very very bad. It also adds overhead to string manipulation.
The obvious answer to your question is that normal strings are implemented in C, while MutableString is implemented in Python.
Not only does every operation on a mutable string have the overhead of going through one or more Python function calls, but the implementation is essentially a wrapper round an immutable string - when you modify the string it creates a new immutable string and throws the old one away. You can read the source in the UserString.py file in your Python lib directory.
To quote the Python docs:
Note:
This UserString class from this module
is available for backward
compatibility only. If you are writing
code that does not need to work with
versions of Python earlier than Python
2.2, please consider subclassing directly from the built-in str type
instead of using UserString (there is
no built-in equivalent to
MutableString).
This module defines a class that acts
as a wrapper around string objects. It
is a useful base class for your own
string-like classes, which can inherit
from them and override existing
methods or add new ones. In this way
one can add new behaviors to strings.
It should be noted that these classes
are highly inefficient compared to
real string or Unicode objects; this
is especially the case for
MutableString.
(Emphasis added).

Using explicit del in python on local variables

What are the best practices and recommendations for using explicit del statement in python? I understand that it is used to remove attributes or dictionary/list elements and so on, but sometimes I see it used on local variables in code like this:
def action(x):
result = None
something = produce_something(x)
if something:
qux = foo(something)
result = bar(qux, something)
del qux
del something
return result
Are there any serious reasons for writing code like this?
Edit: consider qux and something to be something "simple" without a __del__ method.
I don't remember when I last used del -- the need for it is rare indeed, and typically limited to such tasks as cleaning up a module's namespace after a needed import or the like.
In particular, it's not true, as another (now-deleted) answer claimed, that
Using del is the only way to make sure
a object's __del__ method is called
and it's very important to understand this. To help, let's make a class with a __del__ and check when it is called:
>>> class visdel(object):
... def __del__(self): print 'del', id(self)
...
>>> d = visdel()
>>> a = list()
>>> a.append(d)
>>> del d
>>>
See? del doesn't "make sure" that __del__ gets called: del removes one reference, and only the removal of the last reference causes __del__ to be called. So, also:
>>> a.append(visdel())
>>> a[:]=[1, 2, 3]
del 550864
del 551184
when the last reference does go away (including in ways that don't involve del, such as a slice assignment as in this case, or other rebindings of names and other slots), then __del__ gets called -- whether del was ever involved in reducing the object's references, or not, makes absolutely no difference whatsoever.
So, unless you specifically need to clean up a namespace (typically a module's namespace, but conceivably that of a class or instance) for some specific reason, don't bother with del (it can be occasionally handy for removing an item from a container, but I've found that I'm often using the container's pop method or item or slice assignment even for that!-).
No.
I'm sure someone will come up with some silly reason to do this, e.g. to make sure someone doesn't accidentally use the variable after it's no longer valid. But probably whoever wrote this code was just confused. You can remove them.
When you are running programs handling really large amounts of data ( to my experience when the totals memory consumption of the program approaches something like 1GB) deleting some objects:
del largeObject1
del largeObject2
…
can give your program the necessary breathing room to function without running out of memory. This can be the easiest way to modify a given program, in case of a “MemoryError” runtime error.
Actually, I just came across a use for this. If you use locals() to return a dictionary of local variables (useful when parsing things) then del is useful to get rid of a temporary that you don't want to return.

Categories

Resources