What happens internally when concatenating two lists in Python?

What happens internally when concatenating two lists in Python? - python

When concatenating two lists,
a = [0......, 10000000]
b = [0......, 10000000]
a = a + b
does the Python runtime allocate a bigger array and loop through both arrays and put the elements of a and b into the bigger array?
Or does it loop through the elements of b and append them to a and resize as necessary?
I am interested in the CPython implementation.

In CPython, two lists are concatenated in function list_concat.
You can see in the linked source code that that function allocates the space needed to fit both lists.
size = Py_SIZE(a) + Py_SIZE(b);
np = (PyListObject *) list_new_prealloc(size);
Then it copies the items from both lists to the new list.
for (i = 0; i < Py_SIZE(a); i++) {
...
}
...
for (i = 0; i < Py_SIZE(b); i++) {
...
}

You can find out by looking at the id of a before and after concatenating b:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> id(a)
140025874463112
>>> a = a + b
>>> id(a)
140025874467144
Here, since the id is different, we see that the interpreter has created a new list and bound it to the name a. The old a list will be garbage collected eventually.
However, the behaviour can be different when using the augmented assignment operator +=:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> id(a)
140025844068296
>>> a += b
>>> id(a)
140025844068296
Here, since the id is the same, we see that the interpreter has reused the same list object a and appended the values of b to it.
For more detailed information, see these questions:
Why does += behave unexpectedly on lists?
Does list concatenation with the `+` operator always return a new `list` instance?

You can see the implementation in listobject.c::list_concat. Python will get the size of a and b and create a new list object of that size. It will then loop through the values of a and b, which are C pointers to python objects, increment their ref counts and add those pointers to the new list.

It will create a new list with a shallow copy of the items in the first list, followed by a shallow copy of the items in the second list. The + operator calls the object.__add__(self, other) method. For example, for the expression x + y, where x is an instance of a class that has an __add__() method, x.__add__(y) is called. You can read more in the documentation.

Related

Having problem with Shared Reference in case of Lists [duplicate]

This question already has answers here:
Is there a difference between "==" and "is"?
(13 answers)
Closed 2 years ago.
a=b=[1,2,3]
print (a is b) #True
But
a=[1,2,3]
print (a is [1,2,3]) #False
Why does the second part print False ?

Multiple assignment in Python creates two names that point to the same object. For example,
>>> a=b=[1,2,3]
>>> a[0] = 10
>>> b
[10, 2, 3]
is can be used to check whether two names (a and b) hold the reference to the same memory location (object). Therefore,
a=b=[1,2,3] # a and b hold the same reference
print (a is b) # True
Now in this example,
a = [1,2,3]
print (a is [1,2,3]) # False
a does not hold the same reference to the object [1, 2, 3], even though a and [1, 2, 3] are lists with identical elements.
In case you want to compare whether two lists contain the same elements, you can use ==:
>>> a=b=[1, 2, 3]
>>> a == b
True
>>>
>>> a = [1, 2, 3]
>>> a == [1, 2, 3]
True

Your first one explicitly makes a and b references to the object created by the list display [1,2,3].
In your second code, both uses of the list display [1,2,3] necessarily create new list objects, because lists are mutable and you don't want to implicitly share references to them.
Consider a simpler example:
a = []
b = []
a.append(1)
Do you want b to be modified as well?
For immutable values, like ints, the language implementation may cause literals to reuse references to existing objects, but it's not something that can be relied on.

the problem is the logic operator you are using.
You are asking are these identical object with is and not if they are the equal (same data).
One is a reference to a object and the other is the object so even though they are equal the are not the same.
Why your results
When you are setting a and b as the same list you are saying that a and b should be linked and should reference the same data so they are identical to each other but a and b are not the object [1,2,3] they are a reference to a list that is the same.
In summary
== - equal to (same).
is - identical to.
So if you want to check if they are equal(same) use:
>>> a=[1,2,3]
>>> print (a == [1,2,3])
True
Similar question worth reading:
Is there a difference between "==" and "is"?
Hope this helps, Harry.

How to create a new data object in memory, rather than pointing to one? (in Python 3)

As an illustration of my question, say I want to swap two elements in an array:
# Array Integer Integer -> Array
# I want to swap the values at locations i1 and i2.
# I want to return the array with values swapped.
def swap(A, i1, i2):
newA = A
newA[i1] = A[i2]
newA[i2] = A[i1]
return newA
Run this code, and an array is returned with only one value changed:
> testArray = [1, 2, 3, 4]
> swap(testArray, 0, 1)
[2, 2, 3, 4]
Also, if I now check what testArray is (I want it to still be [1, 2, 3, 4]):
> testArray
[2, 2, 3, 4]
So my questions are:
I guess newA = A uses a pointer to A. I'm used to programming in a style where I return a new data structure each time. I'd like to create a whole new array, newA, which just has the same values as A. Then I can let garbage collection take care of newA later. Can I do this in python?
What is newA = A really doing?
Why would someone create a new variable (like newA) to point to the old one (A)? Why wouldn't they just mutate A directly?
And why does the syntax behave differently for atomic data?
i.e.
a = 1
b = a # this same syntax doesn't seem to be a pointer.
b = 2
> a
1

If it is list of integers then you can do:
def swap(A, i1, i2):
temp = A[i1]
A[i1] = A[i2]
A[i2] = temp
return A
or more pythonic way
def swap(A, i1, i2):
A[i1], A[i2] = A[i2], A[i1]
return A
-
newA = A
this create "alias" - both variables use the same list in memory. When you change value in A then you change value in newA too.
see visualization on PythonTutor.com (it is long link with Python code)
http://pythontutor.com/visualize.html#code=A+%3D+%5B1,+2,+3,+4%5D%0A%0AnewA+%3D+A&mode=display&origin=opt-frontend.js&cumulative=false&heapPrimitives=false&textReferences=false&py=2&rawInputLstJSON=%5B%5D&curInstr=2
-
To create copy you can use slicing
newA = A[:] # python 2 & 3
or
import copy
newA = copy.copy(A)
newA = copy.deepcopy(A)
or on Python 3
newA = A.copy()
-
integers and float are kept in variable but other objects are too big to keep it in variable so python keeps only reference/pointer to memory with this big object. Sometimes it is better to send reference (to function or class) than clone all data and send it.
a = 1
b = a # copy value
a = [1,2,3] # big object - variable keeps reference/pointer
b = a # copy reference

How to create a new list?
Some ways and timings for large and small lists:
>>> for way in 'A[:]', 'list(A)', 'A.copy()':
print(way, timeit(way, 'A = list(range(100000))', number=10000))
A[:] 7.3193273699369
list(A) 7.248674272188737
A.copy() 7.366528860679182
>>> for way in 'A[:]', 'list(A)', 'A.copy()':
print(way, timeit(way, 'A = list(range(10))', number=10000000))
A[:] 4.324301856050852
list(A) 7.022488782549999
A.copy() 4.61609732160332
What is newA = A really doing?
Makes variable newA reference the same object A references.
Why would someone create a new variable (like newA) to point to the old one (A)? Why wouldn't they just mutate A directly?
Just an example:
if <something>:
now = A
...
else:
now = B
...
<modify now>
And why does the syntax behave differently for atomic data?
It doesn't. It does make the new variable reference the same object, also for ints. You just don't notice it, because ints can't be changed. But you can see it by testing with is:
>>> a = 1234
>>> b = a
>>> b is a
True <== See?
>>> b = 1234
>>> b is a
False <== And now it's a different 1234 object

Python: those variable dont point to the same values. Why?

I thought that if you assign a variable to another list, it's not copied, but it points to the same location. That's why deepcopy() is for. This is not true with Python 2.7: it's copied.
>>> a=[1,2,3]
>>> b=a
>>> b=b[1:]+b[:1]
>>> b
[2, 3, 1]
>>> a
[1, 2, 3]
>>>
>>> a=(1,2,3)
>>> b=a
>>> b=b[1:]+b[:1]
>>> a
(1, 2, 3)
>>> b
(2, 3, 1)
>>>
What am I missing?

This line changes what b points to:
b=b[1:]+b[:1]
List or tuple addition creates a new list or tuple, and the assignment operator makes b refer to that new list while leaving a referring to the original list or tuple.
Slicing a list or tuple also creates a new object, so that line creates three new objects - one for each slice, and then one for the sum. b = a + b would be a simpler example to demonstrate that addition creates a new object.
You will sometimes see c = b[:] as a way to shallow copy a list, making use of the fact that slicing creates a new object.

When you do b=b[1:]+b[:1] you first create a new object of two b slices and then assign b to reference that object. The same is for both list and tuple cases

Understanding python's name binding

I am trying to clarify for myself Python's rules for 'assigning' values
to variables.
Is the following comparison between Python and C++ valid?
In C/C++ the statement int a=7 means, memory is allocated for an integer variable called a (the quantity on the LEFT of the = sign)
and only then the value 7 is stored in it.
In Python the statement a=7 means, a nameless integer object with value 7 (the quantity on the RIGHT side of the =) is created first and stored somewhere in memory. Then the name a is bound to this object.
The output of the following C++ and Python programs seem to bear this out, but I would like some feedback whether I am right.
C++ produces different memory locations for a and b
while a and b seem to refer to the same location in Python
(going by the output of the id() function)
C++ code
#include<iostream>
using namespace std;
int main(void)
{
int a = 7;
int b = a;
cout << &a << " " << &b << endl; // a and b point to different locations in memory
return 0;
}
Output: 0x7ffff843ecb8 0x7ffff843ecbc
Python: code
a = 7
b = a
print id(a), ' ' , id(b) # a and b seem to refer to the same location
Output: 23093448 23093448

Yes, you're basically correct. In Python, a variable name can be thought of as a binding to a value. This is one of those "a ha" moments people tend to experience when they truly start to grok (deeply understand) Python.
Assigning to a variable name in Python makes the name bind to a different value from what it currently was bound to (if indeed it was already bound), rather than changing the value it currently binds to:
a = 7 # Create 7, bind a to it.
# a -> 7
b = a # Bind b to the thing a is currently bound to.
# a
# \
# *-> 7
# /
# b
a = 42 # Create 42, bind a to it, b still bound to 7.
# a -> 42
# b -> 7
I say "create" but that's not necessarily so - if a value already exists somewhere, it may be re-used.
Where the underlying data is immutable (cannot be changed), that usually makes Python look as if it's behaving identically to the way other languages do (C and C++ come to mind). That's because the 7 (the actual object that the names are bound to) cannot be changed.
But, for mutable data (same as using pointers in C or references in C++), people can sometimes be surprised because they don't realise that the value behind it is shared:
>>> a = [1,2,3] # a -> [1,2,3]
>>> print(a)
[1, 2, 3]
>>> b = a # a,b -> [1,2,3]
>>> print(b)
[1, 2, 3]
>>> a[1] = 42 # a,b -> [1,42,3]
>>> print(a) ; print(b)
[1, 42, 3]
[1, 42, 3]
You need to understand that a[1] = 42 is different to a = [1, 42, 3]. The latter is an assignment, which would result in a being re-bound to a different object, and therefore independent of b.
The former is simply changing the mutable data that both a and b are bound to, which is why it affects both.
There are ways to get independent copies of a mutable value, with things such as:
b = a[:]
b = [item for item in a]
b = list(a)
These will work to one level (b = a can be thought of as working to zero levels) meaning if the a list contains other mutable things, those will still be shared between a and b:
>>> a = [1, [2, 3, 4], 5]
>>> b = a[:]
>>> a[0] = 8 # This is independent.
>>> a[1][1] = 9 # This is still shared.
>>> print(a) ; print(b) # Shared bit will 'leak' between a and b.
[8, [2, 9, 4], 5]
[1, [2, 9, 4], 5]
For a truly independent copy, you can use deepcopy, which will work down to as many levels as needed to separate the two objects.

In your example code, as "int" is a built-in type in C++, so the operator "=" could not be overloaded, but "=" doesn't always create new object, they could also reference to same object. The python object module is kind of like Java, most of the object is an reference but not a copy.
You can also try this:
a = 7
b = 7
print id(a), ' ' , id(b)
it output the same result, as python will find both a and b point to same const variable

How to pass a list element as reference?

I am passing a single element of a list to a function. I want to modify that element, and therefore, the list itself.
def ModList(element):
element = 'TWO'
l = list();
l.append('one')
l.append('two')
l.append('three')
print l
ModList(l[1])
print l
But this method does not modify the list. It's like the element is passed by value. The output is:
['one','two','three']
['one','two','three']
I want that the second element of the list after the function call to be 'TWO':
['one','TWO','three']
Is this possible?

The explanations already here are correct. However, since I have wanted to abuse python in a similar fashion, I will submit this method as a workaround.
Calling a specific element from a list directly returns a copy of the value at that element in the list. Even copying a sublist of a list returns a new reference to an array containing copies of the values. Consider this example:
>>> a = [1, 2, 3, 4]
>>> b = a[2]
>>> b
3
>>> c = a[2:3]
>>> c
[3]
>>> b=5
>>> c[0]=6
>>> a
[1, 2, 3, 4]
Neither b, a value only copy, nor c, a sublist copied from a, is able to change values in a. There is no link, despite their common origin.
However, numpy arrays use a "raw-er" memory allocation and allow views of data to be returned. A view allows data to be represented in a different way while maintaining the association with the original data. A working example is therefore
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4])
>>> a
array([1, 2, 3, 4])
>>> b = a[2]
>>> b
3
>>> b=5
>>> a
array([1, 2, 3, 4])
>>> c = a[2:3]
>>> c
array([3])
>>> c[0]=6
>>> a
array([1, 2, 6, 4])
>>>
While extracting a single element still copies by value only, maintaining an array view of element 2 is referenced to the original element 2 of a (although it is now element 0 of c), and the change made to c's value changes a as well.
Numpy ndarrays have many different types, including a generic object type. This means that you can maintain this "by-reference" behavior for almost any type of data, not only numerical values.

Python doesn't do pass by reference. Just do it explicitly:
l[1] = ModList(l[1])
Also, since this only changes one element, I'd suggest that ModList is a confusing name.

Python is a pass by value language hence you can't change the value by assignment in the function ModList. What you could do instead though is pass the list and index into ModList and then modify the element that way
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
ModList(l, 1)

In many cases you can also consider to let the function both modify and return the modified list. This makes the caller code more readable:
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
return theList
l = ModList(l, 1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

What happens internally when concatenating two lists in Python? - python

You can see the implementation in listobject.c::list_concat. Python will get the size of a and b and create a new list object of that size. It will then loop through the values of a and b, which are C pointers to python objects, increment their ref counts and add those pointers to the new list.

Related

Having problem with Shared Reference in case of Lists [duplicate]

How to create a new data object in memory, rather than pointing to one? (in Python 3)

Python: those variable dont point to the same values. Why?

Understanding python's name binding

How to pass a list element as reference?

Categories

Resources