Shallow or deep copy in a list comprehension - python

If you have a list (Original) in python like:
class CustomID (object):
def __init__(self, *args):
self.ID = ''
self.manymore = float()
self.visited = False
self.isnoise = False
IDlist = ['1','2','3','4','5','6','7','8','9','10']
Original = list()
for IDs in IDlist:
NewObject = CustomID()
NewObject.ID = IDs
Original.append(NewObject)
and if you do a comprehension for a new list and a function to use over the comprehension sublist:
def Func(InputList=list()):
for objects in InputList:
objects.visited = True
return InputList
New_List = [member for member in Original if (int(member.ID)>5)]
ThirdList = Func(New_List)
Is this (New_List) resulting in a shallow or deep copy of the original list? The matter is relevant for me, if the original list contains objects, which attributes can change in the code to follow New_List creation (ThirdList). New_list is send to a function, which will change the attributes. The question is if you try to reuse the original list for the same function with different comprehension (lets say (members>4).
New_List = [member for member in Original if (int(member.ID)>4)]
Actually:
print New_List[3].visited
gives True.

You are creating a shallow, filtered copy.
Your loop doesn't create copies of member, it references them directly.
Not that you would need to create copies, all objects in your original list are immutable integers. Moreover, CPython interns small integers, creating copies would only result in the exact same objects being used for these.
To illustrate, try to create a copy of a list containing mutable objects instead. Here I used a dictionary:
>>> sample = [{'foo': 'bar'}]
>>> copy = [s for s in sample]
>>> copy[0]['spam'] = 'eggs'
>>> copy.append({'another': 'dictionary'})
>>> sample
[{'foo': 'bar', 'spam': 'eggs'}]
The copy list is a new list object containing a reference to the same dictionary contained in sample. Altering that dictionary is reflected in both copy and sample, but appending to copy doesn't alter the original list.
As for your updated loop code, your sample produces a New_List list that still shares objects, and New_List[3].visited is in fact True:
>>> New_List[3].ID
'8'
>>> New_List[3].visited
True
because it is still the same object found in Original at index 7:
>>> New_List[3] is Original[7]
True
which is the same object still found in ThirdList at index 2:
>>> ThirdList[2] is New_List[3]
True

Another idea, which worked for me is to implement a flagClear method in the class
class CustomID (object):
def __init__(self, *args):
self.ID = ''
self.manymore = float()
self.visited = False
self.isnoise = False
def flagClear(self):
self.visited = False
return self
And then, every time I construct a new list, simply to use the method:
New_List = [member.flagClear() for member in Original if (int(member.ID)>4)]
If the only thing I modify in the CustomID is the .visited flag, than this works. Obviously, it will not be perfect. If someone needs a complete solution, the suggestion of Martijn Pieters will work best (implementing .copy() method):
import copy
class CustomID (object):
def __init__(self, *args):
self.ID = ''
self.manymore = float()
self.visited = False
self.isnoise = False
def CustomCopy(self):
return copy.deepcopy(self)
New_List = [member.CustomCopy() for member in Original if (int(member.ID)>4)]
Thank you Martijn, this was really a learning experience for me.

Related

Updating a dictionary privately?

I have a class object that receives some data. Based on a condition, I need that data to change, but only under that condition. Problem I'm running into is that when I call dict.update() , it updates the original variable too. So a subsequent request comes in, and now that original variable is "tainted" so to speak, and is using overridden information that it shouldn't have.
Assuming a dictionary like this:
my_attributes = {"test": True}
And some logic like this:
class MyClass(object):
def __init__(self, attributes):
if my_condition():
attributes.update({"test": False})
The end result:
>>> my_attributes
{'test': False}
So, the next time MyClass is used, those root attributes are still overridden.
I've seemingly gotten around this problem by re-defining attributes:
class MyClass(object):
def __init__(self, attributes):
if my_condition():
attributes = {}
attributes.update(my_attributes)
attributes.update({"test": False})
This has seemed to get around the problem, but I'm not entirely sure this is a good, or even the right, solution to the issue.
Something like this:
class MyClass(object):
#staticmethod
def my_condition():
return True
def __init__(self, attributes):
self.attributes = {**attributes}
if MyClass.my_condition():
self.attributes["test"] = False
my_attributes = {"test": True}
cls_obj = MyClass(my_attributes)
print("my_attributes:", my_attributes, "class.attributes:", cls_obj.attributes)
Output:
my_attributes: {'test': True} class.attributes: {'test': False}
You pass a (mutable) dictionary reference to an object. Now, you have two owners of the reference: the caller of the constructor (the "external world" for the object) and the object itself. These two owners may modify the dictionary. Here is an illustration:
>>> d = {}
>>> def ctor(d): return [d] # just build a list with one element
>>> L = ctor(d)
>>> d[1] = 2
>>> L
[{1: 2}]
>>> L[0][3] = 4
>>> d
{1: 2, 3: 4}
How do you prevent this? Both owners want to protect themselves from wild mutation of their variables. If I were the external world, I would like to pass an immutable reference to the dict, but Python does not provide immutable references for dicts. A copy is the way to go:
>>> d = {}
>>> L = ctor(dict(d)) # I don't give you *my* d
>>> d[1] = 2
>>> L
[{}]
If I were the object, I would do a copy of the object before using it:
>>> d = {}
>>> def ctor2(d): return [dict(d)] # to be sure L[0] is *mine*!
>>> L = ctor2(dict(d)) # I don't give you *my* d
But now you have made two copies of the object just because everyone is scared to see its variables modified by the other. And the issue is still here if the dictionary contains (mutable) references.
The solution is to spell out the responsibilities of each one:
class MyClass(object):
"""Usage: MyClass(attributes).do_something() where attributes is a mapping.
The mapping won't be modified"""
...
Note that this is the common expected behavior: unless specified, the arguments of a function/contructor are not modified. We avoid side effect when possible, but that's not always the case: see list.sort() vs sorted(...).
Hence I think your solution is good. But I prefer to avoid too much logic in the constructor:
class MyClass(object):
#staticmethod
def create_prod(attributes):
attributes = dict(attributes)
attributes.update({"test": False})
return MyClass(attributes)
#staticmethod
def create_test(attributes):
return MyClass(attributes)
def __init__(self, attributes):
self._attributes = attributes # MyClass won't modify attributes

Shallow copy list of objects

What would be the best way to transfer references of objects from one list to another (move objects from one list to another). For clarity, I need to remove the objects from d[1] after copying
class MyObject:
def __init__(self,v):
self.value = v
d = {1: [MyObject("obj1"),MyObject("obj2")], 2: []}
#which one?
#d[2] = [obj for obj in d[1]]
#d[2] = d[1][:]
#d[2] = d[1].copy()
#clear d[1]
#d[1] = []
for i in range(len(d[1])):
d[2].append(d[1].pop(0))
for o in d[2]:
print (o.value)
Which approach is best depends a bit on the details of the surrounding code. Does any other variable or data structure contain a reference to either of your lists? If not, you can just rebind the references in the dict (which takes O(1) time since no copying happens):
d = {1: [MyObject("obj1"),MyObject("obj2")], 2: []}
d[2] = d[1]
d[1] = []
If other references to the existing lists might exist and you want them to continue referring to the correct values (e.g. an old reference to d[1] should still reference d[1] after the changes), then you want to do a slice assignment followed by a clear (this is O(N)):
d[2][:] = d[1] # copy data
d[1].clear()
I don't think there's a good reason to use any other approach unless you have some other logic to apply (for instance, if you only want to copy some of the values and not others).
Adding a dictionary obscures the use case. Based on your examples, it's not clear if you want a copy of the object or a list referencing the same objects.
Assuming the latter, consider the simplified case. It's really as simple as assigning the list to another variable:
>>> class MyObject(object):
... def __init__(self, v):
... self.value = v
...
>>> x = [MyObject(1), MyObject(2)]
>>> y = x
>>> x[1].value
2
Now, both x and y are a list of the same referenced objects. If I change the object in one list, it will change in the other:
>>> y[1].value = 3
>>> x[1].value
3
In your use case (a dictionary with list values), this is quite simple:
d[2] = d[1]
You can then delete the 1 key if necessary:
del d[1]
Voila!

how to find objects which are in a list, but not in another list, comparing by property. in python2

I have 2 lists of objects in python2. They are of different types but have a common property ('name'). I control one list (my_list) and the other is sent to me (src_list).
I want to find new objects in src_list, that aren't in my_list, by comparing their 'name' property.
The pseudo code (and how I'd do this in C) is below, but I'm after a python way of doing it, probably list comprehensions and stuff but I couldn't figure it out.
new_list = []
for srco in src_list: # iterate everything in src list
found = False
for myo in my_list: # iterate everything in my list
if(srco.name.lower() == myo.name.lower()): # compare names, break if true
found = True
break
if not found: # add to new list if wasn't found
new_list.append(srco)
Use a set for fast lookups.
my_list_names = {obj.name.lower() for obj in my_list}
new_list = [obj for obj in src_list if obj.name.lower() not in my_list_names]
Also, if you want to learn to be more pythonic, don't do the found pattern. Do this:
for myo in my_list:
if(srco.name.lower() == myo.name.lower()):
break
else: # executes if there was no break
new_list.append(srco)
Yes, list comprehension comes to mind like you already said. Then, the map builtin can be used to create an iterator for the names from one of the lists.
result = [obj for obj in srco if obj.name.lower() not in set(map(lambda x: x.name.lower(), myo))]
One easy way is to first generate set of names to exclude from my_list and then iterate over src_list keeping the items that can't be found from set:
exclude = {x.name.lower() for x in my_list}
new_list = [x for x in src_list if x.name.lower() not in exclude]
You could use sets to do that.
Create a set with containing the my_list names, another set containing src_list and simply substract the two :
diff_set = src_list_set - my_list_set
And you can then go fetch the objects whose names appear in diff_set
my_list_set = {obj.name.lower() for obj in my_list}
src_list_set = {obj.name.lower() for obj in src_list}
diff_set = src_list_set - my_list_set
new_list = [obj for obj in src_list if obj.name.lower() in diff_set]
(the solution may not be specially short but by replacing the minus by an other operation you can find on the official doc, it fits at a minimal cost many other situations)

Appending a list inside a function

I am still a bit confused about how arguments are passed in python.
I thought non-primitive types are passed by reference, but why does the following code not print [1] then?
def listTest(L):
L = L + [1]
def main:
l = []
listTest(l)
print l #prints []
and how could I make it work.
I guess I need to pass "a pointer to L" by reference
In listTest() you are rebinding L to a new list object; L + [1] creates a new object that you then assign to L. This leaves the original list object that L referenced before untouched.
You need to manipulate the list object referenced by L directly by calling methods on it, such as list.append():
def listTest(L):
L.append(1)
or you could use list.extend():
def listTest(L):
L.extend([1])
or you could use in-place assignment, which gives mutable types the opportunity to alter the object in-place:
def listTest(L):
L += [1]

"Deep copy" nested list without using the deepcopy function

I am trying to copy the nested list a, but do not know how to do it without using the copy.deepcopy function.
a = [[1, 2], [3, 4]]
I used:
b = a[:]
and
b = a[:][:]
But they all turn out to be shallow copy.
Any hints?
My entry to simulate copy.deepcopy:
def deepcopy(obj):
if isinstance(obj, dict):
return {deepcopy(key): deepcopy(value) for key, value in obj.items()}
if hasattr(obj, '__iter__'):
return type(obj)(deepcopy(item) for item in obj)
return obj
The strategy: iterate across each element of the passed-in object, recursively descending into elements that are also iterable and making new objects of their same type.
I make no claim whatsoever that this is comprehensive or without fault [1] (don't pass in an object that references itself!) but should get you started.
[1] Truly! The point here is to demonstrate, not cover every possible eventuality. The source to copy.deepcopy is 50 lines long and it doesn't handle everything.
You can use a LC if there's but a single level.
b = [x[:] for x in a]
This is a complete cheat - but will work for lists of "primitives" - lists, dicts, strings, numbers:
def cheat_copy(nested_content):
return eval(repr(nested_content))
There are strong security implications to consider for this - and it will not be particularly fast. Using json.dumps and loads will be more secure.
I found a way to do it using recursion.
def deep_copy(nested_content):
if not isinstance(nested_content,list):
return nested_content
else:
holder = []
for sub_content in nested_content:
holder.append(deep_copy(sub_content))
return holder
For the recursive version, you have to keep track of a secondary list and return each time.

Categories

Resources