Can I "detect" a slicing expression in a python class method? - python

I am developing an application where I have defined a "variable" object containing data in the form of a numpy array. These variables are linked to (netcdf) data files, and I would like to dynamically load the variable values when needed instead of loading all data from the sometimes huge files at the start.
The following snippet demonstrates the principle and works well, including access to data portions with slices. For example, you can write:
a = var() # empty variable
print a.values[7] # values have been automatically "loaded"
or even:
a = var()
a[7] = 0
However, this code still forces me to load the entire variable data at once. Netcdf (with the netCDF4 library) would allow me to directly access data slices from the file. Example:
f = netCDF4.Dataset(filename, "r")
print f.variables["a"][7]
I cannot use the netcdf variable objects directly, because my application is tied to a web service which cannot remember the netcdf file handler, and also because the variable data don't always come from netcdf files, but may originate from other sources such as OGC web services.
Is there a way to "capture" the slicing expression in the property or setter methods and use them? The idea would be to write something like:
#property
def values(self):
if self._values is None:
self._values = np.arange(10.)[slice] # load from file ...
return self._values
instead of the code below.
Working demo:
import numpy as np
class var(object):
def __init__(self, values=None, metadata=None):
if values is None:
self._values = None
else:
self._values = np.array(values)
self.metadata = metadata # just to demonstrate that var has mor than just values
#property
def values(self):
if self._values is None:
self._values = np.arange(10.) # load from file ...
return self._values
#values.setter
def values(self, values):
self._values = values
First thought: Should I perhaps create values as a separate class and then use __getitem__? See In python, how do I create two index slicing for my own matrix class?

No, you cannot detect what will be done to the object after returning from .values. The result could be stored in a variable and only (much later on) be sliced, or sliced in different places, or used in its entirety, etc.
You indeed should instead return a wrapper object and hook into object.__getitem__; it would let you detect slicing and load data as needed. When slicing, Python passes in a slice() object.

Thanks to the guidance of Martijn Pieters and with a bit more reading, I came up with the following code as demonstration. Note that the Reader class uses a netcdf file and the netCDF4 library. If you want to try out this code yourself you will either need a netcdf file with variables "a" and "b", or replace Reader with something else that will return a data array or a slice from a data array.
This solution defines three classes: Reader does the actual file I/O handling, Values manages the data access part and invokes a Reader instance if no data have been stored in memory, and var is the final "variable" which in real life will contain a lot more metadata. The code contains a couple of extra print statements for educational purposes.
"""Implementation of a dynamic variable class which can read data from file when needed or
return the data values from memory if they were read already. This concepts supports
slicing for both memory and file access."""
import numpy as np
import netCDF4 as nc
FILENAME = r"C:\Users\m.schultz\Downloads\data\tmp\MACC_20141224_0001.nc"
VARNAME = "a"
class Reader(object):
"""Implements the actual data access to variable values. Here reading a
slice from a netcdf file.
"""
def __init__(self, filename, varname):
"""Final implementation will also have to take groups into account...
"""
self.filename = filename
self.varname = varname
def read(self, args=slice(None, None, None)):
"""Read a data slice. Args is a tuple of slice objects (e.g.
numpy.index_exp). The default corresponds to [:], i.e. all data
will be read.
"""
with nc.Dataset(self.filename, "r") as f:
values = f.variables[self.varname][args]
return values
class Values(object):
def __init__(self, values=None, reader=None):
"""Initialize Values. You can either pass numerical (or other) values,
preferrably as numpy array, or a reader instance which will read the
values on demand. The reader must have a read(args) method, where
args is a tuple of slices. If no args are given, all data should be
returned.
"""
if values is not None:
self._values = np.array(values)
self.reader = reader
def __getattr__(self, name):
"""This is only be called if attribute name is not present.
Here, the only attribute we care about is _values.
Self.reader should always be defined.
This method is necessary to allow access to variable.values without
a slicing index. If only __getitem__ were defined, one would always
have to write variable.values[:] in order to make sure that something
is returned.
"""
print ">>> in __getattr__, trying to access ", name
if name == "_values":
print ">>> calling reader and reading all values..."
self._values = self.reader.read()
return self._values
def __getitem__(self, args):
print "in __getitem__"
if not "_values" in self.__dict__:
values = self.reader.read(args)
print ">>> read from file. Shape = ", values.shape
if args == slice(None, None, None):
self._values = values # all data read, store in memory
return values
else:
print ">>> read from memory. Shape = ", self._values[args].shape
return self._values[args]
def __repr__(self):
return self._values.__repr__()
def __str__(self):
return self._values.__str__()
class var(object):
def __init__(self, name=VARNAME, filename=FILENAME, values=None):
self.name = name
self.values = Values(values, Reader(filename, name))
if __name__ == "__main__":
# define a variable and access all data first
# this will read the entire array and save it in memory, so that
# subsequent access with or without index returns data from memory
a = var("a", filename=FILENAME)
print "1: a.values = ", a.values
print "2: a.values[-1] = ", a.values[-1]
print "3: a.values = ", a.values
# define a second variable, where we access a data slice first
# In this case the Reader only reads the slice and no data are stored
# in memory. The second access indexes the complete array, so Reader
# will read everything and the data will be stored in memory.
# The last access will then use the data from memory.
b = var("b", filename=FILENAME)
print "4: b.values[0:3] = ", b.values[0:3]
print "5: b.values[:] = ", b.values[:]
print "6: b.values[5:8] = ",b.values[5:8]

Related

How does LazyFrames from OpenAI's baselines save memory?

OpenAI's baselines use the following code to return a LazyFrames instead of a concatenated numpy array to save memory. The idea is to take the advantage of the fact that a numpy array can be saved at different lists at the same time as lists only save a reference not object itself. However, in the implementation of LazyFrames, it further saves the concatenated numpy array in self._out, in that case if every LazyFrames object has been invoked at least once, it will always save a concatenated numpy array within it, which does not seem to save any memory at all. Then what's the point of LazeFrames? Or do I misunderstand anything?
class FrameStack(gym.Wrapper):
def __init__(self, env, k):
"""Stack k last frames.
Returns lazy array, which is much more memory efficient.
See Also
--------
baselines.common.atari_wrappers.LazyFrames
"""
gym.Wrapper.__init__(self, env)
self.k = k
self.frames = deque([], maxlen=k)
shp = env.observation_space.shape
self.observation_space = spaces.Box(low=0, high=255, shape=(shp[:-1] + (shp[-1] * k,)), dtype=env.observation_space.dtype)
def reset(self):
ob = self.env.reset()
for _ in range(self.k):
self.frames.append(ob)
return self._get_ob()
def step(self, action):
ob, reward, done, info = self.env.step(action)
self.frames.append(ob)
return self._get_ob(), reward, done, info
def _get_ob(self):
assert len(self.frames) == self.k
return LazyFrames(list(self.frames))
class LazyFrames(object):
def __init__(self, frames):
"""This object ensures that common frames between the observations are only stored once.
It exists purely to optimize memory usage which can be huge for DQN's 1M frames replay
buffers.
This object should only be converted to numpy array before being passed to the model.
You'd not believe how complex the previous solution was."""
self._frames = frames
self._out = None
def _force(self):
if self._out is None:
self._out = np.concatenate(self._frames, axis=-1)
self._frames = None
return self._out
def __array__(self, dtype=None):
out = self._force()
if dtype is not None:
out = out.astype(dtype)
return out
def __len__(self):
return len(self._force())
def __getitem__(self, i):
return self._force()[i]
def count(self):
frames = self._force()
return frames.shape[frames.ndim - 1]
def frame(self, i):
return self._force()[..., I]
I actually came here trying to understand how this saved any memory at all! But you mention that the lists store references to the underlying data, while the numpy arrays store copies of that data, and I think you are correct about that.
To answer your question, you are right! When _force is called, it populates the self._out item with a numpy array, thereby expanding the memory. But until you call _force (which is called in any of the API functions of the LazyFrame), self._out is None. So the idea is to not call _force (and therefore, don't call any of the LazyFrames methods) until you need the underlying data, hence the warning in its doc string of "This object should only be converted to numpy array before being passed to the model".
Note that when self._out gets populated by the array, it also clears the self._frames, so that it doesn't store duplicate information (thereby harming its whole purpose of storing only as much as it needs).
Also, in the same file, you'll find the ScaledFloatFrame which carries this doc string:
Scales the observations by 255 after converting to float.
This will undo the memory optimization of LazyFrames,
so don't use it with huge replay buffers.

Is there a way to force Python function defined under a class to return something (as opposed to nothing) of specific data type?

I am aware that Python does not have strong typing and that it does not support keywords specifying return types like void, int and similar in Java and C. I am also aware that we can use type hints to tell the users that they could expect something of specific type in return from a function.
I am trying to implement a Python class which will read a config file (say, a JSON file) that dictates what data transformation methods should be applied on a pandas dataframe. The config file looks something like:
[
{
"input_folder_path": "./input/budget/",
"input_file_name_or_pattern": "Global Budget Roll-up_9.16.19.xlsx",
"sheet_name_of_excel_file": "Budget Roll-Up",
"output_folder_path": "./output/budget/",
"output_file_name_prefix": "transformed_budget_",
"__comment__": "(Optional) File with Python class that houses data transformation functions, which will be imported and used in the transform process. If not provided, then the code will use default class in the 'transform_function.py' file.",
"transform_functions_file": "./transform_functions/budget_transform_functions.py",
"row_number_of_column_headers": 0,
"row_number_where_data_starts": 1,
"number_of_rows_to_skip_from_the_bottom_of_the_file": 0,
"__comment__": "(Required) List of the functions and their parameters.",
"__comment__": "These functions must be defined either in transform_functions.py or individual transformation file such as .\\transform_function\\budget_transform_functions.py",
"functions_to_apply": [
{
"__function_comment__": "Drop empty columns in Budget roll up Excel file. No parameters required.",
"function_name": "drop_unnamed_columns"
},
{
"__function_comment__": "By the time we run this function, there should be only 13 columns total remaining in the raw data frame.",
"function_name": "assert_number_of_columns_equals",
"function_args": [13]
},
{
"__function_comment__": "Map raw channel names 'Ecommerce' and 'ecommerce' to 'E-Commerce'.",
"transform_function_name": "standardize_to_ecommerce",
"transform_function_args": [["Ecommerce", "ecommerce"]]
}
]
}
]
In the main.py code, I have something like this:
if __name__ == '__main__':
# 1. Process arguments passed into the program
parser = argparse.ArgumentParser(description=transform_utils.DESC,
formatter_class = argparse.RawTextHelpFormatter,
usage=argparse.SUPPRESS)
parser.add_argument('-c', required=True, type=str,
help=transform_utils.HELP)
args = parser.parse_args()
# 2. Load JSON configuration file
if (not args.c) or (not os.path.exists(args.c)):
raise transform_errors.ConfigFileError()
# 3. Iterate through each transform procedure in config file
for config in transform_utils.load_config(args.c):
output_file_prefix = transform_utils.get_output_file_path_with_name_prefix(config)
custom_transform_funcs_module = transform_utils.load_custom_functions(config)
row_idx_where_data_starts = transform_utils.get_row_index_where_data_starts(config)
footer_rows_to_skip = transform_utils.get_number_of_rows_to_skip_from_bottom(config)
for input_file in transform_utils.get_input_files(config):
print("Processing file:", input_file)
col_headers_from_input_file = transform_utils.get_raw_column_headers(input_file, config)
if transform_utils.is_excel(input_file):
sheet = transform_utils.get_sheet(config)
print("Skipping this many rows (including header row) from the top of the file:", row_idx_where_data_starts)
cur_df = pd.read_excel(input_file,
sheet_name=sheet,
skiprows=row_idx_where_data_starts,
skipfooter=footer_rows_to_skip,
header=None,
names=col_headers_from_input_file
)
custom_funcs_instance = custom_transform_funcs_module.TaskSpecificTransformFunctions()
for func_and_params in transform_utils.get_functions_to_apply(config):
print("=>Invoking transform function:", func_and_params)
func_args = transform_utils.get_transform_function_args(func_and_params)
func_kwargs = transform_utils.get_transform_function_kwargs(func_and_params)
cur_df = getattr(custom_funcs_instance,
transform_utils.get_transform_function_name(
func_and_params))(cur_df, *func_args, **func_kwargs)
In budget_transform_functions.py file, I have:
class TaskSpecificTransformFunctions(TransformFunctions):
def drop_unnamed_columns(self, df):
"""
Drop columns that have 'Unnamed' as column header, which is a usual
occurrence for some Excel/CSV raw data files with empty but hidden columns.
Args:
df: Raw dataframe to transform.
params: We don't need any parameter for this function,
so it's defaulted to None.
Returns:
Dataframe whose 'Unnamed' columns are dropped.
"""
return df.loc[:, ~df.columns.str.contains(r'Unnamed')]
def assert_number_of_columns_equals(self, df, num_of_cols_expected):
"""
Assert that the total number of columns in the dataframe
is equal to num_of_cols (int).
Args:
df: Raw dataframe to transform.
num_of_cols_expected: Number of columns expected (int).
Returns:
The original dataframe is returned if the assertion is successful.
Raises:
ColumnCountMismatchError: If the number of columns found
does not equal to what is expected.
"""
if df.shape[1] != num_of_cols_expected:
raise transform_errors.ColumnCountError(
' '.join(["Expected column count of:", str(num_of_cols_expected),
"but found:", str(df.shape[1]), "in the current dataframe."])
)
else:
print("Successfully check that the current dataframe has:", num_of_cols_expected, "columns.")
return df
As you can see, I need future implementer of budget_transform_functions.py to be aware that the functions within TaskSpecificTransformFunctions must always return a pandas dataframe. I know that in Java, you can create an interface and whoever implements that interface have to abide by the return values of each method in that interface. I'm wondering if we have similar construct (or a workaround to achieve similar thing) in Python.
Hope this lengthy question make sense and I'm hoping someone with a lot more Python experience than I have will be able to teach me something about this. Thank you very much in advance for your answers/suggestions!
One way to check the return type of a function at least at run time is to wrap the function in another function that checks the return type. To automate this for subclasses, there is __init_subclass__. This can be used in the following way (polishing and handling of special cases needed yet):
import pandas as pd
def wrapCheck(f):
def checkedCall(*args, **kwargs):
r = f(*args, **kwargs)
if not isinstance(r, pd.DataFrame):
raise Exception(f"Bad return value of {f.__name__}: {r!r}")
return r
return checkedCall
class TransformFunctions:
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
for k, v in cls.__dict__.items():
if callable(v):
setattr(cls, k, wrapCheck(v))
class TryTransform(TransformFunctions):
def createDf(self):
return pd.DataFrame(data={"a":[1,2,3], "b":[4,5,6]})
def noDf(self, a, b):
return a + b
tt = TryTransform()
print(tt.createDf()) # Works
print(tt.noDf(2, 2)) # Fails with exception

Python methods from csv

I am working on an assignment where I create "instances" of cities using rows in a .csv, then use these instances in methods to calculate distance and population change. Creating the instances works fine (using steps 1-4 below), until I try to call printDistance:
##Step 1. Open and read CityPop.csv
with open('CityPop.csv', 'r', newline='') as f:
try:
reader = csv.DictReader(f)
##Step 2. Create "City" class
class City:
##Step 3. Use _init method to assign attribute values
def __init__(self, row, header):
self.__dict__ = dict(zip(header, row))
##Step 4. Create "Cities" list
data = list(csv.reader(open('CityPop.csv')))
instances = [City(i, data[0]) for i in data[1:]]
##Step 5. Create printDistance method within "Cities" class
def printDistance(self, othercity, instances):
dist=math.acos((math.sin(math.radians(self.lat)))*(math.sin(math.radians(othercity.lat)))+(math.cos(math.radians(self.lat)))*(math.cos(math.radians(othercity.lat)))*(math.cos(math.radians(self.lon-othercity.lon)))) * 6300 (self.lat, self.lon, othercity.lat, othercity.lon)
When I enter instances[0].printDistance(instances1) in the shell, I get the error:
`NameError: name 'instances' is not defined`
Is this an indentation problem? Should I be calling the function from within the code, not the shell?
Nested functions must not contain self as parameter because they are not member functions. Class cannot pass instance variables to them. You are infact passing the same self from parent to child function.
Also you must not nest constructor, this is only for initiation purpose. Create a separate method indeed.
And try creating instance variable inside the constructor, and that is what init for !
self.instances = [self.getInstance(i, data[0]) for i in data[1:]]
Also create seperate function for instantiation
#classmethod
def getInstance(cls,d1,d2):
return cls(d1,d2)
This is not so much an indentation problem, but more of a general code structure problem. You're nesting a lot:
All the actual work on an incredibly long line (with errors)
Inside of function (correctly) printDistance
Inside of a constructor __init__
Inside of a class definition (correctly) City
Inside of a try block
Inside of a with block
I think this is what you are trying to do:
create a class City, which can print the distance of itself to other cities
generate a list of these City objects from a .csv that somehow has both distances and population (you should probably provide an example of data)
do so in a fault-tolerant and clean way (hence the try and the with)
The reason your instances isn't working is because, unlike you think, it's probably not being created correctly, or at least not in the correct context. And it certainly won't be available to you on the CLI due to all of the nesting.
There's a number of blatant bugs in your code:
What's the (self.lat, self.lon, othercity.lat, othercity.lon) at the end of the last line?
Why are you opening the file for reading twice? You're not even using the first reader
You are bluntly assigning column headers from a .csv as object attributes, but are misspelling their use (lat instead of latitude and lon instead of longitude)
It looks a bit like a lot of code found in various places got pasted together into one clump - this is what it looks like when cleaned up:
import csv
import math
class City:
def print_distance(self, other_city):
print(f'{self.city} to {other_city.city}')
# what a mess...
print(math.acos(
(math.sin(math.radians(float(self.latitude)))) * (math.sin(math.radians(float(other_city.latitude)))) + (
math.cos(math.radians(float(self.latitude)))) * (math.cos(math.radians(float(other_city.latitude)))) * (
math.cos(math.radians(float(self.longitude) - float(other_city.longitude))))) * 6300)
def __init__(self, values, attribute_names):
# this is *nasty* - much better to add the attributes explicitly, but left as original
# also, note that you're reading strings and floats here, but they are all stored as str
self.__dict__ = dict(zip(attribute_names, values))
with open('CityPop.csv', 'r', newline='') as f:
try:
reader = csv.reader(f)
header = next(reader)
cities = [City(row, header) for row in reader]
for city_1 in cities:
for city_2 in cities:
city_1.print_distance(city_2)
except Exception as e:
print(f'Apparently were doing something with this error: {e}')
Note how print_distance is now a method of City, which is called on each instance of City in cities (which is what I renamed instances to).
Now, if you are really trying, this makes more sense:
import csv
import math
class City:
def print_distance(self, other_city):
print(f'{self.name} to {other_city.name}')
# not a lot better, but some at least
print(
math.acos(
math.sin(math.radians(self.lat)) *
math.sin(math.radians(other_city.lat))
+
math.cos(math.radians(self.lat)) *
math.cos(math.radians(other_city.lat)) *
math.cos(math.radians(self.lon - other_city.lon))
) * 6300
)
def __init__(self, lat, lon, name):
self.lat = float(lat)
self.lon = float(lon)
self.name = str(name)
try:
with open('CityPop.csv', 'r', newline='') as f:
reader = csv.reader(f)
header = next(reader)
cities = [City(lat=row[1], lon=row[2], name=row[4]) for row in reader]
for city_1 in cities:
for city_2 in cities:
city_1.print_distance(city_2)
except FileNotFoundError:
print(f'Could not find the input file.')
Note the cleaned up computation, the catching of an error that could be expected to occur (with the with insides the try block) and a proper constructor that assigns what it needs with the correct type, while the reader decides which fields go where.
Finally, as a bonus: nobody should be writing distance calculations like this. Plenty libraries exist that do a much better job of this, like GeoPy. All you need to do is pip install geopy to get it and then you can use this:
import csv
import geopy.distance
class City:
def calc_distance(self, other_city):
return geopy.distance.geodesic(
(self.lat, self.lon),
(other_city.lat, other_city.lon)
).km
def __init__(self, lat, lon, name):
self.lat = float(lat)
self.lon = float(lon)
self.name = str(name)
try:
with open('CityPop.csv', 'r', newline='') as f:
reader = csv.reader(f)
header = next(reader)
cities = [City(lat=row[1], lon=row[2], name=row[4]) for row in reader]
for city_1 in cities:
for city_2 in cities:
print(city_1.calc_distance(city_2))
except FileNotFoundError:
print(f'Could not find the input file.')
Note that I moved the print out of the method as well, since it makes more sense to calculate in the object and print outside it. The nice thing about all this is that the calculation now uses a proper geodesic (WGS-84) to do the calculation and the odds of math errors are drastically reduced. If you must use a simple sphere, the library has functions for that as well.

Applying Class to Items in a List

What I'm trying to do is allow any number of attributes to be supplied to a function. This function will handle creating a class based on those attributes. Then, I've got another function that will handle importing data from a text file, applying the generated class to each item, and adding it to a list. Below is what I have.
def create_class(attributes):
class classObject:
def __init__(self, **attributes):
for attr in attributes.keys():
self.__dict__[attr] = attributes[attr]
return classObject
def file_to_list(file, attributes):
classObject = create_class(attributes)
with open(file, "r") as f:
var = []
for line in f.readlines():
var.append(classObject(line))
return var
data = file_to_list("file.txt", ["propA", "propB"])
The issue is with how I'm trying to add the item to the list. Normally, I wouldn't have any issue, but I believe the way in which I'm creating the class is causing issues with how I usually do it.
File "file.py", line 17, in file_to_list
var.append(classObject(line))
TypeError: init() takes 1 positional argument but 2 were given
How do I loop through each of the attributes of the class, so that I can set the value for each and add it to the list?
UPDATE:
Below is an example of what file.txt looks like.
1A,1B
2A,2B
3A,3B
It looks like your class generation is wrong. You appear to want to be able to do:
Cls = create_class(["some", "attributes", "go", "here"])
and end up with a class object that looks like:
class Cls(object):
def __init__(self, some, attributes, go, here):
self.some = some
self.attributes = attributes
self.go = go
self.here = here
but what you're actually doing is creating a class that takes a dictionary, and gives that dictionary dot-syntax.
>>> obj = Cls({"different": "attributes", "go": "here"})
>>> obj.different
"attributes"
>>> obj.go
"here"
You can implement the former with:
def create_class(attributes: typing.List[str]):
class gen_class(object):
def __init__(self, *args):
if len(args) != len(attributes):
# how do you handle the case where the caller specifies fewer or more
# arguments than the generated class expects? I would throw a...
raise ValueError(f"Wrong number of arguments (expected {len(attributes)}, got {len(args)}.")
for attr, value in zip(attributes, args):
setattr(self, attr, value)
Then you should be able to use csv.reader to read in your file and instantiate those classes.
import csv
CSV_Cls = create_class(["propA", "propB"])
with open(file) as f:
reader = csv.reader(f)
data = [CSV_Cls(*row) for row in reader]
However, it does seem that writing your own code generator to make that class is the wrong choice here. Why not used a collections.namedtuple instead?
from collections import namedtuple
CSV_Cls = namedtuple("CSV_Cls", "propA propB")
with open(file) as f:
reader = csv.reader(f)
data = [CSV_Cls(*row) for row in reader]
This stdlib codegen is already written, known to work (and heavily tested) and won't accidentally introduce errors. The only reason to prefer a class is if you need to tightly-couple some behavior to the data, or if you need a mutable data structure
First, why not use type for this instead? It's the default metaclass, i.e. a callable that creates class objects. The class dict will be the third argument, which makes it easy to create programmatically.
type(name, (), attributes)
(You probably don't need any base classes, but that's what the second argument is for.)
Second, your __init__ doesn't appear to accept a str, which is the only thing you can get from readlines(). It takes only self (implied) and keyword arguments.
You could perhaps convert the line str to a dict (but that depends on what's in it), and then use the dict as your kwargs, like classObject(**kwargs), but then there's probably no point in declaring it with stars in the __init__ method in the first place.

What am I doing wrong? Python object instantiation keeping data from previous instantiation?

Can someone point out to me what I'm doing wrong or where my understanding is wrong?
To me, it seems like the code below which instantiates two objects should have separate data for each instantiation.
class Node:
def __init__(self, data = []):
self.data = data
def main():
a = Node()
a.data.append('a-data') #only append data to the a instance
b = Node() #shouldn't this be empty?
#a data is as expected
print('number of items in a:', len(a.data))
for item in a.data:
print(item)
#b data includes the data from a
print('number of items in b:', len(b.data))
for item in b.data:
print(item)
if __name__ == '__main__':
main()
However, the second object is created with the data from the first:
>>>
number of items in a: 1
a-data
number of items in b: 1
a-data
You can't use an mutable object as a default value. All objects will share the same mutable object.
Do this.
class Node:
def __init__(self, data = None):
self.data = data if data is not None else []
When you create the class definition, it creates the [] list object. Every time you create an instance of the class it uses the same list object as a default value.
The problem is in this line:
def __init__(self, data = []):
When you write data = [] to set an empty list as the default value for that argument, Python only creates a list once and uses the same list for every time the method is called without an explicit data argument. In your case, that happens in the creation of both a and b, since you don't give an explicit list to either constructor, so both a and b are using the same object as their data list. Any changes you make to one will be reflected in the other.
To fix this, I'd suggest replacing the first line of the constructor with
def __init__(self, data=None):
if data is None:
data = []
When providing default values for a function or method, you generally want to provide immutable objects. If you provide an empty list or an empty dictionary, you'll end up with all calls to that function or method sharing the object.
A good workaround is:
def __init__(self, data = None):
if data == None:
self.data = []
else:
self.data = data

Categories

Resources