I am trying to parallelise a sampling process, so I created a Sampler object. The Sampler depends on two datasets, which are large (stored as numpy arrays), which are arguments to the constructor. To avoid having duplicates in the object store, my idea has been to first use ray.put to add the object to the object store and then initialise the Sampler objects with the corresponding ids.
Moreover, I don't want to add the decorators to the Sampler class. Instead, I created a subclass of Sampler, RemoteSampler which decorates the methods of the superclass and modifies them by adding the .remote() call. However, I seem to be unable to initialise the superclass from the ActorClass. I get a type error:
TypeError: super() argument 1 must be type, not ActorClass(RemoteSampler).
Skeleton code below:
import ray
import numpy as np
class Sampler(object):
def __init__(self, train_data, d_train_data, *others):
# these can be big, so we want to have only one copy that
# mutliple actors share
if isinstance(train_data, np.ndarray):
self.train_data = train_data
else:
self.train_data = ray.get(train_data)
if isinstance(d_train_data, np.ndarray):
self.d_train_data = d_train_data
else:
self.d_train_data = ray.get(d_train_data)
# Initialise the rest of the sampler state
self.d1 = {}
self.d2 = {}
def __call__(self, features, n_samples):
a, b, c = self._sampling_loop(features, n_samples)
# process a, b, c and return something
return a, b, c, features
def build_lookups(self, X):
self.d1 = {0: X[0]}
self.d2 = {1: X[1]}
return self.d1, self.d2
def _sampling_loop(self, features, n_samples):
# Use train_data, d_train data and other attributes return some data to call
return 0, 0, 0
#ray.remote
class RemoteSampler(Sampler):
def __init__(self, *args):
super(RemoteSampler, self).__init__(*args)
self.__call__ = ray.method(self.__call__, num_return_vals=4)
self.build_lookups = ray.method(self.build_lookups, num_return_vals=2)
def __call__(self, anchor, num_samples):
return self.__call__(anchor, num_samples).remote()
def build_lookups(self, X):
a, b, c = self.build_lookups.remote(X)
return a, b, c
def _fit_parallel(*args):
# method of a class where the RemoteSampler objects are initialised
# copy large objects to object store
train_data, d_train_data, *others = args
train_data_id = ray.put(train_data)
d_train_data_id = ray.put(d_train_data)
n_args = (train_data_id, d_train_data_id, *others)
return [RemoteSampler.remote(*n_args) for _ in range(4)]
Related
I have training and test dataset as Pytorch Tensor objects and I want to change their values from another class, is this somehow possible? The code below does not change their values:
class BO:
def __init__(self):
self.data = DataCollector().df_subset
self.risk = DataCollector().risk_subset.unique(dim=0)
self.device = torch.device('cpu')
self.X_init = None
self.outs_init = torch.zeros(0)
self.y_init = []
if torch.cuda.is_available():
self.device = torch.device('cuda:7')
def create_init_X_Y(self):
'''
Creates the initial X and y values.
'''
model.TrainDataset().df = self.data
_, loss = model.train()
I need to overwrite the value of model.TrainDataset().df with the value of self.data.
While autograd's hvp tool seems to work very well for functions, once a model becomes involved, Hessian-vector products seem to go to 0. Some code.
First, I define the world's simplest model:
class SimpleMLP(nn.Module):
def __init__(self, in_dim, out_dim):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(in_dim, out_dim),
)
def forward(self, x):
'''Forward pass'''
return self.layers(x)
Then, a loss function:
def objective(x):
return torch.sum(0.25 * torch.sum(x)**4)
We instantiate it:
Arows = 2
Acols = 2
mlp = SimpleMLP(Arows, Acols)
Finally, I'm going to define a "forward" function (distinct from the model's forward function) that will serve as the the full model+loss that we want to analyze:
def forward(*params_list):
for param_val, model_param in zip(params_list, mlp.parameters()):
model_param.data = param_val
x = torch.ones((Arows,))
return objective(mlp(x))
This passes a ones vector into the single-layer "mlp," and passes it into our quadratic loss.
Now, I attempt to compute:
v = torch.ones((6,))
v_tensors = []
idx = 0
#this code "reshapes" the v vector as needed
for i, param in enumerate(mlp.parameters()):
numel = param.numel()
v_tensors.append(torch.reshape(torch.tensor(v[idx:idx+numel]), param.shape))
idx += numel
And finally:
param_tensors = tuple(mlp.parameters())
reshaped_v = tuple(v_tensors)
soln = torch.autograd.functional.hvp(forward, param_tensors, v=reshaped_v)
But, alas, the Hessian-Vector Product in soln is all 0's. What is happening?
What's happening is that strict is False by default in the hvp() function and a tensor of 0's is returned as the Hessian Vector Product instead of an error (source).
If you try with strict=True, an error RuntimeError: The output of the user-provided function is independent of input 0. This is not allowed in strict mode. is returned instead. And when I looked at the full error, I suspect that this error comes from _check_requires_grad(jac, "jacobian", strict=strict) which indicates that the jacobian jac is None.
Update:
Following is a full working example:
import torch
from torch import nn
# your loss function
def objective(x):
return torch.sum(0.25 * torch.sum(x)**4)
# Following are utilities to make nn.Module functional
# borrowed from the link I posted in comment
def del_attr(obj, names):
if len(names) == 1:
delattr(obj, names[0])
else:
del_attr(getattr(obj, names[0]), names[1:])
def set_attr(obj, names, val):
if len(names) == 1:
setattr(obj, names[0], val)
else:
set_attr(getattr(obj, names[0]), names[1:], val)
def make_functional(mod):
orig_params = tuple(mod.parameters())
# Remove all the parameters in the model
names = []
for name, p in list(mod.named_parameters()):
del_attr(mod, name.split("."))
names.append(name)
return orig_params, names
def load_weights(mod, names, params):
for name, p in zip(names, params):
set_attr(mod, name.split("."), p)
# your forward function with update
def forward(*new_params):
# this line replace your for loop
load_weights(mlp, names, new_params)
x = torch.ones((Arows,))
out = mlp(x)
loss = objective(out)
return loss
# your simple MLP model
class SimpleMLP(nn.Module):
def __init__(self, in_dim, out_dim):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(in_dim, out_dim),
)
def forward(self, x):
'''Forward pass'''
return self.layers(x)
if __name__ == '__main__':
# your model instantiation
Arows = 2
Acols = 2
mlp = SimpleMLP(Arows, Acols)
# your vector computation
v = torch.ones((6,))
v_tensors = []
idx = 0
#this code "reshapes" the v vector as needed
for i, param in enumerate(mlp.parameters()):
numel = param.numel()
v_tensors.append(torch.reshape(torch.tensor(v[idx:idx+numel]), param.shape))
idx += numel
reshaped_v = tuple(v_tensors)
#make model's parameters functional
params, names = make_functional(mlp)
params = tuple(p.detach().requires_grad_() for p in params)
#compute hvp
soln = torch.autograd.functional.vhp(forward, params, reshaped_v, strict=True)
print(soln)
Did you try it with doubles instead of floats? I did some tests on my own that showed fairly large error when backproping with 32 bit float (on the order of 1e-5) compared to doubles.
I am trying to figure out why the following code does not work
from functools import lru_cache
import numpy as np
class Mask(dict):
def __init__(self, shape, data = None):
super().__init__(data)
self.shape = shape
#lru_cache(maxsize=1)
def tomatrix(self):
dense = np.zeros(self.shape[0] * self.shape[1])
for uid, entries in self.items():
dense[entries] = uid
return np.reshape(dense, self.shape)
cells = {i : np.arange(i * 10, (i + 1) * 10) for i in range(10)}
mask = Mask((10, 10), cells)
r1 = mask.tomatrix()
r2 = mask.tomatrix()
The error says
TypeError: unhashable type: 'Mask'
as if lru_cache tries to cache self in self.tomatrix().
On the other hand, if I don't subclass from dict and instead have an internal member self.data that stores the actual data, the LRU wrapper does not complain.
Code that works:
from functools import lru_cache
import numpy as np
class Mask:
def __init__(self, shape, data = None):
self.data = data
self.shape = shape
#lru_cache(maxsize=1)
def tomatrix(self):
dense = np.zeros(self.shape[0] * self.shape[1])
for uid, entries in self.data.items():
dense[entries] = uid
return np.reshape(dense, self.shape)
cells = {i : np.arange(i * 10, (i + 1) * 10) for i in range(10)}
mask = Mask((10, 10), cells)
r1 = mask.tomatrix()
r2 = mask.tomatrix()
Anyone can help me figure out this mystery? I'd like to continue subclassing from dict and I need to use LRU caching.
The solution is to get the default __hash__ function defined for user classes that inherits from object. dict defines its own __eq__, but no __hash__ which is therefore blocked.
Adding the line
__hash__ = object.__hash__
to Mask(dict) works.
Many thanks to this answer to another question!
Earlier this week I asked a generic question in a related SO community regarding constructing mathematical trees using OOP. The main takeaway was that the Composite and Interpreter patterns were the go-to patterns for this kind of application.
I then spent several days looking around online for resources on how these are constructed. I'm still convinced that I do not need to construct an entire interpreter and that a composite might be sufficient for my purposes.
From the other question I was trying to construct this tree:
Without using OOP, I'd probably do something like this:
import numpy as np
def root(B, A):
return B+A
def A(x,y,z):
return x*np.log(y)+y**z
def B(alpha, y):
return alpha*y
def alpha(x,y,w):
return x*y+w
if __name__=='__main__':
x,y,z,w = 1,2,3,4
result = root(B(alpha(x,y,w),y), A(x,y,z))
This would give a correct result of 20.693147180559947. I tried to use the composite pattern to do something similar:
class ChildElement:
'''Class representing objects at the bottom of the hierarchy tree.'''
def __init__(self, value):
self.value = value
def __repr__(self):
return "class ChildElement with value"+str(self.value)
def component_method(self):
return self.value
class CompositeElement:
'''Class representing objects at any level of the hierarchy tree except for the bottom level.
Maintains the child objects by adding and removing them from the tree structure.'''
def __init__(self, func):
self.func = func
self.children = []
def __repr__(self):
return "class Composite element"
def append_child(self, child):
'''Adds the supplied child element to the list of children elements "children".'''
self.children.append(child)
def remove_child(self, child):
'''Removes the supplied child element from the list of children elements "children".'''
self.children.remove(child)
def component_method(self):
'''WHAT TO INCLUDE HERE?'''
if __name__=='__main__':
import numpy as np
def e_func(A, B):
return A+B
def A_func(x,y,z):
return x*np.log(y)+y**z
def B_func(alpha,y):
return alpha*y
def alpha_func(x,y,w):
return x*y+w
x = ChildElement(1)
y = ChildElement(2)
z = ChildElement(3)
w = ChildElement(4)
e = CompositeElement(e_func)
A = CompositeElement(A_func)
B = CompositeElement(B_func)
alpha = CompositeElement(alpha_func)
e.children = [A, B]
A.children = [x, y, z]
B.children = [alpha, y]
alpha.children = [x, y, w]
e.component_method()
I got stuck in the last line, however. It seems that if I call the component_method at the level of composite class instance e, it will not work, since the architecture is not built to handle adding two Child or Composite objects.
How can I get this to work? What should the component_method for my CompositeElement class contain?
def component_method(self):
values = [child.component_method() for child in self.children]
return self.func(*values)
This will evaluate the child nodes and pass the values to the function of the node itself, returning the value.
In Python what is the idiomatic way for initializing Python's instance variable:
class Test:
def __init__(self, a, b, c, d):
self.a = a
self.b = b
self.c = c
self.d = d
Or
class Test2:
def __init__(self, data):
self.a = data[0]
self.b = data[1]
self.c = data[2]
self.d = data[3]
UPDATE: I have around 20 instance variables for a class named Link:
self.street
self.anode
self.bnode
self.length
self.setbackA
self.setbackB
self.bearingA
self.bearingB
self.ltype
self.lanesAB
self.leftAB
self.rightAB
self.speedAB
self.fspdAB
self.capacityAB
self.lanesBA
self.leftBA
self.rightBA
self.speedBA
self.fspdBA
self.capacityBA
self.use
Each variable is related to the class Link. Is there a recommended way of refactoring this?
The former, since it's more explicit about what the parameters are and what the object requires.
If you do need to pass in your data as a tuple, there's a shortcut you can use. Instead of doing the latter, or something like:
test = Test(data[0], data[1], data[2], data[3])
You can instead unpack the list/tuple and do:
test = Test(*data)
If you need to pass in a bunch of data (more then 4-5), you should look into either using optional/keyword arguments, creating a custom object to hold some of the data, or using a dictionary:
config = Config(a, b, c, d)
test = Test(e, f, config, foo=13, bar=True)
I would probably refactor your Link class to look like this:
class Node(object):
def __init__(self, node, setback, bearing):
self.node = node
self.setback = setback
self.bearing = bearing
class Connection(object):
def __init__(self, lanes, left, right, speed, fspd, capacity):
self.lanes = lanes
self.left = left
self.right = right
self.speed = speed
self.fspd = fspd
self.capacity = capacity
class Link(object):
def __init__(self, street, length, ltype, use, a, b, ab, ba):
self.street = street
self.length = length
self.ltype = ltype
self.use = use
self.a = a
self.b = b
self.ab = ab
self.ba = ba
I saw that you had some duplicate data, so pulled those off into a separate object. While this doesn't reduce on the number of fields you have, overall, it does make the parameters you need to pass in smaller.
Having a large number of fields isn't bad, but having a large number of parameters generally is. If you can write your methods in a way that they don't need a huge amount of parameters by bundling together data, then it doesn't really matter how many fields you have.
Unpacking an array into a bunch of named variables suggests you should have started with named variables in the first place - stick with the first one.
There is only one reason you might want the second one here - you have something that is inconsiderately producing lists rather than objects. If that does happen:
data = get_data_in_list_form()
actual_data = Test(*data)
Can you group some of your data:
self.street
self.ltype
self.use
self.length
# .a and .b can be instances of NodeConnection
self.a.setback
self.a.bearing
self.b.setback
self.b.bearing
self.b.node
# .ab and .ba can be a separate class, "UniDirectionalLink
self.ab.lanes
self.ab.left
self.ab.right
self.ab.speed
self.ab.fspd
self.ab.capacity
self.ba.lanes
self.ba.left
self.ba.right
self.ba.speed
self.ba.fspd
self.ba.capacity
There's no need to do everything in a constructor here:
link = (
Link(street=..., ltype=..., use=..., length=...)
.starting_at(node_a, bearing=..., setback=...)
.finishing_at(node_b, bearing=..., setback=...)
.forward_route(lanes, left, right, speed, fspd, capacity)
.reverse_route(lanes, left, right, speed, fspd, capacity)
)