How to convert a function based data pipeline to OOP? - python

I am doing some data processing and have built several pipelines, each consisting of multiple functions that broadly modify dictionaries at each step. As the different pipelines operate on the same data and have similar functions I've been trying to convert it into a more OOP orientated structure. However, before I get started I've been tying myself up in knots slightly.
Take the following simplified example:
for f in foos:
y = extract_y_info(f)
z = extract_z_info(f)
*some code that does something with y and z*
def extract_y_info(f):
return *some code that extracts y info from f*
def extract_z_info(f):
return *some code that extracts z info from f*
To me there seems to be a couple of ways I could approach moving this to an OOP structure. The first is quite similar to the function by function approach.
class foo():
def __init__(self, x):
self.x = x
def extract_y_info(self):
return *some code that extracts y info from self.x*
def extract_z_info(self):
return *some code that extracts z info from self.x*
for f in foo_instances:
y = b.extract_y_info()
z = b.extract_z_info()
*some code that does something with y and z*
The other option is modifying the instances of the class:
class foo():
def __init__(self, x):
self.x = x
def extract_y_info(self):
self.y = *some code that extracts y info from self.x*
def extract_z_info(self):
self.z = *some code that extracts z info from self.x*
for f in foo_instances:
f.extract_y_info()
f.extract_z_info()
*some code that does something with f.y and f.z*
Is either of these options better practice than the other? Is there a better third way?

It really depends on what your overall design is and what state do you expect your instances to be at any given time and what you do with it (in other words, is existence of y attribute itself meanigul, but... former seems generally safer to me. You call and you get a value, you don't have to keep track, have I called the method and what state is this or that attribute in? Note though, you should really define instance attributes in the constructor, otherwise the access could be not just surprising, but fatal (AttributeError).
Now a neat solution addressing some of the above point and possibly fitting to what you seem to be doing here to access the values could be a property, which essentially allows you to access value returned by a method as if it was a instance attribute:
class foo():
def __init__(self, x):
self.x = x
def extract_y_info(self):
return #some code that extracts y info from self.x
y = property(extract_y_info)
for f in foo_instances:
print(f"value of f.y = {f.y}")
Or you can do the same using property as method decorator:
#property
def y(self):
return #some code that extracts y info from self.x
If getting y was expensive and its value does not change across the life of instance, starting Python 3.8 you can also use cached_property.

Related

in python, is it a bad practice to use a property which doesn't return an instance variable?

Ok, let's say that I want to create a Vector2d in python to represent two-dimensional vectors. Now, I want to have a property to get the magnitude of the vector, so I code the next class:
class Vector2d:
def __init__(self, x, y):
self._x = x
self._y = y
#property
def magnitude(self):
return (self._x**2 + self._y**2)**(1/2)
I know I could use normal methods to do things like this, but I think that this way is more simple to use my class, so the question is, is it considered a bad practice?

Changing variable for all objects of one class

I am new to Python and I run into a problem.
To keep it simple instead of sending you all my code I will generalize it a bit.
I want to create a class "object" with an x and y coordinate.
class Object():
def __init__(self, x, y):
self.x = x
self.y = y
object_1 = Object(0,0)
object_2 = Object(20, 20)
object_3 = Object(100, 100)
I created three objects of class Object(), each with an individual x and y coordinate. Now I want to add e.g. 5 to all three x coordinate without typing it manually. Is there a smarter way to add 5 to all members of this class?
i am sorry in case my question was answered in another post but I could find anything that helps me. Thank you in advance
To avoid doing it individually, do it in group, you could build a static method in the class that adds a value to the x of all given objects
#staticmethod
def addX(value, *items):
for item in items:
item.x += value
# call
Object.addX(5, object_1, object_2, object_3)
Code Demo
Save all of your obstacles in a list when you create them, something like a group.
You can save them in a list or save them inside another class that they are related to.
Then in each desired move execute a loop over that list, something like this:
for obstacle in obstacles:
obstacle.x += 5

Python method/function chaining

In python, is it possible to chain together class methods and functions together? For example, if I want to instantiate a class object and call a method on it that affects an instance variable's state, could I do that? Here is an example:
class Test(object):
def __init__(self):
self.x = 'Hello'
#classmethod
def make_upper(y):
y.x = y.x.upper()
What I'm wanting to do is this:
h = Test().make_upper()
I want to instantiate a class object and affect the state of a variable in one line of code, but I would also like to be able to chain together multiple functions that can affect state or do something else on the object. Is this possible in python like it is in jQuery?
Yes, sure. Just return self from the instance methods you are interested in:
class Test(object):
def __init__(self):
self.x = 'Hello'
def make_upper(self):
self.x = self.x.upper()
return self
def make_lower(self):
self.x = self.x.lower()
return self
h = Test().make_upper()
print(h.x)
Output:
HELLO
Yes and no. The chaining certainly works, but h is the return value of make_upper(), not the object returned by Test(). You need to write this as two lines.
h = Test()
h.make_upper()
However, PEP-572 was recently accepted for inclusion in Python 3.8, which means someday you could write
(h := Test()).make_upper()
The return value of Test() is assigned to h in the current scope and used as the value of the := expression, which then invokes its make_upper method. I'm not sure I would recommend using := in this case, though; the currently required syntax is much more readable.

Self in python Class - I can do it with out it...? [duplicate]

This question already has answers here:
Why do you need explicitly have the "self" argument in a Python method? [duplicate]
(10 answers)
Closed 6 years ago.
Consider this code:
class example(object):
def __init__ (): # No self
test() # No self
def test(x,y): # No self
return x+y
def test1(x,y): # No self
return x-y
print(example.test(10,5))
print(example.test1(10,5))
15
5
This works as expected. I believe I can write a whole program not using self. What am I missing? What is this self; why is it needed in some practical way?
I have read a lot about it - (stack, Python documentation), but I just don't understand why it's needed, since I can obviously create a program without it.
You can perfectly create a program without it. But then you'd be missing one of the key features of classes. If you can do without self, I'd argue you can do without classes and just do something purely with functions :)
Classes allow you to create objects which have a PROPERTY associated to them, and self allows you to access those values. So say you have a square.
g code:
class Square(object):
def __init__ (self, length, height):
self.length = length # THIS square's length, not others
self.height = height # THIS square's height, not other
def print_length_and_height(self):
print(self.length, self.height) # THIS square's length and height
square1 = Square(2,2)
square2 = Square(4,4)
square1.print_length_and_height() # 2 2
square2.print_length_and_height() # 4 4
Now, this example is quite silly, of course, but i think it shows what SELF specifically is for: it refers to the particular instance of an object.
By all means, if you don't see the point to it, just do away with classes and just use functions, there nothing wrong with that.
You haven't utilised a class or object properly. Cutting out the garbage code, your program reduces to:
def test(x,y): #No class
return x+y
def test1(x,y): #No class
return x-y
print(example.test(10,5))
print(example.test1(10,5))
Output:
15
5
Your "class" is no more useful than if you wrapped your program in the nested structures:
if True:
for i in range(1):
...
A proper object will have attributes (data fields) and functions that operate on that data (see below). Your code has an empty object; hence, you have nothing on which to operate, no need for self, and no need for a class at all.
Rather, use a class when you need to encapsulate a data representation and associated operations. Below, I've reused some of your code to make example do some trivial complex number work. There are many extensions and improvements to make in this; I kept it relatively close to your original work.
class example(object):
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
sign = ' + ' if self.b >= 0 else ' - '
return str(self.a) + sign + str(abs(self.b)) + 'i'
def add(self, x):
self.a += x.a
self.b += x.b
def sub(self, x):
self.a -= x.a
self.b -= x.b
complex1 = example(10, 5)
complex2 = example(-3, 2)
complex1.add(complex2)
print(complex1)
complex2.sub(complex1)
print(complex2)
Output:
7 + 7i
-10 - 5i
Are you familiar with Object-Oriented Paradigm?
If you don't you should check it. Python is a Object-Oriented Language and self lets you define your object properties.
An example:
You have a class named Vehicle. A vehicle could be a bike, a car, even a plane. So something you can include is a name and a type.
class Vehicle():
def init(self, name, type): # Constructor
self.name = name
self.type = type
def info(self):
print("I'm a ")
print(self.name)
That's all, now you have a vehicle with name and type. Every instance of Vehicle would have a name and a type different or not and every intance can access its own variables. I'm sorry I can't explain it better. Firstable you need to know Object-Oriented Paradigm knowledge. Please comment my answer if you have doubts & I'll answer you or give a link where it comes explained better.

storing list/tuples in sqlite database with sqlalchemy

I have a class that holds the size and position of something I draw to the screen. I am using sqlalchemy with a sqlite database to persist these objects. However, the position is a 2D value (x and y) and I'd like to have a convienent way to access this as
MyObject.pos # preferred, simpler interface
# instead of:
MyObject.x
MyObject.y # inconvenient
I can use properties but this solution isn't optimal since I cannot query based on the properties
session.query(MyObject).filter(MyObject.pos==some_pos).all()
Is there some way to use collections or association proxies to get the behavior I want?
If you are using PostGIS (Geometry extended version of postgres), you can take advantage of that using GeoAlchemy, which allows you to define Column types in terms of geometric primitives available in PostGIS. One such data type is Point, which is just what it sounds like.
PostGIS is a bit more difficult to set up than vanilla PostgreSQL, but if you actually intend to do queries based on actual geometric terms, it's well worth the extra (mostly one time) trouble.
Another solution, using plain SQLAlchemy is to define your own column types with the desired semantics, and translate them at compile time to more primitive types supported by your database.
Actually, you could use a property, but not with the builtin property decorator. You'd have to have to work a little harder and create your own, custom descriptor.
You probably want a point class. A decent option is actually to use
a namedtuple, since you don't have to worry about proxying assignment
of individual coordinates. The property gets assigned all or nothing
Point = collections.namedtuple('Point', 'x y')
This would let us at least compare point values. The next step in
writing the descriptor is to work through its methods. There are two methods to think about, __get__
and __set__, and with get, two situations, when called on
an instance, and you should handle actual point values, and when
called on the class, and you should turn it into a column expression.
What to return in that last case is a bit tricky. What we want is something
that will, when compared to a point, returns a column expression that equates
the individual columns with the individual coordinates. well make one more
class for that.
class PointColumnProxy(object):
def __init__(self, x, y):
''' these x and y's are the actual sqlalchemy columns '''
self.x, self.y = x, y
def __eq__(self, pos):
return sqlalchemy.and_(self.x == pos.x,
self.y == pos.y)
All that's left is to define the actual descriptor class.
class PointProperty(object):
def __init__(self, x, y):
''' x and y are the names of the coordinate attributes '''
self.x = x
self.y = y
def __set__(self, instance, value):
assert type(value) == Point
setattr(instance, self.x, value.x)
setattr(instance, self.y, value.y)
def __get__(self, instance, owner):
if instance is not None:
return Point(x=getattr(instance, self.x),
y=getattr(instance, self.y))
else: # called on the Class
return PointColumnProxy(getattr(owner, self.x),
getattr(owner, self.y))
which could be used thusly:
Base = sqlalchemy.ext.declarative.declarative_base()
class MyObject(Base):
x = Column(Float)
y = Column(Float)
pos = PointProperty('x', 'y')
Define your table with a PickleType column type. It will then automatically persist Python objects, as long as they are pickleable. A tuple is pickleable.
mytable = Table("mytable", metadata,
Column('pos', PickleType(),
...
)

Categories

Resources