I'm trying to refactor a very repetitive section of code.
I have a class that has two instance variables that get updated:
class Alerter(object):
'Sends email regarding information about unmapped positions and trades'
def __init__(self, job):
self.job = job
self.unmappedPositions = None
self.unmappedTrades = None
After my code going through some methods, it creates a table and updates self.unmappedPositions and self.unmappedTrades:
def load_positions(self, filename):
unmapped_positions_table = etl.fromcsv(filename)
if 'positions' in filename:
return self.add_to_unmapped_positions(unmapped_positions_table)
else:
return self.add_to_unmapped_trades(unmapped_positions_table)
So I have two functions that essentially do the same thing:
def add_to_unmapped_trades(self, table):
if self.unmappedTrades:
Logger.info("Adding to unmapped")
self.unmappedTrades = self.unmappedTrades.cat(
table).cache()
else:
Logger.info("Making new unmapped")
self.unmappedTrades = table
Logger.info("Data added to unmapped")
return self.unmappedTrades
And:
def add_to_unmapped_positions(self, table):
if self.unmappedPositions:
Logger.info("Adding to unmapped")
self.unmappedPositions = self.unmappedPositions.cat(
table).cache()
else:
Logger.info("Making new unmapped")
self.unmappedPositions = table
Logger.info("Data added to unmapped")
return self.unmappedPositions
I tried making it one method so that it just passes in a third argument and then figures out what to update. The third argument being the intialized variable, either self.unmappedPositions or self.unmappedTrades. However, that doesn't seem to work. Any other suggestions?
It looks like you've had the key insight that you can write this function independent of any particular storage:
def add_to_unmapped(unmapped, table):
if unmapped:
Logger.info("Adding to unmapped")
unmapped = unmapped.cat(table).cache()
else:
Logger.info("Making new unmapped")
unmapped = table
Logger.info("Data added to unmapped")
return unmapped
This is actually good practice on its own. For instance, you can write unit tests for it, or if you have two tables (as you do) you can just write the implementation for it once.
If you consider what, abstractly, your two add_to_unmapped_* functions do, they:
Compute the new table;
Save the new table in the object; and
Return the new table.
We've now separated out step 1, and you can refactor the wrappers:
class Alerter:
def add_to_unmapped_trades(self, table):
self.unmappedTrades = add_to_unmapped(self.unmappedTrades, table)
return self.unmappedTrades
Related
I'm new to OOP python and trying to understand how to handle instances, I have a method:
class Object:
things = []
def __init__(self, table):
self.table = table
self.things.append(table)
( ... )
def thingy(self):
return self.db.execute(f"select date, p1, p2 from {self.table}")
def all_things(self):
self.things.extend(
map(lambda t: Object(thing=t + '_thing').thingy(), Constants.THINGS))
return self.things
Now how would I call this object, because my thing is driven by a list from Constants.THINGS, I.E: THINGS = ["table1", "table2" ... ], but in order to create the object to call the method all_things() - I must have a thing set - even tho the method sets the thing on call ...
This feels a little backward, so would appreciate what it is I am misunderstanding as I think I need to change the constructor/object
a = Object(end_date="2020-01-05",
start_date="2020-01-01",
thing=WHAT_TO_PUT_HERE).all_things()
If I add anything to this thing I get a double output
Any help is appreciated
UPDATE:
The desired output would be that thing() will fire, based on a list input provided by Constants.THINGS, if we input: THINGS = ["table1", "table2"] we would expect thingy() to execute twice with:
select date, p1, p2 from table1,
select date, p1, p2 from table2
And this would be added to the things class variable, and then when all_things() finishes we should have the content of the two select statements in a list
However,
Object.things
will actually have [WHAT_TO_PUT_HERE, table_1, table2]
So according to your update, this is what I think you're attempting to do.
class Object:
def __init__(self):
# do some initialization
pass
def thingy(self, table):
return self.db.execute(f"select date, p1, p2 from {table}")
# call the method "thingy" on all Constants.THINGS
def all_things(self):
map(self.thingy, Constants.THINGS)
Then from outside the class you would call it like this.
my_instance = Object()
my_instance.all_things()
I'm assuming the class will also have some setup and teardown of your db connection. As well as some other things but this is simply a minimalistic attempt at giving an example of how it should work.
Okay, so rather than having a class variable which #Axe319 informed me doesn't get reset with every instance as self.table would. I altered the constructor to just have:
class Object:
def __init__(self, table):
self.table = table
self.things = list()
Then when I call the particular method outside the class:
all_things() I can just pass None into the table as the method builds that for me. i.e:
a = Object(thing=None).all_things()
This might be an anti-pattern - again I'm new to OOP, but it's creating something that looks correct.
P.S yes I agree, things, thingy, and the thing was a bad choice for variables for this question...
Thanks
I have a utility module that I use to provide data to other scripts. I can't get my head around the best way of utilising this whilst minimising the amount of function calls (which are all, for the sake of the argument, slow).
It looks something like this:
helper.py
dataset1 = slow_process1()
dataset2 = slow_process2()
def get_specific_data1():
data = #do stuff with dataset1
return data
def get_specific_data2():
data = #do stuff with dataset1
return data
def get_specific_data3():
data = #do stuff with dataset2
return data
Now, say I need to run get_specific_data1 in a script. In the setup above, I'm importing the module, which means I call slow_process2 on import, unnecessarily.
If I nest the assignment of dataset1 and dataset2, but then need to call get_specific_data1 and get_specific_data2 in the same script, I run slow_process1 twice, which again is unnecessary.
If I create a Helper class with methods for the get_specific_data functions, which runs slow_process1 or slow_process2 if required, stores the data, and then can access as required when methods are called I can get around this. Is that appropriate?
Something like:
class Helper:
def __init__(self):
self.dataset1 = None
self.dataset2 = None
def run_dataset1():
self.dataset1 = slow_process1()
def run_dataset2():
self.dataset2 = slow_process2()
def get_specific_data1():
if dataset1 is None:
self.rundataset1()
data = #do stuff with dataset1
return data
etc
Apologies if this is a stupid question, but I have limited experience with OOP and don't want to make mistakes up front.
Thanks
This is what I meant about using a class with properties, only in this case I've used a custom version of one named lazyproperty. It's considered "lazy" because it only gets computed when when it's accessed, like a regular property, but unlike them, the computed value is effectively cached in a way—changing it into a instance attribute—so it won't be re-computed every time.
Caveat: Doing this assumes that the value would be the same no matter when it was calculated and any changes made to it after the first access will be visible to other methods of the same instance of the class in which it was used—i.e they won't see a freshly re-computed value.
Once this is done, the methods in the class can just reference self.dataset1 or self.dataset2 as though they were regular instance attributes, and then, if it's the first time, the data associated with it will be computed, otherwise the value previously created value will simply be returned. You can see this happening in the output produced (shown far below).
# From the book "Python Cookbook" 3rd Edition.
class lazyproperty:
def __init__(self, func):
self.func = func
def __get__(self, instance, cls):
if instance is None:
return self
else:
value = self.func(instance)
setattr(instance, self.func.__name__, value)
return value
def slow_process1():
print('slow_process1() running')
return 13
def slow_process2():
print('slow_process2() running')
return 21
class Helper:
def __init__(self):
""" Does nothing - so not really needed. """
pass
#lazyproperty
def dataset1(self):
return slow_process1()
#lazyproperty
def dataset2(self):
return slow_process2()
def process_data1(self):
print('self.dataset1:', self.dataset1) # doing stuff with dataset1
return self.dataset1 * 2
def process_data2(self):
print('self.dataset2:', self.dataset2) # doing stuff with dataset2
return self.dataset2 * 2
def process_data3(self):
print('self.dataset2:', self.dataset2) # also does stuff with dataset2
return self.dataset2 * 3
if __name__ == '__main__':
helper = Helper()
print(helper.process_data1()) # Will cause slow_process1() to be called
print(helper.process_data2()) # Will cause slow_process2() to be called
print(helper.process_data3()) # Won't call slow_process2() again
Output:
slow_process1() running
self.dataset1: 13
26
slow_process2() running
self.dataset2: 21
42
self.dataset2: 21
63
You might be able to solve this with a lazy loading technique:
dataset1 = None
dataset2 = None
def ensureDataset1():
global dataset1
if dataset1 is None:
dataset1 = slow_process1()
def ensureDataset2():
global dataset2
if dataset2 is None:
dataset2 = slow_process2()
def get_specific_data1():
ensureDataset1()
data = #do stuff with dataset1
return data
etc
The side effect here is that if you never get around to examining either of dataset1 or dataset2 they never load.
I have homework that I am stuck on. I have gone as far as I can but I am stuck, can someone point me in the right direction.... I am getting stick in making each data row a new object. Normally i would think I could just iterate over the rows, but that will only return last row
Question:
Modify the classFactory.py source code so that the DataRow class returned by the build_row function has another method:
retrieve(self, curs, condition=None)
self is (as usual) the instance whose method is being called, curs is a database cursor on an existing database connection, and condition (if present) is a string of condition(s) which must be true of all received rows.
The retrieve method should be a generator, yielding successive rows of the result set until it is completely exhausted. Each row should be a new object of type DataRow.
This is what I have------
the test:
import unittest
from classFactory import build_row
class DBTest(unittest.TestCase):
def setUp(self):
C = build_row("user", "id name email")
self.c = C([1, "Steve Holden", "steve#holdenweb.com"])
def test_attributes(self):
self.assertEqual(self.c.id, 1)
self.assertEqual(self.c.name, "Steve Holden")
self.assertEqual(self.c.email, "steve#holdenweb.com")
def test_repr(self):
self.assertEqual(repr(self.c),
"user_record(1, 'Steve Holden', 'steve#holdenweb.com')")
if __name__ == "__main__":
unittest.main()
the script I am testing
def build_row(table, cols):
"""Build a class that creates instances of specific rows"""
class DataRow:
"""Generic data row class, specialized by surrounding function"""
def __init__(self, data):
"""Uses data and column names to inject attributes"""
assert len(data)==len(self.cols)
for colname, dat in zip(self.cols, data):
setattr(self, colname, dat)
def __repr__(self):
return "{0}_record({1})".format(self.table, ", ".join([" {0!r}".format(getattr(self, c)) for c in self.cols]))
DataRow.table = table
DataRow.cols = cols.split()
return DataRow
It should roughly be something like the following:
def retrieve(self, curs, condition=None):
query_ = "SELECT * FROM rows"
if condition is not None:
query_ += " %s" %condition
curs.execute(query_)
for row in curs.fetchall(): # iterate over the retrieved results
yield row # and yield each row in turn
Iterate over the rows as normal, but use yield instead of return.
I'm using Python 3, but the question isn't really tied to the specific language.
I have class Table that implements a table with a primary key. An instance of that class contains the actual data (which is very large).
I want to allow users to create a sub-table by providing a filter for the rows of the Table. I don't want to copy the table, so I was planning to keep in the sub-table just the subset of the primary keys from the parent table.
Obviously, the sub-table is just a view into the parent table; it will change if the parent table changes, will become invalid if the parent table is destroyed, and will lose some of its rows if they are deleted from the parent table. [EDIT: to clarify, if parent table is changed, I don't care what happens to the sub-table; any behavior is fine.]
How should I connect the two classes? I was thinking of:
class Subtable(Table):
def __init__(self, table, filter_function):
# ...
My assumption was that Subtable keeps the interface of Table, except slightly overrides the inherited methods just to check if the row is in. Is this a good implementation?
The problem is, I'm not sure how to initialize the Subtable instance given that I don't want to copy the table object passed to it. Is it even possible?
Also I was thinking to give class Table an instance method that returns Subtable instance; but that creates a dependency of Table on Subtable, and I guess it's better to avoid?
I'm going to use the following (I omitted many methods such as sort, which work quite well in this arrangement; also omitted error handling):
class Table:
def __init__(self, *columns, pkey = None):
self.pkey = pkey
self.__columns = columns
self.__data = {}
def __contains__(self, key):
return key in self.__data
def __iter__(self):
for key in self.__order:
yield key
def __len__(self):
return len(self.__data)
def items(self):
for key in self.__order:
yield key, self.__data[key]
def insert(self, *unnamed, **named):
if len(unnamed) > 0:
row_dict = {}
for column_id, column in enumerate(self.__columns):
row_dict[column] = unnamed[column_id]
else:
row_dict = named
key = row_dict[self.pkey]
self.__data[key] = row_dict
class Subtable(Table):
def __init__(self, table, row_filter):
self.__order = []
self.__data = {}
for key, row in table.items():
if row_filter(row):
self.__data[key] = row
Essentially, I'm copying the primary keys only, and create references to the data tied to them. If a row in the parent table is destroyed, it will still exist in the sub-table. If a row is modified in the parent table, it is also modified in the sub-table. This is fine, since my requirements was "anything goes when parent table is modified".
If you see any issues with this design, let me know please.
I'm trying to figure out the best way to create a class that can modify and create new users all in one. This is what I'm thinking:
class User(object):
def __init__(self,user_id):
if user_id == -1
self.new_user = True
else:
self.new_user = False
#fetch all records from db about user_id
self._populateUser()
def commit(self):
if self.new_user:
#Do INSERTs
else:
#Do UPDATEs
def delete(self):
if self.new_user == False:
return False
#Delete user code here
def _populate(self):
#Query self.user_id from database and
#set all instance variables, e.g.
#self.name = row['name']
def getFullName(self):
return self.name
#Create a new user
>>u = User()
>>u.name = 'Jason Martinez'
>>u.password = 'linebreak'
>>u.commit()
>>print u.getFullName()
>>Jason Martinez
#Update existing user
>>u = User(43)
>>u.name = 'New Name Here'
>>u.commit()
>>print u.getFullName()
>>New Name Here
Is this a logical and clean way to do this? Is there a better way?
Thanks.
You can do this with metaclasses. Consider this :
class MetaCity:
def __call__(cls,name):
“”“
If it’s in the database, retrieve it and return it
If it’s not there, create it and return it
““”
theCity = database.get(name) # your custom code to get the object from the db goes here
if not theCity:
# create a new one
theCity = type.__call__(cls,name)
return theCity
class City():
__metaclass__ = MetaCity
name = Field(Unicode(64))
Now you can do things like :
paris = City(name=u"Paris") # this will create the Paris City in the database and return it.
paris_again = City(name=u"Paris") # this will retrieve Paris from the database and return it.
from : http://yassinechaouche.thecoderblogs.com/2009/11/21/using-beaker-as-a-second-level-query-cache-for-sqlalchemy-in-pylons/
Off the top of my head, I would suggest the following:
1: Use a default argument None instead of -1 for user_id in the constructor:
def __init__(self, user_id=None):
if user_id is None:
...
2: Skip the getFullName method - that's just your Java talking. Instead use a normal attribute access - you can convert it into a property later if you need to.
What you are trying to achieve is called Active Record pattern. I suggest learning existing systems providing this sort of things such as Elixir.
Small change to your initializer:
def __init__(self, user_id=None):
if user_id is None: