I wanted to know whether the below scenario be available using __new__ special method. If so, I would like to hear from stackoverflow. I have class name Listing which reads records from a file and then convert them in a queries. To be concise, initially the snippet reads all the lines from the file and converts them into list of lists. Again, this list of lists are passed to the loadlist method of Event, which reads each list, unpacks and then set them to class attributes.
For Instance, I have the below three records
1|305|8|1851|Gotterdammerung|2008-01-25 14:30:00
2|306|8|2114|Boris Godunov|2008-10-15 20:00:00
3|302|8|1935|Salome|2008-04-19 14:30:0
Here, Listing.py reads the above content and converts them into queries which is given below
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('1','305','8','1851','Gotterdammerung','2008-01-25 14:30:00')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('2','306','8','2114','Boris Godunov','2008-10-15 20:00:00')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('3','302','8','1935','Salome','2008-04-19 14:30:00')
The Whole program of Listing.py
class Event:
def __init__(self,eventid,venueid,catid,dateid,eventname,starttime):
self.eventid = eventid
self.venueid = venueid
self.catid = catid
self.dateid = dateid
self.eventname = eventname
self.starttime = starttime
def __iter__(self):
return (i for i in (self.eventid,self.venueid,self.catid,self.dateid,self.eventname,self.starttime))
def __str__(self):
return str(tuple(self))
def __repr__(self):
return "INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ({!r},{!r},{!r},{!r},{!r},{!r})".format(*self)
#classmethod
def loadlist(cls,records):
return [cls(*record) for record in records]
if __name__ == '__main__':
records = []
with open('tickitdb/allevents_pipe.txt','r') as f:
records = list(map(lambda s:s.rstrip('\n').split('|'),f.readlines()))
events = Event.loadlist(records=records)
with open('events.sql','w+') as f:
print('writing file')
for event in events:
f.write(repr(event)+"\n")
When i ran the program, i came across the below error.
TypeError: __init__() missing 5 required positional arguments:. And i figured out the root cause behind this. When the program reads the file and converts them into list of records, there was record which is empty hasn't, for instance
1.['1','305','8','1851','Gotterdammerung','2008-01-25 14:30:00']
2.['2','306','8','2114','Boris','Godunov','2008-10-15 20:00:00']
3.['3','302','8','1935','Salome','2008-04-19 14:30:0']
4.['']
For the 4th record, there are no values. So, to avoid such errors, i decided to make use of __new__ special method. I can achieve same functionality by putting the if condition and then checking whether the list is empty or not. But then i wondering how to make use of new special method to avoid such scenarios. With little knowledge of python, i have filled the new special method, but then I came across the below error
RecursionError: maximum recursion depth exceeded while calling a Python object
def __new__(cls,*args,**kwargs):
if len(args) != 0:
instance = Event.__new__(cls,*args,**kwargs)
return instance
Can we filter the records using the __new__ special method ?
What you want to do is totally possible. But you will need to initialize the instance by yourself once it returns from new .
I fixed your code as under
Given listing.txt
1|305|8|1851|Gotterdammerung|2008-01-25 14:30:00
2|306|8|2114|Boris Godunov|2008-10-15 20:00:00
3|302|8|1935|Salome|2008-04-19 14:30:0
4|302|8|1935|Salome|2008-04-19 14:30:0
class Event:
def __new__(cls, *args, **kwargs):
breakpoint()
if len(*args) > 1:
instance = object.__new__(cls)
breakpoint()
return instance
else:
return None
def __init__(self,eventid,venueid,catid,dateid,eventname,starttime):
self.eventid = eventid
self.venueid = venueid
self.catid = catid
self.dateid = dateid
self.eventname = eventname
self.starttime = starttime
def __iter__(self):
return (i for i in (self.eventid,self.venueid,self.catid,self.dateid,self.eventname,self.starttime))
def __str__(self):
return str(tuple(self))
def __repr__(self):
return "INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ({!r},{!r},{!r},{!r},{!r},{!r})".format(*self)
#classmethod
def loadlist(cls, records):
breakpoint()
return [cls.__init__(*record) for record in records ]
def initialize(e,eventid,venueid,catid,dateid,eventname,starttime):
e.eventid = eventid
e.venueid = venueid
e.catid = catid
e.dateid = dateid
e.eventname = eventname
e.starttime = starttime
return e
if __name__ == '__main__':
records = []
events = []
with open('listing.txt', 'r') as f:
records = list(map(lambda s: s.rstrip('\n').split('|'), f.readlines()))
for record in records:
breakpoint()
e = Event.__new__(Event, record)
breakpoint()
if e:
events.append(initialize(e, *record))
with open('events.sql','w+') as f:
print('writing file')
for event in events:
f.write(repr(event)+"\n")
OUTPUT
events.sql
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('1','305','8','1851','Gotterdammerung','2008-01-25 14:30:00')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('2','306','8','2114','Boris Godunov','2008-10-15 20:00:00')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('3','302','8','1935','Salome','2008-04-19 14:30:0')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('4','302','8','1935','Salome','2008-04-19 14:30:0')
So I would solve it like this:
class Event:
def __init__(self, a, b):
self.a = a
self.b = b
def __new__(cls, *args, **kwargs):
if len(args) != 0:
return super(Event, cls).__new__(cls)
else:
return None
def print(self):
print("a " + str(self.a))
print("b " + str(self.b))
c = Event(1, 2)
if c is None:
print("do some stuff here if it is empty")
If you initialize Event with no parameters, it would return None according to len(args) != 0. Otherwise the instance is returned. Hope that helps.
I need to be able to build my buildObject using data extracted from csv file columns
class BuildObject(ObjectID):
def __init__(self, ObjectID, ObjectName, ObjectPrice, ObjectLocation, ObjectColour, ObjectAge, ObjectTag):
self.ObjectID= ObjectID
self.ObjectName= ObjectName
def main():
with open(filename1, "r") as csv1, open(filename2, "r") as csv2:
csvReader1 = csv.DictReader(csv1)
csvReader2 = csv.DictReader(csv2)
csvList = []
for row1, row2 in zip(csvReader1, csvReader2):
csvList.append((row2["ObjectName"], row1["ObjectId"], row1["ObjectPrice"]))
return csvList
Comment: My concern is with this answer that it will work fine provided the csv files have the exact same objectID and in the same order - but will happen if a objectID/Object is missing only in one of the csv files?
Therefore, you can't use zip(csvReader1, csvReader2), you
need random access to a Date_Record using the ObjectID as key/index.
As you mentinioned large amounts of data I would recommend go with SQL.
If you want to do it using Python objects change the following:
def __init__(self):
self._data_store = {}
#data_store.setter
def data_store(self, data):
...
self._data_store[record['ObjectID'] = record
Question: The one topic would be the create a BuildObject for every unique itemID using the data from the csv files and sql query
Checking your code, got the following Error:
class BuildObject(ObjectID):
NameError: name 'ObjectID' is not defined
Why do you inherit from ObjectID?
Where are these class defined?
Consider the following:
class Data_Record():
"""
This class object hold all data for ONE Record
"""
def __init__(self, ObjectID, ObjectName):
self.ObjectID= ObjectID
self.ObjectName= ObjectName
# ... (omitted for brevity)
class Data_Store():
"""
This class object handels Data_Record, reading from csv or sql or anywhere
"""
def __init__(self):
# List to hold all Data_Record objects
self._data_store = []
# Access read only the Data_Record objects
#property
def data_store(self):
return self._data_store
# Add ONE Data_Record from either csv or sql or anywhere
#data_store.setter
def data_store(self, data):
# Condition type(data)
if isinstance(data, dict):
record = Data_Record(**data)
elif isinstance(data, list):
record = Data_Record(**tuple(data))
else:
raise(ValueError, "Data of type({}) are not supported!".format(type(data)))
self._data_store.append(record)
# Method to read from csv
def read_csv(self, fname1, fname2):
# ... (omitted for brevity)
csvReader1, csvReader2 = ([], [])
for csv1, csv2 in zip(csvReader1, csvReader2):
self.data_store = (csv2["ObjectName"], csv1["ObjectId"])
# Method to read from sql
def read_sql(self, sql, query):
result = sql.query(query)
for record in result:
self.data_store = record
Alternative: Without #property/getter/setter.
Here the read(... functions have to know how to add a new Date_Record object to self.data_store. Note: self.data_store is now a public attribute.
If you decide, later on, to store not in memory, you have to rewrite both read(... functions.
class Data_Record():
def __init__(self, data=None):
# Condition type(data)
if isinstance(data, dict):
self.ObjectID = data['ObjectID']
self.ObjectName = data['ObjectName']
elif isinstance(data, list):
# List have to be in predefined order
# e.g ObjectID == Index 0 ObjectName == Index 1 etc.
self.ObjectID = data[0]
self.ObjectName = data[1]
else:
self.ObjectID = None
self.ObjectName = None
class Data_Store():
def __init__(self):
self.data_store = []
def read_csv(self, fname1, fname2):
for csv1, csv2 in zip(csvReader1, csvReader2):
self.data_store.append(Data_Record((csv2["ObjectName"], csv1["ObjectId"])))
def read_sql(self, query):
for record in SQL.query(query):
self.data_store.append(Data_Record(record))
I'm trying to refactor a very repetitive section of code.
I have a class that has two instance variables that get updated:
class Alerter(object):
'Sends email regarding information about unmapped positions and trades'
def __init__(self, job):
self.job = job
self.unmappedPositions = None
self.unmappedTrades = None
After my code going through some methods, it creates a table and updates self.unmappedPositions and self.unmappedTrades:
def load_positions(self, filename):
unmapped_positions_table = etl.fromcsv(filename)
if 'positions' in filename:
return self.add_to_unmapped_positions(unmapped_positions_table)
else:
return self.add_to_unmapped_trades(unmapped_positions_table)
So I have two functions that essentially do the same thing:
def add_to_unmapped_trades(self, table):
if self.unmappedTrades:
Logger.info("Adding to unmapped")
self.unmappedTrades = self.unmappedTrades.cat(
table).cache()
else:
Logger.info("Making new unmapped")
self.unmappedTrades = table
Logger.info("Data added to unmapped")
return self.unmappedTrades
And:
def add_to_unmapped_positions(self, table):
if self.unmappedPositions:
Logger.info("Adding to unmapped")
self.unmappedPositions = self.unmappedPositions.cat(
table).cache()
else:
Logger.info("Making new unmapped")
self.unmappedPositions = table
Logger.info("Data added to unmapped")
return self.unmappedPositions
I tried making it one method so that it just passes in a third argument and then figures out what to update. The third argument being the intialized variable, either self.unmappedPositions or self.unmappedTrades. However, that doesn't seem to work. Any other suggestions?
It looks like you've had the key insight that you can write this function independent of any particular storage:
def add_to_unmapped(unmapped, table):
if unmapped:
Logger.info("Adding to unmapped")
unmapped = unmapped.cat(table).cache()
else:
Logger.info("Making new unmapped")
unmapped = table
Logger.info("Data added to unmapped")
return unmapped
This is actually good practice on its own. For instance, you can write unit tests for it, or if you have two tables (as you do) you can just write the implementation for it once.
If you consider what, abstractly, your two add_to_unmapped_* functions do, they:
Compute the new table;
Save the new table in the object; and
Return the new table.
We've now separated out step 1, and you can refactor the wrappers:
class Alerter:
def add_to_unmapped_trades(self, table):
self.unmappedTrades = add_to_unmapped(self.unmappedTrades, table)
return self.unmappedTrades
Here are my averagely general class methods for creating a dictionary from the result of database queries:
def make_schema_dict(self):
schema = [i[2] for i in self.cursor.tables()
if i[2].startswith('tbl_') or i[2].startswith('vw_')]
self.schema = {table: {'scheme': [row.column_name for row
in self.cursor.columns(table)]}
for table in schema}
def last_table_query_as_dict(self, table):
return {'data': [{col: row.__getattribute__(col) for col in self.schema[table]['scheme']
if col != 'RowNum'} for row in self.cursor.fetchall()]}
Unfortunately as you can see, there are many complications.
For example, when multiple tables are queried; some hackish lambdas are required to generate the resulting dictionary.
Can you think of some more general methods?
You should be able to use row.cursor_description to make this a lot simpler. This should get you a list of dictionaries for the results:
[{c[0]: v for (c, v) in zip(row.cursor_description, row)} for row in self.cursor.fetchall()]
A neat solution can be found in this thread: https://groups.google.com/forum/?fromgroups#!topic/pyodbc/BVIZBYGXNsk
The root of the idea being, subclass Connection to use a custom Cursor class, have the Cursor class automatically construct dicts for you. I'd call this a fancy pythonic solution. You could also just have an additional function fetchonedict() and extend the Cursor class rather than override so you could retain default behavior.
class ConnectionWrapper(object):
def __init__(self, cnxn):
self.cnxn = cnxn
def __getattr__(self, attr):
return getattr(self.cnxn, attr)
def cursor(self):
return CursorWrapper(self.cnxn.cursor())
class CursorWrapper(object):
def __init__(self, cursor):
self.cursor = cursor
def __getattr__(self, attr):
return getattr(self.cursor, attr)
def fetchone(self):
row = self.cursor.fetchone()
if not row:
return None
return dict((t[0], value) for t, value in zip (self.cursor.description, row))
Additionally, while not for PyODBC, check out this stackoverflow answer for links to DictCursor classes for MySQL and OurSQL if you need some inspiration for design.