python generator - python

I have homework that I am stuck on. I have gone as far as I can but I am stuck, can someone point me in the right direction.... I am getting stick in making each data row a new object. Normally i would think I could just iterate over the rows, but that will only return last row
Question:
Modify the classFactory.py source code so that the DataRow class returned by the build_row function has another method:
retrieve(self, curs, condition=None)
self is (as usual) the instance whose method is being called, curs is a database cursor on an existing database connection, and condition (if present) is a string of condition(s) which must be true of all received rows.
The retrieve method should be a generator, yielding successive rows of the result set until it is completely exhausted. Each row should be a new object of type DataRow.
This is what I have------
the test:
import unittest
from classFactory import build_row
class DBTest(unittest.TestCase):
def setUp(self):
C = build_row("user", "id name email")
self.c = C([1, "Steve Holden", "steve#holdenweb.com"])
def test_attributes(self):
self.assertEqual(self.c.id, 1)
self.assertEqual(self.c.name, "Steve Holden")
self.assertEqual(self.c.email, "steve#holdenweb.com")
def test_repr(self):
self.assertEqual(repr(self.c),
"user_record(1, 'Steve Holden', 'steve#holdenweb.com')")
if __name__ == "__main__":
unittest.main()
the script I am testing
def build_row(table, cols):
"""Build a class that creates instances of specific rows"""
class DataRow:
"""Generic data row class, specialized by surrounding function"""
def __init__(self, data):
"""Uses data and column names to inject attributes"""
assert len(data)==len(self.cols)
for colname, dat in zip(self.cols, data):
setattr(self, colname, dat)
def __repr__(self):
return "{0}_record({1})".format(self.table, ", ".join([" {0!r}".format(getattr(self, c)) for c in self.cols]))
DataRow.table = table
DataRow.cols = cols.split()
return DataRow

It should roughly be something like the following:
def retrieve(self, curs, condition=None):
query_ = "SELECT * FROM rows"
if condition is not None:
query_ += " %s" %condition
curs.execute(query_)
for row in curs.fetchall(): # iterate over the retrieved results
yield row # and yield each row in turn

Iterate over the rows as normal, but use yield instead of return.

Related

how do I assign data to class which inherits from "sqliteDict" class in python?

In the code below as you can see I have a test class that inherits from the sqliteDict class. There is also a get_term() method that returns the keys for the dictionary. In the main part, first I make an instance of the class and try to make a new sqliteDict file and assign simple data to it through a context manager block. Until now everything works great but when I try to read the data through the second context manager block from the same file, it seems the data is not saved in the file.
from collections import defaultdict
from sqlitedict import SqliteDict
class test(SqliteDict):
def __init__(self, filename: str = "inverted_index.sqlite", new = False):
super().__init__(filename, flag="n" if new else "c")
self._index = defaultdict(list) if new else self
def get_terms(self):
"""Returns all unique terms in the index."""
return self._index.keys()
if __name__ == "__main__":
with test("test.sqlite",new=True) as d:
d._index["test"]= ["ok"]
print("first attempt: ", [t for t in d.get_terms()])
d.commit()
with test("test.sqlite", new=False) as f:
print("second attempt: ",[t for t in f.get_terms()])
and the result is:
first attempt: ['test']
second attempt: []

Need help in implementing __new__ method for the below scenario

I wanted to know whether the below scenario be available using __new__ special method. If so, I would like to hear from stackoverflow. I have class name Listing which reads records from a file and then convert them in a queries. To be concise, initially the snippet reads all the lines from the file and converts them into list of lists. Again, this list of lists are passed to the loadlist method of Event, which reads each list, unpacks and then set them to class attributes.
For Instance, I have the below three records
1|305|8|1851|Gotterdammerung|2008-01-25 14:30:00
2|306|8|2114|Boris Godunov|2008-10-15 20:00:00
3|302|8|1935|Salome|2008-04-19 14:30:0
Here, Listing.py reads the above content and converts them into queries which is given below
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('1','305','8','1851','Gotterdammerung','2008-01-25 14:30:00')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('2','306','8','2114','Boris Godunov','2008-10-15 20:00:00')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('3','302','8','1935','Salome','2008-04-19 14:30:00')
The Whole program of Listing.py
class Event:
def __init__(self,eventid,venueid,catid,dateid,eventname,starttime):
self.eventid = eventid
self.venueid = venueid
self.catid = catid
self.dateid = dateid
self.eventname = eventname
self.starttime = starttime
def __iter__(self):
return (i for i in (self.eventid,self.venueid,self.catid,self.dateid,self.eventname,self.starttime))
def __str__(self):
return str(tuple(self))
def __repr__(self):
return "INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ({!r},{!r},{!r},{!r},{!r},{!r})".format(*self)
#classmethod
def loadlist(cls,records):
return [cls(*record) for record in records]
if __name__ == '__main__':
records = []
with open('tickitdb/allevents_pipe.txt','r') as f:
records = list(map(lambda s:s.rstrip('\n').split('|'),f.readlines()))
events = Event.loadlist(records=records)
with open('events.sql','w+') as f:
print('writing file')
for event in events:
f.write(repr(event)+"\n")
When i ran the program, i came across the below error.
TypeError: __init__() missing 5 required positional arguments:. And i figured out the root cause behind this. When the program reads the file and converts them into list of records, there was record which is empty hasn't, for instance
1.['1','305','8','1851','Gotterdammerung','2008-01-25 14:30:00']
2.['2','306','8','2114','Boris','Godunov','2008-10-15 20:00:00']
3.['3','302','8','1935','Salome','2008-04-19 14:30:0']
4.['']
For the 4th record, there are no values. So, to avoid such errors, i decided to make use of __new__ special method. I can achieve same functionality by putting the if condition and then checking whether the list is empty or not. But then i wondering how to make use of new special method to avoid such scenarios. With little knowledge of python, i have filled the new special method, but then I came across the below error
RecursionError: maximum recursion depth exceeded while calling a Python object
def __new__(cls,*args,**kwargs):
if len(args) != 0:
instance = Event.__new__(cls,*args,**kwargs)
return instance
Can we filter the records using the __new__ special method ?
What you want to do is totally possible. But you will need to initialize the instance by yourself once it returns from new .
I fixed your code as under
Given listing.txt
1|305|8|1851|Gotterdammerung|2008-01-25 14:30:00
2|306|8|2114|Boris Godunov|2008-10-15 20:00:00
3|302|8|1935|Salome|2008-04-19 14:30:0
4|302|8|1935|Salome|2008-04-19 14:30:0
class Event:
def __new__(cls, *args, **kwargs):
breakpoint()
if len(*args) > 1:
instance = object.__new__(cls)
breakpoint()
return instance
else:
return None
def __init__(self,eventid,venueid,catid,dateid,eventname,starttime):
self.eventid = eventid
self.venueid = venueid
self.catid = catid
self.dateid = dateid
self.eventname = eventname
self.starttime = starttime
def __iter__(self):
return (i for i in (self.eventid,self.venueid,self.catid,self.dateid,self.eventname,self.starttime))
def __str__(self):
return str(tuple(self))
def __repr__(self):
return "INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ({!r},{!r},{!r},{!r},{!r},{!r})".format(*self)
#classmethod
def loadlist(cls, records):
breakpoint()
return [cls.__init__(*record) for record in records ]
def initialize(e,eventid,venueid,catid,dateid,eventname,starttime):
e.eventid = eventid
e.venueid = venueid
e.catid = catid
e.dateid = dateid
e.eventname = eventname
e.starttime = starttime
return e
if __name__ == '__main__':
records = []
events = []
with open('listing.txt', 'r') as f:
records = list(map(lambda s: s.rstrip('\n').split('|'), f.readlines()))
for record in records:
breakpoint()
e = Event.__new__(Event, record)
breakpoint()
if e:
events.append(initialize(e, *record))
with open('events.sql','w+') as f:
print('writing file')
for event in events:
f.write(repr(event)+"\n")
OUTPUT
events.sql
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('1','305','8','1851','Gotterdammerung','2008-01-25 14:30:00')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('2','306','8','2114','Boris Godunov','2008-10-15 20:00:00')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('3','302','8','1935','Salome','2008-04-19 14:30:0')
INSERT INTO EVENT (EVENTID,VENUEID,CATID,DATEID,EVENTNAME,STARTTIME) VALUES ('4','302','8','1935','Salome','2008-04-19 14:30:0')
So I would solve it like this:
class Event:
def __init__(self, a, b):
self.a = a
self.b = b
def __new__(cls, *args, **kwargs):
if len(args) != 0:
return super(Event, cls).__new__(cls)
else:
return None
def print(self):
print("a " + str(self.a))
print("b " + str(self.b))
c = Event(1, 2)
if c is None:
print("do some stuff here if it is empty")
If you initialize Event with no parameters, it would return None according to len(args) != 0. Otherwise the instance is returned. Hope that helps.

Python Building Object from two sources

I need to be able to build my buildObject using data extracted from csv file columns
class BuildObject(ObjectID):
def __init__(self, ObjectID, ObjectName, ObjectPrice, ObjectLocation, ObjectColour, ObjectAge, ObjectTag):
self.ObjectID= ObjectID
self.ObjectName= ObjectName
def main():
with open(filename1, "r") as csv1, open(filename2, "r") as csv2:
csvReader1 = csv.DictReader(csv1)
csvReader2 = csv.DictReader(csv2)
csvList = []
for row1, row2 in zip(csvReader1, csvReader2):
csvList.append((row2["ObjectName"], row1["ObjectId"], row1["ObjectPrice"]))
return csvList
Comment: My concern is with this answer that it will work fine provided the csv files have the exact same objectID and in the same order - but will happen if a objectID/Object is missing only in one of the csv files?
Therefore, you can't use zip(csvReader1, csvReader2), you
need random access to a Date_Record using the ObjectID as key/index.
As you mentinioned large amounts of data I would recommend go with SQL.
If you want to do it using Python objects change the following:
def __init__(self):
self._data_store = {}
#data_store.setter
def data_store(self, data):
...
self._data_store[record['ObjectID'] = record
Question: The one topic would be the create a BuildObject for every unique itemID using the data from the csv files and sql query
Checking your code, got the following Error:
class BuildObject(ObjectID):
NameError: name 'ObjectID' is not defined
Why do you inherit from ObjectID?
Where are these class defined?
Consider the following:
class Data_Record():
"""
This class object hold all data for ONE Record
"""
def __init__(self, ObjectID, ObjectName):
self.ObjectID= ObjectID
self.ObjectName= ObjectName
# ... (omitted for brevity)
class Data_Store():
"""
This class object handels Data_Record, reading from csv or sql or anywhere
"""
def __init__(self):
# List to hold all Data_Record objects
self._data_store = []
# Access read only the Data_Record objects
#property
def data_store(self):
return self._data_store
# Add ONE Data_Record from either csv or sql or anywhere
#data_store.setter
def data_store(self, data):
# Condition type(data)
if isinstance(data, dict):
record = Data_Record(**data)
elif isinstance(data, list):
record = Data_Record(**tuple(data))
else:
raise(ValueError, "Data of type({}) are not supported!".format(type(data)))
self._data_store.append(record)
# Method to read from csv
def read_csv(self, fname1, fname2):
# ... (omitted for brevity)
csvReader1, csvReader2 = ([], [])
for csv1, csv2 in zip(csvReader1, csvReader2):
self.data_store = (csv2["ObjectName"], csv1["ObjectId"])
# Method to read from sql
def read_sql(self, sql, query):
result = sql.query(query)
for record in result:
self.data_store = record
Alternative: Without #property/getter/setter.
Here the read(... functions have to know how to add a new Date_Record object to self.data_store. Note: self.data_store is now a public attribute.
If you decide, later on, to store not in memory, you have to rewrite both read(... functions.
class Data_Record():
def __init__(self, data=None):
# Condition type(data)
if isinstance(data, dict):
self.ObjectID = data['ObjectID']
self.ObjectName = data['ObjectName']
elif isinstance(data, list):
# List have to be in predefined order
# e.g ObjectID == Index 0 ObjectName == Index 1 etc.
self.ObjectID = data[0]
self.ObjectName = data[1]
else:
self.ObjectID = None
self.ObjectName = None
class Data_Store():
def __init__(self):
self.data_store = []
def read_csv(self, fname1, fname2):
for csv1, csv2 in zip(csvReader1, csvReader2):
self.data_store.append(Data_Record((csv2["ObjectName"], csv1["ObjectId"])))
def read_sql(self, query):
for record in SQL.query(query):
self.data_store.append(Data_Record(record))

Passing in an instance variable as an argument in a class method?

I'm trying to refactor a very repetitive section of code.
I have a class that has two instance variables that get updated:
class Alerter(object):
'Sends email regarding information about unmapped positions and trades'
def __init__(self, job):
self.job = job
self.unmappedPositions = None
self.unmappedTrades = None
After my code going through some methods, it creates a table and updates self.unmappedPositions and self.unmappedTrades:
def load_positions(self, filename):
unmapped_positions_table = etl.fromcsv(filename)
if 'positions' in filename:
return self.add_to_unmapped_positions(unmapped_positions_table)
else:
return self.add_to_unmapped_trades(unmapped_positions_table)
So I have two functions that essentially do the same thing:
def add_to_unmapped_trades(self, table):
if self.unmappedTrades:
Logger.info("Adding to unmapped")
self.unmappedTrades = self.unmappedTrades.cat(
table).cache()
else:
Logger.info("Making new unmapped")
self.unmappedTrades = table
Logger.info("Data added to unmapped")
return self.unmappedTrades
And:
def add_to_unmapped_positions(self, table):
if self.unmappedPositions:
Logger.info("Adding to unmapped")
self.unmappedPositions = self.unmappedPositions.cat(
table).cache()
else:
Logger.info("Making new unmapped")
self.unmappedPositions = table
Logger.info("Data added to unmapped")
return self.unmappedPositions
I tried making it one method so that it just passes in a third argument and then figures out what to update. The third argument being the intialized variable, either self.unmappedPositions or self.unmappedTrades. However, that doesn't seem to work. Any other suggestions?
It looks like you've had the key insight that you can write this function independent of any particular storage:
def add_to_unmapped(unmapped, table):
if unmapped:
Logger.info("Adding to unmapped")
unmapped = unmapped.cat(table).cache()
else:
Logger.info("Making new unmapped")
unmapped = table
Logger.info("Data added to unmapped")
return unmapped
This is actually good practice on its own. For instance, you can write unit tests for it, or if you have two tables (as you do) you can just write the implementation for it once.
If you consider what, abstractly, your two add_to_unmapped_* functions do, they:
Compute the new table;
Save the new table in the object; and
Return the new table.
We've now separated out step 1, and you can refactor the wrappers:
class Alerter:
def add_to_unmapped_trades(self, table):
self.unmappedTrades = add_to_unmapped(self.unmappedTrades, table)
return self.unmappedTrades

More general way of generating PyODBC queries as a dict?

Here are my averagely general class methods for creating a dictionary from the result of database queries:
def make_schema_dict(self):
schema = [i[2] for i in self.cursor.tables()
if i[2].startswith('tbl_') or i[2].startswith('vw_')]
self.schema = {table: {'scheme': [row.column_name for row
in self.cursor.columns(table)]}
for table in schema}
def last_table_query_as_dict(self, table):
return {'data': [{col: row.__getattribute__(col) for col in self.schema[table]['scheme']
if col != 'RowNum'} for row in self.cursor.fetchall()]}
Unfortunately as you can see, there are many complications.
For example, when multiple tables are queried; some hackish lambdas are required to generate the resulting dictionary.
Can you think of some more general methods?
You should be able to use row.cursor_description to make this a lot simpler. This should get you a list of dictionaries for the results:
[{c[0]: v for (c, v) in zip(row.cursor_description, row)} for row in self.cursor.fetchall()]
A neat solution can be found in this thread: https://groups.google.com/forum/?fromgroups#!topic/pyodbc/BVIZBYGXNsk
The root of the idea being, subclass Connection to use a custom Cursor class, have the Cursor class automatically construct dicts for you. I'd call this a fancy pythonic solution. You could also just have an additional function fetchonedict() and extend the Cursor class rather than override so you could retain default behavior.
class ConnectionWrapper(object):
def __init__(self, cnxn):
self.cnxn = cnxn
def __getattr__(self, attr):
return getattr(self.cnxn, attr)
def cursor(self):
return CursorWrapper(self.cnxn.cursor())
class CursorWrapper(object):
def __init__(self, cursor):
self.cursor = cursor
def __getattr__(self, attr):
return getattr(self.cursor, attr)
def fetchone(self):
row = self.cursor.fetchone()
if not row:
return None
return dict((t[0], value) for t, value in zip (self.cursor.description, row))
Additionally, while not for PyODBC, check out this stackoverflow answer for links to DictCursor classes for MySQL and OurSQL if you need some inspiration for design.

Categories

Resources