checkpointing in python to catch the runtime state - python

I have a problem to make my code more self-healable. Eg: I execute a method 1 to load the data from a CSV into the Vertica database. I have another method 2 to check if the number of rows in the database and the number of lines in CSV file is same. If the number of lines doesn't match, then I was thinking of calling the method 2 from the point where it called the query to load data from CSV into the database.
I was thinking of a checkpointing strategy for this problem. like, maintain some points in the code where the errors usually occur and recalling them at other points.
I already tried using pickle module in python, but came to know that pickle can only save objects, classes, variables etc. can't save the point from where I can actually execute a method.
i have provided some demo code:
import pickle
class Fruits:
def apple(self):
filehandler= open ("Fruits.obj","wb")
print "apple"
pickle.dump(self,filehandler)
print "mapple"
filehandler.close()
def mango(self):
filehandler = open("Fruits.obj","rb")
print "mango"
obj=pickle.load(filehandler)
obj.apple()
general = Fruits()
general.apple()
general.mango()
the output of above program is:
apple
mapple
mango
apple
mapple
I want my code to execute such that when mango method calls apple method, it must execute from the point of only print "mapple". it must not execute the whole method.
please do provide me some insight on how to solve this problem.
thanks in advance

Note:
Your code doesn't work at all. filehandler in def mango(... IS NOT the same as filehandler in def apple(.... Therfore, the file opend in def mango(... are never closed.
Add a if condidtion to def apple, you don't need pickle at all.
def apple(self, mango=False):
if not a´mango:
filehandler= open ("Fruits.obj","wb")
...
print "mapple"
...
def mango(self):
filehandler = open("Fruits.obj","rb")
...
obj.apple(True)

Related

how can i write unit test for function that is making network request without changing it's interface?

I read that Unit tests run fast. If they don’t run fast, they aren’t unit tests. A test is not a unit test if 1. It talks to a database. 2. It communicates across a network. 3. It touches the file system. 4. You have to do special things to your environment (such as editing configuration files) to run it. in Working Effectively with legacy code (book).
I have a function that is downloading the zip from the internet and then converting it into a python object for a particular class.
import typing as t
def get_book_objects(date: str) -> t.List[Book]:
# download the zip with the date from the endpoint
res = requests.get(f"HTTP-URL-{date}")
# code to read the response content in BytesIO and then use the ZipFile module
# to extract data.
# parse the data and return a list of Book object
return books
let's say I want to write a unit test for the function get_book_objects. Then how am I supposed to write a unit test without making a network request? I mean I prefer file system read-over a network request because it will be way faster than making a request to the network although it is written that a good unit test also not touches the file system I will be fine with that.
So even if I want to write a unit test where I can provide a local zip file I have to modify the existing function to open the file from the local file system or I have to add some additional parameter to the function so I can send a zip file path from unit test function.
What will you do to write a good unit test in this kind of situation?
What will you do to write a good unit test in this kind of situation?
In the TDD world, the usual answer would be to delegate the work to a more easily tested component.
Consider:
def get_book_objects(date: str) -> t.List[Book]:
# This is the piece that makes get_book_objects hard
# to isolate
http_get = requests.get
# download the zip with the date from the endpoint
res = http_get(f"HTTP-URL-{date}")
# code to read the response content in BytesIO and then use the ZipFile module
# to extract data.
# parse the data and return a list of Book object
return books
which might then become something like
def get_book_objects(date: str) -> t.List[Book]:
# This is the piece that makes get_book_objects hard
# to isolate
http_get = requests.get
return get_book_objects_v2(http_get, date)
def get_book_objects_v2(http_get, date: str) -> t.List[Book]
# download the zip with the date from the endpoint
res = http_get(f"HTTP-URL-{date}")
# code to read the response content in BytesIO and then use the ZipFile module
# to extract data.
# parse the data and return a list of Book object
return books
get_book_objects is still hard to test, but it is also "so simple that there are obviously no deficiencies". On the other hand, get_book_objects_v2 is easy to test, because your test can control what callable is passed to the subject, and can use any reasonable substitute you like.
What we've done is shift most of the complexity/risk into a "unit" that is easier to test. For the function that is still hard to test, we'll use other techniques.
When authors talk about tests "driving" the design, this is one example - we're treating "complicated code needs to be easy to test" as a constraint on our design.
You've already identified the correct reference (Working Effectively with Legacy Code). The material you want is the discussion of seams.
A seam is a place where you can alter behavior in your program without editing in that place.
(In my edition of the book, the discussion begins in Chapter 4).

How do I get stored dictionaries' values from a shelve-file back into the program

I started learning Python 2 weeks ago, and now I am trying to code a text adventure game. However, I've run into a problem. So far, I haven't found any solution on Google which can help me.
I decided to store basically all relevant variables in dictionaries - feel free to tell me wether that's even a clever idea or rather stupid of me, I actually do not know this, I just thought it might be a solution that works.
Here's my problem: last thing I decided to insert into the program is a save_game() function. So I defined:
def save_game(data):
import shelve
savegame = shelve.open('./save/savegame')
savegame['data'] = data
savegame.close()
And of course, if I then call
save_game(save_game_data)
with save_game_data being the dictionary where I've put all the other dictionaries so I can handle saving with a single function call (I thought that might be better?), it actually works.
But of course a save_game() only makes sense if you can also reload the data into the program.
So I defined:
def load_game(data):
import shelve, time
savegame = shelve.open('./save/savegame')
data = savegame['data']
data = dict(data) # This was inserted because I hoped it would solve my problem, but it doesn't
savegame.close()
But the result of
load_game(save_game_data)
Unfortunately is no updated dictionary save_game_data with all the keys and values, and I just can't get my head around how to get all the stored data back into values in the dictionaries. Maybe I'm on a totally wrong way all together, maybe I just don't know enough about Python yet to even know where I'm erring.
The save_game() and load_game() functions are in a different file from the main file, and are correctly imported if that is relevant.
It looks like you're trying to pass save_game_data to load_game() as if to mean "load data and put it into save_game_data" but this isn't what load_game() is doing. By doing this:
def load_game(data):
import shelve, time
savegame = shelve.open('./save/savegame')
data = (savegame['data'])
You're replacing what data refers to, so save_game_data doesn't get changed.
Instead, you can drop the argument to load_game() and add:
return data
at the end of the function, and call it like this:
save_game_data = load_game()

Python Win32com Excel. Issue with python threading and com object handling

For some days I've been breaking my head over some issues I'm having with a Win32Com Excel object in python.
What I'm trying to do is very simple; get the sheet names from a workbook and write a value to the bottom of a column. The workbook is often opened by the users and needs to remain accessible/editable, hence I decided to use Win32Com. Any other suggestions to achieve the same are very welcome as well.
The base-code is quite straightforward:
class excelCOM():
def __init__(self, file):
self.xl = win32com.client.gencache.EnsureDispatch('Excel.Application')
self.wb = self.xl.Workbooks(os.path.basename(file))
def getSheets(self):
return [s.Name for s in self.wb.Sheets]
def dataToCell(self, sheet, column, string):
sht = self.wb.Sheets(sheet)
maxLastCell = sht.Range('{}{}'.format(column, 1048576))
lastCell = maxLastCell.End(win32com.client.constants.xlUp)
lastCell.Offset(2, 1).Value = string
Issues occur when integrated with the rest of the software. I've gone through lots of documentation and trying different things but I've been unable to get a reliable result. I'll try my best to summarize what I've tried and what the observations were.
When I create a single instance of this class and run the methods, everything works as expected.
if __name__ == '__main__':
xlWb = excelCOM(r"TestBook.xlsx")
print(xlWb.getSheets())
xlWb.dataToCell("Sheet1", 'B', 'All Good')
The information to write to the excel file comes from a logfile that is being written to by another (external) program. My software monitors the file and writes to excel whenever there is a new line added. All of this is handled by the 'processor'. The formatting for the text that's written to excel is user defined. Those formatting settings (and others) are imported to the processor from a pickled (settings) object (allowing the user to save settings (using a GUI) and run the processor without needing the GUI). As code (extremely simplified, just to get the idea across):
class processor():
def __init__(self, settings_object):
self.com_object = excelCOM(settings_object.excel_filepath)
self.file_monitor(settings_object.input_filepath)
self.file_monitor.start()
def dispatch():
# called whenever file_monitor registers a change to the file at input_filepath
last_line = self.get_last_line(input_filepath)
sheet, column, formatted_text = self.process(last_line)
self.com_object.dataToCell(sheet, column, formatted_text)
Now, when text is supposed to be written to the excel file by the processor, I get Exception in thread Thread-1: pywintypes.com_error: (-2147221008, 'CoInitialize has not been called.', None, None) at sht = self.wb.Sheets(sheet)
When I call CoInitialize() before sht = self.wb.Sheets(sheet) I get Exception in thread Thread-1: pywintypes.com_error: (-2147417842, 'The application called an interface that was marshalled for a different thread.', None, None). Calling CoInitialize() anywhere else in the code doesn't seem to do anything.
Adding the same code as in init solves the issue from point 3, but this seems very wrong to me. Mainly because I don't exactly know why it fixes the issue and what's going on in the background with com objects in windows.
def dataToCell(self, sheet, column, string):
pythoncom.CoInitialize() # point 3.
self.xl = win32com.client.gencache.EnsureDispatch('Excel.Application') # point 4.
self.wb = self.xl.Workbooks(os.path.basename(self.file)) # point 4.
sht = self.wb.Sheets(sheet)
maxLastCell = sht.Range('{}{}'.format(column, 1048576))
lastCell = maxLastCell.End(win32com.client.constants.xlUp)
lastCell.Offset(2, 1).Value = string
Now switching to the GUI. As mentioned earlier, the settings file for the processor is created with help from a GUI. It basically gives the functionally to build the settings_object. In the GUI I also have a 'run' button which directly calls processor(settings_object) in a seperate Thread. When I do this, I don't even need to add the additional lines as described in point 3 and 4. It runs perfectly with just the base-code.
I've gone through dozens of pages of documentation, StackOverflow topics, tutorials, blogs hidden in the dark corners of the web, Python Programming on Win32 but I simply can't wrap my head around what's going on. I consider myself a half-decent programmer for 'get the job done' applications but I don't have any education in computer science (or even programming in general), which is what I suspect I'm lacking at the moment.
I'm hoping someone can point me in the right direction for understanding this behavior and maybe give some advice on the proper way of implementing this functionality for my scenario.
Please let me know if you require any more information.
Best regards and many thanks in advance,
RiVer

i have challenges implementing OOP in python

I am facing challenges implementing OOP in python to enable me to call the functions whenever i want , so far i have no syntax errors which makes quite challenging for me . The first part of the code runs ehich is to accept data but the function part does not run.
I have tried different ways of calling the function by creating an instance of it.
print (list)
def tempcheck(self,newList):
temp=newList[0]
if temp==27:
print ("Bedroom has ideal temperature ")
elif temp>=28 or temp<=26:
print ("Bedroom Temperature is not ideal ,either too low or too cold. ")
print ("Please to adjust the temperature to the optimum temperature which is 27 degree Celsuis")
# now to initialize args
def __init__(self,temp,puri1,bedwashroom,newList):
self.temp=temp
self.puri1=puri1
self.bedwashroom=bedwashroom
tempcheck(newList)
# now calling the functions
newvalue=tempcheck(list)
# where list contains the values from the input function.
I expected the function to to check the specific value at the location in the list provided which is called list and also for the function to return a string based on the if statements.
i got it right ,i figured out an alternative to my bug thanks for the critique however any further addition is welcome,
the main goal was to create a function that takes input and passes it to list to be used later i guess this code is less cumbersome
the link to the full code is pasted below

Python object persistence

I'm seeking advice about methods of implementing object persistence in Python. To be more precise, I wish to be able to link a Python object to a file in such a way that any Python process that opens a representation of that file shares the same information, any process can change its object and the changes will propagate to the other processes, and even if all processes "storing" the object are closed, the file will remain and can be re-opened by another process.
I found three main candidates for this in my distribution of Python - anydbm, pickle, and shelve (dbm appeared to be perfect, but it is Unix-only, and I am on Windows). However, they all have flaws:
anydbm can only handle a dictionary of string values (I'm seeking to store a list of dictionaries, all of which have string keys and string values, though ideally I would seek a module with no type restrictions)
shelve requires that a file be re-opened before changes propagate - for instance, if two processes A and B load the same file (containing a shelved empty list), and A adds an item to the list and calls sync(), B will still see the list as being empty until it reloads the file.
pickle (the module I am currently using for my test implementation) has the same "reload requirement" as shelve, and also does not overwrite previous data - if process A dumps fifteen empty strings onto a file, and then the string 'hello', process B will have to load the file sixteen times in order to get the 'hello' string. I am currently dealing with this problem by preceding any write operation with repeated reads until end of file ("wiping the slate clean before writing on it"), and by making every read operation repeated until end of file, but I feel there must be a better way.
My ideal module would behave as follows (with "A>>>" representing code executed by process A, and "B>>>" code executed by process B):
A>>> import imaginary_perfect_module as mod
B>>> import imaginary_perfect_module as mod
A>>> d = mod.load('a_file')
B>>> d = mod.load('a_file')
A>>> d
{}
B>>> d
{}
A>>> d[1] = 'this string is one'
A>>> d['ones'] = 1 #anydbm would sulk here
A>>> d['ones'] = 11
A>>> d['a dict'] = {'this dictionary' : 'is arbitrary', 42 : 'the answer'}
B>>> d['ones'] #shelve would raise a KeyError here, unless A had called d.sync() and B had reloaded d
11 #pickle (with different syntax) would have returned 1 here, and then 11 on next call
(etc. for B)
I could achieve this behaviour by creating my own module that uses pickle, and editing the dump and load behaviour so that they use the repeated reads I mentioned above - but I find it hard to believe that this problem has never occurred to, and been fixed by, more talented programmers before. Moreover, these repeated reads seem inefficient to me (though I must admit that my knowledge of operation complexity is limited, and it's possible that these repeated reads are going on "behind the scenes" in otherwise apparently smoother modules like shelve). Therefore, I conclude that I must be missing some code module that would solve the problem for me. I'd be grateful if anyone could point me in the right direction, or give advice about implementation.
Use the ZODB (the Zope Object Database) instead. Backed with ZEO it fulfills your requirements:
Transparent persistence for Python objects
ZODB uses pickles underneath so anything that is pickle-able can be stored in a ZODB object store.
Full ACID-compatible transaction support (including savepoints)
This means changes from one process propagate to all the other processes when they are good and ready, and each process has a consistent view on the data throughout a transaction.
ZODB has been around for over a decade now, so you are right in surmising this problem has already been solved before. :-)
The ZODB let's you plug in storages; the most common format is the FileStorage, which stores everything in one Data.fs with an optional blob storage for large objects.
Some ZODB storages are wrappers around others to add functionality; DemoStorage for example keeps changes in memory to facilitate unit testing and demonstration setups (restart and you have clean slate again). BeforeStorage gives you a window in time, only returning data from transactions before a given point in time. The latter has been instrumental in recovering lost data for me.
ZEO is such a plugin that introduces a client-server architecture. Using ZEO lets you access a given storage from multiple processes at a time; you won't need this layer if all you need is multi-threaded access from one process only.
The same could be achieved with RelStorage, which stores ZODB data in a relational database such as PostgreSQL, MySQL or Oracle.
For beginners, You can port your shelve databases to ZODB databases like this:
#!/usr/bin/env python
import shelve
import ZODB, ZODB.FileStorage
import transaction
from optparse import OptionParser
import os
import sys
import re
reload(sys)
sys.setdefaultencoding("utf-8")
parser = OptionParser()
parser.add_option("-o", "--output", dest = "out_file", default = False, help ="original shelve database filename")
parser.add_option("-i", "--input", dest = "in_file", default = False, help ="new zodb database filename")
parser.set_defaults()
options, args = parser.parse_args()
if options.in_file == False or options.out_file == False :
print "Need input and output database filenames"
exit(1)
db = shelve.open(options.in_file, writeback=True)
zstorage = ZODB.FileStorage.FileStorage(options.out_file)
zdb = ZODB.DB(zstorage)
zconnection = zdb.open()
newdb = zconnection.root()
for key, value in db.iteritems() :
print "Copying key: " + str(key)
newdb[key] = value
transaction.commit()
I suggest using TinyDB, it's much much better and simple to use.
https://tinydb.readthedocs.io/en/stable/

Categories

Resources