This is a question related to best practice.
Let's say I have the following code:
def clean_text(text, idx):
title = sqlalchemy.query("SELECT title FROM page_titles WHERE id = %s" % idx)
return {title: text}
def scrape_webpage(idx):
""" do some stuff here"""
response = requests.get(link)
return clean_text(response.text, idx)
def main():
for idx in range(10):
scrape_webpage(idx)
if __name__ == '__main__':
main()
The main problem I have with this code is there is some statefulness required (namely idx) which is just being passed through scrape_webpage. You can imagine if you have many functions before clean_text() is invoked, idx has to be passed through all of them without being used in any of them.
Is there a better way that clean_text can know the state of the loop without having it passed as an argument (and incidentally, through all of the functions which use it)? Perhaps generators or callbacks? Would appreciate an example.
Related
I'm trying to learn OOP but I'm getting very confused with how I'm supposed to run the methods or return values. In the following code I want to run read_chapters() first, then sendData() with some string content that comes from read_chapters(). Some of the solutions I found did not use __init__ but I want to use it (just to see/learn how i can use them).
How do I run them? Without using __init__, why do you only return 'self'?
import datetime
class PrinceMail:
def __init__(self):
self.date2 = datetime.date(2020, 2, 6)
self.date1 = datetime.date.today()
self.days = (self.date1 - self.date2).days
self.file = 'The_Name.txt'
self.chapter = '' # Not sure if it would be better if i initialize chapter here-
# or if i can just use a normal variable later
def read_chapters(self):
with open(self.file, 'r') as book:
content = book.readlines()
indexes = [x for x in range(len(content)) if 'CHAPTER' in content[x]]
indexes = indexes[self.days:]
heading = content[indexes[0]]
try:
for i in (content[indexes[0]:indexes[1]]):
self.chapter += i # can i use normal var and return that instead?
print(self.chapter)
except IndexError:
for i in (content[indexes[0]:]):
self.chapter += i
print(self.chapter)
return self????? # what am i supposed to return? i want to return chapter
# The print works here but returns nothing.
# sendData has to run after readChapters automatically
def sendData(self):
pass
#i want to get the chapter into this and do something with it
def run(self):
self.read_chapters().sendData()
# I tried this method but it doesn't work for sendData
# Is there anyother way to run the two methods?
obj = PrinceMail()
print(obj.run())
#This is kinda confusing as well
Chaining methods is just a way to shorten this code:
temp = self.read_chapters()
temp.sendData()
So, whatever is returned by read_chapters has to have the method sendData. You should put whatever you want to return in read_chapters in a field of the object itself (aka self) in order to use it after chaining.
First of all, __init__ has nothing to do with what you want to achieve here. You can consider it as a constructor for other languages, this is the first function that is called when you create an object of the class.
Now to answer your question, if I am correct you just want to use the output of read_chapters in sendData. One of the way you can do that is by making the read_chapters a private method (that is if you don't want it to use through the object) using __ in the starting of the name like __read_chapters then make a call to the function inside the sendData function.
Another point to consider here is, when you are using self and don't intend to use the function through the object you don't need to return anything. self assigns the value to the attribute of the current instance. So, you can leave the function read_chapters at self.chapter = i and access the same in sendData.
Ex -
def sendData(self):
print(self.chapter)
I'm not an expert but, the reason to return self is because it is the instance of the class you're working with and that's what allows you to chain methods.
For what you're trying to do, method chaining doesn't seem to be the best approach. You want to sendData() for each iteration of the loop in read_chapters()? (you have self.chapter = i which is always overwritten)
Instead, you can store the chapters in a list and send it after all the processing.
Also, and I don't know if this is a good practice but, you can have a getter to return the data if you want to do something different with (return self.chapter instead of self)
I'd change your code for:
import datetime
class PrinceMail:
def __init__(self):
self.date2 = datetime.date(2020, 2, 6)
self.date1 = datetime.date.today()
self.days = (self.date1 - self.date2).days
self.file = 'The_Name.txt'
self.chapter = []
def read_chapters(self):
with open(self.file, 'r') as book:
content = book.readlines()
indexes = [x for x in range(len(content)) if 'CHAPTER' in content[x]]
indexes = indexes[self.days:]
heading = content[indexes[0]]
try:
for i in (content[indexes[0]:indexes[1]]):
self.chapter.append(i)
except IndexError:
#not shure what you want to do here
for i in (content[indexes[0]:]):
self.chapter.append(i)
return self
# sendData has to run after readChapters automatically
def sendData(self):
pass
#do what ever with self.chapter
def get_raw_chapters(self):
return self.chapter
Also, check PEP 8 Style Guide for naming conventions (https://www.python.org/dev/peps/pep-0008/#function-and-variable-names)
More reading in
Method chaining - why is it a good practice, or not?
What __init__ and self do on Python?
Will use following example to explain.
Existing python file (a.py) contains one class:
class A:
def method1(self, par1, par2='e'):
# some code here
pass
def method2(self, parA):
# some code here
pass
def method3(self, a, b, c):
# lots of code here
pass
def anothermethod(self):
pass
if __name__ == '__main__':
A().anothermethod()
Now, there is a need to create another py file (b.py), which would contain subclass (class B) of class A.
And there is a need to have all the methods included (all inherited from parent class), but without
implementation in it. Result might look like:
class B(A):
def method1(self, par1, par2='e'):
# empty here; ready to override
pass
def method2(self, parA):
# empty here; ready to override
pass
def method3(self, a, b, c):
# empty here; ready to override
pass
def anothermethod(self):
# empty here; ready to override
pass
if __name__ == '__main__':
B().anothermethod()
Having described the example, the question is: how could one generate last mentioned (skeleton-like) py file? So that after generating you can just open generated file and start right away with filling specific implementation.
There must be a shorter way, 1-2 line solution. Maybe it is already solvable by some existing functionality within modules already provided by Python (Python 3)?
Edit (2018 Mar 14). Thank you https://stackoverflow.com/a/49152537/4958287 (though was looking for short and already existing solution here). Will have to settle with longer solution for now -- will include its rough version here, maybe it would be helpful to someone else:
import inspect
from a import A
def construct_skeleton_subclass_from_parent(subcl_name, parent_cl_obj):
"""
subcl_name : str
Name for subclass and
file to be generated.
parent_cl_obj : obj (of any class to create subclass for)
Object of parent class.
"""
lines = []
subcl_name = subcl_name.capitalize()
parent_cl_module_name = parent_cl_obj.__class__.__module__
parent_cl_name = parent_cl_obj.__class__.__name__
lines.append('from {} import {}'.format(parent_cl_module_name, parent_cl_name))
lines.append('')
lines.append('class {}({}):'.format(subcl_name, parent_cl_name))
for name, method in inspect.getmembers(parent_cl_obj, predicate=inspect.ismethod):
args = inspect.signature(method)
args_others = str(args).strip('()').strip()
if len(args_others) == 0:
lines.append(' def {}(self):'.format(name))
else:
lines.append(' def {}(self, {}):'.format(name, str(args).strip('()')))
lines.append(' pass')
lines.append('')
#...
#lines.append('if __name__ == \'__main__\':')
#lines.append(' ' + subcl_name + '().anothermethod()')
#...
with open(subcl_name.lower() + '.py', 'w') as f:
for c in lines:
f.write(c + '\n')
a_obj = A()
construct_skeleton_subclass_from_parent('B', a_obj)
Get the list of methods and each of their signatures using the inspect module:
import a
import inspect
for name, method in inspect.getmembers(a.A, predicate=inspect.ismethod):
args = inspect.signature(method)
print(" def {}({}):".format(name, args))
print(" pass")
print()
I am trying to change my code to a more object oriented format. In doing so I am lost on how to 'visualize' what is happening with multiprocessing and how to solve it. On the one hand, the class should track changes to local variables across functions, but on the other I believe multiprocessing creates a copy of the code which the original instance would not have access to. I need to figure out a way to manipulate classes, within a class, using multiprocessing, and have the parent class retain all manipulated values in the nested classes.
A simple version (OLD CODE):
function runMultProc():
...
dictReports = {}
listReports = ['reportName1.txt', 'reportName2.txt']
tasks = []
pool = multiprocessing.Pool()
for report in listReports:
if report not in dictReports:
dictReports[today][report] = {}
tasks.append(pool.apply_async(worker, args=([report, dictReports[today][report]])))
else:
continue
for task in tasks:
report, currentReportDict = task.get()
dictReports[report] = currentFileDict
function worker(report, currentReportDict):
<Manipulate_reports_dict>
return report, currentReportDict
NEW CODE:
class Transfer():
def __init__(self):
self.masterReportDictionary[<todays_date>] = [reportObj1, reportObj2]
def processReports(self):
self.pool = multiprocessing.Pool()
self.pool.map(processWorker, self.masterReportDictionary[<todays_date>])
self.pool.close()
self.pool.join()
def processWorker(self, report):
# **process and manipulate report, currently no return**
report.name = 'foo'
report.path = '/path/to/report'
class Report():
def init(self):
self.name = ''
self.path = ''
self.errors = {}
self.runTime = ''
self.timeProcessed = ''
self.hashes = {}
self.attempts = 0
I don't think this code does what I need it to do, which is to have it process the list of reports in parallel AND, as processWorker manipulates each report class object, store those results. As I am fairly new to this I was hoping someone could help.
The big difference between the two is that the first one build a dictionary and returned it. The second model shouldn't really be returning anything, I just need for the classes to finish being processed and they should have relevant information within them.
Thanks!
I'm creating a program that uses the Twisted module and callbacks.
However, I keep having problems because the asynchronous part goes wrecked.
I have learned (also from previous questions..) that the callbacks will be executed at a certain point, but this is unpredictable.
However, I have a certain program that goes like
j = calc(a)
i = calc2(b)
f = calc3(c)
if s:
combine(i, j, f)
Now the boolean s is set by a callback done by calc3. Obviously, this leads to an undefined error because the callback is not executed before the s is needed.
However, I'm unsure how you SHOULD do if statements with asynchronous programming using Twisted. I've been trying many different things, but can't find anything that works.
Is there some way to use conditionals that require callback values?
Also, I'm using VIFF for secure computations (which uses Twisted): VIFF
Maybe what you're looking for is twisted.internet.defer.gatherResults:
d = gatherResults([calc(a), calc2(b), calc3(c)])
def calculated((j, i, f)):
if s:
return combine(i, j, f)
d.addCallback(calculated)
However, this still has the problem that s is undefined. I can't quite tell how you expect s to be defined. If it is a local variable in calc3, then you need to return it so the caller can use it.
Perhaps calc3 looks something like this:
def calc3(argument):
s = bool(argument % 2)
return argument + 1
So, instead, consider making it look like this:
Calc3Result = namedtuple("Calc3Result", "condition value")
def calc3(argument):
s = bool(argument % 2)
return Calc3Result(s, argument + 1)
Now you can rewrite the calling code so it actually works:
It's sort of unclear what you're asking here. It sounds like you know what callbacks are, but if so then you should be able to arrive at this answer yourself:
d = gatherResults([calc(a), calc2(b), calc3(c)])
def calculated((j, i, calc3result)):
if calc3result.condition:
return combine(i, j, calc3result.value)
d.addCallback(calculated)
Or, based on your comment below, maybe calc3 looks more like this (this is the last guess I'm going to make, if it's wrong and you'd like more input, then please actually share the definition of calc3):
def _calc3Result(result, argument):
if result == "250":
# SMTP Success response, yay
return Calc3Result(True, argument)
# Anything else is bad
return Calc3Result(False, argument)
def calc3(argument):
d = emailObserver("The argument was %s" % (argument,))
d.addCallback(_calc3Result)
return d
Fortunately, this definition of calc3 will work just fine with the gatherResults / calculated code block immediately above.
You have to put if in the callback. You may use Deferred to structure your callback.
As stated in previous answer - the preocessing logic should be handled in callback chain, below is simple code demonstration how this could work. C{DelayedTask} is a dummy implementation of a task which happens in the future and fires supplied deferred.
So we first construct a special object - C{ConditionalTask} which takes care of storring the multiple results and servicing callbacks.
calc1, calc2 and calc3 returns the deferreds, which have their callbacks pointed to C{ConditionalTask}.x_callback.
Every C{ConditionalTask}.x_callback does a call to C{ConditionalTask}.process which checks if all of the results have been registered and fires on a full set.
Additionally - C{ConditionalTask}.c_callback sets a flag of wheather or not the data should be processed at all.
from twisted.internet import reactor, defer
class DelayedTask(object):
"""
Delayed async task dummy implementation
"""
def __init__(self,delay,deferred,retVal):
self.deferred = deferred
self.retVal = retVal
reactor.callLater(delay, self.on_completed)
def on_completed(self):
self.deferred.callback(self.retVal)
class ConditionalTask(object):
def __init__(self):
self.resultA=None
self.resultB=None
self.resultC=None
self.should_process=False
def a_callback(self,result):
self.resultA = result
self.process()
def b_callback(self,result):
self.resultB=result
self.process()
def c_callback(self,result):
self.resultC=result
"""
Here is an abstraction for your "s" boolean flag, obviously the logic
normally would go further than just setting the flag, you could
inspect the result variable and do other strange stuff
"""
self.should_process = True
self.process()
def process(self):
if None not in (self.resultA,self.resultB,self.resultC):
if self.should_process:
print 'We will now call the processor function and stop reactor'
reactor.stop()
def calc(a):
deferred = defer.Deferred()
DelayedTask(5,deferred,a)
return deferred
def calc2(a):
deferred = defer.Deferred()
DelayedTask(5,deferred,a*2)
return deferred
def calc3(a):
deferred = defer.Deferred()
DelayedTask(5,deferred,a*3)
return deferred
def main():
conditional_task = ConditionalTask()
dFA = calc(1)
dFB = calc2(2)
dFC = calc3(3)
dFA.addCallback(conditional_task.a_callback)
dFB.addCallback(conditional_task.b_callback)
dFC.addCallback(conditional_task.c_callback)
reactor.run()
I'm current writing a short bit of code that will compare an etag for a web server page in a saved document to the etag on the server. If they are different, the code will indicate this. My code is below:-
import httplib
def currentTag():
f = open('C:/Users/ME/Desktop/document.txt')
e = f.readline()
newTag(e)
def newTag(old_etag):
c = httplib.HTTPConnection('standards.ieee.org')
c.request('HEAD', '/develop/regauth/oui/oui.txt')
r = c.getresponse()
current_etag = r.getheader('etag').replace('"', '')
compareTag(old_etag, current_etag)
def compareTag(old_etag, current_etag):
if old_etag == current_etag:
print "the same"
else:
print "different"
if __name__ == '__main__':
currentTag()
Now, reviewing my code, there is actually no reason to pass 'etag' from the currentTag() method to the newTag() method given that the pre-existing etag is not processed in newTag(). Nonetheless, if I don't do this, how can I pass two different values to compareTag(). So for example, when defining compareTag(), how can I pass 'etag' from the currentTag() method and 'current_etag' from the newTag() method?
you shouldn't chain your function calls like that, have a main block of code that calls the functions serially, like so:
if __name__ == '__main__':
currtag = currentTag()
newtag = newTag()
compareTag(currtag,newtag)
adjust your functions to return the relevant data
the basic idea of a function is that it returns data, you usually use functions to do some processing and return a value and not for control flow.
change your main to:
if __name__ == '__main__':
compareTag(currentTag(), newTag())
and then have currentTag() return e and newTag() return current_etag
def checkTags():
c = httplib.HTTPConnection('standards.ieee.org')
c.request('HEAD', '/develop/regauth/oui/oui.txt')
r = c.getresponse()
with open('C:/Users/ME/Desktop/document.txt', 'r') as f:
if f.readline() == r.getheader('etag').replace('"', ''): print "the same"
else: print "different"
you could make the variables (i.e. etag) global