My question is basically the following: I have two classes:
Class 1 that does something in a loop (web scraping)
Class 2 that has different functions to automize things someone has to do in this loop and similar loops in the future.
The idea is to abstract Class 1 as much as possible.
Example: Class 2 has a function that "pauses like a human" after certain iterations
(excerpt)
def pause_like_human(self):
''' Pausing long every x-th iteration or else short '''
# every 20th iteration - wait for 5-10 minutes before continuing
if self.index_no % self.index_counter == 0:
timestamp = time.localtime()
return print("Let's wait " + str(self.waiting_time_long) + " seconds! The time is: "\
+ str(timestamp.tm_hour) + ":" + str(timestamp.tm_min) + ":"\
+ str(timestamp.tm_sec)), time.sleep(self.waiting_time_long)
else:
return print("Let's continue! ... after " + str(self.waiting_time_short) + " seconds..."), time.sleep(self.waiting_time_short)
Now my problem is that I would need to import "index_no" from the loop I am using in class 2... over and over again.
(index_no is in this loop:)
for index_no, city in enumerate(self.df['column'][len(self.dic)-1:], start = len(self.dic)-1):
To me it sounds like this would be very inefficient - does someone have a better idea?
So the answer for me specifically seems to be that I simply abstract the code even more. This will turn my "complicated" loop into a very easy loop and my two classes into one class.
Class 1 turns to a longer list with links, but actually well prepared.
Class 2 turns into a more generalised class with most of the functionality of the former class 1 inherited already.
I wouldnt mind leaving this post open if someone comes up with a general help or hints...
Related
I was working on building a randomized character generator for Pathfinder 3.5 and got stuck.
I am using the Populate_Skills(Skill_String, Draw, Skill_List, Class_Skill): function to populate a randiomized list of skills with their class based points total, class bonus, and point buy. So modelling the action of a player picking skills for their character.
As an example below, Wizards.
I pick Knowledge_Arcana as a skill and spend one of my skill point pool (Calculated by taking my intelligence modifier +2) on it. So that skill now equals my intelligence modifier(+1 in this case), class skill bonus as a wizard (+3), plus the point I spent(+1) for a total of 5.
The problem is while the function prints the correct result of 5, the outstanding variables do not populate with the final total. To continue our example I'd run the function on Knowledge_Arcana, get a +5, and then check the Knowledge_Arcana after the function call and get just +1. Conversely, if I write out the function as just an if statement it works. Example is next to the function for comparison.
Does anyone know why Im getting the different result?
## Creating the lists and breaking into two separate sections
Int_Mod = 1
Skill_Ranks = 3
Rand_Class = 'Wizard'
Knowledge_Arcana = Int_Mod
Knowledge_Dungeoneering = Int_Mod
Wizard_Class_Top_Skills = ["Knowledge_Arcana"]
Wizard_Class_Less_Skills = ["Knowledge_Dungeoneering"]
Class_Skill = 3
Important_Skills_Weighted = .6
Less_Important_Skills_Weighted = .4
Important_Skills_Total_Weighted = round(Skill_Ranks*Important_Skills_Weighted)
Less_Skill_Total_Weighted = round(Skill_Ranks*Less_Important_Skills_Weighted)
Wiz_Draw =['Knowledge_Arcana', 'Knowledge_Dungeoneering']
def Populate_Skills(Skill_String, Draw, Skill_List, Class_Skill):
if Skill_String in Draw:
Skill_List = Skill_List + Class_Skill + Draw.count(Skill_String)
print(Skill_String, Skill_List)
else:
print('Nuts!')
## Function Calls
Populate_Skills('Knowledge_Arcana', Wiz_Draw, Knowledge_Arcana, Class_Skill)
Populate_Skills('Knowledge_Dungeoneering', Wiz_Draw, Knowledge_Dungeoneering, Class_Skill)
print(Knowledge_Arcana,Knowledge_Dungeoneering)
Edited to be a MRE, I believe. Sorry folks, Im new.
You are passing in a reference to a list and expect the function to modify it; but you are reassigning the variable inside the function which creates a local variable, which is then lost when the function is exited. You want to manipulate the same variable which the caller passed in, instead.
def Populate_Skills(Skill_String, Draw, Skill_List, Class_Skill):
if Skill_String in Draw:
Skill_List.extend(Class_Skill + Draw.count(Skill_String))
print(Skill_String, Skill_List)
else:
print('Nuts!')
Alternatively, have the function return the new value, and mandate for the caller to pick it up and assign it to the variable.
def Populate_Skills(Skill_String, Draw, Skill_List, Class_Skill):
if Skill_String in Draw:
Skill_List = Skill_List + Class_Skill + Draw.count(Skill_String)
print(Skill_String, Skill_List)
else:
print('Nuts!')
return Skill_List
Skill_List = Populate_Skills('Knowledge_Arcana', Wiz_Draw, Knowledge_Arcana, Class_Skill)
# etc
You should probably also rename your variables (capital letters should be used for classes and globals; regular Python functions and variables should use snake_case) and avoid using global variables at all. The entire program looks like you should probably look into refactoring it into objects, but that's far beyond the scope of what you are asking.
The Scenario:
I am doing a question on Leetcode called nth Ugly Number. The algorithm is to find the nth number whose prime factors include only 1, 2, 3, and 5.
I created a solution which was accepted and passed all the tests. Then, I wanted to memoize it for practice with memoization with python - however, something has gone wrong with the memoization. It works for my own personal tests, but Leetcode does not accept the answer.
The memoized code is detailed below:
class Solution:
uglyNumbers = [1, 2, 3, 4, 5]
latest2index = 2
latest3index = 1
latest5index = 1
def nthUglyNumber(self, n: int) -> int:
while len(self.uglyNumbers) <= n:
guess2 = self.uglyNumbers[self.latest2index] * 2
guess3 = self.uglyNumbers[self.latest3index] * 3
guess5 = self.uglyNumbers[self.latest5index] * 5
nextUgly = min(guess2, guess3, guess5)
if(nextUgly == guess2):
self.latest2index += 1
if(nextUgly == guess3):
self.latest3index += 1
if(nextUgly == guess5):
self.latest5index += 1
self.uglyNumbers.append(nextUgly)
return self.uglyNumbers[n-1]
The only change I made when memoizing was to make uglyNumbers, latest2index, etc. to be class members instead of local variables.
The Problem:
When I submit to LeetCode, it claims that the solution no longer works. Here is where it breaks:
Input 12 /// Output 6 /// Expected 16
However, when I test the code myself and provide it with input 12, it gives the expected output 16. It does this even if I call nthUglyNumber with a bunch of different inputs before and after 12, so I have no idea why the test case breaks upon being submitted to LeetCode
Here's the testing I performed to confirm that the algorithm appears to work as expected:
# This code goes inside Class Solution
def nthUglyNumber(self, n: int) -> int:
print("10th: " + str(self.nthUgliNumber(10)))
print("11th: " + str(self.nthUgliNumber(11)))
print("12th: " + str(self.nthUgliNumber(12)))
print("9th: " + str(self.nthUgliNumber(9)))
print("14th: " + str(self.nthUgliNumber(14)))
print("10th: " + str(self.nthUgliNumber(10)))
print("11th: " + str(self.nthUgliNumber(11)))
print("12th: " + str(self.nthUgliNumber(12)))
return self.nthUgliNumber(n)
def nthUgliNumber(self, n: int) -> int:
# The regular definition of nthUglyNumber goes here
What I want to know
Is there some edge case in Python memoization that I am not seeing that's causing the code to trip up? Or is it fully Leetcode's fault? I know my algorithm works without memoization, but I want to understand what's going wrong so I gain a better understanding of Python and so that I can avoid similar mistakes in the future.
I appreciate the help!
I believe leetcode is probably running all tests in parallel on multiple threads using separate instances of the Solution class. Since you are storing nthUgliNumber as a class variable, instances may be updating it (and the 3 indexes) in a conflicting manner.
From leetcode's perspective, each test is not expected to have side effects that would impact other tests. So, parallel execution in distinct instances is legitimate. Caching beyond the scope of the test case is likely undesirable as it would make performance measurements inconsistent and dependent on the order and content of the test cases.
I have written an instance method which uses recursion to find a certain solution. It works perfectly fine except the time when I'm exiting the if-elif block. I call the function itself inside IF block. Also, I have only one return statement. The output from the method is weird for me to understand. Here is the code and the output:
def create_schedule(self):
"""
Creates the day scedule for the crew based on the crew_dict passed.
"""
sched_output = ScheduleOutput()
assigned_assignements = []
for i in self.crew_list:
assigned_assignements.extend(i.list_of_patients)
rest_of_items = []
for item in self.job.list_of_patients:
if item not in assigned_assignements:
rest_of_items.append(item)
print("Rest of the items are:", len(rest_of_items))
if len(rest_of_items) != 0:
assignment = sorted(rest_of_items, key=lambda x: x.window_open)[0]
# print("\nNext assignment to be taken ", assignment)
output = self.next_task_eligibility(assignment, self.crew_list)
if len(output) != 0:
output_sorted = sorted(output, key=itemgetter(2))
crew_to_assign = output_sorted[0][1]
assignment.eta = output_sorted[0][4]
assignment.etd = int(assignment.eta) + int(assignment.care_duration)
crew = next((x for x in self.crew_list if x.crew_number == crew_to_assign), None)
self.crew_list.remove(crew)
crew.list_of_patients.append(assignment)
crew.time_spent = assignment.etd
self.crew_list.append(crew)
self.create_schedule()
else:
print("*" * 80, "\n", "*" * 80, "\nWe were not able to assign a task so stopped.\n", "*" * 80, "\n", "*" * 80)
sched_output.crew_output = self.crew_list
sched_output.patients_left = len(rest_of_items)
elif not rest_of_items:
print("Fully solved.")
sched_output.crew_output = self.crew_list
sched_output.patients_left = 0
print("After completely solving coming here.")
return sched_output
This was the output:
Rest of the items are: 10
Rest of the items are: 9
Rest of the items are: 8
Rest of the items are: 7
Rest of the items are: 6
Rest of the items are: 5
Rest of the items are: 4
Rest of the items are: 3
Rest of the items are: 2
Rest of the items are: 1
Rest of the items are: 0
Fully solved.
After completely solving coming here.
After completely solving coming here.
After completely solving coming here.
After completely solving coming here.
After completely solving coming here.
After completely solving coming here.
After completely solving coming here.
After completely solving coming here.
After completely solving coming here.
After completely solving coming here.
After completely solving coming here.
What I don't understand is that as soon as my list rest_of_items is empty, I assign data to sched_output and return it. However, print statement is being executed for the same number of time as recursion was done. How can I avoid this?
My output is perfectly fine. All I want to do is understand the cause of this behaviour and how to avoid it.
The reason it's printing out 11 times is that you always call print at the end of the function, and you're calling the function 11 times. (It's really the same reason you get Rest of the items are: … 11 times, which should be a lot more obvious.)
Often, the best solution is to redesign things so instead of doing "side effects" like print inside the function, you just return a value, and the caller can then do whatever side effects it wants with the result. In that case, it doesn't matter that you're calling print 11 times; the print will only happen once, in the caller.
If that isn't possible, you can change this so that you only print something when you're at the top of the stack. But in many recursive functions, there's no obvious way to figure that out without passing down more information:
def create_schedule(self, depth=0):
# etc.
self.create_schedule(depth+1)
# etc.
if not depth:
print('After completely solving come here.')
returns sched_output
The last resort is to just wrap the recursive function, like this:
def _create_schedule(self):
# etc.
self._create_schedule()
# etc.
# don't call print
return sched_output
def create_schedule(self):
result = self._create_schedule()
print('After completely solving come here.')
return result
That's usually only necessary when you need to do some one-time setup for the recursive process, but here you want to do some one-time post-processing instead, which is basically the same problem, so it can be solved the same way.
(Of course this is really just the first solution in disguise, but it's hidden inside the implementation of create_schedule, so you don't need to change the interface that the callers see.)
As you call your create_schedule function within itself before the function finishes, once it has gotten to the end and doesn't need to call itself again, each function ends, and hits the "After completely solving coming here.", at the end of the function.
This means that each function, after calling itself, is still running - just stuck at the line where it calls itself - until they have all completed, which is when the paused functions can finish their task, printing out your statement.
You have print("After completely solving coming here.") at the end of your recursive function. That line will be executed once for each recursion.
Consider this simple example, which recreates your issue:
def foo(x):
print("x = {x}".format(x=x))
if x > 1:
foo(x-1)
print("Done.")
Now call the function:
>>> foo(5)
x = 5
x = 4
x = 3
x = 2
x = 1
Done.
Done.
Done.
Done.
Done.
As you can see, on the final call to foo(x=0), it will print "Done.". At that point, the function will return to the previous call, which will also print "Done." and so on.
It's a program that suggests to the user a player's name if the user made a typo. It's extremely slow.
First it has to issue a get request, then checks to see if the player's name is within the json data, if it is, pass. Else, it takes all the players' first and last names and appends it to names. Then it checks whether the first_name and last_name closely resembles the names in the list using get_close_matches. I knew from the start this would be very slow, but there has to be a faster way to do this, it's just I couldn't come up with one. Any suggestions?
from difflib import get_close_matches
def suggestion(first_name, last_name):
names = []
my_request = get_request("https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json")
for n in my_request['activeplayers']['playerentry']:
if last_name == n['player']['LastName'] and first_name == n['player']['FirstName']:
pass
else:
names.append(n['player']['FirstName'] + " " + n['player']['LastName'])
suggest = get_close_matches(first_name + " " + last_name, names)
return "did you mean " + "".join(suggest) + "?"
print suggestion("mattthews ", "stafffford") #should return Matthew Stafford
Well, since it turned out my suggestion in the comments worked out, I might as well post it as an answer with some other ideas included.
First, take your I/O operation out of the function so that you're not wasting time making the request every time your function is run. Instead, you will get your json and load it into local memory when you start the script. If at all possible, downloading the json data beforehand and instead opening a text file might be a faster option.
Second, you should get a set of unique candidates per loop because there is no need to compare them multiple times. When a name is discarded by get_close_matches(), we know that same name does not need to be compared again. (It would be a different story if the criteria with which the name is being discarded depends on the subsequent names, but I doubt that's the case here.)
Third, try to work with batches. Given that get_close_matches() is reasonably efficient, comparing to, say, 10 candidates at once shouldn't be any slower than to 1. But reducing the for loop from going over 1 million elements to over 100K elements is quite a significant boost.
Fourth, I assume that you're checking for last_name == ['LastName'] and first_name == ['FirstName'] because in that case there would have been no typo. So why not simply break out of the function?
Putting them all together, I can write a code that looks like this:
from difflib import get_close_matches
# I/O operation ONCE when the script is run
my_request = get_request("https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json")
# Creating batches of 10 names; this also happens only once
# As a result, the script might take longer to load but run faster.
# I'm sure there is a better way to create batches, but I'm don't know any.
batch = [] # This will contain 10 names.
names = [] # This will contain the batches.
for player in my_request['activeplayers']['playerentry']:
name = player['FirstName'] + " " + player['LastName']
batch.append(name)
# Obviously, if the number of names is not a multiple of 10, this won't work!
if len(batch) == 10:
names.append(batch)
batch = []
def suggest(first_name, last_name, names):
desired_name = first_name + " " + last_name
suggestions = []
for batch in names:
# Just print the name if there is no typo
# Alternatively, you can create a flat list of names outside of the function
# and see if the desired_name is in the list of names to immediately
# terminate the function. But I'm not sure which method is faster. It's
# a quick profiling task for you, though.
if desired_name in batch:
return desired_name
# This way, we only match with new candidates, 10 at a time.
best_matches = get_close_matches(desired_name, batch)
suggestions.append(best_matches)
# We need to flatten the list of suggestions to print.
# Alternatively, you could use a for loop to append in the first place.
suggestions = [name for batch in suggestions for name in batch]
return "did you mean " + ", ".join(suggestions) + "?"
print suggestion("mattthews ", "stafffford") #should return Matthew Stafford
I'm pretty new to python, so I apologize if this question sounds silly. This is not just plain curiosity, I have to work with code that has a class like that.
Consider a following python code snippet:
class _Base(object):
constant1 = 1
constant2 = 2
constant3 = 3
def main():
a = _Base # Referencing a class
b = _Base() # Instantiating
if __name__ == '__main__':
main()
In this particular example when _Base class does not have an __init__() method, is there any downside, performance-wise or otherwise, in using b approach as compared to a?
You'd normally put constants inside a module instead of a class. If you need to use them for subclasses then inherit from the Base and use them, otherwise, don't instantiate the class as you're just using it as a kind of "namespace".
Name it something better than _Base and access the variables as (for eg:) MyConstants.constant1 instead...
Jon Clements gives the answer to how you should do this.
But to answer your actual question:
is there any downside, performance-wise or otherwise, in using b approach as compared to a?
More important than performance is readability. If you instantiate an object, readers will think you've done so for some reason, and get side-tracked trying to figure out what b is being used for and what a _Base instance represents in your object model and so on. It won't take too long to figure out that it's useless, but "obvious" is always better than "won't take too long to figure out".
But there is a performance downside as well. It will most likely never matter in any measurable way in any program you ever write, but it's there.
First, b is a newly-allocated object that takes a few bytes (maybe a couple dozen), while a is just a new name for an already-existing object (the class itself). So, it wastes memory.
Second, constructing b takes a bit of time. Besides allocating that memory, you also have to call the __new__ and __init__ slots on object.
You can test the performance difference for yourself with timeit, but I wouldn't bother. You'll most likely find out that b is 20 times slower than a or something like that, but a 20:1 improvement in something you do one time per run that takes under a microsecond is still meaningless.
I say look out for future maintenance, and set the class to have all #property methods, since they'll be indistinguishable from typical constants, and allow your code to grow, since everything will be an instance of _Base, and wrapped in the handy property decorator.
class Base(object):
"""for now, just holds constants"""
def __init__(self):
pass
#property
def constant1(self):
return 1
#property
def constant2(self):
return 2
#property
def constant3(self):
return 3
Of course, my next question is...why do you need to instantiate a class to do that? Why not a constant defined in the method?
#! /usr/bin/env python
"""this module does stuff"""
CONSTANT_1 = 1
CONSTANT_2 = 2
CONSTANT_3 = 3
You can reference those in methods and classes without providing them as arguments. They work as pervasive sources of data, just like a constant should. This is typically the way to avoid magic numbers.
#Jon Clements answer is pretty good, but if you want, you can stay with the class, but turn all of your constants into static methods.
class MyConstants(object):
#staticmethod
def constant1():
return 1
Then you can call it:
some_variable = MyConstants.constant1()
I feel that handling things like this is nicer in terms of maintainability---if you ever want to do anything other than return a constant, Jon's solution won't work, and you'll have to refactor your code. For example, you may want to change the definition of constant1 at some point:
def constant1():
import time
import math
current_time = time.time()
return math.ceil(current_time)
which returns the current time to the nearest second.
Anyway, sorry for the essay :)
So, given the comments here, I thought I'd see what the actual overhead was in doing things my way (with a factory) vs. declaring static constants vs. using properties in a class.
time_test.py:
import time
CONSTANT_1 = 1000
CONSTANT_2 = 54
CONSTANT_3 = 42
CONSTANT_4 = 3.14
class Constants(object):
constant_1 = 1000
constant_2 = 54
constant_3 = 42
constant_4 = 3.14
class Factory(object):
#staticmethod
def constant_1():
return 1000
#staticmethod
def constant_2():
return 54
#staticmethod
def constant_3():
return 42
#staticmethod
def constant_4():
return 3.14
if __name__ == '__main__':
loops = 10000000
# static const
start = time.time()
for i in range(loops):
sum = CONSTANT_1
sum += CONSTANT_2
sum += CONSTANT_3
sum += CONSTANT_4
static_const_time = time.time() - start
# as attributes
start = time.time()
for i in range(loops):
sum = Constants.constant_1
sum += Constants.constant_2
sum += Constants.constant_3
sum += Constants.constant_4
attributes_time = time.time() - start
# Factory
start = time.time()
for i in range(loops):
sum = Factory.constant_1()
sum += Factory.constant_2()
sum += Factory.constant_3()
sum += Factory.constant_4()
factory_time = time.time() - start
print static_const_time / loops
print attributes_time / loops
print factory_time / loops
import pdb
pdb.set_trace()
Results:
Bens-MacBook-Pro:~ ben$ python time_test.py
4.64897489548e-07
7.57454514503e-07
1.09821901321e-06
--Return--
> /Users/ben/time_test.py(71)<module>()->None
-> pdb.set_trace()
(Pdb)
So there you have it: a marginal gain (a few seconds per ten million loops) in efficiency that's probably swamped by things in other places in your code. So we've established that all three solutions have similar performance unless you care about micro-optimizations like this. (If that's the case, you're probably better off working in C.) All three solutions are readable, maintainable, and probably can be found in version control in any software company that uses Python. So the difference is down to aesthetics.
Anyway, I once lost 15% of points for a research paper in high school because my bibliography wasn't formatted correctly. The content was flawless, it just wasn't pretty enough for my teacher. I've found that one can spend their time learning rules or solving problems. I prefer solving problems.