Input superclass variables in Python - python

I'm building a web crawler with Python. I created a parent class to save the user and the password, that I'd like to be inputed via keyboard.
The parent class looks like this:
class ParentCrawler(object):
def __init__(self):
"""Saves the user and the password"""
self.user = input("Email: ")
self.password = getpass.getpass("Password: ")
Then I created a subclass of that parent class with the idea of running parallel instances of it to make the crawling faster. But everytime I create a new object of the child class, I'm asked to input user and pass again, like in the pic below, and that's not what I want.
When a child object is created...
I know I could just hard code my user and pass into the parent class constructor method, but I'd like to know how to input them manually every time the program runned.

Th __init__ method of a class will be run every time you create a new instance. Since this values are needed just once, and you don't need different values for them for each instance, it makes little sense for their values to be requested inside the class initialiser, or other method.
Moreover, if your classes have nothing to do with user interaction on the terminal, there is no reason to hardcode this user interaction in the class code - if you make modifications to your program that will use the same class, and get this information from a configuration file, or from a POSTed web form, for example, you won't be able to use these classes in this way.
There is nothing wrong to pass the credentials as mandatory values when instantiating a class. To continue development and use of your program using it interactivelly at the terminal, you can create a simple function that will request these input data, and return them -
NUM_WORKERS = 4
def get_credentials():
user = input("Email: ")
password = getpass.getpass("Password: ")
return user, password
def main():
workers = []
user, password = get_credentials()
for i in range(NUM_WORKERS):
worker = Crawler(user, password)
workers.append(worker)
worker.start()
...
class Crawler:
def __init__(self, user, password):
...

Related

Change function to Signal login

I want to run this function only when the user logs in.
def purchased_today(self):
check = Rating.objects.filter(user=self.user, book__collection=self.collection, score__lte=4).order_by('newest')[:50]
a = check.count()
today = datetime.now().strftime("%Y-%m-%d")
if self.last_checked.date().strftime("%Y-%m-%d") != today:
return a
I tried it:
#receiver(user_logged_in)
def purchased_today(sender, user, request, **kwargs):
check = Rating.objects.filter(user=self.user, book__collection=self.collection, score__lte=4).order_by('newest')[:50]
a = check.count()
today = datetime.now().strftime("%Y-%m-%d")
if self.last_checked.date().strftime("%Y-%m-%d") != today:
return a
But it returns:
name 'self' is not defined.
how can i replace the self?
You don't say what kind of class self is referring to. But in this case, I would write code outside of that class that triggers on user_log_in, and instantiates a user class when that happens. Once the class is instantiated, you can call
purchased_today(self)
as before.
If you want to separate out the functions that can be run anytime versus the ones that must only be run when user is logged in, then I would make 2 separate classes and trigger instantiation of the 2nd one when user_log_in triggers.
The control code for the callback should be outside of the class either way.

Python3 "Class factory" - ex: API(token).MyClass()

I'm writing a python REST client for an API.
The API needs authentication and I would like to have many API client objects running on the same script.
My current code for the API is something like this:
class RestAPI:
def __init__(self, id):
self.id = id
self.fetch()
def fetch(self):
requests.get(self.url+self.id, auth=self.apikey)
class Purchase(RestAPI):
url = 'http://example.com/purchases/'
class Invoice(RestAPI):
url = 'http://example.com/invoices/'
...
And I would like to use the API like that:
api_admin = Api('adminmytoken')
api_user = Api('usertoken')
…
amount = api_admin.Purchase(2).amount
api_user.Purchase(2).amount # raises because api_user is not authorized for this purchase
The problem is that each object needs to know it's apikey depending on the client I want to use.
That pattern looks like to me to a "class factory": all the classes of RestAPI need to know of the provided token.
How is it possible to cleanly do that without giving manually the token to each model ?
I think the issue here is that your design is a little backwards. Inheritance might not be the key here. What I might do is take the api token as an argument on the User class, then that gets passed to an instance-level binding on the Rest interface:
class APIUser:
def __init__(self, id, api_key, **kwargs):
self._rest = Interface(id, api_key, **kwargs)
def purchase(self, some_arg):
# the interface itself does the actual legwork,
# and you are simply using APIUser to call functions with the interface
return self._rest.fetch('PURCHASE', some_arg)
class Interface:
methods = {
# call you want (class url)
'PURCHASE': (Purchase, 'https://myexample.com/purchases'),
'INVOICE': (Invoice, 'https://myexample.com/invoices'),
# add more methods here
}
def __init__(self, id, key):
self.id = id
self.key = key
self.session = requests.Session()
def _fetch(self, method, *args, **kwargs):
# do some methods to go get data
try:
# use the interface to look up your class objects
# which you may or may not need
_class, url = self.methods[method]
except KeyError as e:
raise ValueError(f"Got unsupported method, expected "
f"{'\n'.join(self.methods)}") from e
headers = kwargs.pop('headers', {})
# I'm not sure the actual interface here, maybe you call the
# url to get metadata to populate the class with first...
req = requests.Request(_class.http_method, url+self.id, auth=self.key, headers=headers).prepare()
resp = self.session.send(req)
# this will raise the 401 ahead of time
resp.raise_for_status()
# maybe your object uses metadata from the response
params = resp.json()
# return the business object only if the user should see it
return _class(*args, **kwargs, **params)
class Purchase:
http_method = 'GET'
def __init__(self, *args, **kwargs):
# do some setup here with your params passed by the json
# from the api
user = APIUser("token", "key") # this is my user session
some_purchase = user.purchase(2) # will raise a 401 Unauthorized error from the requests session
admin = APIUser("admintoken", "adminkey") # admin session
some_purchase = admin.purchase(2)
# returns a purchase object
some_purchase.amount
There are a few reasons why you might want to go this way:
You don't get the object back if you aren't allowed to see it
Now the rest interface is in control of who sees what, and that's implicitly tied to the user object itself, without every other class needing to be aware of what's going on
You can change your url's in one place (if you need to)
Your business objects are just business objects, they don't need to do anything else
By separating out what your objects actually are, you still only need to pass the api keys and tokens once, to the User class. The Interface is bound on the instance, still giving you the flexibility of multiple users within the same script.
You also get the models you call on explicitly. If you try to take a model, you have to call it, and that's when the Interface can enforce your authentication. You no longer need your authentication to be enforced by your business objects

Python Celery single base class instance for all tasks

I have a tasks.py that contains a subclass of Task.
According to the docs the base class is instantiated only once per tasks.
But this is only true for same tasks method. Calling a different task creates a new instance. So I can't access sessions via get_sessions created with create_session. How may I have only a single instance that is shared between different tasks?
class AuthentificationTask(Task):
connections = {}
def login(self, user, password, server):
if not user in self.connections:
self.connections = {user: ServerConnection(verbose=True)}
# from celery.contrib import rdb
# rdb.set_trace()
self.connections[user].login(user=user, password=password, server=server)
#task(bind=True, max_retries=1, queue='test', base=AuthentificationTask)
def create_session(self, user, password, server):
self.login(user, password, server)
#task(bind=True, max_retries=1, queue='test', base=AuthentificationTask)
def get_sessions(self, user, password, server):
return self.connections[user].sessions
Set the task_cls arg for your Celery application like this:
class AuthentificationTask(Task):
def example(self):
logger.info('AuthentificationTask.example() method was called')
#celery.task(bind=True)
def test_my_task(self):
# call AuthentificationTask.example
self.example()
app = celery.Celery(
__name__,
broker='redis://localhost:6379/0',
task_cls=AuthentificationTask,
# other args
)
In this case will be use your custom class for all tasks as default.
Seems this was an issue on my site caused by reinitialising self.connections each time.
self.connections = {user: ServerConnection(verbose=True)}
In further tests base was instantiated only once for all (different) tasks. Thanks #Danila Ganchar for suggesting an alternative approach. I will give it a try!
You're on the right track by making connections a class variable on AuthentificationTask. That makes it available as a property on the class itself (i.e. as AuthentificationTask.connections). When you reference self.connections in the login method, I believe Python is looking for an instance variable connections, not the class variable of the same name. For the desired behavior, replace self.connections (in both login and get_sessions) with AuthentificationTask.connections.

Accessing parent class variable from an inner class?

I am working with locust and I am working in mimicking the behavior of a user. However I am getting trouble accessing the parent class variable. Any idea how I can pass it?
class User(TaskSet):
some_user = ''
def on_start(self):
self.get_user()
def get_user(self):
some_user = self.client.get...#gets user
#task
class UpdatingUser(TaskSet):
def updating(self):
path = "/posts/" + User.some_user
By the time I get to User.some_user I never have the user.
You've not provided all of the code, but the problem may be that get_user() is setting some_user as an instance attribute somewhere, as in self.some_user = foo.
This will only set some_user for that specific instance of User however (so for Bob, Lisa, Beto, User53, etc.), but not for the User class itself. When accessing some_user with self, as in self.some_user, you set it for the specific instance that's executing those statements, not the class. In updating() you're accessing the class attribute User.some_user, not a specific instance attribute like usr53.some_user. In order to update the class attribute, invariant by default for all instances of User, you ought to be setting it with User.some_user = foo in get_user().
Right now in path = "/posts/" + User.some_user, it's trying to access the class attribute which may never have been set. Because nested classes like UpdatingUser can't access the instances of the nesting class (User) that they're called from, UpdatingUser won't be able to access any some_user set with self or any other instance attributes of User. So the solution would be to have get_user() set the class attribute instead of the instance attribute as described in the previous paragraph.
This answer is a bit late but, if anyone has this issue, the TaskSet has a parent property, which can be used to access the parent's instance variables. The following is what I used for a basic one-time login:
class UserBehaviour(TaskSet):
def on_start(self):
self.token = self.login()
self.headers = {'Authorization': 'Bearer ' + self.token}
def login(self):
with self.client.post("/login", catch_response = True) as response:
return response.json()['token']
#task
class UserTask1(TaskSet):
#task
def get_data(self):
self.client.get("/data", headers = self.parent.headers)
class WebsiteUser(HttpLocust):
task_set = UserBehaviour

How to use a custom __init__ of an app engine Python model class properly?

I'm trying to implement a delayed blog post deletion scheme. So instead of an annoying Are you sure?, you get a 2 minute time frame to cancel deletion.
I want to track What will be deleted When with a db.Model class (DeleteQueueItem), as I found no way to delete a task from the queue and suspect I can query what's there.
Creating a DeleteQueueItem entity should automatically set a delete_when property and add a task to the queue. I use the relative path of blog posts as their key_name and want to use that as key_name here, too. This led me to a custom init:
class DeleteQueueItem(db.Model):
"""Model to keep track of items that will be deleted via task queue."""
# URL path to the blog post is handled as key_name
delete_when = db.DateTimeProperty()
def __init__(self, **kwargs):
delay = 120 # Seconds
t = datetime.timedelta(seconds=delay)
deadline = datetime.datetime.now() - t
key_name = kwargs.get('key_name')
db.Model.__init__(self, **kwargs)
self.delete_when = deadline
taskqueue.add(url='/admin/task/delete_page',
countdown=delay,
params={'path': key_name})
This seems to work, until I try to delete the entity:
fetched_item = models.DeleteQueueItem.get_by_key_name(path)
This fails with:
TypeError: __init__() takes exactly 1 non-keyword argument (2 given)
What am I doing wrong?
Generally, you shouldn't try and override the init method of Model classes. While it's possible to get right, the correct constructor behaviour is fairly complex, and may even change between releases, breaking your code (though we try to avoid doing so!). Part of the reason for this is that the constructor has to be used both by your own code, to construct new models, and by the framework, to reconstitute models loaded from the datastore.
A better approach is to use a factory method, which you call instead of the constructor.
Also, you probably want to add the task at the same time as you write the entity, rather than at creation time. If you don't, you end up with a race condition: the task may execute before you've stored the new entity to the datastore!
Here's a suggested refactoring:
class DeleteQueueItem(db.Model):
"""Model to keep track of items that will be deleted via task queue."""
# URL path to the blog post is handled as key_name
delete_when = db.DateTimeProperty()
#classmethod
def new(cls, key_name):
delay = 120 # Seconds
t = datetime.timedelta(seconds=delay)
deadline = datetime.datetime.now() - t
return cls(key_name=key_name, delete_when=deadline)
def put(self, **kwargs):
def _tx():
taskqueue.add(url='/admin/task/delete_page',
countdown=delay,
params={'path': key_name},
transactional=True)
return super(DeleteQueueItem, self).put(**kwargs)
if not self.is_saved():
return db.run_in_transaction(_tx)
else:
return super(DeleteQueueItem, self).put(**kwargs)

Categories

Resources