What metadata can actually go into a scrapy.Field object? - python

I was reviewing the docs for Items in Scrapy today, and came across the followoing line:
Field objects are used to specify metadata for each field...You can
specify any kind of metadata for each field. There is no restriction
on the values accepted by Field objects.
Within the docs however, it seems like the only kind of "metadata" passed to the Field objects are functions (in this example a serializer) or input/output processors.
So I went into Python and tried to make the following Item:
class ScrapyPracticeItem(scrapy.Item):
name = scrapy.Field()
age = scrapy.Field('color':'purple')
But this was not accepted syntax either.
I am confused now -- could anyone give me a better definition of what they mean by metadata? Do they only mean transformations of the data in the item? Could it contain more information?

A field object is simply an alias for the standard python dictionary. This is the actual description in the scrapy API
class scrapy.Field([arg])¶
The Field class is just an alias to the built-in dict class and doesn’t provide any extra functionality or attributes. In other words, Field objects are plain-old Python dicts. A separate class is used to support the item declaration syntax based on class attributes.
So anything that can be used as a dictionary value can be assigned to a scrapy field already without using any parameters in its constructor. The following example shows how you can create a color
class MyItem(scrapy.Item):
color = scrapy.Field()
age = scrapy.Field()
When they say you can set the metadata, the field is the metadata for the item that you are setting. the option to add a serializer isn't actually directly handled by the Field, instead it is handled by the Item object or its MetaClass.
This is the actual source code for the scrapy.Field class:
class Field(dict):
"""Container of field metadata"""
# that is it
All data processing and name assignment is taken care of by one of scrapy's custom metaclasses.
Scrapy is intentionally structured and borrows a lot of its methodology from django framework. The Item class and it's associated metaclass are designed to work similar to djangos Model class which it uses for communicating with a storage backend, usually a database.
However because scrapy items can be extracted and used in innumerable ways the Item class allows much more flexibility than its django counterpart, so there really are no limitations on what can be considered metadata or what can be stored in an Item class.

Related

Django: Are Django models dataclasses?

Can we say that Django models are considered dataclasses? I don't see #dataclass annotation on them or on their base class model.Models. However, we do treat them like dataclasses because they don't have constructors and we can create new objects by naming their arguments, for example MyDjangoModel(arg1= ..., arg2=...).
On the other hand, Django models also don't have init methods (constructors) or inherit from NamedTuple class.
What happens under the hood that I create new Django model objects?
A lot of the magic that happens with models, if not nearly all of it, is from its base meta class.
This can be found in django.db.models.ModelBase specifically in the __new__ function.
Regardless of an __init__ method being defined or not (which actually, it is as per Abdul's comment), doesn't mean it can or should be considered a dataclass.
As described very eloquently in this SO post by someone else;
What are data classes and how are they different from common classes?
Despite django models quite clearly and apparently seeming to have some kind of data stored in them, the models are more like an easy to use (and reuse) set of functions which leverage a database backend, which is where the real state of an object is stored, the model just gives access to it.
It's also worth noting that models don't store data, but simply retrieves it.
Take for example this simple model:
class Person(models.Model):
name = models.CharField()
And then we did something like this in a shell:
person = Person.objects.get(...)
print(person.name)
When we access the attribute, django is actually asking the database for the information and this generates a query to get the value.
The value isn't ACTUALLY stored on the model object itself.
With that in mind, inherently, django models ARE NOT dataclasses. They are plain old regular classes.
Django does not work with data classes. You can define a custom model field. But likely this will take some development work.

How to do typing in django model field?

With the recent addition of inlay types for python in VS Code I noticed that the typing for a Field will look like this:
As you can see it is Field[Unknown, Unknown]. And as you know if you don't provide a type to a field, you won't get attribute hints for the field, and the field will be shown as Unknown.
You could just provide an str type if you for example have a CharField, something like this:
field: str = models.CharField()
The problem is, if you want to use a strongly typed linter - it will show you an error, that the assigned value is not of type str.
So I saw this inlay and I started playing around with this generic, and I noticed that the second parameter of the generic will be the type used to represent the field attribute:
My question is, does anyone know what is the first parameter of the generic used for, and where is this generic even created, because inside of the package I see that the Field class does not inherit any Generics.
Django does not allow mutating fields since a change to a field of a model would lead to a database migration.
Nevertheless under the hood many fields use the same types and are basically replaceable. I.e. ImageField just stores a path to a string similar to what i CharField could do. Allthough the inner representation of data, or how the data is stored in the field might be different for some fields.
Still all of the fields come with a huge functionality and are usually deeply embedded and wired into the framework. Therefore django model fields are not generic. I guess your IDE is doing something, ... not appropriate :)
In the documentation you can find additional information on fields. Here you can find a list of all built-in fields.
edit:
I was thinking some more about the issue. almost all fields, I believe all of them, extend the models.Field class. So this might be the reason your IDE is doing this. Some polymorphic thingy has been activated in the code generator or something.
I think that the best way to type a field to get attribute hints (methods and attributes) of the value of the field and don't get linting errors when using strong or basic typing would be to use Union and do something like this:
username: str|CharField = CharField()

Django model polymorphism without Multi-Table Inheritance and additional JOIN

I’m quite new to Django and I’m trying to implement polymorphism inside a Django model, but I can’t see how to do. Before going on I have to say I’ve already tried django-model-utils and django-polymorphism, but they don’t do exactly what I’m looking for.
I have a model called Player, each player has a Role and each Role has different behaviours (i.e. their methods return different values):
class Player(models.Model):
username=models.TextField()
role=models.ForeignKey(Role) #Role is another model with a field called ’name'
def allow_action(self)
#some stuff
class RoleA():
def allow_action(self):
#some specific stuff
class RoleB():
pass
I want that every time I retrieve any instance of Player (in example through Player.objects.filter(…)) every instances has the allow_action() method overwritten by the custom one defined inside the specific class (RoleA, RoleB, etc…) or use the default method provided in Player if the related subclass has no method called with the same name (RoleA, RoleB, etc... are the same role name stored in Player.role.name).
CONSTRAINTS:
Since subclasses (RolaA, RoleB, etc…) do not add new field but only overwrite methods all data have to be stored inside Player’s table, so I don’t want to use Django Multi-Table Inheritance but something more similar to Proxies.
I don’t want to perform additional JOIN to determine specific subclass type since all informations needed are stored inside Player’s table.
I think that this is a standard polymorphism pattern but I don’t see how to implement it in Django using the same table for all players (I've already implemented this polymorphism but not linked to a Django model). I’ve seen Django has a kind of inheritance called “Proxy” but it doesn’t allow to make queries like Player.objects.filter(…) and get instances with method overwritten by custom ones (or at least this is what I understood).
Thanks in advance.
Disclaimer: I've not used django-polymorphic, and this code is based on 5 minutes spent scanning the docs and is entirely untested but I'll interested to see if it works:
from polymorphic import PolymorphicModel
class Role(PolymorphicModel):
name = models.CharField()
class RoleA(Role):
def allow_action(self):
# Some specific stuff...
class RoleB(Role):
pass
class Player(models.Model):
username=models.TextField()
role=models.ForeignKey(Role) #Role is another model with a field called ’name'
def allow_action(self)
if callable(getattr(self.role, "allow_action", None):
self.role.allow_action()
else:
# default action...
Now I believe you should be able to create an instance of Role, RoleA, or RoleB and have Player point to it in the foreign key. Calling allow_action() on an instance of Player will check to see if the instance of Role (or RoleA, RoleB etc) has a callable attribute allow_action() and if so, it will use that, otherwise it will use the default.

Following backreferences of unknown kinds in NDB

I'm in the process of writing my first RESTful web service atop GAE and the Python 2.7 runtime; I've started out using Guido's shiny new ndb API.
However, I'm unsure how to solve a particular case without the implicit back-reference feature of the original db API. If the user-agent requests a particular resource and those resources 1 degree removed:
host/api/kind/id?depth=2
What's the best way to discover a related collection of entities from the "one" in a one-to-many relationship, given that the kind of the related entity is unknown at development time?
I'm unable to use a replacement query as described in a previous SO inquiry due to the latter restriction. The fact that my model is definable at runtime (and therefore isn't hardcoded) prevents me from using a query to filter properties for matching keys.
Ancestor and other kindless queries are also out due to the datastore limitation that prevents me from filtering on a property without the kind specified.
Thus far, the only idea I've had (beyond reverting to the db api) is to use a cross-group transaction to write my own reference on the "one", either by updating an ndb.StringProperty(repeat=True) containing all the related kinds when an entity of a new kind is introduced or by simply maintaining a list of keys on the "one" ndb.KeyProperty(repeat=True) every time a related "many" entity is written to the datastore.
I'm hoping someone more experienced than myself can suggest a better approach.
Given jmort253's suggestion, I'll try to augment my question with a concrete example adapted from the docs:
class Contact(ndb.Expando):
""" The One """
# basic info
name = ndb.StringProperty()
birth_day = ndb.DateProperty()
# If I were using db, a collection called 'phone_numbers' would be implicitly
# created here. I could use this property to retrieve related phone numbers
# when this entity was queried. Since NDB lacks this feature, the service
# will neither have a reference to query nor the means to know the
# relationship exists in the first place since it cannot be hard-coded. The
# data model is extensible and user-defined at runtime; most relationships
# will be described only in the data, and must be discoverable by the server.
# In this case, when Contact is queried, I need a way to retrieve the
# collection of phone numbers.
# Company info.
company_title = ndb.StringProperty()
company_name = ndb.StringProperty()
company_description = ndb.StringProperty()
company_address = ndb.PostalAddressProperty()
class PhoneNumber(ndb.Expando):
""" The Many """
# no collection_name='phone_numbers' equivalent exists for the key property
contact = ndb.KeyProperty(kind='Contact')
number = ndb.PhoneNumberProperty()
Interesting question! So basically you want to look at the Contact class and find out if there is some other model class that has a KeyProperty referencing it; in this example PhoneNumber (but there could be many).
I think the solution is to ask your users to explicitly add this link when the PhoneNumber class is created.
You can make this easy for your users by giving them a subclass of KeyProperty that takes care of this; e.g.
class LinkedKeyProperty(ndb.KeyProperty):
def _fix_up(self, cls, code_name):
super(LinkedKeyProperty, self)._fix_up(cls, code_name)
modelclass = ndb.Model._kind_map[self._kind]
collection_name = '%s_ref_%s_to_%s' % (cls.__name__,
code_name,
modelclass.__name__)
setattr(modelclass, collection_name, (cls, self))
Exactly how you pick the name for the collection and the value to store there is up to you; just put something there that makes it easy for you to follow the link back. The example would create a new attribute on Contact:
Contact.PhoneNumber_ref_contact_to_Contact == (PhoneNumber, PhoneNumber.contact)
[edited to make the code working and to add an example. :-) ]
Sound like a good use case for ndb.StructuredProperty.

Django's Model fields are defined on the class level?

Maybe my question is little childish. A django model is typically defined like this:
class DummyModel(models.Model):
field1 = models.CharField()
field2 = models.CharField()
As per my understanding, field1 and field2 are defined on the class level instead of instance level. So different instances will share the same field value. How can this be possible considering a web application should be thread safe? Am I missing something in my python learning curve?
You are correct that normally attributes declared at the class level will be shared between instances. However, Django uses some clever code involving metaclasses to allow each instance to have different values. If you're interested in how this is possible, Marty Alchin's book Pro Django has a good explanation - or you could just read the code.
Think of the models you define as specifications. You specify the fields that you want, and when Django hands you back an instance, it has used your specifications to build you an entirely different object that looks the same.
For instance,
field1 = models.CharField()
When you assign a value to field1, such as 'I am a field', don't you think it's strange that you can assign a string to a field that is supposed to be a 'CharField'? But when you save that instance, everything still works?
Django looks at the CharField, says "this should be a string", and hands it off to you. When you save it, Django checks the value against the specification you've given, and saves it if it's valid.
This is a very simplistic view of course, but it should highlight the difference between defining a model, and the actual instance you get to work with.

Categories

Resources