I want to store an enum in the database, according to this.
So let's say, I have a Gender enum and a Person model. I want to do a select like Person.select().where(Person.gender == Gender.MALE)
This can be achieved by creating a GenderField in Person as described here. But the gender won't be in the database as a table, and I want the Person to have foreign keys to the Gender table.
So how can I store static Gender data in the database, then query the Person table by the enum values?
You can, as part of the table creation process, populate the gender table:
class Gender(Model):
label = CharField(primary_key=True)
class Meta:
database = db
def create_schema():
# Use an execution context to ensure the connection is closed when
# we're finished.
with db.execution_context():
db.create_tables([Gender, ...], True)
if not Gender.select().exists():
defaults = ['MALE', 'FEMALE', 'UNSPECIFIED']
Gender.insert_many([{'label': label} for label in defaults]).execute()
The only reason to make Gender a table in this example, though, would be if you plan to dynamically add new Gender labels at run-time. If the set of values is fixed, my opinion is you're better off doing something like this:
class Person(Model):
GENDER_MALE = 'M'
GENDER_FEMALE = 'F'
GENDER_UNSPECIFIED = 'U'
name = CharField()
gender = CharField()
# Then, simply:
Person.create(name='Huey', gender=Person.GENDER_MALE)
Person.create(name='Zaizee', gender=Person.GENDER_UNSPECIFIED)
Related
I set up the column names in the class like below:
class Stat1(Base):
__tablename__ = 'stat1'
__table_args__ = {'sqlite_autoincrement': True}
id = Column(VARCHAR, primary_key=True, nullable=False)
Date_and_Time = Column(VARCHAR)
IP_Address = Column(VARCHAR)
Visitor_Label = Column(VARCHAR)
Browser = Column(VARCHAR)
Version = Column(VARCHAR)
The csv file does not use the UNDERSCORE in the column names. It is a csv file downloaded from the internet. For instance, when I import the column names headers like "Date_and_Time" are imported as "Date and Time".
I had assumed (that's wrong, right?) that the CSV's column name would map to the class column headers I set up but that's not happening and the queries are not running properly because of it. I am getting messages like this:
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such
column: stat1.Date_and_Time [SQL: 'SELECT stat1.id AS stat1_id,
stat1."Date_and_Time" AS "stat1_Date_and_Time", stat1."IP_Address" AS
"stat1_IP_Address"...etc.
Is there a way to map these automatically so that queries are successful? Or a way to change the CSV's column headings automatically to insert an UNDERSCORE in the column headings to match with the columns defined in the Class?
There are a couple of different ways that you can approach this:
Implement Your Own De-serialization Logic
This means that the process of reading your CSV file and mapping its columns to your Base model class' attributes is done manually (as in your question), and then you read / map your CSV using your own custom code.
I think, in this scenario, having underscores in your model class attributes (Stat1.Date_and_Time) but not in your CSV header (...,"Date and Time",...) will complicate your code a bit. However, depending on how you've implemented your mapping code you can set your Column to use one model attribute name (Stat1.Date_and_Time)
and a different database column name (e.g. have Stat1.Date_and_Time map to your database column "Date and Time"). To accomplish this, you need to pass the name argument as below:
class Stat1(Base):
__tablename__ = 'stat1'
__table_args__ = { 'sqlite_autoincrement': True }
id = Column(name = 'id', type_ = VARCHAR, primary_key = True, nullable = False)
Date_and_Time = Column(name = 'Date and Time', type_ = VARCHAR)
IP_Address = Column(name = 'IP Address', type_ = VARCHAR)
# etc.
Now when you read records from your CSV file, you will need to load them into the appropriate model attributes in your Stat1 class. A pseudo-code example would be:
id, date_and_time, ip_address = read_csv_record(csv_record)
# Let's assume the "read_csv_record()" function reads your CSV record and returns
# the appropriate value for `id`, `Date_And_Time`, and `IP_Address`
my_record = Stat1(id = id,
Date_And_Time = date_and_time,
ip_address
# etc.)
Here, the trick is in implementing your read_csv_record() function so that it reads and returns the column values for your model attributes, so that you can then pass them appropriately to your Stat1() constructor.
Use SQLAthanor
An (I think easier) alternative to implementing your own de-serialization solution is to use a library like SQLAthanor (full disclosure: I'm the library's author, so I'm a bit biased). Using SQLAthanor, you can either:
Create your Stat model class programmatically:
from sqlathanor import generate_model_from_csv
Stat1 = generate_model_from_csv('your_csv_file.csv',
'stat1',
primary_key = 'id')
Please note, however, that if your column header names are not ANSI SQL standard column names (if they contain spaces, for example), this will likely produce an error.
Define your model, and then create instances from your CSV.
To do this, you would define your model very similarly to how you do above:
from sqlathanor import BaseModel
class Stat1(BaseModel):
__tablename__ = 'stat1'
__table_args__ = { 'sqlite_autoincrement': True }
id = Column(name = 'id', type_ = VARCHAR, primary_key = True, nullable = False, supports_csv = True, csv_sequence = 1)
Date_and_Time = Column(name = 'Date and Time', type_ = VARCHAR, supports_csv = True, csv_sequence = 2)
IP_Address = Column(name = 'IP Address', type_ = VARCHAR, supports_csv = True, csv_sequence = 3)
# etc.
The supports_csv argument tells your Stat1 class that model attribute Stat1.id can be de-serialized from (and serialized to) CSV, and the csv_sequence argument indicates that it will always be the first column in a CSV record.
Now you can create a new Stat1 instance (a record in your database) by passing your CSV record to Stat1.new_from_csv():
# let's assume you have loaded a single CSV record into a variable "csv_record"
my_record = Stat1.new_from_csv(csv_record)
and that's it! Now your my_record variable will contain an object representation of your CSV record, which you can then commit to the database if and when you choose. Since there is a wide variety of ways that CSV files can be constructed (using different delimiters, wrapping strategies, etc.) there are a large number of configuration arguments that can be supplied to .new_from_csv(), but you can find all of them documented here: https://sqlathanor.readthedocs.io/en/latest/using.html#new-from-csv
SQLAthanor is an extremely robust library for moving data into / out of CSV and SQLAlchemy, so I strongly recommend you review the documentation. Here are the important links:
Github Repo
Comprehensive Documentation
PyPi
Hope this helps!
I have this query that joins multiple tables together:
select
p.player_id
, d.player_data_1
, l.year
, l.league
, s.stat_1
, l.stat_1_league_average
from
stats s
inner join players p on p.player_id = s.player_id
left join player_data d on d.other_player_id = p.other_player_id
left join league_averages as l on l.year = s.year and l.league = s.year
where
p.player_id = 123
My models look like this:
class Stats(models.Model):
player_id = models.ForeignKey(Player)
stat_1 = models.IntegerField()
year = models.IntegerField()
league = models.IntegerField()
class Player(models.Model):
player_id = models.IntegerField(primary_key=True)
other_player_id = models.ForeignKey(PlayerData)
class PlayerData(models.Model):
other_player_id = models.IntegerField(primary_key=True)
player_data_1 = models.TextField()
class LeagueAverages(models.Model):
year = models.IntegerField()
league = models.IntegerField()
stat_1_league_average = models.DecimalField()
I can do something like this:
Stats.objects.filter(player_id=123).select_related('player')
to do the first join. For the second join, I tried:
Stats.objects.filter(player_id=123).select_related('player').select_related('player_data')
but I got this error:
django.core.exceptions.FieldError: Invalid field name(s) given in select_related: 'player_data'. Choices are: player
How would I do the third join considering that year and league aren't foreign keys in any of the tables? Thanks!
select_related(*fields) Returns a QuerySet that will “follow” foreign-key relationships, [...]
According to the django documentation select_related follows foreign-key relationships. player_data is neighter a foreign key, nor even an field of Stats. If you'd want to INNER join PlayerData and Player you could follow its foreign-keys. In your case use the
double-underscore to get to PlayerData:
Stats.objects.all()
.select_related('player_id')
.select_related('player_id__other_player_id')
As for joining LeagueAverages: There is not a way to join models without an appropriate foreign key, but to use raw sql. Have a look at a related question: Django JOIN query without foreign key. By using .raw(), your LEFT join (which by the way is also not that easy without using raw: Django Custom Left Outer Join) could also be taken care of.
Quick notes about your models:
Each model by default has an automatically incrementing primary key that can be accessed via .id or .pk. So there is no need to add for example player_id
A models.ForeignKey field references an object not it's id. Therefore it's more intuitive to rename for example player_id to player. If you name your field player django allows you automatically to access it's id via player_id
How can I update a tables columns and column data types in PeeWee?
I have already created the table Person in the database from my model. But I've now added some new fields to the model and changed the type of certain existing fields/columns.
The following doesn't update the table structure:
psql_db = PostgresqlExtDatabase(
'MyDB',
user='foo',
password='bar',
host='',
port='5432',
register_hstore=False
)
class PsqlModel(Model):
"""A base model that will use our Postgresql database"""
class Meta:
database = psql_db
class Person(PsqlModel):
name = CharField()
birthday = DateField() # New field
is_relative = BooleanField() # Field type changed from varchar to bool
def __str__(self):
return '%s, %s, %s' % (self.name, self.birthday, self.is_relative)
psql_db.connect()
# is there a function to update/change the models table columns??
psql_db.create_tables([Person], True) # Hoping an update of the table columns occurs
# Error because no column birthday and incorrect type for is_relative
grandma_glen = Person.create(name='Glen', birthday=date(1966,1,12), is_relative=True)
From the documentation: http://docs.peewee-orm.com/en/latest/peewee/example.html?highlight=alter
Adding fields after the table has been created will required you to
either drop the table and re-create it or manually add the columns
using an ALTER TABLE query.
Alternatively, you can use the schema migrations extension to alter
your database schema using Python.
From http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#migrate:
# Postgres example:
my_db = PostgresqlDatabase(...)
migrator = PostgresqlMigrator(my_db)
title_field = CharField(default='')
status_field = IntegerField(null=True)
migrate(
migrator.add_column('some_table', 'title', title_field),
migrator.rename_column('some_table', 'pub_date', 'publish_date'),
migrator.add_column('some_table', 'status', status_field),
migrator.drop_column('some_table', 'old_column'),
)
And a lot many other operations are possible.
So, first you will need to alter the table schema, and then, you can update your model to reflect those changes.
I have a peewee model like the following:
class Parrot(Model):
is_alive = BooleanField()
bought = DateField()
color = CharField()
name = CharField()
id = IntegerField()
I get this data from the user and look for the corresponding id in the (MySQL) database. What I want to do now is to update those attributes which are not set/empty at the moment. For example, if the new data has the following attributes:
is_alive = True
bought = '1965-03-14'
color = None
name = 'norwegian'
id = 17
and the data from the database has:
is_alive = False
bought = None
color = 'blue'
name = ''
id = 17
I would like to update the bought date and the name (which are not set or empty), but without changing the is_alive status. In this case, I could get the new and old data in separate class instances, manually create a list of attributes and compare them one for one, updating where necessary, and finally saving the result to the database. However, I feel there might be a better way for handling this, which could also be used for any class with any attributes. Is there?
MySQL Solution:
UPDATE my_table SET
bought = ( case when bought is NULL OR bought = '' ) then ? end )
, name = ( case when name is NULL OR name = '' ) then ? end )
-- include other field values if any, here
WHERE
id = ?
Use your scripting language to set the parameter values.
In case of the parameters matching the old values, then update will not be performed, by default.
I am a relative newcomer to SQLAlchemy and have read the basic docs. I'm currently following Mike Driscoll's MediaLocker tutorial and modifying/extending it for my own purpose.
I have three tables (loans, people, cards). Card to Loan and Person to Loan are both one-to-many relationships and modelled as such:
from sqlalchemy import Table, Column, DateTime, Integer, ForeignKey, Unicode
from sqlalchemy.orm import backref, relation
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine("sqlite:///cardsys.db", echo=True)
DeclarativeBase = declarative_base(engine)
metadata = DeclarativeBase.metadata
class Loan(DeclarativeBase):
"""
Loan model
"""
__tablename__ = "loans"
id = Column(Integer, primary_key=True)
card_id = Column(Unicode, ForeignKey("cards.id"))
person_id = Column(Unicode, ForeignKey("people.id"))
date_issued = Column(DateTime)
date_due = Column(DateTime)
date_returned = Column(DateTime)
issue_reason = Column(Unicode(50))
person = relation("Person", backref="loans", cascade_backrefs=False)
card = relation("Card", backref="loans", cascade_backrefs=False)
class Card(DeclarativeBase):
"""
Card model
"""
__tablename__ = "cards"
id = Column(Unicode(50), primary_key=True)
active = Column(Boolean)
class Person(DeclarativeBase):
"""
Person model
"""
__tablename__ = "people"
id = Column(Unicode(50), primary_key=True)
fname = Column(Unicode(50))
sname = Column(Unicode(50))
When I try to create a new loan (using the below method in my controller) it works fine for unique cards and people, but once I try to add a second loan for a particular person or card it gives me a "non-unique" error. Obviously it's not unique, that's the point, but I thought SQLAlchemy would take care of the behind-the-scenes stuff for me, and add the correct existing person or card id as the FK in the new loan, rather than trying to create new person and card records. Is it up to me to query to the db to check PK uniqueness and handle this manually? I got the impression this should be something SQLAlchemy might be able to handle automatically?
def addLoan(session, data):
loan = Loan()
loan.date_due = data["loan"]["date_due"]
loan.date_issued = data["loan"]["date_issued"]
loan.issue_reason = data["loan"]["issue_reason"]
person = Person()
person.id = data["person"]["id"]
person.fname = data["person"]["fname"]
person.sname = data["person"]["sname"]
loan.person = person
card = Card()
card.id = data["card"]["id"]
loan.card = card
session.add(loan)
session.commit()
In the MediaLocker example new rows are created with an auto-increment PK (even for duplicates, not conforming to normalisation rules). I want to have a normalised database (even in a small project, just for best practise in learning) but can't find any examples online to study.
How can I achieve the above?
It's up to you to retrieve and assign the existing Person or Card object to the relationship before attempting to add a new one with a duplicate primary key. You can do this with a couple of small changes to your code.
def addLoan(session, data):
loan = Loan()
loan.date_due = data["loan"]["date_due"]
loan.date_issued = data["loan"]["date_issued"]
loan.issue_reason = data["loan"]["issue_reason"]
person = session.query(Person).get(data["person"]["id"])
if not person:
person = Person()
person.id = data["person"]["id"]
person.fname = data["person"]["fname"]
person.sname = data["person"]["sname"]
loan.person = person
card = session(Card).query.get(data["card"]["id"])
if not card:
card = Card()
card.id = data["card"]["id"]
loan.card = card
session.add(loan)
session.commit()
There are also some solutions for get_or_create functions, if you want to wrap it into one step.
If you're loading large numbers of records into a new database from scratch, and your query is more complex than a get (the session object is supposed to cache get lookups on its own), you could avoid the queries altogether at the cost of memory by adding each new Person and Card object to a temporary dict by ID, and retrieving the existing objects there instead of hitting the database.