Database field length is not enforced - python

I am using web2py (python) with sqlite3 database (test flowers database :) ). Here is the declaration of the table:
db.define_table('flower',
Field('code', type='string', length=4, required=True, unique=True),
Field('name', type='string', length=100, required=True),
Field('description', type='string', length=250, required=False),
Field('price', type='float', required=True),
Field('photo', 'upload'));
Which translates into correct SQL in sql.log:
CREATE TABLE flower(
id INTEGER PRIMARY KEY AUTOINCREMENT,
code CHAR(4),
name CHAR(200),
description CHAR(250),
price CHAR(5),
photo CHAR(512)
);
But when I insert a value of "code" field that's greater than 4 chars, it still inserts. I tried setting to CHAR(10) (simple test, I guess) with the same result.
>>>db.flower.insert(code="123456789999", name="flower2", description="test flower 2", price="5.00");
>>>1L;
The same problem applies to all field where I set the length. I also tried validation (although, I am not 100% on correct use of it). This is also within flower model flowers.py where the table is defined and follows table declaration:
db.flower.code.requires = [ IS_NOT_EMPTY(), IS_LENGTH(4), IS_NOT_IN_DB(db, 'flower.code')]
Documentation on this is here, but I can't find anything that's limiting SQLite3 or web2py length check of the string. I would expect to see an error on insert.
Would appreciate some help on this? What did I miss in the documentation? I used symphony2 with PHP and MySQL before and would expect similar behaviour here.

SQLite is not like other databases. For all (most) practical purposes columns are untyped and INSERTs will always succeed and not lose data or precision (meaning, you can INSERT a text value into a REAL field if you want).
The declared type of the column is used for a system called "type affinity", which is described here: https://www.sqlite.org/datatype3.html.
Once you get used to it, it's kind of fun -- but definitely not what you'd expect!
You have to perform length checking in your code before issuing the INSERT.

As already mentioned, SQLite does not enforce character field length declarations (see https://www.sqlite.org/faq.html#q9). Furthermore, the IS_LENGTH validator is only applied if you do the insert via a SQLFORM submission or via the .validate_and_insert method -- if you just use the .insert method, the validators stored in the requires attribute are not applied, so you will get no error.

Related

Is my code susceptible to SQL injection attack? [duplicate]

I have some code in Python that sets a char(80) value in an sqlite DB.
The string is obtained directly from the user through a text input field and sent back to the server with a POST method in a JSON structure.
On the server side I currently pass the string to a method calling the SQL UPDATE operation.
It works, but I'm aware it is not safe at all.
I expect that the client side is unsafe anyway, so any protection is to be put on the server side. What can I do to secure the UPDATE operation agains SQL injection ?
A function that would "quote" the text so that it can't confuse the SQL parser is what I'm looking for. I expect such function exist but couldn't find it.
Edit:
Here is my current code setting the char field name label:
def setLabel( self, userId, refId, label ):
self._db.cursor().execute( """
UPDATE items SET label = ? WHERE userId IS ? AND refId IS ?""", ( label, userId, refId) )
self._db.commit()
From the documentation:
con.execute("insert into person(firstname) values (?)", ("Joe",))
This escapes "Joe", so what you want is
con.execute("insert into person(firstname) values (?)", (firstname_from_client,))
The DB-API's .execute() supports parameter substitution which will take care of escaping for you, its mentioned near the top of the docs; http://docs.python.org/library/sqlite3.html above Never do this -- insecure.
Noooo... USE BIND VARIABLES! That's what they're there for. See this
Another name for the technique is parameterized sql (I think "bind variables" may be the name used with Oracle specifically).

SQLite: Why can't parameters be used to set an identifier?

I'm refactoring a little side project to use SQLite instead of a python data structure so that I can learn SQLite. The data structure I've been using is a list of dicts, where each dict's keys represent a menu item's properties. Ultimately, these keys should become columns in an SQLite table.
I first thought that I could create the table programmatically by creating a single-column table, iterating over the list of dictionary keys, and executing an ALTER TABLE, ADD COLUMN command like so:
# Various import statements and initializations
conn = sqlite3.connect(database_filename)
cursor = conn.cursor()
cursor.execute("CREATE TABLE menu_items (item_id text)")
# Here's the problem:
cursor.executemany("ALTER TABLE menu_items ADD COLUMN ? ?", [(key, type(value)) for key, value in menu_data[0].iteritems()])
After some more reading, I realized parameters cannot be used for identifiers, only for literal values. The PyMOTW on sqlite3 says
Query parameters can be used with select, insert, and update statements. They can appear in any part of the query where a literal value is legal.
Kreibich says on p. 135 of Using SQLite (ISBN 9780596521189):
Note, however, that parameters can only be used to replace literal
values, such as quoted strings or numeric values. Parameters
cannot be used in place of identifiers, such as table names or
column names. The following bit of SQL is invalid:
SELECT * FROM ?; -- INCORRECT: Cannot use a parameter as an identifier
I accept that positional or named parameters cannot be used in this way. Why can't they? Is there some general principle I'm missing?
Similar SO question:
Python sqlite3 string formatting
Identifiers are syntactically significant while variable values are not.
Identifiers need to be known at SQL compilation phase so that the compiled internal bytecode representation knows about the relevant tables, columns, indices and so on. Just changing one identifier in the SQL could result in a syntax error, or at least a completely different kind of bytecode program.
Literal values can be bound at runtime. Variables behave essentially the same in a compiled SQL program regardless of the values bound in them.
I don't know why, but every database I ever used has the same limitation.
I think it would be analogous to use a variable to hold the name of another variable. Most languages do not allow that, PHP being the only exception I know of.
Regardless of the technical reasons, dynamically choosing table/column names in SQL queries is a design smell, which is why most databases do not support it.
Think about it; if you were coding a menu in Python, would you dynamically create a class for each combination of menu items? Of course not; you'd have one Menu class that contains a list of menu items. It's similar in SQL too.
Most of the time, when people ask about dynamically choosing table names, it's because they've split up their data into different tables, like collection1, collection2, ... and use the name to select which collection to query from. This isn't a very good design; it requires the service to repeat the schema for each table, including indexes, constraints, permissions, etc, and also makes altering the schema harder (Need to add a field? Now you need to do it across hundreds of tables instead of one).
The correct way of designing the database would be to have a single collection table and add a collection_id column; instead of querying collection4, you'd add a WHERE collection_id = 4 constraint to your SELECT queries. Note that the 4 is now a value, and can be replaced with a query parameter.
For your case, I would use this schema:
CREATE TABLE menu_items (
item_id TEXT,
key TEXT,
value NONE,
PRIMARY KEY(item_id, key)
);
Use executemany to insert a row for each entry in the dictionary. When you need to load the dictionary, run a SELECT filtering on item_id and recreate the dictionary one row/entry at a time.
(Of course, as with everything in Software Engineering, there are exception. Tools that operate on schemas generically, such as ORMs, will need to specify table/column names dynamically.)

Why do I need to run the postgresql nextval function? And how to prevent it?

I just had an issue with Django and PostgreSQL that I don't understand.
I have a simple model, defined such as:
class MyModel(models.Model):
my_field = models.IntegerField()
my_other_field = models.TextField()
In my view, i have something similar to:
my_object = MyModel(my_field=1, my_other_field='blah')
my_object.save()
Everything was working fine, until this morning. I got this error:
IntegrityError at /my_url/
duplicate key value violates unique constraint "my_model_pkey"
DETAIL: Key (id)=(3) already exists.
CONTEXT: Remote SQL command: INSERT INTO public.my_model(id, my_field, my_other_field) VALUES ($1, $2, $3) RETURNING id
I had this error once, I know it is related to the way PostgreSQL syncs the sequential table associated with my model with the id column. I has to run this function in PostgreSQL until the id returned was greater than the biggest value of the id.
select nextval('my_model_id_seq'::regclass);
My question is: Why did this happen in the first place? And how to prevent it in the future ?
By the way, that's the only way I insert data into the table, I've never inserted data manually.
I hope the question is clear enough
I think the question is not "why is my sequence getting messed up" - rather it is "why is Django trying to supply a value for the id column when inserting a row, instead of allowing the database to insert the next value in the sequence".
The Django documentation describes the algorithm it uses to decide whether it should be doing an UPDATE or an INSERT when you call save().
This algorithm involves checking if the 'id' field of the object is already set to some value. If it is not, then it does an INSERT (presumably not specifying a value for the 'id' field). If it is set, then it first tries to do an UPDATE; if that does not result in an updated record, then it will do an INSERT (this time presumably it would specify a value for the 'id' field).
As pointed out in Erwin's answer, the error message which you seeing indicates it is trying to insert a row while specifying the value for the 'id' field.
I note that it appears this algorithm has changed in version 1.6 of Django. Previously it used a SELECT first to see if a record existed, then an UPDATE if it did or an INSERT if it did not. If your problem has started occurring since upgrading, then that could be a cause. The documentation notes:
There are some rare cases where the database doesn’t report that a row
was updated even if the database contains a row for the object’s
primary key value. An example is the PostgreSQL ON UPDATE trigger
which returns NULL. In such cases it is possible to revert to the old
algorithm by setting the select_on_save option to True.
If this were happening for you, then it would explain your symptoms: the error would actually be occurring when trying to update a value in the database, and django would erroneously think that the row did not exist and then try to create it.
You could check for this by setting 'select_on_save' to true to revert to the old behavior.
Another possible reason for this would be if your code inadvertently set the 'id' attribute on an object to some value, and then called save(). This could cause various problems, depending on whether the value already existed in the database or not. In particular, it might result in creating a row which has an 'id' value which is ahead of the current range of the sequence associated with the column, so that later on you would get errors trying to insert into the row.
Another possible reason could be using the 'force_insert' argument to save(), on a row which had previously loaded from the database (so that it was actually an existing row you should be updating).
The root of the problem lies here (SQL command from your error message):
INSERT INTO public.my_model(id, my_field, my_other_field)
VALUES ($1, $2, $3)
RETURNING id
Since your id column seems to be a serial type, do not insert values manually. Let the default draw from the sequence automatically. Should be:
INSERT INTO public.my_model(my_field, my_other_field)
VALUES ($1, $2)
RETURNING id;
That's the whole point of adding RETURNING id to begin with: to return the newly generated id. If you pass in a value yourself, you wouldn't need to have it returned.
Fix
If the sequence got out of sync somehow, because manual entries conflict with the numbers from nextval(), run this query once:
SELECT setval('my_model_id_seq', max(id)) FROM my_model;
This sets the sequence to the current maximum. Next call is next number, no off-by-one error.

Django get_or_create raises Duplicate entry for key Primary with defaults

Help! Can't figure this out! I'm getting a Integrity error on get_or_create even with a defaults parameter set.
Here's how the model looks stripped down.
class Example(models.Model):model
user = models.ForeignKey(User)
text = models.TextField()
def __unicode__(self):
return "Example"
I run this in Django:
def create_example_model(user, textJson):
defaults = {text: textJson.get("text", "undefined")}
model, created = models.Example.objects.get_or_create(
user=user,
id=textJson.get("id", None),
defaults=defaults)
if not created:
model.text = textJson.get("text", "undefined")
model.save()
return model
I'm getting an error on the get_or_create line:
IntegrityError: (1062, "Duplicate entry '3020' for key 'PRIMARY'")
It's live so I can't really tell what the input is.
Help? There's actually a defaults set, so it's not like, this problem where they do not have a defaults. Plus it doesn't have together-unique. Django : get_or_create Raises duplicate entry with together_unique
I'm using python 2.6, and mysql.
You shouldn't be setting the id for objects in general, you have to be careful when doing that.
Have you checked to see the value for 'id' that you are putting into the database?
If that doesn't fix your issue then it may be a database issue, for PostgreSQL there is a special sequence used to increment the ID's and sometimes this does not get incremented. Something like the following:
SELECT setval('tablename_id_seq', (SELECT MAX(id) + 1 FROM
tablename_id_seq));
get_or_create() will try to create a new object if it can't find one that is an exact match to the arguments you pass in.
So is what I'm assuming is happening is that a different user has made an object with the id of 3020. Since there is no object with the user/id combo you're requesting, it tries to make a new object with that combo, but fails because a different user has already created an item with the id of 3020.
Hopefully that makes sense. See what the following returns. Might give a little insight as to what has gone on.
models.Example.objects.get(id=3020)
You might need to make 3020 a string in the lookup. I'm assuming a string is coming back from your textJson.get() method.
One common but little documented cause for get_or_create() fails is corrupted database indexes.
Django depends on the assumption that there is only one record for given identifier, and this is in turn enforced using UNIQUE index on this particular field in the database. But indexes are constantly being rewritten and they may get corrupted e.g. when the database crashes unexpectedly. In such case the index may no longer return information about an existing record, another record with the same field is added, and as result you'll be hitting the IntegrityError each time you try to get or create this particular record.
The solution is, at least in PostgreSQL, to REINDEX this particular index, but you first need to get rid of the duplicate rows programmatically.

AppEngine: Query datastore for records with <missing> value

I created a new property for my db model in the Google App Engine Datastore.
Old:
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
New:
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
is_approved = db.BooleanProperty(default=False)
How to query for the Logo records, which to not have the 'is_approved' value set?
I tried
logos.filter("is_approved = ", None)
but it didn't work.
In the Data Viewer the new field values are displayed as .
According to the App Engine documentation on Queries and Indexes, there is a distinction between entities that have no value for a property, and those that have a null value for it; and "Entities Without a Filtered Property Are Never Returned by a Query." So it is not possible to write a query for these old records.
A useful article is Updating Your Model's Schema, which says that the only currently-supported way to find entities missing some property is to examine all of them. The article has example code showing how to cycle through a large set of entities and update them.
A practice which helps us is to assign a "version" field on every Kind. This version is set on every record initially to 1. If a need like this comes up (to populate a new or existing field in a large dataset), the version field allows iteration through all the records containing "version = 1". By iterating through, setting either a "null" or another initial value to the new field, bump the version to 2, store the record, allows populating the new or existing field with a default value.
The benefit to the "version" field is that the selection process can continue to select against that lower version number (initially set to 1) over as many sessions or as much time is needed until ALL records are updated with the new field default value.
Maybe this has changed, but I am able to filter records based on null fields.
When I try the GQL query SELECT * FROM Contact WHERE demo=NULL, it returns only records for which the demo field is missing.
According to the doc http://code.google.com/appengine/docs/python/datastore/gqlreference.html:
The right-hand side of a comparison can be one of the following (as
appropriate for the property's data type): [...] a Boolean literal, as TRUE or
FALSE; the NULL literal, which represents the null value (None in
Python).
I'm not sure that "null" is the same as "missing" though : in my case, these fields already existed in my model but were not populated on creation. Maybe Federico you could let us know if the NULL query works in your specific case?

Categories

Resources