What does the second parameter mean in dblite's open - python

As shown in the example in the link below, I am having trouble to figure out what the second parameter in open() is.. Can anyone tell me about this? Thank you
https://pypi.python.org/pypi/scrapy-dblite/0.2.5

The Storage() class constructor documents the second parameter as:
uri - URI to sqlite database, sqlite://<sqlite-database>:<table>
So you name the full path of the database file (sqlite stores a database in one file), and a table name for the items to be stored in.
If you use an absolute path, it should start with an extra slash:
sqlite:///some/path/to/database.db:foobar
will open /some/path/to/database.db (creating it if it doesn't yet exist), and use a table called foobar in that database (again, creating it if it doesn't yet exist).

Related

Why is my code producing a different SQLite file between my local installation and GitHub Action?

I have a Python module to handle my SQLite database, and this module provides two functions:
init_database() creates the database file with all the CREATE TABLE statements, it is executed if the database does not exist yet.
upgrade_database() updates the schema of the database (ALTER TABLE and other changes), it is executed when migrating from an older version of my program.
To ensure this critical part of the application works as expected, I wrote some tests for PyTest, to check that once these functions are executed, we get exactly some content.
My test to check the init part looks like this:
def test_init_database():
# PATH_DIRNAME is a constant with the path where the files are stored
db_path = f"{PATH_DIRNAME}/files/database.db"
if path.exists(db_path):
# Delete the file if it exists (happens if the test has already been run)
remove(db_path)
get_db(db_path).init_database()
with open(f"{PATH_DIRNAME}/files/database/database.sha256", "r") as file:
# The file may contain a blank line at the end, we just take the first one
# This is the hash sum we should get if the file is correctly generated
expected_sum = file.read().split("\n")[0]
with open(db_path, "rb") as file:
# Compute the SHA-256 hash sum of the generated database file, and compare it
assert hashlib.sha256(file.read()).hexdigest() == expected_sum
If I run this test locally, it passes with no problem. But if I run it on GitHub Action, it fails on the assertion, because the hash is different.
Then I've configured my GH Action workflow to upload the generated files in an artifact, so I could check them by myself, and it looks like there is a subtle difference between the file generated on my local environment and the one generated in my workflow:
$ xxd gha_workflow/database.db > gha_workflow.hex
$ xxd local/database.db > local.hex
$ diff local.hex gha_workflow.hex
7c7
< 00000060: 002e 5f1d 0d0f f800 0a0b ac00 0f6a 0fc7 .._..........j..
---
> 00000060: 002e 5f1c 0d0f f800 0a0b ac00 0f6a 0fc7 .._..........j..
Note as the fourth byte is different (1d vs 1c).
What could cause this difference? Am I doing my test wrong?
Depending on the sqlite version or build options your resulting database may vary in its format. For example the version and the page size can change and that may alter the physical format of your database. Depending on what you're trying to achieve, you may want to try to compare with a logical representation of your schema and content.
You can check these documentation pages for more information on the file format and the build options:
https://www.sqlite.org/compile.html
https://www.sqlite.org/formatchng.html
https://www.sqlite.org/fileformat2.html
Based on the advice given in comments, I think I have found a better way to test my database schema without depending on my file's metadata.
Instead of computing the SHA-256 sum, I get the table schema and compare it to the fields I expect to see if the database is well formed.
To get the schema, SQLite provides the following syntax:
PRAGMA table_info('my_table');
I can run it with Python like any regular SQL query, and it will return a list of tuples that contain, for each field:
the position
the field name
the field type
0 if it is nullable, 1 otherwise
the default value (or None if there isn't)
1 if it a primary key
Thanks for helping me to understand my mistake!

Python - pandas to_sql

I'm trying to use https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.to_sql.html
When I change the Name argument, e.g. say I set
pd.to_sql(name="testTable",constring)
the actual table name comes up as [UserName].[testTable] rather than just [testTable]
Is there a way I can get rid of the [userName]? which is linked to the user who runs the script?
The [UserName] portion of the table name is the schema that the table is in. I don't know which database engine you're using, but the schema you're looking for might be "dbo".
According to the documentation, you can provide a schema argument:
pd.to_sql(name="testTable",constring, schema="dbo")
Note that if the schema is left blank, it uses the DB user's default schema (as defined when the user was added to the database), which in your case, appears to be the schema of the user.

Should I use python classes for a MySQL database insert program?

I have created a database to store NGS sequencing results. It consists of 17 tables to store all of the information. The results are stored in spreadsheets which I parse data from and store in variables using python (2.7), and then use the python package mysqldb to insert data into the database. I mainly use functions to obtain the information i need in variables, then write a loop in which I call this function followed by a 'try:' statement to insert. Here is a simple example:
def sample_processer(file):
my_file = open(file, 'r+')
samples = []
for line in my_file:
...get info...
samples.append(line[0])
return(samples)
samples = sample_processor('path/to/file')
for sample in samples:
try:
sql = "samsql = "INSERT IGNORE INTO sample(sample_id, diagnosis, screening) VALUES ("
samsql = samsql + "'"+sample+"'," +sam_screen_dict.get(sample)+"')"
except e:
db.rollback()
print("Something went wrong inserting data into the sample table: %s" %(e))
*sam_screen_dict is a dictionary i made from another function.
This is a simple table that I upload into but many of them call of different dictionaries to make sure the correct results are uploaded. However I was wondering whether there would be a more robust way in which to do this using a class.
For example, my sample_id has an associated screening attribute in the sample table, so this is easy to do with one dictionary. I have more complex junction tables, such as the table in which the sample_id, experiment_id and found mutation are stored, alongside other data, would it be a good idea to create a class for this table, calling on a simple 'sample' class to inherit from? That way I would always know that the results being inserted will be for the correct sample/experiment etc.
Also, using classes could I write rules for each attribute so that if the source spreadsheet is for some reason incorrect, it will not insert into the database?
I.e: sample_id is in the format A123/16. Therefore using a class it will check that the first character is 'A' and that sample_id[-3] should always == '/'. I know I could write these into functions, but I feel like it would take up so much space and time writing so many 'if' statements, that if it is stored once in a class then this would be alot better.
Has anybody done anything similar using classes to pass through their variables to test that they are correct before it gets to the insert stage and an error is created?
I am new to python classes and understand the basics, still trying to get to grips with them so a point in the right direction would be great - as would any help on how to go about actually writing the code for a python class that would be used to make a more robust database insertion program.
17tables it means you may use about 17 classes.
Use a simple script. webpy.db
https://github.com/webpy/webpy/blob/master/web/db.py just modify few code.
Then you can use webpy api: http://webpy.org/docs/0.3/api#web.db to finish your job.
Hope it's useful for you

Specifying the full path to an entity with inheritance hierarchies in contains_eager()

I have a query like
(session.query(Root).with_polymorphic('*')
.outerjoin(Subclass.related1).options(contains_eager(Subclass.related1)))
So far things work.
I want also want to eagerly load Related1.related2 and I tried this:
(session.query(Root).with_polymorphic('*')
.outerjoin(Subclass.related1).options(contains_eager(Subclass.related1))
.outerjoin(Related1.related2).options(contains_eager(Related1.related2)))
But it doesn't work:
sqlalchemy.exc.ArgumentError: Can't find property 'related2' on any entity specified in this Query. Note the full path from root (Mapper|Root|root) to target entity must be specified.
Given that related1 is related to the root entity via a subclass I don't see how to specify the full path.
I also tried
(session.query(Root).with_polymorphic('*')
.outerjoin(Subclass.related1).options(contains_eager(Subclass.related1))
.outerjoin(Related1.related2).options(contains_eager('related1.related2')))
which predictably fails with
sqlalchemy.exc.ArgumentError: Can't find property named 'related1' on the mapped entity Mapper|Root|root in this Query.
How can I specify the full path to the indirectly-related entity in contains_eager()?
contains_eager needs a full path from the entities the query knows about:
contains_eager(Subclass.related1, Related1.related2)

web2py auto_import vs define_table

The documentation we can use auto_import if we "need access to the data but not to he web2py table attributes", but this code seems to use the table attributes just fine.
from gluon import DAL, Field
db = DAL('sqlite://storage.sqlite', auto_import=True)
for row in db(db.person).select():
print row.name
The table was defined in a previous run.
db = DAL('sqlite://storage.sqlite', auto_import=True)
db.define_table('person',
Field('name'))
db.person[0] = {'name' : 'dave'}
db.commit()
Doing both auto_import=True and the define_table gives an error about "invalid table name". Doing neither gives an error if I try to access db.table.
With auto_import=True, web2py will get the field names and types directly from the *.table files in the application's "databases" folder. When the documentation refers to "web2py table attributes" that will not be available, it is referring to attributes that are defined in the model (i.e., using db.define_table()) but not stored in the database or *.table files, such as "requires", "widget", "represent", etc. Those attributes are defined only in web2py code and therefore cannot be determined merely by reading the *.table files. Note, the *.table files are used for database migrations, so they only store metadata directly relevant to the database (i.e., field names and types, and database-level contraints, such as "notnull" and "unique"). Attributes like "requires" and "represent" are only used by web2py and have no effect on the database, so are not recorded in the *.table files.

Categories

Resources