Python/SQLite: smooth way to set column names in CREATE TABLE - python

I'm building up a table in Python's SQLite module which will consist of 18 columns at least. These columns are named by times (for example "08-00"), all stored in a list called 'time_range'. I want to avoid writing up all 18 table names by hand in the SQL statement, since all this already exists inside the mentioned list and it would make the code quite ugly. Howver, this:
marks = '?,'*18
self.c.execute('''CREATE TABLE finishedJobs (%s)''' % marks, tuple(time_range))
did not work. As it seems, Python/SQLite does not accept parameters at this place. Is there any smart workaround for my purpose or do I really have to name every single column in a CREATE TABLE statement by hand?

Related

Pyspark join conditions using dictionary values for keys

I'm working on a script that tests the contents of some newly generated tables against production tables. The newly generated tables may or may not have the same column names and may have multiple columns that have to be used in join conditions. I'm attempting to write out a function with the needed keys being passed using a dictionary.
something like this:
def check_subset_rel(self, remote_df, local_df, keys):
join_conditions = []
for key in keys:
join_conditions.append(local_df.key['local_key']==remote_df.key['remote_key'])
missing_subset_df = local_df.join(remote_df, join_conditions, 'leftanti')
pyspark/python doesn't like the dictionary usage in local_df.key['local_key'] and remote_df.key['remote_key']. I get a "'DataFrame' object has no attribute 'key'" error. I'm pretty sure that it's expecting the actual name of the column instead of any variable, but I'm not sure if I can make that conversation between value and column name.
Does anyone know how I could go about this?

Inserting python list into SQLite cell [duplicate]

I have a list/array of strings:
l = ['jack','jill','bob']
Now I need to create a table in slite3 for python using which I can insert this array into a column called "Names". I do not want multiple rows with each name in each row. I want a single row which contains the array exactly as shown above and I want to be able to retrieve it in exactly the same format. How can I insert an array as an element in a db? What am I supposed to declare as the data type of the array while creating the db itself? Like:
c.execute("CREATE TABLE names(id text, names ??)")
How do I insert values too? Like:
c.execute("INSERT INTO names VALUES(?,?)",(id,l))
EDIT: I am being so foolish. I just realized that I can have multiple entries for the id and use a query to extract all relevant names. Thanks anyway!
You can store an array in a single string field, if you somehow genereate a string representation of it, e.g. sing the pickle module. Then, when you read the line, you can unpickle it. Pickle converts many different complex objects (but not all) into a string, that the object can be restored of. But: that is most likely not what you want to do (you wont be able to do anything with the data in the tabel, except selecting the lines and then unpickle the array. You wont be able to search.
If you want to have anything of varying length (or fixed length, but many instances of similiar things), you would not want to put that in a column or multiple columns. Thing vertically, not horizontally there, meaning: don't thing about columns, think about rows. For storing a vector with any amount of components, a table is a good tool.
It is a little difficult to explain from the little detail you give, but you should think about creating a second table and putting all the names there for every row of your first table. You'd need some key in your first table, that you can use for your second table, too:
c.execute("CREATE TABLE first_table(int id, varchar(255) text, additional fields)")
c.execute("CREATE TABLE names_table(int id, int num, varchar(255) name)")
With this you can still store whatever information you have except the names in first_table and store the array of names in names_table, just use the same id as in first_table and num to store the index positions inside the array. You can then later get back the array by doing someting like
SELECT name FROM names_table
WHERE id=?
ORDER BY num
to read the array of names for any of your rows in first_table.
That's a pretty normal way to store arrays in a DB.
This is not the way to go. You should consider creating another table for names with foreign key to names.
You could pickle/marshal/json your array and store it as binary/varchar/jsonfield in your database.
Something like:
import json
names = ['jack','jill','bill']
snames = json.dumps(names)
c.execute("INSERT INTO nametable " + snames + ";")

How to update all object columns in SqlAlchemy?

I have a table of Users(more than 15 columns) and sometimes I need to completely update all the user attributes.For xample, I want to replace
user_in_db = session.query(Users).filter_by(user_twitter_iduser.user_twitter_id).first()
with some other object.
I have found the following solution :
session.query(User).filter_by(id=123).update({"name": user.name})
but I fell that writing all 15+ attributes is error-prone and there should exist a simpler solution.
You can write:
session.query(User).filter_by(id=123).update({column: getattr(user, column) for column in User.__table__.columns.keys()})
This will iterate over the columns of the User model (table) and it'll dynamically create a dictionary with the necessary keys and values.

appending non-unique rows to another database using python

Hey all,
I have two databases. One with 145000 rows and approx. 12 columns. I have another database with around 40000 rows and 5 columns. I am trying to compare based on two columns values. For example if in CSV#1 column 1 says 100-199 and column two says Main St(meaning that this row is contained within the 100 block of main street), how would I go about comparing that with a similar two columns in CSV#2. I need to compare every row in CSV#1 to each single row in CSV#2. If there is a match I need to append the 5 columns of each matching row to the end of the row of CSV#2. Thus CSV#2's number of columns will grow significantly and have repeat entries, doesnt matter how the columns are ordered. Any advice on how to compare two columns with another two columns in a separate database and then iterate across all rows. I've been using python and the import csv so far with the rest of the work, but this part of the problem has me stumped.
Thanks in advance
-John
A csv file is NOT a database. A csv file is just rows of text-chunks; a proper database (like PostgreSQL or Mysql or SQL Server or SQLite or many others) gives you proper data types and table joins and indexes and row iteration and proper handling of multiple matches and many other things which you really don't want to rewrite from scratch.
How is it supposed to know that Address("100-199")==Address("Main Street")? You will have to come up with some sort of knowledge-base which transforms each bit of text into a canonical address or address-range which you can then compare; see Where is a good Address Parser but be aware that it deals with singular addresses (not address ranges).
Edit:
Thanks to Sven; if you were using a real database, you could do something like
SELECT
User.firstname, User.lastname, User.account, Order.placed, Order.fulfilled
FROM
User
INNER JOIN Order ON
User.streetnumber=Order.streetnumber
AND User.streetname=Order.streetname
if streetnumber and streetname are exact matches; otherwise you still need to consider point #2 above.

sqlite3 and cursor.description

When using the sqlite3 module in python, all elements of cursor.description except the column names are set to None, so this tuple cannot be used to find the column types for a query result (unlike other DB-API compliant modules). Is the only way to get the types of the columns to use pragma table_info(table_name).fetchall() to get a description of the table, store it in memory, and then match the column names from cursor.description to that overall table description?
No, it's not the only way. Alternatively, you can also fetch one row, iterate over it, and inspect the individual column Python objects and types. Unless the value is None (in which case the SQL field is NULL), this should give you a fairly precise indication what the database column type was.
sqlite3 only uses sqlite3_column_decltype and sqlite3_column_type in one place, each, and neither are accessible to the Python application - so their is no "direct" way that you may have been looking for.
I haven't tried this in Python, but you could try something like
SELECT *
FROM sqlite_master
WHERE type = 'table';
which contains the DDL CREATE statement used to create the table. By parsing the DDL you can get the column type info, such as it is. Remember that SQLITE is rather vague and unrestrictive when it comes to column data types.

Categories

Resources