load data into table in Mysql

load data into table in Mysql - python

I created 4 columns for a table
cur.execute("""CREATE TABLE videoinfo (
id INT UNSIGNED PRIMARY KEY AUTO INCREMENT,
date DATETIME NOT NULL,
src_ip CHAR(32),
hash CHAR(150));
""")
I have a .txt file which has three columns of data inside. I want to use LOAD DATA LOCAL INFILEcommand to insert data, but the problem is ,the table I created now has four columns, the first one is the id, so, can mysql automatically insert data from the second column or extra command is needed?
Many thanks!

AUTO INCREMENT isn't valid syntax. If you check MySQL's documentation for the CREATE TABLE statement, you'll see the proper keyword is AUTO_INCREMENT.
Additionally, date is a keyword, so you'll need to quote it with backticks, as mentioned on the MySQL identifier documentation page. The documentation also lists all keywords, which must be quoted to use them as identifiers. To be safe, you could simply quote all identifiers.
To insert data only into some columns, you can explicitly specify columns. For LOAD DATA INFILE:
LOAD DATA INFILE 'file_name'
INTO TABLE videoinfo
(`date`, src_ip, hash)
For the INSERT statement:
INSERT INTO videoinfo (`date`, src_ip, hash)
VALUES (...);
This, too, is revealed in the MySQL manual. Notice a pattern?

Related

How to avoid explicit casting NULL during INSERT in Postgresql

I am writing python scripts to sychronize tables from a MSSQL database to a Postgresql DB. The original author tends to use super wide tables with a lot of regional consecutive NULL holes in them.
For insertion speed, I serialized the records in bulk to string in the following form before execute()
INSERT INTO A( {col_list} )
SELECT * FROM ( VALUES (row_1), (row_2),...) B( {col_list} )
During the row serialization, its not possbile to determin the data type of NULL or None in python. This makes the job complicated. All NULL values in timestamp columns, integer columns etc need explicit type cast into proper types, or Pg complains about it.
Currently I am checking the DB API connection.description property and compare column type_code, for every column and add type casting like ::timestamp as needed.
But this feels cumbersome, with the extra work: the driver already converted the data from text to proper python data type, now I have to redo it for column with those many Nones.
Is there any better way to work around this with elegancy & simplicity ?

If you don't need the SELECT, go with #Nick's answer.
If you need it (like with a CTE to use the input rows multiple times), there are workarounds depending on the details of your use case.
Example, when working with complete rows:
INSERT INTO A -- complete rows
SELECT * FROM (
VALUES ((NULL::A).*), (row_1), (row_2), ...
) B
OFFSET 1;
{col_list} is optional noise in this particular case, since we need to provide complete rows anyway.
Detailed explanation:
Casting NULL type when updating multiple rows

Instead of inserting from a SELECT, you can attach a VALUES clause directly to the INSERT, i.e.:
INSERT INTO A ({col_list})
VALUES (row_1), (row_2), ...
When you insert from a query, Postgres examines the query in isolation when trying to infer the column types, and then tries to coerce them to match the target table (only to find out that it can't).
When you insert directly from a VALUES list, it knows about the target table when performing the type inference, and can then assume that any untyped NULL matches the corresponding column.

You could try to create json from data and then rowset from json using json_populate_record(..).
postgres=# create table js_test (id int4, dat timestamp, val text);
CREATE TABLE
postgres=# insert into js_test
postgres-# select (json_populate_record(null::js_test,
postgres(# json_object(array['id', 'dat', 'val'], array['5', null, 'test']))).*;
INSERT 0 1
postgres=# select * from js_test;
id | dat | val
----+-----+------
5 | | test
You can use json_populate_recordset(..) to do the same with multiple rows in one go. You just pass json value that is array of json. Make sure it isn't array of json.
So this is OK: '[{"id":1,"dat":null,"val":6},{"id":3,"val":"tst"}]'::json
This is not: array['{"id":1,"dat":null,"val":6}'::json,'{"id":3,"val":"tst"}'::json]
select *
from json_populate_recordset(null::js_test,
'[{"id":1,"dat":null,"val":6},{"id":3,"val":"tst"}]')

Python pyodbc is not recognizing parameter marker. I've supplied 2 markers and two values but its not recognizing them

Any idea why the code below would not recognize the first placeholder? I'm assuming I have to put a special character in front of it but i've been unable to find any documentation around it. I've also tried just a simple "create table ?" with no success.
for champ in champion_list:
UPDATE_SQL = """\
if not exists (select * from sysobjects where name=? and xtype='U')
CREATE TABLE [dbo].[?](
[champId] [varchar](50) NOT NULL,
[championName] [varchar] NOT NULL,
[version] [varchar](50) NOT NULL
) ON [PRIMARY]
"""
values=(champ,champ)
try:
cursorprod.execute(UPDATE_SQL, values)
print str(champ),'table added.'
except Exception as e:
print(e)
I get the error
The SQL contains 1 parameter markers, but 2 parameters were supplied

Query parameters are for specifying column values in DML statements; they cannot be used to specify object (e.g., column or table) names in DDL statements. You will need to use dynamic SQL (string substitution) for that ...
... assuming that you really want to create separate tables for each item in the list. If the structure of those tables is identical then that is a bad design. You'd be better served with one table that includes an extra column to identify the list item associated with each row.

Google BigQuery API, How to set the field type of destinationTable?

We are using BigQuery's Python API, specifically the jobs resource, to run a query on an existing BigQuery table, and to export the results by inserting the resulting dataset in a new BigQuery table (destinationTable).
Is there a way to also update the schema of the newly created table and set a specific datatype? By default, all fields are set to a 'string' type, but we need one of the fields to be 'timestamp'.

In order to set the field types of the destination table you need to CAST to the new type in your query, as the result set describes the new field type in the destination table.
You need to use simple CAST functions to have numbers/dates like.
SELECT TIMESTAMP(t) AS t FROM (SELECT "2015-01-01 00:00:00" t)
Recently it was introduced the "unflatten" feature for record types, so you can now transfer the whole record to another table while preserving the RECORD structure - for that you'd need to set a destination table (and the desired write disposition), set allowLargeResults =TRUE, and then set Flatten Results= FALSE (see the last post in here where it is explained). Then you can run a query like this to transfer the whole record to the dest table:
SELECT cell.* FROM publicdata:samples.trigrams LIMIT 0;
I'm using tables from the publicdata:samples dataset which is also available to you, so you can run these tests, too. In the above query 'cell' is a record, and if you set Flatten Results=FALSE, you'll see that 'cell' is still a RECORD in your dest table.
You can remove some fields from your record when transferring data to the dest table. Here is the query that demonstrates this (again, you'd need to run it with Flatten Results=FALSE):
SELECT cell.value, cell.volume_count FROM publicdata:samples.trigrams
LIMIT 0;
After you run the above query, the 'cell' record will only contain the fields you specified.
You can rename an existing field within a record when transferring data to the dest table:
SELECT cell.value AS cell.newvalue FROM publicdata:samples.trigrams
LIMIT 0;
Unfortunately, currently there is no way to add a field to a record, e.g., the following query will create 'url' outside of both 'actor_attributes' and 'repository' records.
SELECT
actor_attributes.blog,
repository.created_at,
repository.url AS actor_attributes.url
FROM publicdata:samples.github_nested
LIMIT 0;
So in order to add a field to a record, you'd need to export your data, process it outside of BigQuery, and then load it back with the new schema.

The field types of the destination table will be automatically set. If you need to transform a string to an integer or timestamp, do so in the query.
This will create a destination table with one column (string):
SELECT x FROM (SELECT "1" x)
This will create a destination table with one column (integer):
SELECT INTEGER(x) AS x FROM (SELECT "1" x)
This will create a destination table with one column (timestamp):
SELECT TIMESTAMP(x) AS x FROM (SELECT "2015-10-21 04:29:00" x)

sqlite3 prints without explicit command

I'm trying to get a rowid if a data row exists. What I have now is
row_id = self.dbc.cursor.execute("SELECT ROWID FROM Names where unq_id=?",(namesrow['unq_id'],)).fetchall()[0][0]
where namesrow is a dictionary of column names with corresponding data to fill into the table. The problem is this prints 'unq_id' when runs and I'm not sure how to get rid of it.
I'm using sqlite3 and python. Any help's appreciated!

quoting the sqlite documentation:
With one exception noted below, if a rowid table has a primary key
that consists of a single column and the declared type of that column
is "INTEGER" in any mixture of upper and lower case, then the column
becomes an alias for the rowid.
So if your unq_id is the integer primary key in this table, then rowid and unq_id will be the same field.

creating blank field and receving the INTEGER PRIMARY KEY with sqlite, python

I am using sqlite with python. When i insert into table A i need to feed it an ID from table B. So what i wanted to do is insert default data into B, grab the id (which is auto increment) and use it in table A. Whats the best way receive the key from the table i just inserted into?

As Christian said, sqlite3_last_insert_rowid() is what you want... but that's the C level API, and you're using the Python DB-API bindings for SQLite.
It looks like the cursor method lastrowid will do what you want (search for 'lastrowid' in the documentation for more information). Insert your row with cursor.execute( ... ), then do something like lastid = cursor.lastrowid to check the last ID inserted.
That you say you need "an" ID worries me, though... it doesn't matter which ID you have? Unless you are using the data just inserted into B for something, in which case you need that row ID, your database structure is seriously screwed up if you just need any old row ID for table B.

Check out sqlite3_last_insert_rowid() -- it's probably what you're looking for:
Each entry in an SQLite table has a
unique 64-bit signed integer key
called the "rowid". The rowid is
always available as an undeclared
column named ROWID, OID, or _ROWID_ as
long as those names are not also used
by explicitly declared columns. If the
table has a column of type INTEGER
PRIMARY KEY then that column is
another alias for the rowid.
This routine returns the rowid of the
most recent successful INSERT into the
database from the database connection
in the first argument. If no
successful INSERTs have ever occurred
on that database connection, zero is
returned.
Hope it helps! (More info on ROWID is available here and here.)

Simply use:
SELECT last_insert_rowid();
However, if you have multiple connections writing to the database, you might not get back the key that you expect.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

load data into table in Mysql - python

Related

How to avoid explicit casting NULL during INSERT in Postgresql

Python pyodbc is not recognizing parameter marker. I've supplied 2 markers and two values but its not recognizing them

Google BigQuery API, How to set the field type of destinationTable?

sqlite3 prints without explicit command

creating blank field and receving the INTEGER PRIMARY KEY with sqlite, python

Categories

Resources