Importing CSV into MySQL Database (Django Webapp) - python

I'm developing a webapp in Django, and for it's database I need to import a CSV file into a particular MySQL database.
I searched around a bit, and found many pages which listed how to do this, but I'm a bit confused.
Most pages say to do this:
LOAD DATA INFILE '<file>' INTO TABLE <tablenname>
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
But I'm confused how Django would interpret this, since we haven't mentioned any column names here.
I'm new to Django and even newer to databasing, so I don't really know how this would work out.

It looks like you are in the database admin (i.e. PostgreSQL/MySQL). Others above has given a good explanation for that.
But if you want to import data into Django itself -- Python has its own csv implementation, like so: import csv.
But if you're new to Django, then I recommend installing something like the Django CSV Importer: http://django-csv-importer.readthedocs.org/en/latest/index.html. (You install the add-ons into your Python library.)
The author, unfortunately, has a typo in the docs, though. You have to do from csvImporter.model import CsvDbModel, not from csv_importer.model import CsvDbModel.
In your models.py file, create something like:
class MyCSVModel(CsvDbModel):
pass
class Meta:
dbModel = Model_You_Want_To_Reference
delimiter = ","
has_header = True
Then, go into your Python shell and do the following command:
my_csv = MyCsvModel.import_data(data = open("my_csv_file_name.csv"))

This isn't Django code, and Django does not care what you call the columns in your CSV file. This is SQL you run directly against your database via the DB shell. You should look at the MySQL documentation for more details, but it will just take the columns in order as they are defined in the table.
If you want more control, you could write some Python code using the csv module to load and parse the file, then add it to the database via the Django ORM. But this will be much much slower than the SQL way.

It will likely just add the data to the columns in order, since they are omitted from your SQL statement.
If you want, you can add the fields to the end of the SQL:
LOAD DATA INFILE '<file>' INTO TABLE <tablenname>
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(#Field1, #Field2, #Field3) /* Fields in CSV */
SET Col1 = #Field1, Col2 = #Field2, Col3 = #Field3; /* Columns in DB */
More in-depth analysis of the LOAD DATA command at MySQL.com

The command is interpreted by MySQL, not Django. As stated in the manual:
By default, when no column list is provided at the end of the LOAD DATA INFILE statement, input lines are expected to contain a field for each table column. If you want to load only some of a table's columns, specify a column list:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);
You must also specify a column list if the order of the fields in the input file differs from the order of the columns in the table. Otherwise, MySQL cannot tell how to match input fields with table columns.

Related

How to save/copy value of table column not listed in csv column list in vertica copy command

I am using following code to copy data from csv to vertica table.
copy_command = 'COPY cb.table_format2 (ACC_NO, REF_NO, CUSTOMER_NAME, ADDRESS) FROM STDIN '\
'ENCLOSED BY \'"\' delimiter \',\' SKIP 1 '\
'exceptions \'' + file_path_exception + '\' rejected data \'' + file_path_rejected + '\';'
with open(file_path, "rb") as inf:
cur.copy(copy_command, inf)
I have another field named 'FileId' in vertica table and i want to populate it with value of my local variable, so that i can check later that which data is entered against which fileId, and in my future requirements, i also want to save created on date and user session id in other columns too.
Please let me know how can i do this? Either even this is possible or not?
If this can't be done then what are other ways to make sure which data is saved/copied against which time stamp and user session?
Thanks for help in advance.
Update:-
I am using python 3 and vertica_python module to connect to vertica from python.
You need to use AS to sql expressions to fields. You don't mention which python module you are using, but you can do it as a literal. Add in a new field entry that looks like:
FILEID AS 123456
And just get your string to look that way (probably using a formatter, or a bind value).

How to read a csv using sql

I would like to know how to read a csv file using sql. I would like to use group by and join other csv files together. How would i go about this in python.
example:
select * from csvfile.csv where name LIKE 'name%'
SQL code is executed by a database engine. Python does not directly understand or execute SQL statements.
While some SQL database store their data in csv-like files, almost all of them use more complicated file structures. Therefore, you're required to import each csv file into a separate table in the SQL database engine. You can then use Python to connect to the SQL engine and send it SQL statements (such as SELECT). The engine will perform the SQL, extract the results from its data files, and return them to your Python program.
The most common lightweight engine is SQLite.
littletable is a Python module I wrote for working with lists of objects as if they were database tables, but using a relational-like API, not actual SQL select statements. Tables in littletable can easily read and write from CSV files. One of the features I especially like is that every query from a littletable Table returns a new Table, so you don't have to learn different interfaces for Table vs. RecordSet, for instance. Tables are iterable like lists, but they can also be selected, indexed, joined, and pivoted - see the opening page of the docs.
# print a particular customer name
# (unique indexes will return a single item; non-unique
# indexes will return a Table of all matching items)
print(customers.by.id["0030"].name)
print(len(customers.by.zipcode["12345"]))
# print all items sold by the pound
for item in catalog.where(unitofmeas="LB"):
print(item.sku, item.descr)
# print all items that cost more than 10
for item in catalog.where(lambda o : o.unitprice>10):
print(item.sku, item.descr, item.unitprice)
# join tables to create queryable wishlists collection
wishlists = customers.join_on("id") + wishitems.join_on("custid") + catalog.join_on("sku")
# print all wishlist items with price > 10
bigticketitems = wishlists().where(lambda ob : ob.unitprice > 10)
for item in bigticketitems:
print(item)
Columns of Tables are inferred from the attributes of the objects added to the table. namedtuples are good also, as well as a types.SimpleNamespaces. You can insert dicts into a Table, and they will be converted to SimpleNamespaces.
littletable takes a little getting used to, but it sounds like you are already thinking along a similar line.
You can easily query an SQL Database using PHP script. PHP runs serverside, so all your code will have to be on a webserver (the one with the database). You could make a function to connect to the database like this:
$con= mysql_connect($hostname, $username, $password)
or die("An error has occured");
Then use the $con to accomplish other tasks such as looping through data and creating a table, or even adding rows and columns to an existing table.
EDIT: I noticed you said .CSV file. You can upload a CSV file into a SQL database and create a table out of it. If you are using a control panel service such as phpMyAdmin, you can simply import a CSV file into your database like this:
If you are looking for a free web host to test your SQL and PHP files on, check out x10 hosting.

Django: How do I get every table and all of that table's columns in a project?

I'm creating a set of SQL full database copy scripts using MySQL's INTO OUTFILE and LOAD DATA LOCAL INFILE.
Specifically:
SELECT {columns} FROM {table} INTO OUTFILE '{table}.csv'
LOAD DATA LOCAL INFILE '{table}.csv' REPLACE INTO {table} {columns}
Because of this, I don't need just the tables, I also need the columns for the tables.
I can get all of the tables and columns, but this doesn't include m2m tables:
from django.db.models import get_models()
for model in get_models():
table = model._meta.db_table
columns = [field.column for field in model._meta.fields]
I can also get all of the tables, but this doesn't give me access to the columns:
from django.db import connection
tables = connection.introspection.table_names()
How do you get every table and every corresponding column on that table for a Django project?
More details:
I'm doing this on a reasonably large dataset (>1GB) so using the flat file method seems to be the only reasonable way to make this large of a copy in MySQL. I already have the schema copied over (using ./manage.py syncdb --migrate) and the issue I'm having is specifically with copying the data, which requires me to have the tables and columns to create proper SQL statements. Also, the reason I can't use default column ordering is because the production database I'm copying from has different column ordering than what is created with a fresh syncdb (due to many months worth of migrations and schema changes).
Have you taken a look at manage.py ?
You can get boatloads of SQL information, for example to get all the create table syntax for an app within your project you can do:
python manage.py sqlall <appname>
If you type:
python manage.py help
You can see a ton of other features.
I dug in to the source to find this solution. I feel like there's probably a better way, but this does the trick.
This first block gets all of the normal (non-m2m) tables and their columns
from django.db import connection
from django.apps import apps
table_info = []
tables = connection.introspection.table_names()
seen_models = connection.introspection.installed_models(tables)
for model in apps.get_models():
if model._meta.proxy:
continue
table = model._meta.db_table
if table not in tables:
continue
columns = [field.column for field in model._meta.fields]
table_info.append((table, columns))
This next block was the tricky part. It gets all the m2m field tables and their columns.
for model in apps.get_models():
for field in model._meta.local_many_to_many:
if not field.creates_table:
continue
table = field.m2m_db_table()
if table not in tables:
continue
columns = ['id'] # They always have an id column
columns.append(field.m2m_column_name())
columns.append(field.m2m_reverse_name())
table_info.append((table, columns))
Have you looked into "manage.py dumpdata" and "manage.py loaddata"? They dump and load in json format. I use it to dump stuff from one site and overwrite another site's database. It doesn't have an "every database" option on dumpdata, but you can call it in a loop on the results of a "manage.py dbshell" command.

MySQL LOAD DATA LOCAL INFILE example in python?

I am looking for a syntax definition, example, sample code, wiki, etc. for
executing a LOAD DATA LOCAL INFILE command from python.
I believe I can use mysqlimport as well if that is available, so any feedback (and code snippet) on which is the better route, is welcome. A Google search is not turning up much in the way of current info
The goal in either case is the same: Automate loading hundreds of files with a known naming convention & date structure, into a single MySQL table.
David
Well, using python's MySQLdb, I use this:
connection = MySQLdb.Connect(host='**', user='**', passwd='**', db='**')
cursor = connection.cursor()
query = "LOAD DATA INFILE '/path/to/my/file' INTO TABLE sometable FIELDS TERMINATED BY ';' ENCLOSED BY '\"' ESCAPED BY '\\\\'"
cursor.execute( query )
connection.commit()
replacing the host/user/passwd/db as appropriate for your needs. This is based on the MySQL docs here, The exact LOAD DATA INFILE statement would depend on your specific requirements etc (note the FIELDS TERMINATED BY, ENCLOSED BY, and ESCAPED BY statements will be specific to the type of file you are trying to read in).
You can also get the results for the import by adding the following lines after your query:
results = connection.info()

What's the most efficient way to insert thousands of records into a table (MySQL, Python, Django)

I have a database table with a unique string field and a couple of integer fields. The string field is usually 10-100 characters long.
Once every minute or so I have the following scenario: I receive a list of 2-10 thousand tuples corresponding to the table's record structure, e.g.
[("hello", 3, 4), ("cat", 5, 3), ...]
I need to insert all these tuples to the table (assume I verified neither of these strings appear in the database). For clarification, I'm using InnoDB, and I have an auto-incremental primary key for this table, the string is not the PK.
My code currently iterates through this list, for each tuple creates a Python module object with the appropriate values, and calls ".save()", something like so:
#transaction.commit_on_success
def save_data_elements(input_list):
for (s, i1, i2) in input_list:
entry = DataElement(string=s, number1=i1, number2=i2)
entry.save()
This code is currently one of the performance bottlenecks in my system, so I'm looking for ways to optimize it.
For example, I could generate SQL codes each containing an INSERT command for 100 tuples ("hard-coded" into the SQL) and execute it, but I don't know if it will improve anything.
Do you have any suggestion to optimize such a process?
Thanks
You can write the rows to a file in the format
"field1", "field2", .. and then use LOAD DATA to load them
data = '\n'.join(','.join('"%s"' % field for field in row) for row in data)
f= open('data.txt', 'w')
f.write(data)
f.close()
Then execute this:
LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;
Reference
For MySQL specifically, the fastest way to load data is using LOAD DATA INFILE, so if you could convert the data into the format that expects, it'll probably be the fastest way to get it into the table.
If you don't LOAD DATA INFILE as some of the other suggestions mention, two things you can do to speed up your inserts are :
Use prepared statements - this cuts out the overhead of parsing the SQL for every insert
Do all of your inserts in a single transaction - this would require using a DB engine that supports transactions (like InnoDB)
If you can do a hand-rolled INSERT statement, then that's the way I'd go. A single INSERT statement with multiple value clauses is much much faster than lots of individual INSERT statements.
Regardless of the insert method, you will want to use the InnoDB engine for maximum read/write concurrency. MyISAM will lock the entire table for the duration of the insert whereas InnoDB (under most circumstances) will only lock the affected rows, allowing SELECT statements to proceed.
what format do you receive? if it is a file, you can do some sort of bulk load: http://www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html
This is unrelated to the actual load of data into the DB, but...
If providing a "The data is loading... The load will be done shortly" type of message to the user is an option, then you can run the INSERTs or LOAD DATA asynchronously in a different thread.
Just something else to consider.
I donot know the exact details, but u can use json style data representation and use it as fixtures or something. I saw something similar on Django Video Workshop by Douglas Napoleone. See the videos at http://www.linux-magazine.com/online/news/django_video_workshop. and http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1. Hope this one helps.
Hope you can work it out. I just started learning django, so I can just point you to resources.

Categories

Resources