Related
I created an sqlite3 database using python to store data as shown in the code below
import sqlite3
conn = sqlite3.connect('tweets_data.sqlite')
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS tweets')
cur.execute('''
CREATE TABLE tweets (
id INTEGER PRIMARY KEY, created_at TEXT, full_text TEXT,
favourite_count INTEGER, retweet_count INTEGER)
''')
With this table, i want to store data from a JSON file (parts of the code are screenshotted and attached as images), which i have loaded as seen below
import json
with open('tweets.json') as f:
data = json.load(f)
After that, i tried inserting the data into the table using a for loop to pull out all unique tweet id and its following information. The code below is what i tried doing
for records in data:
cur.execute('INSERT INTO tweets (id, created_at, full_text, favourite_count, retweet_count) VALUES (?, ?, ?, ?, ?)',
(records['id'], records['created_at'], records['full_text'], records['user']['favourites_count'], records['retweet_count']))
conn.commit()
print(cur.fetchall())
conn.close()
However when i did a print (cur.fetchall()), the output was only an empty list. Nothing was inserted into the table. Thank you if anybody is able to help!
json file page 1
json file page 2
cur.fetchall() returns the result of the last query, and INSERT yields no result. You need a SELECT-query first:
cur.execute('SELECT * FROM tweets')
rows = cut.fetchall()
I'm brand new to postgreSQL or SQL at all.
I'm trying to create a table in a database via Python and then load data from a .csv file into the table.
My code looks like this:
import csv
import psycopg2
#Establish connection to database
con = psycopg2.connect(
host = "localhost",
database = "kundeavgang",
user = "postgres",
password = "postgres",
)
#Cursor
cur = con.cursor()
#If a mistake is made, start from scratch
cur.execute("DROP TABLE IF EXISTS kundeavgang")
#Create table
cur.execute('''
CREATE TABLE "kundeavgang"(
"customerID" TEXT,
"gender" TEXT,
"SeniorCitizen" TEXT,
"Partner" TEXT,
"Dependents" TEXT,
"tenure" INT,
"PhoneService" TEXT,
"MultipleLines" TEXT,
"InternetService" TEXT,
"OnlineSecurity" TEXT,
"DeviceProtection" TEXT,
"TechSupport" TEXT,
"StreamingMovies" TEXT,
"Contract" TEXT,
"PaperlessBilling" TEXT,
"PaymentMethod" TEXT,
"MonthlyCharges" FLOAT,
"TotalCharges" FLOAT,
"Churn" TEXT
)
''')
#Acsess .csv file
with open('kundeavgang.csv') as csvFile:
reader = csv.reader(csvFile)
skipHeader = next(reader) #Account for header
for row in reader:
customerID = row[0]
gender = row[1]
SeniorCitizen = row[2]
Partner = row[3]
Dependents = row[4]
tenure = row[5]
PhoneService = row[6]
MultipleLines = row[7]
InternetService = row[8]
OnlineSecurity = row[9]
OnlineBackup = row[10]
DeviceProtection = row[11]
TechSupport = row[12]
StreamingTV = [13]
StreamingMovies = row[14]
Contract = row[15]
PaperlessBilling = row[16]
PaymentMethod = row[17]
MonthlyCharges = row[18]
TotalCharges = row[19]
Churn = row[20]
cur.execute('''INSERT INTO kundeavgang(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''',(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn))
#Commit the transaction
con.commit()
#End connection
con.close()
In pgAdmin, the table comes up as existing in the database. However, I cannot find the actual table. Further, I have no idea about this line of code:
cur.execute('''INSERT INTO kundeavgang(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''',(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn))
What does all the %s stand for? I found it off an online example which was not very helpful, so I tried it without knowing what it means. I have seen some examples where question marks are inserted instead, but also this without explanation.
Lastly, as the code stands now, I get the error message:
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''',(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn))
IndexError: tuple index out of range
All help or explanations will be appreciated.
For bulk inserts from text files, consider copy_from or copy_expert of psycopg2. Also, be sure to commit your execution:
cur.execute("DROP TABLE IF EXISTS kundeavgang")
con.commit()
cur.execute('''CREATE TABLE "kundeavgang" ... ''')
con.commit()
with open('kundeavgang.csv') as csvFile:
next(csvFile) # SKIP HEADERS
cur.copy_from(csvFile, "kundeavgang", sep=",")
# POSTGRES COPY COMMAND FOR CSV MODE
# cur.copy_expert("""COPY "kundeavgang" FROM STDIN WITH CSV""", csvFile)
con.commit()
The %s are placeholders for the values that will be inserted and passed through the following tuple:
(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn)
The problem that your insert statement going to insert to 20 columns, you provide 20 values in your tuple but you have 22 placeholders (%s).
The problem is a mismatch between the number of columns to be populated and the length of the list provided. This is an easy mistake to make when dealing with a lot of columns. One way to reduce risk of error is to use the length of the columns or values list to build the statement.
cols = [name1, name2,...]
vals = [val1, val2, ...]
assert len(cols) == len(vals), 'mismatch between number of columns and number of values'
template = """INSERT INTO tbl ({}) VALUES ({})"""
stmt = template.format(', '.join(cols), ','.join(['%s'] * len(vals)))
cur.execute(stmt, vals)
Note that when building the column names dynamically it's good practice to quote them - psycopg2 provides tools for this.
Change the line reader = csv.reader(csvFile) to:
reader = csv.reader(csvFile, delimiter=';')
I have an Excel sheet that is to be inserted into a database. I wrote a Python script, which takes an Excel file, converts it into a CSV, and then inserts it to the database.
The problem is that the database contains two tables, where one of them has a unique ID which is auto generated and gets set when the data is inserted into the table. The other table uses this as a foreign key.
This is how my tables are created:
create table table (
id uuid DEFAULT uuid_generate_v4() PRIMARY KEY NOT NULL,
foo1 varchar(255),
foo2 varchar(255),
foo3 varchar(255),
foo4 varchar(255)
);
create table another_table (
id uuid PRIMARY KEY references table (id)
foo1 varchar(255),
foo2 varchar(255)
);
This is the code I use to insert the data into the database:
with open(csv_file, 'rb') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
next(reader)
for row in reader:
cur.execute(
"INSERT INTO table (foo1, foo2, foo3, foo4) VALUES (%s, %s, %s, %s); ",
"INSERT INTO another_table (foo1, foo2) VALUES (%s, %s),
row
)
conn.commit()
This will insert data into the database, but the ID field in another_table will be empty. Does anyone know how I can acquire this ID and put it into the second table?
I was able to solve this myself without doing much tweaks with my code. I had to solve another problem with my code where several values in the csv file where null values, but converting to csv made it look like it was empty strings instead. By using pandas I was able to set all null values to "None", and afterwards cleaning each row before inserting it into the database:
with open(csv_file, 'rb') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
next(reader)
for row in reader:
clean_row = []
for x in row:
if x == "None":
clean_row.append(None)
else:
clean_row.append(x)
cur.execute(
"INSERT INTO table (foo1, foo2, foo3, foo4) VALUES (%s, %s, %s, %s); ",
"INSERT INTO another_table (foo1, foo2) VALUES (%s, %s),
clean_row
)
conn.commit()
The values from the csv is now put into an array which I can use in my query to ask table for it' id, like this:
with open(csv_file, 'rb') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
next(reader)
for row in reader:
clean_row = []
for x in row:
if x == "None":
clean_row.append(None)
else:
clean_row.append(x)
cur.execute(
"INSERT INTO table (foo1, foo2, foo3, foo4) VALUES (%s, %s, %s, %s); ",
"INSERT INTO another_table (foo1, foo2, id) VALUES (%s, %s, SELECT id FROM table WHERE "
"foo1 = '" + clean_row[0] + "' AND foo2 = '" + clean_row[1] + "')),
clean_row
)
conn.commit()
This will acquire the ID and put it into another_table, and can be done as long as u have unique values in table.
I got stuck when I was trying to create database in Python using sqlite3. The below is what I did. When I tried to run, it kept telling me the tables already exist. I couldn't figure out why. Thanks!
import sqlite3
variables = (data)
functions = (data)
var_func = (data)
conn = sqlite3.connect('python_database')
c = conn.cursor()
#create table
c.execute(''' CREATE table variable_table (
id integer,
name text,
module text,
type text,
desc text) ''')
c.execute(''' CREATE table function_table (
id integer,
name text) ''')
c.execute(''' CREATE table var_func_table (
variable_id integer,
function_id integer,
type text) ''')
#fill tables with data
for row in variables:
c.execute ('insert into variable_table values (?,?,?,?,?)', row )
for row in functions:
c.execute ('insert into function_table values (?,?)', row)
for row in var_func:
c.execute ('insert into var_func_table values (?,?,?)', row)
# Save (commit) the change
conn.commit
conn.close
I would like to dump only one table but by the looks of it, there is no parameter for this.
I found this example of the dump but it is for all the tables in the DB:
# Convert file existing_db.db to SQL dump file dump.sql
import sqlite3, os
con = sqlite3.connect('existing_db.db')
with open('dump.sql', 'w') as f:
for line in con.iterdump():
f.write('%s\n' % line)
You can copy only the single table in an in memory db:
import sqlite3
def getTableDump(db_file, table_to_dump):
conn = sqlite3.connect(':memory:')
cu = conn.cursor()
cu.execute("attach database '" + db_file + "' as attached_db")
cu.execute("select sql from attached_db.sqlite_master "
"where type='table' and name='" + table_to_dump + "'")
sql_create_table = cu.fetchone()[0]
cu.execute(sql_create_table);
cu.execute("insert into " + table_to_dump +
" select * from attached_db." + table_to_dump)
conn.commit()
cu.execute("detach database attached_db")
return "\n".join(conn.iterdump())
TABLE_TO_DUMP = 'table_to_dump'
DB_FILE = 'db_file'
print getTableDump(DB_FILE, TABLE_TO_DUMP)
Pro:
Simplicity and reliability: you don't have to re-write any library method, and you are more assured that the code is compatible with future versions of the sqlite3 module.
Con:
You need to load the whole table in memory, which may or may not be a big deal depending on how big the table is, and how much memory is available.
Dump realization lies here http://coverage.livinglogic.de/Lib/sqlite3/dump.py.html (local path: PythonPath/Lib/sqlite3/dump.py)
You can modify it a little:
# Mimic the sqlite3 console shell's .dump command
# Author: Paul Kippes <kippesp#gmail.com>
def _iterdump(connection, table_name):
"""
Returns an iterator to the dump of the database in an SQL text format.
Used to produce an SQL dump of the database. Useful to save an in-memory
database for later restoration. This function should not be called
directly but instead called from the Connection method, iterdump().
"""
cu = connection.cursor()
table_name = table_name
yield('BEGIN TRANSACTION;')
# sqlite_master table contains the SQL CREATE statements for the database.
q = """
SELECT name, type, sql
FROM sqlite_master
WHERE sql NOT NULL AND
type == 'table' AND
name == :table_name
"""
schema_res = cu.execute(q, {'table_name': table_name})
for table_name, type, sql in schema_res.fetchall():
if table_name == 'sqlite_sequence':
yield('DELETE FROM sqlite_sequence;')
elif table_name == 'sqlite_stat1':
yield('ANALYZE sqlite_master;')
elif table_name.startswith('sqlite_'):
continue
else:
yield('%s;' % sql)
# Build the insert statement for each row of the current table
res = cu.execute("PRAGMA table_info('%s')" % table_name)
column_names = [str(table_info[1]) for table_info in res.fetchall()]
q = "SELECT 'INSERT INTO \"%(tbl_name)s\" VALUES("
q += ",".join(["'||quote(" + col + ")||'" for col in column_names])
q += ")' FROM '%(tbl_name)s'"
query_res = cu.execute(q % {'tbl_name': table_name})
for row in query_res:
yield("%s;" % row[0])
# Now when the type is 'index', 'trigger', or 'view'
#q = """
# SELECT name, type, sql
# FROM sqlite_master
# WHERE sql NOT NULL AND
# type IN ('index', 'trigger', 'view')
# """
#schema_res = cu.execute(q)
#for name, type, sql in schema_res.fetchall():
# yield('%s;' % sql)
yield('COMMIT;')
Now it accepts table name as second argument.
You can use it like this:
with open('dump.sql', 'w') as f:
for line in _iterdump(con, 'GTS_vehicle'):
f.write('%s\n' % line)
Will get something like:
BEGIN TRANSACTION;
CREATE TABLE "GTS_vehicle" ("id" integer NOT NULL PRIMARY KEY, "name" varchar(20) NOT NULL, "company_id" integer NULL, "license_plate" varchar(20) NULL, "icon" varchar(100) NOT NULL DEFAULT 'baseicon.png', "car_brand" varchar(30) NULL, "content_type_id" integer NULL, "modemID" varchar(100) NULL, "distance" integer NULL, "max_speed" integer NULL DEFAULT 100, "max_rpm" integer NULL DEFAULT 4000, "fuel_tank_volume" integer NULL DEFAULT 70, "max_battery_voltage" integer NULL, "creation_date" datetime NOT NULL, "last_RFID" text NULL);
INSERT INTO "GTS_vehicle" VALUES(1,'lan1_op1_car1',1,'03115','baseicon.png','UFP',16,'lan_op1_car1',NULL,100,4000,70,12,'2011-06-23 11:54:32.395000',NULL);
INSERT INTO "GTS_vehicle" VALUES(2,'lang_op1_car2',1,'03','baseicon.png','ыва',16,'lan_op1_car2',NULL,100,4000,70,12,'2011-06-23 11:55:02.372000',NULL);
INSERT INTO "GTS_vehicle" VALUES(3,'lang_sup_car1',1,'0000','baseicon.png','Fiat',16,'lan_sup_car1',NULL,100,4000,70,12,'2011-06-23 12:32:09.017000',NULL);
INSERT INTO "GTS_vehicle" VALUES(4,'lang_sup_car2',1,'123','baseicon.png','ЗАЗ',16,'lan_sup_car2',NULL,100,4000,70,12,'2011-06-23 12:31:38.108000',NULL);
INSERT INTO "GTS_vehicle" VALUES(9,'lang_op2_car1',1,'','baseicon.png','',16,'1233211234',NULL,100,4000,70,12,'2011-07-05 13:32:09.865000',NULL);
INSERT INTO "GTS_vehicle" VALUES(11,'Big RIder',1,'','baseicon.png','0311523',16,'111',NULL,100,4000,70,20,'2011-07-07 12:12:40.358000',NULL);
COMMIT;
By iterdump(), all information would be displayed like this:
INSERT INTO "name" VALUES(1, 'John')
INSERT INTO "name" VALUES(2, 'Jane')
INSERT INTO "phone" VALUES(1, '111000')
INSERT INTO "phone" VALUES(2, '111001')
An easy way is by filter certain keywords by string.startswith() method.
For example, the table name is 'phone':
# Convert file existing_db.db to SQL dump file dump.sql
import sqlite3, os
con = sqlite3.connect('existing_db.db')
with open('dump.sql', 'w') as f:
for line in con.iterdump():
if line.startswith('INSERT INTO "phone"'):
f.write('%s\n' % line)
Not very smart, but can fit your objective.