Error when importing CSV to postgres with python and psycopg2 - python

I try to COPY a CSV file from a folder to a postgres table using python and psycopg2 and I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
psycopg2.ProgrammingError: must be superuser to COPY to or from a file
HINT: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.
I also tried to run it through the python environment as:
constr = "dbname='db_name' user='user' host='localhost' password='pass'"
conn = psycopg2.connect(constr)
cur = conn.cursor()
sqlstr = "COPY test_2 FROM '/tmp/tmpJopiUG/downloaded_xls.csv' DELIMITER ',' CSV;"
cur.execute(sqlstr)
I still get the above error. I tried \copy command but this works only in psql. What is the alternative in order to be able to execute this through my python script?
EDITED
After having a look in the link provided by #Ilja Everilä I tried this:
cur.copy_from('/tmp/tmpJopiUG/downloaded_xls.csv', 'test_copy')
I get an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument 1 must have both .read() and .readline() methods
How do I give these methods?

Try using cursor.copy_expert():
constr = "dbname='db_name' user='user' host='localhost' password='pass'"
conn = psycopg2.connect(constr)
cur = conn.cursor()
sqlstr = "COPY test_2 FROM STDIN DELIMITER ',' CSV"
with open('/tmp/tmpJopiUG/downloaded_xls.csv') as f:
cur.copy_expert(sqlstr, f)
conn.commit()
You have to open the file in python and pass it to psycopg, which then forwards it to postgres' stdin. Since you're using the CSV argument to COPY, you have to use the expert version in which you pass the COPY statement yourself.

You can also use copy_from. See the code below
with open('/tmp/tmpJopiUG/downloaded_xls.csv') as f:
cur.copy_from(f, table_name,sep=',')
conn.commit()

Related

Running impala-shell from jupyter notebook

I am trying to run impala from jupyter-notebook as follows--
from impala.dbapi import connect
conn = connect(host='xx-xx-xx.xx.com',
port=21000,
auth_mechanism="PLAIN",
user='xxxx',
password='xxxx',
)
cursor = conn.cursor()
But I am getting this error
TTransportException Traceback (most recent call last)
<ipython-input-28-6c858acffc1b> in <module>
.
.
TTransportException: Bad status: 3 (b'Unsupported mechanism type PLAIN')
After trying out many things which did not work I thought to run impala command as python subprocess.run but I am getting no output (return code 1)--
r = subprocess.run(['impala-shell', '-q', "select xxx...xxx"],
stdout=subprocess.PIPE)
print(r.stdout.decode()) # returncode=1
Also !impala-shell in jupyter giving this error
File "/opt/xxxx/xxxx/xxxx/impala-shell/impala_shell.py", line 262
print "Query options (defaults shown in []):"
^
SyntaxError: invalid syntax
Can anyone let me know the problem and guide me on how to resolve this?

Python Package and Methods not Importing

I built a simple class with a couple methods to make my life a little easier when loading data into Postgres with Python. I also attempted to package it so I could pip install it (just to experiment, never done that before).
import psycopg2
from sqlalchemy import create_engine
import io
class py_psql:
engine = None
def engine(self, username, password, hostname, port, database):
connection = 'postgresql+psycopg2://{}:{}#{}:{}/{}'.format(ntid.lower(), pw, hostname, port, database)
self.engine = create_engine(connection)
def query(self, query):
pg_eng = self.engine
return pd.read_sql_query(query, pg_eng)
def write(self, write_name, df, if_exists='replace', index=False):
mem_size = df.memory_usage().sum()/1024**2
pg_eng = self.engine
def write_data():
df.head(0).to_sql(write_name, pg_eng, if_exists=if_exists,index=index)
conn = pg_eng.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, write_name, null="")
conn.commit()
if mem_size > 100:
validate_size = input('DataFrame is {}mb, proceed anyway? (y/n): '.format(mem_size))
if validate_size == 'y':
write_data()
else:
print("Canceling write to database")
else:
write_data()
My package directory looks like this:
py_psql
py_psql.py
__init__.py
setup.py
My init.py is empty since I read elsewhere that I was able to do that. I'm not remotely an expert here...
I was able to pip install that package and import it, and if I were to paste this class into a python shell, I would be able to do something like
test = py_psql()
test.engine(ntid, pw, hostname, port, database)
and have it create the sqlalchemy engine. However, when I import it after the pip install I can't even initialize a py_psql object:
>>> test = py_psql()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'module' object is not callable
>>> py_psql.engine(ntid, pw, hostname, port, database)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'py_psql' has no attribute 'engine'
I'm sure I'm messing up something obvious here, but I found the process of packaging fairly confusing while researching this. What am I doing incorrectly?
Are you sure you imported your package correctly after pip install?
For example:
from py_psql.py_psql import py_psql
test = py_psql()
test.engine(ntid, pw, hostname, port, database)

I get NotImplementedError when trying to do a prepared statement with mysql python connector

I want to use prepared statements to insert data into a MySQL DB (version 5.7) using python, but I keep getting a NotImplementedError.
I'm following the documentation here: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorprepared.html
Using Python 2.7 and version 8.0.11 of mysql-connector-python library:
pip show mysql-connector-python
---
Metadata-Version: 2.1
Name: mysql-connector-python
Version: 8.0.11
Summary: MySQL driver written in Python
Home-page: http://dev.mysql.com/doc/connector-python/en/index.html
This is a cleaned version (no specific hostname, username, password, columns, or tables) of the python script I'm running:
import mysql.connector
from mysql.connector.cursor import MySQLCursorPrepared
connection = mysql.connector.connect(user=username, password=password,
host='sql_server_host',
database='dbname')
print('Connected! getting cursor')
cursor = connection.cursor(cursor_class=MySQLCursorPrepared)
select = "SELECT * FROM table_name WHERE column1 = ?"
param = 'param1'
print('Executing statement')
cursor.execute(select, (param,))
rows = cursor.fetchall()
for row in rows:
value = row.column1
print('value: '+ value)
I get this error when I run this:
Traceback (most recent call last):
File "test.py", line 18, in <module>
cursor.execute(select, (param,))
File "/home/user/.local/lib/python2.7/site-packages/mysql/connector/cursor.py", line 1186, in execute
self._prepared = self._connection.cmd_stmt_prepare(operation)
File "/home/user/.local/lib/python2.7/site-packages/mysql/connector/abstracts.py", line 969, in cmd_stmt_prepare
raise NotImplementedError
NotImplementedError
CEXT will be enabled by default if you have it, and prepared statements are not supported in CEXT at the time of writing.
You can disable the use of CEXT when you connect by adding the keyword argument use_pure=True as follows:
connection = mysql.connector.connect(user=username, password=password,
host='sql_server_host',
database='dbname',
use_pure=True)
Support for prepared statements in CEXT will be included in the upcoming mysql-connector-python 8.0.17 release (according to the MySQL bug report). So once that is available, upgrade to at least 8.0.17 to solve this without needing use_pure=True.

input() causes unexpected EOF SyntaxError

I have written a return function for my group project.
I am using python 3.4 and wrote this:
def readrouter(x, y):
conn = sqlite3.connect('server.db')
cur = conn.cursor()
cur.execute("SELECT DISTINCT command FROM router WHERE
function =? or type = ? ORDER BY key ASC",(x, y))
read = cur.fetchall()
return read;
a = input("x:")
b = input("y:")
for result in readrouter(a,b):
print (result[0])
As my major member is using 2.7 and I need to follow his version now.
After I re-input my .py into python 2.7
there is a error:
x:create vlan
Traceback (most recent call last):
File "C:/Users/f0449492/Desktop/2015225/database.py", line 322, in <module>
a = input("x")
File "<string>", line 1
create vlan
^
SyntaxError: unexpected EOF while parsing
Process finished with exit code 1
how to fix this bug?
In Python 2.7, replace input() with raw_input().
The former runs eval() on the input string and expects valid Python code as input. Your input create vlan isn't valid Python and can't be eval'ed. The latter just returns a string with no further processing.
As a follow up - to ensure compatibility with both Python branches you may use six .

Error while importing file into DB2 from python script

Getting the below error while trying to import a ^ delimited file into a DB2 database using python 2.4.3.
Error:
Traceback (most recent call last):
File "C:\Python25\Usefulscripts\order.py", line 89, in <module>
load_order_stack()
File "C:\Python25\Usefulscripts\order.py", line 75, in load_order_stack
conn2.execute(importTmp)
ProgrammingError: ('42601', '[42601] [IBM][CLI Driver][DB2/LINUXX8664] SQL0104N An unexpected token "orders_extract"
was found following "import from ".
Code:
import pyodbc
def load_order_stack():
try:
conn2 = pyodbc.connect('DSN=db2Database;UID=ueserid;PWD=password')
importTmp = ("import from orders_extract of del modified by coldel0x5E"
"insert_update into test.ORDERS_Table (ORDER_ID,item,price);")
conn2.execute(importTmp)
conn2.commit()
IMPORT is not an SQL statement. It is a DB2 Command Line Processor (CLP) command and as such can only be run by the said CLP.
There is an SQL interface to some CLP commands via calls to the ADMIN_CMD() stored procedure, please check the manual: IMPORT using ADMIN_CMD
You also have the option of reading the file, line by line, and inserting into your database. This will definitely be slower than any native import operation. Assuming your delimited file structure is, and the file is named input.txt:
ORDER_ID^item^price
1^'bat'^50.00
2^'ball'^25.00
Code:
import csv
import pyodbc
connection = pyodbc.connect('DSN=db2Database;UID=ueserid;PWD=password')
cursor = connection.cursor()
with open('input.txt', 'rb') as f:
rows = csv.reader(f, delimiter='^')
# get column names from header in first line
columns = ','.join(next(rows))
for row in rows:
# build sql with placeholders for insert
placeholders = ','.join('?' * len(row))
sql = 'insert into ({}) values ({});'.format(columns, placeholders)
# execute parameterized database insert
cursor.execute(sql, row)
cursor.commit()
Play around with commit() placement, you probably want to commit in batches to improve performance.

Categories

Resources