extra data after last expected column psycopg2

extra data after last expected column psycopg2 - python

I'm losing my sanity after this error, I've uploaded dozens of tables before, but this one keep giving me the error in the title.
I have a data-frame with 11 columns and an SQL table already set up for it. All columns match name.
df_rates = df_rates.replace('\t', '', regex=True)
data_to_upload_output = io.StringIO() # Create object to store csv output in
df_rates.to_csv(data_to_upload_output, sep='\t', header=False, index=False,date_format='%Y-%m-%d') # Send my_data to csv
data_to_upload_output.seek(0) # Return to start of file
conn = psycopg2.connect(host='xxxxx-xxx-x-x',
dbname='xxxx',
user=uid,
password=pwd,
port=xxxx,
options="-c search_path=dbo,development")
db_table = 'sandbox.gm_dt_input_dist_rates'
with conn:
with conn.cursor() as cur:
cur.copy_from(data_to_upload_output, db_table, null='', columns=my_data.columns) # null values become '', columns should be lowercase, at least for PostgreSQL
conn.commit()
conn.close()
The error continues saying:
CONTEXT: COPY gm_dt_input_dist_rates, line 43:
"IE00B44CGS96 USD 0.9088 0.9088 10323906 97.2815 97.2815 2022-05-12 2022-05-11 cfsad 2022-05-20"
Which makes me think the "/t" hasn't been recognized. But this same code works perfectly for all the other tables I'm uploading. I've checked for posts with same errors but I couldn't find a way to apply the solution they got to what I'm experiencing.
Thanks for your help!
It is much appreciated, have a great weekend!

Related

Pandas .to_sql() Output exceeds the size limit

I'm using pydobc and sqlalchemy to insert data into a table in SQL Server, and I'm getting this error.
https://i.stack.imgur.com/miSp9.png
Here are snippets of the functions I use.
This is the function I use to connect to the SQL server (using fast_executemany)
def connect(server, database):
global cnxn_str, cnxn, cur, quoted, engine
cnxn_str = ("Driver={SQL Server Native Client 11.0};"
"Server=<server>;"
"Database=<database>;"
"UID=<user>;"
"PWD=<password>;")
cnxn = pyodbc.connect(cnxn_str)
cur = cnxn.cursor()
cur.fast_executemany=True
quoted = quote_plus(cnxn_str)
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted), fast_executemany=True)
And this is the function I'm using to query and insert the data into the SQL server
def insert_to_sql_server():
global df, np_array
# Dataframe df is from a numpy array dtype = object
df = pd.DataFrame(np_array[1:,],columns=np_array[0])
# add new columns, data processing
df['comp_key'] = df['col1']+"-"+df['col2'].astype(str)
df['comp_key2'] = df['col3']+"-"+df['col4'].astype(str)+"-"+df['col5'].astype(str)
df['comp_statusID'] = df['col6']+"-"+df['col7'].astype(str)
convert_dict = {'col1': 'string', 'col2': 'string', ..., 'col_n': 'string'}
# convert data types of columns from objects to strings
df = df.astype(convert_dict)
connect(<server>, <database>)
cur.rollback()
# Delete old records
cur.execute("DELETE FROM <table>")
cur.commit()
# Insert dataframe to table
df.to_sql(<table name>, engine, index=False, \
if_exists='replace', schema='dbo', chunksize=1000, method='multi')
The insert function runs for about 30 minutes before finally returning the error message.
I encountered no errors when doing it with a smaller df size. The current df size I have is 27963 rows and 9 columns. One thing which I think contributes to the error is the length of the string. By default the numpy array is dtype='<U25', but I had to override this to dtype='object' because it was truncating the text data.
I'm out of ideas because it seems like the error is referring to limitations of either Pandas or the SQL Server, which I'm not familiar with.
Thanks

Thanks for all the input (still new here)! Accidentally stumbled upon the solution, which is by reducing the df.to_sql from
df.to_sql(chunksize=1000)
to
df.to_sql(chunksize=200)
After digging it turns out there's a limitation from SQL server (https://discuss.dizzycoding.com/to_sql-pyodbc-count-field-incorrect-or-syntax-error/)

In my case, I had the same "Output exceeds the size limit" error, and I fixed it adding "method='multi'" in df.to_sql(method='multi').
First I tried the "chuncksize" solution and it didn't work.
So... check that if you're at the same scenario!
with engine.connect().execution_options(autocommit=True) as conn:
df.to_sql('mytable', con=conn, method='multi', if_exists='replace', index=True)

Sqlalchemy constructing incorrectly SQL Query when not Reloading dataframe from CSV in Python

I'm using python3, SQLAlchemy, and a MariaDB server.
Im getting data from a REST server in json format parsing it to a dictionary then to a Dataframe in Pandas.
The error i'm getting occurs when i dont save the Dataframe into a CSV format and then Reload it like this:
df.to_csv("temp_save.csv", index=False)
df = pd.read_csv("temp_save.csv")
When the previous lines are commented out i get the following error:
(pymysql.err.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '), (), (), 0, '2022-01-26T17:32:49Z', 29101, 1, 3, 2, '2022-01-25T17:32:49Z', '2' at line 1")
[SQL: INSERT INTO `TicketRequesters` (subject, group_id, department_id, category, sub_category, item_category, requester_id, responder_id, due_by, fr_escalated, deleted, spam, email_config_id, fwd_emails, reply_cc_emails, cc_emails, is_escalated, fr_due_by, id, priority, status .....
VALUES (%(subject_m0)s, %(group_id_m0)s, %(department_id_m0)s, %(category_m0)s, %(sub_category_m0)s, %(item_category_m0)s, %(requester_id_m0)s, %(responder_id_m0)s, %(due_by_m0)s, %(fr_escalated_m0)s, %(deleted_m0)s, %(spam_m0)s, %(email_config_id_m0)s, %(fwd_emails_m0)s, %(reply_cc_emails_m0)s, %(cc_emails_m0)s, %(is_escalated_m0)s, %(fr_due_by_m0)s, %(id_m0)s, %(priority_m0)s, %(status_m0)s, %(source_m0)s, %(created_at_m0)s, %(updated_at_m0)s, %(requested_for_id_m0)s, %(to_emails_m0)s, %(type_m0)s, %(description_text_m0)s, %(custom_fields_localidad_m0)s, %(custom_fields_hora_de_la_falla_m0)s, %(custom_fields_hubo_alguna_modificacin_en_el_firewall_o_en_su_pl_m0)s, %(custom_fields_el_incidente_presentado_corresponde_a_m0)s, %(custom_fields_client_type_m0)s, %(custom_fields_riesgos_del_cambio_o_caso_m0)s, %(custom_fields_solucin_del_caso_m0)s, %(custom_fields_estado_de_cierre_m0)s, %(custom_fields_numero_de_oportunidad_m0)s, %(custom_fields_cuales_son_sus_servicios_afectados_especificar_si_m0)s, %(custom_fields_numero_de_ticket_de_cambio_m0)s, %(custom_fields_cantidad_estimada_de_personas_o_departamentos_afe_m0)s, %(cu.....
As shown, in the VALUES %()s field "_m0" is getting appended at the end, i noticed the number grows up to the number of rows i'm trying to upsert.
%(stats_created_at_m29)s, %(stats_updated_at_m29)s, %(stats_ticket_id_m29)s, %(stats_opened_at_m29)s, %(stats_group_escalated_m29)s, %(stats_inbound_count_m29)s, %(stats_status_updated_at_m29)s, %(stats_outbound_count_m29)s, %(stats_pending_since_m29)s, %(stats_resolved_at_m29)s, %(stats_closed_at_m29)s, %(stats_first_assigned_at_m29)s, %(stats_assigned_at_m29)s, %(stats_agent_responded_at_m29)s, %(stats_requester_responded_at_m29)s, %(stats_first_responded_at_m29)s, %(stats_first_resp_time_in_secs_m29)s, %(stats_resolution_time_in_secs_m29)s, %(description_m29)s, %
This is the python code that i try to use, just in case.
engine = db.create_engine(
f"mariadb+pymysql://{user}:{password}#{host}/{database_name}?charset=utf8mb4"
)
columndict: dict = {"id": Column("id", Integer, primary_key=True)}
# Prepare Column List, check columndict if exists, get Column object from dict
column_list = [columndict.get(name, Column(name, String(256))) for name in columns]
# Get an instance of the table
# TODO: Instance table from metadata (without having to give it columns)
instanceTable = Table(table_name, metadata, *column_list)
metadata.create_all(engine)
# Schema created
# Create Connection
conn = engine.connect()
# Prepare statement
insert_stmt = db.dialects.mysql.insert(instanceTable).values(values)
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
data=insert_stmt.inserted, status="U"
)
# Execute statement
result = conn.execute(on_duplicate_key_stmt)
# DATA UPSERTED
I investigated about the limitations of mysql/mariadb with the UTF8 encoding and the correct encoding is ?charset=utf8mb4, this might be related to the query construction issue.
EDIT: I found a fix for this error by replacing empty lists and empty strings from the Dataframe with None

This problem was caused due to sending empty lists [] and empty strings '' to the SQLAlchemy values.
Fixed by replacing those items with None.

sqlite3 INSERT INTO fails if values is a 'itertools.chain' object

I am newbie for sqlite3 (python user)
Already stored data in database based on below method, but not working this time. INSERT INTO execuated, no error report, but nothing stored in db.
I searched this topic a lot on this website, focused on "no commit". But I am pretty sure I finished "commit" and closed file correctly. After INSERT action, I could found table f65 in my db with header (column name),but no more data. (10 columns x 4k rows).
Key codes snippet as below.
df = pd.DataFrame(pd.read_excel(r'/Users/' + username + r'/Documents/working/addf6-5.xlsx', header=0))
df = df.replace('#NUM!', '')
value = chain(reversed(grouplist2), df.values.tolist())
for x in value:
x[4] = str(x[4])
x[7] = str(x[7])
conn = sqlite3.connect('veradb.db')
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS f65")
conn.commit()
c.close()
conn.close()
conn = sqlite3.connect('veradb.db')
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS f65 ('offer', 'code', 'desc', 'engine', 'cost', 'supplier', 'remark', 'podate', 'empty', 'border')")
c.executemany("INSERT INTO f65 VALUES (?,?,?,?,?,?,?,?,?,?)", value)
conn.commit()
c.close()
conn.close()
More details for explanation:
Cache data is in addf6-5, I fetched new data in grouplist2 (it's a list) and used chain(reversed(grouplist2), df.values.tolist())to combine data.
Why used
for x in value:
x[4] = str(x[4])
x[7] = str(x[7])
Because my pycharm report "Error binding parameter 4 - probably unsupported type."&"Error binding parameter 7 - probably unsupported type." So I do STR again though they should be TEXT according to SQLite by default.
I tried to test "value" via
for x in value:
x[4] = str(x[4])
x[7] = str(x[7])
print(x)
And found they could be printed correctly.Maybe this proved value (lists) was correct?
Some people may doubt if attribute missed in "CREATE TABLE" sentence, I can say Sqlite provided them TEXT by default is no typing and I execute these codes on other files and worked.
Looking forward to your help. Thank you in advance.

Fetching a long text column

I've got a problem with fetching a result from a MySql database with Python3.6.
The long text is a numpy array transformed into a string.
When I check the database and look into the column img_array everything is just fine. All the data is written.
Next I try to retrieve the text column like this:
con = .. # SQL connection which is successful and working fine
cur = con.cursor() # Getting the cursor
cur.execute('SELECT img_array FROM table WHERE id = 1')
result = cur.fetchone()[0] # Result is a tuple with the array at 0
print(result)
[136 90 87 ... 66 96 125]
The problem here is that the ... is like a string. I'm missing all the values.
When I try the following it works just fine:
cur.execute('SELECT img_array FROM table LIMIT 1')
result = cur.fetchone()[0] # this gives me the entire string in the DB
print(result)
# The entire array will be printed here without missing values
I really don't know how to fetch a column with the where clause via python.
Any ideas?
EDIT: Ok, the last edit was wrong... I've checked it again and the buffered cursor doesn't change it. I'm confused because it seemed to work.

How to solve invalid input syntax for type double precision

I try to load csv data into postgres. The creating table part is fine. But when I try to load data from csv, it got error. My code and error are attached below. Is %s wrong?
import psycopg2
import csv
conn = psycopg2.connect(host="127.0.0.1", port="5432", database="postgres", user="postgres", password="*******")
print "Opened database successfully"
cur = conn.cursor()
cur.execute('''create table calls_aapl("Ask" float,"Bid" float,"Change" float,"ContractSymbol" varchar(50),"ImpliedVolatility" float,"LastPrice" float,
"LastTradeDate" date,"OpenInterest" int,"PercentChange" float,"Strike" float,"Volume" int);''')
print "Table created successfully"
reader = csv.reader(open('D:/python/Anaconda/AAPL_Data/Calls.csv', 'r'))
for i, row in enumerate(reader):
print(i, row)
if i == 0: continue
cur.execute('''
INSERT INTO "calls_aapl"(
"Ask", "Bid", "Change", "ContractSymbol", "ImpliedVolatility", "LastPrice", "LastTradeDate", "OpenInterest", "PercentChange", "Strike", "Volume"
) values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''', row
)
conn.commit()
cur.close()
Error:
(0, ['Ask', 'Bid', 'Change', 'ContractSymbol', 'LastPrice', 'LastTradeDate', 'OpenInterest', 'PercentChange', 'PercentImpliedVolatility', 'Strike', 'Volume'])
(1, ['41.7', '39.75', '1.15', 'AAPL180803C00150000', '41.05', '7/31/2018', '52', '2.88', '154.59', '150', '6'])
DataError: invalid input syntax for type double precision: "7/31/2018"
LINE 4: ...1.7','39.75','1.15','AAPL180803C00150000','41.05','7/31/2018...
^

Using %s is ok because PostgreSQL can cast strings to numbers in an INSERT.
Your problem is a different one. Your INSERT statement specifies a column "ImpliedVolatility" (too late for a warning against mixed case identifiers) which is not in the data.
This causes the fifth column (labeled LastPrice to be inserted into "ImpliedVolatility" and the next column (labeled LastTradeDate) to be inserted into "LastPrice".
The former of these is wrong but works, because both "LastPrice" and "ImpliedVolatility" are float^H^H^H^H^Hdouble precision, but the latter fails because it tries to insert a date string into a double precision column.
Omit the column "ImpliedVolatility" from the INSERT statement.

its just about typo i think,
you should equalize the table column to your insert query.
That "LastTradeDate" is inserted to "LastPrice" which is which is not the right column
thank you

Usually occurs when your column headers and values aren't matched up properly. Try checking to see if the number of values specified are the same and of similar data types.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

extra data after last expected column psycopg2 - python

Related

Pandas .to_sql() Output exceeds the size limit

Sqlalchemy constructing incorrectly SQL Query when not Reloading dataframe from CSV in Python

sqlite3 INSERT INTO fails if values is a 'itertools.chain' object

Fetching a long text column

How to solve invalid input syntax for type double precision

Categories

Resources