Related
I have a inquiry from my job to create 2 tables related one-to-one and insert some rows on the table. I have a .CSV with the data dictionary (column name and data type) from the table and I am wondering to know how to declare the tables columns automatically (declarative syntax) without write one by one column (there are 260 columns). Same thing for the insert, how to add rows to the multiple columns table without write column by column?
I have the data in a Data frame but I was not able to insert it using df.to_sql from pandas. Do you guys have any similar example?
The database used is MySQL. The table structure in the database is showed below:
enter image description here
Below is what I did.
I have created a function to define the table constraints.
def create_mysql_tables(dataframe, engine):
# print(dataframe.iloc)
if 'Active' in dataframe.columns:
start = time.time()
engine.execute('DROP TABLE IF EXISTS com_treb;')
engine.execute('DROP TABLE IF EXISTS cnd_treb;')
engine.execute('DROP TABLE IF EXISTS res_treb;')
engine.execute('DROP TABLE IF EXISTS mls_treb;')
dataframe.to_sql("mls_treb", if_exists='replace', con=engine,
dtype={'mls_number': VARCHAR(dataframe.index.get_level_values('mls_number').str.len().max())})
with engine.connect() as con:
con.execute('ALTER TABLE `mls_treb` ADD PRIMARY KEY (`mls_number`);')
end = time.time()
print(end - start)
First questions is, there is a better way to define a primary key using the SQLALCHEMY avoiding to write the query?
After I have created the table I have to insert some row if it doesn't exist. I was trying to do it by using the function below.
def crud_database(engine, mls_full_dataframe, res_full_dataframe):
for index, row in mls_full_dataframe.iterrows():
row_db_mls = pd.read_sql_query("SELECT * FROM mls_treb WHERE `mls_number` LIKE %(mlsnumber)s ", engine, params={'mlsnumber' : row.at['mls_number']})
if row_db_mls.empty:
row.to_sql("mls_treb", if_exists='append', con=engine, index_label='mls_number' )
if row.at['Class'][0] == 'RES':
row_res = res_full_dataframe.iloc[row.at['mls_number']]
else:
print(0)
but I am getting the error below when I am trying to insert the row from picture below:
image from row to be inserted with index 0
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1782, in _execute_context
self.dialect.do_executemany(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 729, in do_executemany
cursor.executemany(statement, parameters)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/cursor.py", line 670, in executemany
return self.execute(stmt)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/cursor.py", line 568, in execute
self._handle_result(self._connection.cmd_query(stmt))
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/connection.py", line 854, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/connection.py", line 664, in _handle_result
raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column '3' in 'field list'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/sql.py", line 1419, in to_sql
raise err
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/sql.py", line 1411, in to_sql
table.insert(chunksize, method=method)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/sql.py", line 845, in insert
exec_insert(conn, keys, chunk_iter)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/sql.py", line 762, in _execute_insert
conn.execute(self.table.insert(), data)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1289, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
return connection._execute_clauseelement(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1481, in _execute_clauseelement
ret = self._execute_context(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
self._handle_dbapi_exception(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2026, in _handle_dbapi_exception
util.raise_(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
raise exception
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1782, in _execute_context
self.dialect.do_executemany(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 729, in do_executemany
cursor.executemany(statement, parameters)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/cursor.py", line 670, in executemany
return self.execute(stmt)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/cursor.py", line 568, in execute
self._handle_result(self._connection.cmd_query(stmt))
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/connection.py", line 854, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/connection.py", line 664, in _handle_result
raise errors.get_exception(packet)
sqlalchemy.exc.ProgrammingError: (mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column '3' in 'field list'
[SQL: INSERT INTO mls_treb (mls_number, `3`) VALUES (%(mls_number)s, %(3)s)]
[parameters: ({'mls_number': 'mls_number', '3': 'N5404949'}, {'mls_number': 'active', '3': True}, {'mls_number': 'class_name', '3': 'RES'}, {'mls_number': 'active_date', '3': datetime.date(2022, 1, 16)})]
(Background on this error at: https://sqlalche.me/e/14/f405)
With Gord Thompsom help I could find the answer for problem. Prior to use the row to insert
row_to_insert = mls_full_dataframe.loc[mls_full_dataframe['mls_number'] == row.at['mls_number']]
row_to_insert.to_sql("mls_treb", if_exists='append', con=engine, index=False)
I was getting a series instead of row as dataframe. So, when I was trying to insert the row as serie, the number '3' was the index used as key and the compiler did not find the column name properly with that name "3". To fix it, I was retrieving the row_to_insert as dataframe and trying to insert it into database.
My Pandas Dataframe named as data_five_minutes:
script_id
date_time
open
0
1
2019-01-11 09:35:00
25
1
1
2019-01-11 09:40:00
30
2
1
2019-01-11 09:45:00
48
Full ss:
Now I was trying to get only data which are having script_id as 1:
data = ps.sqldf(f"""SELECT d4.script_id,d4.date_time,d4.open,d4.high,d4.low,d4.close,d4.volume
FROM data_five_minutes d4 WHERE d4.script_id = {id}""")
this is just small condition but in original it has nested where clause so SQL query will be passed on original code as earlier i was passing the same query in database.
I am getting below error:
Traceback (most recent call last): File
"C:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1752, in
_e xecute_context
cursor, statement, parameters, context File "C:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 714,
in do_executemany
cursor.executemany(statement, parameters) sqlite3.InterfaceError: Error binding parameter 2 - probably unsupported type.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File
"D:\python_projects\auto\testing.py", line 70, in
FROM data_five_minutes d4 WHERE d4.script_id = 1""") File "C:\Python\lib\site-packages\pandasql\sqldf.py", line 156, in sqldf
return PandaSQL(db_uri)(query, env) File "C:\Python\lib\site-packages\pandasql\sqldf.py", line 58, in call
write_table(env[table_name], table_name, conn) File "C:\Python\lib\site-packages\pandasql\sqldf.py", line 121, in
write_table
index=not any(name is None for name in df.index.names)) # load index into d b if all levels are named File
"C:\Python\lib\site-packages\pandas\io\sql.py", line 518, in to_sql
method=method, File "C:\Python\lib\site-packages\pandas\io\sql.py", line 1320, in to_sql
table.insert(chunksize, method=method) File "C:\Python\lib\site-packages\pandas\io\sql.py", line 756, in insert
exec_insert(conn, keys, chunk_iter) File "C:\Python\lib\site-packages\pandas\io\sql.py", line 670, in
_execute_ins ert
conn.execute(self.table.insert(), data) File "C:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in
ex ecute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File "C:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line
324, in _e xecute_on_connection
self, multiparams, params, execution_options File "C:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1462, in
_e xecute_clauseelement
cache_hit=cache_hit, File "C:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1815, in
e xecute_context
e, statement, parameters, cursor, context File "C:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1996, in
h andle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from=e File "C:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 207, in
rai se
raise exception File "C:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1752, in
_e xecute_context
cursor, statement, parameters, context File "C:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 714,
in do_executemany
cursor.executemany(statement, parameters) sqlalchemy.exc.InterfaceError: (sqlite3.InterfaceError) Error binding
parameter 2 - probably unsupported type. [SQL: INSERT INTO
data_five_minutes (script_id, date_time, open, high, low, clos e,
volume) VALUES (?, ?, ?, ?, ?, ?, ?)] [parameters: ((1, '2021-05-25
10:30:00.000000', Decimal('1978.6'), Decimal('1985 '),
Decimal('1978.1'), Decimal('1985'), Decimal('323')), (1, '2021-05-25
10:35:0
0.000000', Decimal('1985'), Decimal('1986.05'), Decimal('1982.75'), Decimal('198
3.85'), Decimal('954')), and so on...] (Background on this error at: https://sqlalche.me/e/14/rvf5)
Process returned 1 (0x1) execution time : 2.672 s Press any key
to continue . . .
Using the mysql.connector package (with Django), and executing:
c.execute("""
select *
from shop_sales
where product_id in %s
""", [(83, 84, 85, 87, 88, 89)])
We get the following traceback:
Traceback (most recent call last):
File "/srv/venv/dev35/lib/python3.5/site-packages/mysql/connector/conversion.py", line 179, in to_mysql
return getattr(self, "_{0}_to_mysql".format(type_name))(value)
AttributeError: 'DjangoMySQLConverter' object has no attribute '_tuple_to_mysql'
This is reported as a bug at https://bugs.mysql.com/bug.php?id=89112, but I'm trying to find a workaround by writing a converter (from my settings.py):
from mysql.connector.django.base import DjangoMySQLConverter
def _tuple_to_mysql(self, value):
res = []
for item in value:
msval = self.to_mysql(item)
if not isinstance(msval, bytes):
msval = str(msval).encode()
res.append(msval)
return b'(' + b', '.join(res) + b')'
DjangoMySQLConverter._tuple_to_mysql = _tuple_to_mysql
but it looks like mysql.connector is adding extra quotes around it:
Traceback (most recent call last):
File "/srv/venv/dev35/lib/python3.5/site-packages/mysql/connector/django/base.py", line 176, in _execute_wrapper
return method(query, args)
File "/srv/venv/dev35/lib/python3.5/site-packages/mysql/connector/cursor.py", line 561, in execute
self._handle_result(self._connection.cmd_query(stmt))
File "/srv/venv/dev35/lib/python3.5/site-packages/mysql/connector/connection.py", line 525, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File "/srv/venv/dev35/lib/python3.5/site-packages/mysql/connector/connection.py", line 427, in _handle_result
raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''(83, 84, 85, 87, 88, 89)'
What am I doing wrong?
I have to read from a OrientDB. To test that everything works I tried to read from the Database with the SELECT Statement.
like this:
import pyorient
client = pyorient.OrientDB("adress", 2424)
session_id = client.connect("root", "password")
client.db_open("table","root","password")
print str(client.db_size())
client.query("SELECT * FROM L1_Req",1)
The Connection works fine and also the print str(client.db_size()) line.
But at client.query("SELECT * FROM L1_Req",1) it returns the following Error Message:
Traceback (most recent call last):
File "testpy.py", line 9, in <module>
client.query("SELECT * FROM L1_Req",1)
File "C:\app\tools\python27\lib\site-packages\pyorient\orient.py", line 470, i
n query
.prepare(( QUERY_SYNC, ) + args).send().fetch_response()
File "C:\app\tools\python27\lib\site-packages\pyorient\messages\commands.py",
line 144, in fetch_response
super( CommandMessage, self ).fetch_response()
File "C:\app\tools\python27\lib\site-packages\pyorient\messages\base.py", line
265, in fetch_response
self._decode_all()
File "C:\app\tools\python27\lib\site-packages\pyorient\messages\base.py", line
249, in _decode_all
self._decode_header()
File "C:\app\tools\python27\lib\site-packages\pyorient\messages\base.py", line
176, in _decode_header
serialized_exception = self._decode_field( FIELD_STRING )
File "C:\app\tools\python27\lib\site-packages\pyorient\messages\base.py", line
366, in _decode_field
_decoded_string = self._orientSocket.read( _len )
File "C:\app\tools\python27\lib\site-packages\pyorient\orient.py", line 164, i
n read
buf = bytearray(_len_to_read)
MemoryError
I also tried some ohter SQL Statements like:
client.query("SELECT subSystem FROM L1_Req",1)
I cant't get why this happends. Can you guys help me ?
I am trying to execute a query and return the results to Excel. The query takes in a string of years as input parameters. I am calling it in Python like this:
def flatten(l):
for el in l:
try:
yield from flatten(el)
except TypeError:
yield el
my_list = (previous_year_1,previous_year,current_year)
sql = 'select year,sum(sales)/case when sum(t_count)=0 then 1 else sum(t_count) as tx_sales from t_sales where year in ({1})'+ 'group by year' + 'order by year'
sql = sql.format ('?',','.join('?' * len(my_list)))
params = tuple(flatten(member_list))
ind_data = pd.read_sql(sql,engine,params)
The query itself, after fixing the end clause, works perfectly when run through SSMS. Just not through the Python code. The error I'm getting is:
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1139, in _execute_context
context)
File "C:\Anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 450, in do_execute
cursor.execute(statement, parameters)
pyodbc.Error: ('07002', '[07002] [Microsoft][SQL Server Native Client 11.0]COUNT field incorrect or syntax error (0) (SQLExecDirectW)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:\pbp_proj\pbp_proj.py", line 61, in pull_metrics
ind_data = pd.read_sql_query(sql, engine, params)
File "C:\Anaconda3\lib\site-packages\pandas\io\sql.py", line 411, in read_sql_query
parse_dates=parse_dates, chunksize=chunksize)
File "C:\Anaconda3\lib\site-packages\pandas\io\sql.py", line 1128, in read_query
result = self.execute(*args)
File "C:\Anaconda3\lib\site-packages\pandas\io\sql.py", line 1022, in execute
return self.engine.execute(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1989, in execute
return connection.execute(statement, *multiparams, **params)
File "C:\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 906, in execute
return self._execute_text(object, multiparams, params)
File "C:\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1054, in _execute_text
statement, parameters
File "C:\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context
context)
File "C:\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1341, in _handle_dbapi_exception
exc_info
File "C:\Anaconda3\lib\site-packages\sqlalchemy\util\compat.py", line 188, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=exc_value)
File "C:\Anaconda3\lib\site-packages\sqlalchemy\util\compat.py", line 181, in reraise
raise value.with_traceback(tb)
File "C:\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1139, in _execute_context
context)
File "C:\Anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 450, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('07002', '[07002] [Microsoft][SQL Server Native Client 11.0]COUNT field incorrect or syntax error (0) (SQLExecDirectW)')
How can I resolve this error?
As #MYGz has already mentioned there is a missing space before order by.
Beside that there is a missing space before group by and the most important one - your CASE ... statement should be "closed" with END.
That said try the following SQL:
sql = 'select year,sum(sales)/(case when sum(t_count)=0 then 1 else sum(t_count) end)' \
+' as tx_sales from t_sales where year in ({1})'+' group by year order by year'
You can use your SQL pattern directly using .format() - there is no need to overwrite it:
params = tuple(flatten(member_list))
ind_data = pd.read_sql(sql.format('?',','.join('?' * len(params))), engine, params)
You have missed a space in your sql string between year and order by.
Try this:
sql = 'select year,sum(sales)/case when sum(t_count)=0 then 1 else sum(t_count) as tx_sales from t_sales where year in ({1}) '+ 'group by year ' + 'order by year '
Resolved this. A bit of hack, but works. I first changed it to using pyodbc instead of sqlalchemy.
so my query string became:
sql = 'select year,sum(sales)/case when sum(t_count)=0 then 1 else sum(t_count) end as tx_sales from t_sales where year in (?,?,?) '+ ' group by year' + ' order by year'
ind_data = pd.read_sql(sql, conn, params=member_list)
summary = ind_data.transpose()
I then had to add another AND clause with another parameter. For this I created:
cur_params = (member_list)
cur_params.append(var_premium)
then passsed cur_params to ind_data.
ind_data = pd.read_sql(sql, conn, params=cur_params)
both sets return data correctly now.
Thank you all for reading my post and for all the suggestions.