I am trying to find a way, either in R or Python, to use a dataframe as a table in an Oracle SQL statement.
It is impractical, for my objective, to:
Create a string out of a column and use that as a criteria (more than a 1k, which is the limit)
Create a new table in the database and use that (don't have access)
Download the entire contents of the table and merge in pandas (millions of records in the database and would bog down the db and my system)
I have found packages that will allow you to "register" a dataframe and have it act as a "table/view" to allow queries against it, but it will not allow them to be used in a query with a different connection string. Can anyone point me in the right direction? Either to allow two different connections in the same SQL statement (to Oracle and a package like DuckDB) to permit an inner join or direct link to the dataframe and allow that to be used as a table in a join?
SAS does this so effortlessly and I don't want to go back to SAS because the other functionality is not as good as Python / R, but this is a dealbreaker if I can't do database extractions.
Answering my own question here -- after much research.
In short, this cannot be done. A series of criteria, outside of a list or concat, you cannot create a dataframe in python or R and pass it through a query into a SQL Server or Oracle database. It's unfortunate, but if you don't have permissions to write to temporary tables in the Oracle database, you're out of options.
Related
hello someone can help me I must create a program in Python that connects to my database, read the database records and create a pivot table with Pandas with that data
You might be looking for pandas.read_sql
You can use con parameter to pass your database connection string.
Here's a link to connect SQL to Python. Depending on the database you use, library will vary. Search for your particular database to python connection.
https://www.geeksforgeeks.org/how-to-connect-python-with-sql-database/
This will give you a 5 min crash course with Pandas and how to use pivot tables.
https://medium.com/bhavaniravi/python-pandas-tutorial-92018da85a33#:~:text=%20Get%20Started%20With%20Pandas%20In%205%20mins,DataFrame%20and%20understood%20its%20structure%2C%20let%E2%80%99s...%20More%20
Any ideas on getting data into a PostgreSQL stored proc / function from Python?
I have a DataFrame built up from other data sources and I need to do some work with Postgres and then INSERT/UPDATE some data in PostgreSQL if the query is successful. I know I can get it to work using just Python and raw SQL queries in Python strings and inserting variables where needed, but I know this is poor practice.
In the past I've been able to pass a C# DataTable from C# to a SQL Stored Procedure using MS SQL Server and User-Defined Table types. Is there a way to do something similar with Python DataFrames within PostgreSQL?
This link has been really helpful on the syntax for Python variables to Postgres function, but I have not seen anything on passing Pandas DataFrames to a PostgreSQL function. Is this possible? Is there a different design pattern I should be using?
Is there a way to perform an SQL query that joins a MySQL table with a dict-like structure that is not in the database but instead provided in the query?
In particular, I regularly need to post-process data I extract from a database with the respective exchange rates. Exchange rates are not stored in the database but retrieved on the fly and stored temporarily in a Python dict.
So, I have a dict: exchange_rates = {'EUR': 1.10, 'GBP': 1.31, ...}.
Let's say some query returns something like: id, amount, currency_code.
Would it be possible to add the dict to the query so I can return: id, amount, currency_code, usd_amount? This would remove the need to post-process in Python.
This solution doesn't use a 'join', but does combine the data from Python into SQL via a case statement. You could generate the sql you want in python (as a string) that includes these values in a giant case statement.
You give no details, and don't say which version of Python, so it's hard to provide useful code. But This works with Python 2.7 and assumes you have some connection to the MySQL db in python:
exchange_rates = {'EUR': 1.10, 'GBP': 1.31, ...}
# create a long set of case conditions as a string
er_case_statement = "\n".join("mytable.currency = \"{0}\" then {1}".format(k,v) for (k,v) in exchange_rates.iteritems())
# build the sql with these case statements
sql = """select <some stuff>,
case {0}
end as exchange_rate,
other columns
from tables etc
where etc
""".format(er_case_statement)
Then send this SQL to MySQL
I don't like this solution; you end up with a very large SQL statement which can hit the maximum ( What is maximum query size for mysql? ).
Another idea is to use temporary tables in mysql. Again assuming you are connecting to the db in python, with python create the sql that creates a temporary table and insert the exchange rates, send that to MySQL, then build a query that joins your data to that temporary table.
Finally you say you don't want to post-process in python but you have a dict from somewhere do I don't know which environment you are using BUT if you can get these exchange rates from the web, say with CURL, then you could use shell to also insert these values into a MySQL temp table, and join there.
sorry this is general and not specific, but the question could use more specificity. Hope it helps someone else give a more targeted answer.
I have 2 tables, One with new data, and another with old data.
I need to find the diff between the two tables and push only the changes into the table with the old data as it will be in production.
Both the tables are identical in terms of columns, only the data varies.
EDIT:
I am looking for only one way sync
EDIT 2
The table may have foreign keys.
Here are the constraints
I can't use shell utilities like mk-table-sync
I can't use gui tools,because they cannot be automated, like suggested here.
This needs to be done programmatically, or in the db.
I am working in python on Google App-engine.
Currently I am doing things like
OUTER JOINs and WHERE [NOT] EXISTS to compare each record in SQL queries and pushing the results.
My questions are
Is there a better way to do this ?
Is it better to do this in python rather than in the db ?
According to your comment to my question, you could simply do:
DELETE FROM OldTable;
INSERT INTO OldTable (field1, field2, ...) SELECT * FROM NewTable;
As I pointed out above, there might be reasons not to do this, e.g., data size.
In MySQL, I have two different databases -- let's call them A and B.
Database A resides on server server1, while database B resides on server server2.
Both servers {A, B} are physically close to each other, but are on different machines and have different connection parameters (different username, different password etc).
In such a case, is it possible to perform a join between a table that is in database A, to a table that is in database B?
If so, how do I go about it, programatically, in python? (I am using python's MySQLDB to separately interact with each one of the databases).
Try to use FEDERATED Storage Engine.
Workaround: it is possible to use another DBMS to retrieve data between two databases, for example you could do it using linked servers in MS SQL Server (see sp_addlinkedserver stored procedure). From the documentation:
A linked server allows for access to distributed, heterogeneous queries against OLE DB data sources.
It is very simple - select data from one server, select data from another server and aggregate using Python. If you would like to have SQL query with JOIN - put result from both servers into separate tables in local SQLite database and write SELECT with JOIN.
No. It is not possible to do the join as you would like. But you may be able to sort something out by replicating one of the servers to the other for the individual database.
One data set is under the control of one copy of MySQL and the other dataset is under the control of the other copy of MySQL. The query can only be processed by one of the (MySQL) servers.
If you create a copy of the second database on the first server or vice versa (the one that gets the fewest updates is best) you can set up replication to keep the copy up to date. You will then be able to run the query as you want.