Is there a way to perform an SQL query that joins a MySQL table with a dict-like structure that is not in the database but instead provided in the query?
In particular, I regularly need to post-process data I extract from a database with the respective exchange rates. Exchange rates are not stored in the database but retrieved on the fly and stored temporarily in a Python dict.
So, I have a dict: exchange_rates = {'EUR': 1.10, 'GBP': 1.31, ...}.
Let's say some query returns something like: id, amount, currency_code.
Would it be possible to add the dict to the query so I can return: id, amount, currency_code, usd_amount? This would remove the need to post-process in Python.
This solution doesn't use a 'join', but does combine the data from Python into SQL via a case statement. You could generate the sql you want in python (as a string) that includes these values in a giant case statement.
You give no details, and don't say which version of Python, so it's hard to provide useful code. But This works with Python 2.7 and assumes you have some connection to the MySQL db in python:
exchange_rates = {'EUR': 1.10, 'GBP': 1.31, ...}
# create a long set of case conditions as a string
er_case_statement = "\n".join("mytable.currency = \"{0}\" then {1}".format(k,v) for (k,v) in exchange_rates.iteritems())
# build the sql with these case statements
sql = """select <some stuff>,
case {0}
end as exchange_rate,
other columns
from tables etc
where etc
""".format(er_case_statement)
Then send this SQL to MySQL
I don't like this solution; you end up with a very large SQL statement which can hit the maximum ( What is maximum query size for mysql? ).
Another idea is to use temporary tables in mysql. Again assuming you are connecting to the db in python, with python create the sql that creates a temporary table and insert the exchange rates, send that to MySQL, then build a query that joins your data to that temporary table.
Finally you say you don't want to post-process in python but you have a dict from somewhere do I don't know which environment you are using BUT if you can get these exchange rates from the web, say with CURL, then you could use shell to also insert these values into a MySQL temp table, and join there.
sorry this is general and not specific, but the question could use more specificity. Hope it helps someone else give a more targeted answer.
Related
I have a table where I wrote 1.6 million records, and each has two columns: an ID, and a JSON string column.
I want to select all of those records and write the json in each row as a file. However, the query result is too large, and I get the 403 associated with that:
"403 Response too large to return. Consider specifying a destination table in your job configuration."
I've been looking at the below documentation around this and understand that they recommend specifying a table for the results and viewing them there, BUT all I want to do is select * from the table, so that would effectively just be copying it over, and I feel like I would run into the same issue querying that result table.
https://cloud.google.com/bigquery/docs/reference/standard-sql/introduction
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.allow_large_results
What is the best practice here? Pagination? Table sampling? list_rows?
I'm using the python client library as stated in the question title. My current code is just this:
query = f'SELECT * FROM `{project}.{dataset}.{table}`'
return client.query(query)
I should also mention that the IDs are not sequential, they're just alphanumerics.
The best practice and efficient way is to export your data and then download it instead of querying the whole table (SELECT *).
From there, you may extract your needed data from the exported files (eg. CSV, JSON, etc) using python code without having to wait for your code to finish the SELECT * query.
I am trying to find a way, either in R or Python, to use a dataframe as a table in an Oracle SQL statement.
It is impractical, for my objective, to:
Create a string out of a column and use that as a criteria (more than a 1k, which is the limit)
Create a new table in the database and use that (don't have access)
Download the entire contents of the table and merge in pandas (millions of records in the database and would bog down the db and my system)
I have found packages that will allow you to "register" a dataframe and have it act as a "table/view" to allow queries against it, but it will not allow them to be used in a query with a different connection string. Can anyone point me in the right direction? Either to allow two different connections in the same SQL statement (to Oracle and a package like DuckDB) to permit an inner join or direct link to the dataframe and allow that to be used as a table in a join?
SAS does this so effortlessly and I don't want to go back to SAS because the other functionality is not as good as Python / R, but this is a dealbreaker if I can't do database extractions.
Answering my own question here -- after much research.
In short, this cannot be done. A series of criteria, outside of a list or concat, you cannot create a dataframe in python or R and pass it through a query into a SQL Server or Oracle database. It's unfortunate, but if you don't have permissions to write to temporary tables in the Oracle database, you're out of options.
I have a table in SQL Server and the table has already data for month of November. I have to insert data for previous months such as starting from Jan through October. I have data in a spreadsheet. I want to do bulk insert using Python. I have successfully established the connection to the server using Python and able to access the table. However, I don't know how to insert data above the rows those are already present in the table of the server. The table doesn't have any constraints, primary keys and index.
I am not sure whether the insertion is possible based on the condition. If it is possible kindly share some clues.
Notes: I don't have access to SSIS. I can't do insertion using "BULK INSERT" because the I can't map my shared drive with SQL server. That's why I have decided to use python script to do the operation.
SQL Server Management Studio is just the GUI for interacting with SQL Server.
However, I don't know how to insert data above the rows those are
already present in the table of the server
Tables are ordered or structured based off the clustered index. Since you don't have one since you said there aren't any PK's or indexes, inserting the records "below" or "above" won't happen. A table without a clustered index is called a HEAP which is what you have.
Thus, just insert the data. The order will be determined by any order by clauses you place on a statement (at least the order of the results) or the clustered index on the table if you create one.
I assume you think your data is ordered because, by chance, when you run select * from table your results appear to be in the same order each time. However, this blog will show you that this isn't guaranteed and elaborates on the fact that your results truly aren't ordered without an order by clause.
I know there are various ETL tools available to export data from oracle to MongoDB but i wish to use python as intermediate to perform this. Please can anyone guide me how to proceed with this?
Requirement:
Initially i want to add all the records from oracle to mongoDB and after that I want to insert only newly inserted records from Oracle into MongoDB.
Appreciate any kind of help.
To answer your question directly:
1. Connect to Oracle
2. Fetch all the delta data by timestamp or id (first time is all records)
3. Transform the data to json
4. Write the json to mongo with pymongo
5. Save the maximum timestamp / id for next iteration
Keep in mind that you should think about the data model considerations and usually relational DB (like Oracle) and document DB (like mongo) will have different data model.
I'm playing around with Python 3's sqlite3 module, and acquainting myself with SQL in the process.
I've written a toy program to hash a salted password and store it, the associated username, and the salt into a database. I thought it would be intuitive to create a function of the signature:
def store(table, data, database=':memory:')
Callable as, for example, store('logins', {'username': 'bob', 'salt': 'foo', 'salted_hash' : 'bar'}), and be able to individually add into logins, into new a row, the value bob for username, foo for salt, et caetera.
Unfortunately I'm swamped with what SQL to code. I'm trying to do this in a "dynamically typed" fashion, in that I won't be punished for storing the wrong types, or be able to add new columns at will, for example.
I want the function to, sanitizing all input:
Check if the table exists, and create it if it doesn't, with the passed keys from the dictionary as the columns;
If the table already exists, check if a table has the specified columns (the keys to the passed dictionary), and add them if it doesn't (is this even possible with SQL?);
Add the individual values from my dictionary to the appropriate columns in the dictionary.
I can use INSERT for the latter, but it seems very rigid. What happens if the columns don't exist, for example? How could we then add them?
I don't mind whether the code is tailored to Python 3's sqlite3, or just the SQL as an outline; as long as I can work it and use it to some extent (and learn from it) I'm very grateful.
(On a different note, I'm wondering what other approaches I could use instead of a SQL'd relational database; I've used Amazon's SimpleDB before and have considered using that for this purpose as it was very "dynamically typed", but I want to know what SQL code I'd have to use for this purpose.)
SQLite3 is dynamically typed, so no problem there.
CREATE TABLE IF NOT EXISTS <name> ... See here.
You can see if the columns you need already exist in the table by using sqlite_master documented in this FAQ. You'll need to parse the sql column, but since it's exactly what your program provided to create the table, you should know the syntax.
If the column does not exist, you can ALTER TABLE <nam>? ADD COLUMN ... See here.