Creating SQL Scripts from Excel Spreadsheets - Best method? - python

I have an excel spreadsheet which contains several columns; e.g. table, column name, join conditions etc, that will be used as inputs to populate DDL scripts that will create other large tables.
I need to be able to parse this excel spreadsheet to create the SQL DML queries, and am trying to work out the best method to do so. I was thinking of creating a parser in Python using Pandas, but was wondering does anybody have any ideas of the best approach?

Related

Python or R -- create a SQL join using a dataframe

I am trying to find a way, either in R or Python, to use a dataframe as a table in an Oracle SQL statement.
It is impractical, for my objective, to:
Create a string out of a column and use that as a criteria (more than a 1k, which is the limit)
Create a new table in the database and use that (don't have access)
Download the entire contents of the table and merge in pandas (millions of records in the database and would bog down the db and my system)
I have found packages that will allow you to "register" a dataframe and have it act as a "table/view" to allow queries against it, but it will not allow them to be used in a query with a different connection string. Can anyone point me in the right direction? Either to allow two different connections in the same SQL statement (to Oracle and a package like DuckDB) to permit an inner join or direct link to the dataframe and allow that to be used as a table in a join?
SAS does this so effortlessly and I don't want to go back to SAS because the other functionality is not as good as Python / R, but this is a dealbreaker if I can't do database extractions.
Answering my own question here -- after much research.
In short, this cannot be done. A series of criteria, outside of a list or concat, you cannot create a dataframe in python or R and pass it through a query into a SQL Server or Oracle database. It's unfortunate, but if you don't have permissions to write to temporary tables in the Oracle database, you're out of options.

Selecting data from a CSV and entering that data into a table in SQLite

I am trying to figure out how to iterate through rows in a .CSV files and enter that data into a table in sqlite but only if the data in that row meets certain criteria.
I am trying to build a database of my personal spending. I have used python to categorise my spending data I now want to enter that data into a database with each category as a different table. This means I need to sort the data and enter it into different tables based on the category of spend.
I looked for quite a long time. Can anyone help?
You need to read the CSV file using pandas and store it in a pandas DataFrame. Then (If you did not create already a database) use SQLAlchemy library (Here is the documentation) to create an engine engine = sqlalchemy.create_engine('sqlite:///file.db').
Afterwards, you need to convert the DataFrame to the SQL database using pandas to_sql function (Documentation). df.to_sql('file_name', engine, index=False). I used the index=False to avoid creating a column for the index of the DataFrame.

Insert Information to MySQL from multiple related excel files

So i have this huge DB schema from a vehicle board cards, this data is actually stored in multiple excel files, my job was to create a database scheema to dump all this data into a MySql, but now i need to create the process to insert data into the DB.
This is an example of how is the excel tables sorted:
The thing is that all this excel files are not well tagged.
My question is, what do i need to do in order to create a script to dump all this data from the excel to the DB?
I'm also using ids, Foreign keys, Primary Keys, joins, etc.
I've thought about this so far:
1.-Normalize the structure of the tables in Excel in a good way so that data can be inserted with SQL language.
2.-Create a script in python to insert the data of each table.
Can you help out where should i start and how? what topics i should google?
With pandas you can easily read from excel (both csv and xlsx) and dump the data into any database
import pandas as pd
df = pd.read_excel('file.xlsx')
df.to_sql(sql_table)
If you have performance issues dumping to MySQL, you can find another way of doing the dump here
python pandas to_sql with sqlalchemy : how to speed up exporting to MS SQL?

How to pass a pandas DataFrame to a PostgreSQL Function (Stored Procedure)?

Any ideas on getting data into a PostgreSQL stored proc / function from Python?
I have a DataFrame built up from other data sources and I need to do some work with Postgres and then INSERT/UPDATE some data in PostgreSQL if the query is successful. I know I can get it to work using just Python and raw SQL queries in Python strings and inserting variables where needed, but I know this is poor practice.
In the past I've been able to pass a C# DataTable from C# to a SQL Stored Procedure using MS SQL Server and User-Defined Table types. Is there a way to do something similar with Python DataFrames within PostgreSQL?
This link has been really helpful on the syntax for Python variables to Postgres function, but I have not seen anything on passing Pandas DataFrames to a PostgreSQL function. Is this possible? Is there a different design pattern I should be using?

Best way to store a csv database in memory?

I have a text file (CSV) which acts as a database for my application formatted as follows:
ID(INT),NAME(STRING),AGE(INT)
1,John,23
2,Paul,34
3,Jack,12
Before you ask, I cannot get away from a CSV text file (imposed) but I can remove/change the first row (header) into another format or into another file all together (I added it to keep track of the schema).
When I start my application I want to read-in all the data in-memory so I can then query it and change it and stuff. I need to extra:
- Schema (column names and types)
- Data
I am trying to figure out what would be the best way to store this in memory using Python (very new to the language and its constructs) - any suggestions/recommendations?
Thanks,
If you use a Pandas DataFrame you can query it like it was an SQL table, and read it directly from CSV and write it back out as well. I think that this is the best option for you. It's very fast and performant, and builds on solid, proven technologies.

Categories

Resources