I through my collegue recieved .db file (which includes text and number data) which I need to pass into pandas dataframe for further processing. I never worked or know about SQLite. But, with few google search,I written following line of code:
import pandas as pd
import numpy as np
import sqlite3
conn = sqlite3.connect('data.db') # This create `data.sqlite`
sql="""
SELECT * FROM data;
"""
df=pd.read_sql_query(sql,conn)
df.head()
This giving me following error
'error Execution failed on sql ' SELECT * FROM data;
': no such table: data
What table this code is referring to ? I had only data.db.
I do not quite understand where i am going wrong with this. Any advice how to get my data into dataframe df?
I'm also new to SQL but based on what you've provided, "data" is referring to a table in your database "data.db".
The query that you typed is instructing the program to select all items from the table called "data". This website helped me with creating tables: https://www.tutorialspoint.com/sqlite/sqlite_create_table.htm
Related
I have an excel file with 2 sheets that I wish to manipulate using SQL in a Jupyter Notebook. I remember a long time ago going through some tutorial that explained you can use SQL within Python just by adding %sql before each line. But from google searches I can't figure out how to get this to work. It seems I need to create a database and then a connection, etc. But the database is just going to be the pandas dataframes I've already imported, I don't need to connect to any external database, right? (Sorry I know it must sound like such a stupid question but I've never created my own databases before).
Here's what I've tried:
import pandas as pd
import sqlite3
%load_ext sql
%sql sqlite://
path = (my file path)
orders = pd.read_excel(path,sheet_name = 1)
items = pd.read_excel(path,sheet_name = 2)
%sql select * from orders
Then I get an error:
* sqlite://
(sqlite3.OperationalError) no such table: orders
[SQL: select * from orders]
Hope everyone is well and staying safe.
I'm uploading an Excel file to SQL using Python. I have three fields: CarNumber nvarchar(50), Status nvarchar(50), and DisassemblyStart date.
I'm having an issue importing the DisassemblyStart field dates. The connection and transaction using Python are succesful.
However I get zeroes all over, even though the Excel file is populated with dates. I've tried switching to nvarchar(50), date, and datetime to see if I can least get a string and nothing. I saved the Excel file as CSV and TXT and tried uploading it and still got zeroes. I added 0.001 to every date in Excel (as to add an artificial time) in case that would make it clic but nothing happened. Still zeroes. I sure there's a major oversight from being too much in the weeds. I need help.
The Excel file has the three field columns.
This is the Python code:
import pandas as pd
import pyodbc
# Import CSV
data = pd.read_csv (r'PATH_TO_CSV\\\XXX.csv')
df = pd.DataFrame(data,columns= ['CarNumber','Status','DisassemblyStart'])
df = df.fillna(value=0)
# Connect to SQL Server
conn = pyodbc.connect("Driver={SQL Server};Server=SERVERNAME,PORT ;Database=DATABASENAME;Uid=USER;Pwd=PW;")
cursor = conn.cursor()
# Create Table
cursor.execute ('DROP TABLE OPS.dbo.TABLE')
cursor.execute ('CREATE TABLE OPS.dbo.TABLE (CarNumber nvarchar(50),Status nvarchar(50), DisassemblyStart date)')
# Insert READ_CSV INTO TABLE
for row in df.itertuples():
cursor.execute('INSERT INTO OPS.dbo.TABLE (CarNumber,Status,DisassemblyStart) VALUES (?,?,Convert(datetime,?,23))',row.CarNumber,row.Status,row.DisassemblyStart)
conn.commit()
conn.close()
Help will be much appreciated.
Thank you and be safe,
David
In trying to import an sql database into a python pandas dataframe, and I am getting a syntax error. I am newbie here, so probably the issue is very simple.
After downloading sqlite sample chinook.db from http://www.sqlitetutorial.net/sqlite-sample-database/
and reading pandas documentation, I tried to load it into a pandas dataframe with
import pandas as pd
import sqlite3
conn = sqlite3.connect('chinook.db')
df = pd.read_sql('albums', conn)
where 'albums' is a table of 'chinook.db' gathered with sqlite3 from command line.
The result is:
...
DatabaseError: Execution failed on sql 'albums': near "albums": syntax error
I tried variations of the above code to import in an ipython session the tables of the database for exploratory data analysis, with no success.
What am I doing wrong? Is there a documentation/tutorial for newbies with some examples around?
Thanks in advance for your help!
Found it!
An example of db connection with SQLAlchemy can be found here:
https://www.codementor.io/sagaragarwal94/building-a-basic-restful-api-in-python-58k02xsiq
import pandas as pd
from sqlalchemy import create_engine
db_connect = create_engine('sqlite:///chinook.db')
df = pd.read_sql('albums', con=db_connect)
print(df)
As suggested by #Anky_91, also pd.read_sql_table works, as read_sql wraps it.
The issue was the connection, that has to be made with SQLAlchemy and not with sqlite3.
Thanks
I wrote a small script in Python that could help me to extract data from a database. Here is my script :
#!/usr/bin/python3
import pandas as pd
from sqlalchemy import create_engine
#connect to server
mytab = create_engine('mssql+pyodbc://test:test1#mypass')
#sql query that retrieves my table
df = pd.read_sql('select * from FO_INV', mytab)
#query result to excel file
df.to_csv('inventory.csv', index=False, sep=',', encoding='utf-8')
Everything works fine if I choose to select top 100 rows for example. But for the whole table, it take forever !!!
Do you have any idea or recommendations, please ?
Thank you in advance :)
I would suggest using pyodbc instead of SQLALCHEMY.
Something like this:
import pyodbc
mytab = pyodbc.connect('DRIVER={SQL SERVER};SERVER=.\;DATABASE=myDB;UID=user;PWD=pwd')
Check your timings with this. This should be faster.
I have a tabledata.csv file and I have been using pandas.read_csv to read or choose specific columns with specific conditions.
For instance I use the following code to select all "name" where session_id =1, which is working fine on IPython Notebook on datascientistworkbench.
df = pandas.read_csv('/resources/data/findhelp/tabledata.csv')
df['name'][df['session_id']==1]
I just wonder after I have read the csv file, is it possible to somehow "switch/read" it as a sql database. (i am pretty sure that i did not explain it well using the correct terms, sorry about that!). But what I want is that I do want to use SQL statements on IPython notebook to choose specific rows with specific conditions. Like I could use something like:
Select `name`, count(distinct `session_id`) from tabledata where `session_id` like "100.1%" group by `session_id` order by `session_id`
But I guess I do need to figure out a way to change the csv file into another version so that I could use sql statement. Many thx!
Here is a quick primer on pandas and sql, using the builtin sqlite3 package. Generally speaking you can do all SQL operations in pandas in one way or another. But databases are of course useful. The first thing you need to do is store the original df in a sql database so that you can query it. Steps listed below.
import pandas as pd
import sqlite3
#read the CSV
df = pd.read_csv('/resources/data/findhelp/tabledata.csv')
#connect to a database
conn = sqlite3.connect("Any_Database_Name.db") #if the db does not exist, this creates a Any_Database_Name.db file in the current directory
#store your table in the database:
df.to_sql('Some_Table_Name', conn)
#read a SQL Query out of your database and into a pandas dataframe
sql_string = 'SELECT * FROM Some_Table_Name'
df = pd.read_sql(sql_string, conn)
Another answer suggested using SQLite. However, DuckDB is a much faster alternative than loading your data into SQLite.
First, loading your data will take time; second, SQLite is not optimized for analytical queries (e.g., aggregations).
Here's a full example you can run in a Jupyter notebook:
Installation
pip install jupysql duckdb duckdb-engine
Note: if you want to run this in a notebook, use %pip install jupysql duckdb duckdb-engine
Example
Load extension (%sql magic) and create in-memory database:
%load_ext SQL
%sql duckdb://
Download some sample CSV data:
from urllib.request import urlretrieve
urlretrieve("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv", "penguins.csv")
Query:
%%sql
SELECT species, COUNT(*) AS count
FROM penguins.csv
GROUP BY species
ORDER BY count DESC
JupySQL documentation available here