Data Analysis using SQL/Python - python

I am working on the Data Analysis using SQL on Kaggle.
https://www.kaggle.com/dimarudov/data-analysis-using-sql/comments
However, I am not sure why tables is returning a blank database.
import numpy as np
import pandas as pd
import sqlite3
import matplotlib.pyplot as plt
path = r"C:/Users/ksumm/OneDrive/Desktop/Python Projects/Euro Soccer/database.sqlite"
database = path + 'database.sqlite'
conn = sqlite3.connect(database)
tables = pd.read_sql("""SELECT *
FROM sqlite_master
WHERE type='table';""", conn)
Output image:
Output

instead of using
database = path + 'database.sqlite'
You can directly use path since that path already contains path to sqlite database.
modified code :
import numpy as np
import pandas as pd
import sqlite3
import matplotlib.pyplot as plt
path = r"C:/Users/ksumm/OneDrive/Desktop/Python Projects/Euro Soccer/database.sqlite"
conn = sqlite3.connect(path)
tables = pd.read_sql("""SELECT *
FROM sqlite_master
WHERE type='table';""", conn)
OR
import numpy as np
import pandas as pd
import sqlite3
import matplotlib.pyplot as plt
path = r"C:/Users/ksumm/OneDrive/Desktop/Python Projects/Euro Soccer/"
database = path + "database.sqlite"
conn = sqlite3.connect(database)
tables = pd.read_sql("""SELECT *
FROM sqlite_master
WHERE type='table';""", conn)

path = r"C:/Users/ksumm/OneDrive/Desktop/Python Projects/Euro Soccer/database.sqlite"
database = path + 'database.sqlite'
You're appending the database name to the database name. If you look on your disk, you may find a file named
C:\Users\ksumm\OneDrive\Desktop\Python Projects\Euro Soccer\database.sqlitedatabase.sqlite
To avoid that in the future, the Python SQLite module has a slightly odd way to not create a database if it doesn't exist when you open it. The connect method accepts a URI, and the URI accepts parameters. When the filename is correct, this will do what you want:
conn = sqlite3.connect('file:%s?mode=rw' % database, uri=True )
If database does not describe an existing file, the rw mode causes the function to fail, raising a sqlite3.OperationalError exception.

Related

Imported function acts differently than one written in the same file

I have this structure:
Folder_1
- Scripts
- functions
- data_import.py
- main_notebook.ipynb
- Data
sales_data_1.csv
- SQL
- sales_data_1.sql
- sql_2.sql
Inside data_import.py I have this function:
import os
import pandas as pd
import numpy as np
import psycopg2 as pg
sql_path = r"C:Folder_1\SQL/" # path to local sql folder
data_path = r"C:Folder_1\Data/" # path to local data folder
conn = pg.connect(
dbname="db",
host="host",
user="user",
password="pw",
port="port",
)
conn.set_session(autocommit=True)
def get_data(sql_file_name):
# if file already exists, load it
if os.path.isfile(f'{data_path}{sql_file_name}.csv'):
df = pd.read_csv(f'{data_path}{sql_file_name}.csv')
return df
# otherwise, create it
else:
# get data from the internet
query = open(f'{sql_path}{sql_file_name}.sql', 'r').read()
df = pd.read_sql_query(query,conn)
# save it to a file
df.to_csv(f'{data_path}{sql_file_name}.csv', index=False)
# return it
return df
In my main_notebook.ipynb I import the functions like so:
from functions import data_import as jp
When I am trying to use it like this:
sales_data = jp.get_data('sales_data_1')
I get this:
But when I use the same, identical function in my main_notebook.ipynb ( with the imports and connections above ) I do get the actual df loaded if the file is not present in the data folder, the function correctly loads the query, saves the .csv file inside Data folder for the next use.
sales_data = get_data('sales_data_1') # does work as expected
But when I use it after importing it provides me with the query instead the actual pd.DataFrame. I am not sure where is my mistake, the goal is that the function inside the module data_import would work exactly as if it would be written in the main_notebook.ipynb file.

How to insert new row from a dataframe to PostgreSQL table

I am new in python. I am trying to insert new records from a dataframe to a postgres table. However I observed that everytime existing rows become duplicated.Though I want only new records will be updated into Postgres table and it will ignore existing records. I am using below codes. Can anyone help me on this.
from io import StringIO
from sqlalchemy import create_engine
import psycopg2
import psycopg2.extras as extras
import io
engine = create_engine('postgresql+psycopg2://postgres:Test_1234#localhost:5432/dbname')
selling.head(0).to_sql('html', engine, if_exists='append',index=False)
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
selling.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'html', null="")
conn.commit()

How to let user upload CSV to MongoDB from any directory

I've been trying to do a simple upload function that let's the user choose a CSV file from his PC and upload it into my Mongo DB. I am currently using Python, Pymongo and Pandas to do it and it works, but only with my "local" adress (C:\Users\joao.soeiro\Downloads) as it shows on the code.
I'd like to know how I could make this string "dynamic" so it reads and uploads files from anywhere, not only my computer. I know it must be a silly question but im really a begginer here...
Thought about creating some temporary directory using tempfile() module but idk how I'd put it to work in my code, which is the following:
import pandas as pd
from pymongo import MongoClient
client = MongoClient("mongodb+srv://xxx:xxx#bycardb.lrp4p.mongodb.net/myFirstDatabase?retryWrites=true&w=majority")
print('connected')
db = client['dbycar']
collection = db['users']
data = pd.read_csv(r'C:\Users\joao.soeiro\Downloads\csteste4.csv')
data.reset_index(inplace=True)
data_dict = data.to_dict("records")
collection.insert_many(data_dict)
Solved with this:
import tkinter as tk
from IPython.display import display
from tkinter import filedialog
import pandas as pd
from pymongo import MongoClient
#conecting db
client = MongoClient("mongodb+srv://xxxx:xxxx#bycardb.lrp4p.mongodb.net/myFirstDatabase?retryWrites=true&w=majority")
print('conectado com o banco')
db = client['dbycar']
collection = db['usuarios']
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
print(file_path)
data = pd.read_csv(file_path)
data.reset_index(inplace=True)
data_dict = data.to_dict("records")
df = pd.DataFrame(data_dict)
display(df)
collection.insert_many(data_dict)
print('uploaded')

How to save files from postgreSQL to local?

I have a requirement that there are a lot of files (such as image, .csv) saved in a table hosted in Azure PostgreSQL. Files are saved as binary data type. Is it possible extract them directly to local file system by SQL query? I am using python as my programming language, any guide or code sample is appreciated, thanks!
If you just want to extract binary files from SQL to local and save as a file, try the code below:
import psycopg2
import os
connstr = "<conn string>"
rootPath = "d:/"
def saveBinaryToFile(sqlRowData):
destPath = rootPath + str(sqlRowData[1])
if(os.path.isdir(destPath)):
destPath +='_2'
os.mkdir(destPath)
else:
os.mkdir(destPath)
newfile = open(destPath +'/' + sqlRowData[0]+".jpg", "wb");
newfile.write(sqlRowData[2])
newfile.close
conn = psycopg2.connect(connstr)
cur = conn.cursor()
sql = 'select * from images'
cur.execute(sql)
rows = cur.fetchall()
print(sql)
print('result:' + str(rows))
for i in range(len(rows)):
saveBinaryToFile(rows[i])
conn.close()
This is my sample SQL table :
Result:

Read csv into database SQLite3 ODO Python

I am trying to read in a csv into a new table in a new databased using ODO, SQLite3 and Python.
I am following these guides:
https://media.readthedocs.org/pdf/odo/latest/odo.pdf
http://odo.pydata.org/en/latest/perf.html?highlight=sqlite#csv-sqlite3-57m-31s
I am trying the following:
import sqlite3
import csv
from odo import odo
file_path = 'my_path/'
# In this case 'my_path/' is a substitute for my real path
db_name = 'data.sqlite'
conn = sqlite3.connect(file_path + db_name)
This creates a new sqlite file data.sqlite within file_path. I can see it there in the folder.
When I then try to read my csv into this database I get the following error:
csv_path = 'my_path/data.csv'
odo(csv_path, file_path + db_name)
conn.close()
NotImplementedError: Unable to parse uri to data resource: # lists my path
Can you help?
No thanks to the ODO documentation, this succesfully created a new table in a new database and read in the csv file to that database:
import sqlite3
import csv
from odo import odo
# [1]
# Specify file path
file_path = 'my_path/'
# In this case 'my_path/' is a substitute for my real path
# Specify csv file path and name
csv_path = file_path + 'data.csv'
# Specify database name
db_name = 'data.sqlite'
# Connect to new database
conn = sqlite3.connect(file_path + db_name)
# [2]
# Use Odo to detect the shape and datatype of your csv:
data_shape = discover(resource(csv_path))
# Ready in csv to new table called 'data' within database 'data.sqlite'
odo(pd.read_csv(csv_path), 'sqlite:///' + file_path + 'data.sqlite::data', dshape=data_shape)
# Close database
conn.close()
Sources used in [1]:
https://docs.python.org/2/library/sqlite3.html
python odo sql AssertionError: datashape must be Record type, got 0 * {...}
Sources used in [2]:
https://stackoverflow.com/a/41584832/2254228
http://sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html#creating-a-new-sqlite-database
https://stackoverflow.com/a/33316230/2254228
what is difference between .sqlite and .db file?
The ODO documentation is here (good luck...) https://media.readthedocs.org/pdf/odo/latest/odo.pdf
I found the document in the document website and in github are different. Please use github version as reference.
The
NotImplementedError: Unable to parse uri to data resource
error is mentioned in this section.
You could solve by using
pip install odo[sqlite] or
pip install odo[sqlalchemy]
Then you may encounter another error if you use windows and odo 0.5.0:
AttributeError: 'DiGraph object has no attribute 'edge'
Install networkx 1.11 instead of networkx 2.0 could solve this error.
(reference)
pip uninstall networkx
pip install networkx==1.11
I hope this will help

Categories

Resources