Read csv into database SQLite3 ODO Python - python

I am trying to read in a csv into a new table in a new databased using ODO, SQLite3 and Python.
I am following these guides:
https://media.readthedocs.org/pdf/odo/latest/odo.pdf
http://odo.pydata.org/en/latest/perf.html?highlight=sqlite#csv-sqlite3-57m-31s
I am trying the following:
import sqlite3
import csv
from odo import odo
file_path = 'my_path/'
# In this case 'my_path/' is a substitute for my real path
db_name = 'data.sqlite'
conn = sqlite3.connect(file_path + db_name)
This creates a new sqlite file data.sqlite within file_path. I can see it there in the folder.
When I then try to read my csv into this database I get the following error:
csv_path = 'my_path/data.csv'
odo(csv_path, file_path + db_name)
conn.close()
NotImplementedError: Unable to parse uri to data resource: # lists my path
Can you help?

No thanks to the ODO documentation, this succesfully created a new table in a new database and read in the csv file to that database:
import sqlite3
import csv
from odo import odo
# [1]
# Specify file path
file_path = 'my_path/'
# In this case 'my_path/' is a substitute for my real path
# Specify csv file path and name
csv_path = file_path + 'data.csv'
# Specify database name
db_name = 'data.sqlite'
# Connect to new database
conn = sqlite3.connect(file_path + db_name)
# [2]
# Use Odo to detect the shape and datatype of your csv:
data_shape = discover(resource(csv_path))
# Ready in csv to new table called 'data' within database 'data.sqlite'
odo(pd.read_csv(csv_path), 'sqlite:///' + file_path + 'data.sqlite::data', dshape=data_shape)
# Close database
conn.close()
Sources used in [1]:
https://docs.python.org/2/library/sqlite3.html
python odo sql AssertionError: datashape must be Record type, got 0 * {...}
Sources used in [2]:
https://stackoverflow.com/a/41584832/2254228
http://sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html#creating-a-new-sqlite-database
https://stackoverflow.com/a/33316230/2254228
what is difference between .sqlite and .db file?
The ODO documentation is here (good luck...) https://media.readthedocs.org/pdf/odo/latest/odo.pdf

I found the document in the document website and in github are different. Please use github version as reference.
The
NotImplementedError: Unable to parse uri to data resource
error is mentioned in this section.
You could solve by using
pip install odo[sqlite] or
pip install odo[sqlalchemy]
Then you may encounter another error if you use windows and odo 0.5.0:
AttributeError: 'DiGraph object has no attribute 'edge'
Install networkx 1.11 instead of networkx 2.0 could solve this error.
(reference)
pip uninstall networkx
pip install networkx==1.11
I hope this will help

Related

How to perform a CName Lookup on a csv file

I am in the processes of automating the CName lookup process via python and would like some help / thought on my current draft.
The goal is for the script to take in each site under the column 'site' and provide the cname of the site in another column named 'CName'
Here is what I have now:
# pip install pandas
from tkinter.dnd import dnd_start
import pandas as pd
# pip install dnspython
from dns import resolver, reversename
# pip install xlrd, pip install xlsxwriter, pip install socket
from pandas.io.excel import ExcelWriter
import time
import socket
import dns.resolver
startTime = time.time()
# Import excel called logs.xlsx as dataframe
# if CSV change to pd.read_csv('logs.csv', error_bad_lines=False)
logs = pd.read_csv(path to file)
# Create DF with dupliate sites filtered for check
logs_filtered = logs.drop_duplicates(['site']).copy()
def cNameLookup(site):
name = str(site).strip()
try:
cname = socket.AddressInfo(site)[0]
for val in cname:
print('CNAME Record : ', val.target)
except:
return 'N/A'
# Create CName column with the CName Lookup result
logs_filtered['cname'] = logs_filtered['site'].apply(cNameLookup)
# Merge DNS column to full logs matching IP
logs_filtered = logs.merge(logs_filtered[['site', 'cname']], how='left', on=['site'])
# output as Excel
writer = ExcelWriter('validated_logs.xlsx', engine='xlsxwriter', options={
'strings_to_urls': False})
logs_filtered.to_excel(writer, index=False)
writer.save()
print('File Succesfully written as validated_logs.xlsx')
print('The script took {0} second !'.format(time.time() - startTime))
As of now, when I run the script, all I get for the CName colum is 'N/A' all the way down, it seems as though the cname lookup portion of the code is not working as intended.
Thank you in advance for any help / suggestions!

Exporting CSV file into a specific path and Creating an automated code that runs daily

I am a novice at python and I am trying to create my first automated code in jupyter notebooks that will export my data pull from SQL server to a specific path and this code needs to run daily.
My questions:
1- It needs to export the CSV file to a specific folder, don't know how to do that
2- I need the code to run by itself on a daily basis
I am stuck, Any help is appreciated.
I have connected to the sql server and successfully pull the report and write a CSV file.
import smtplib
import pyodbc
import pandas as pd
import pandas.io.sql
server = 'example server'
db = 'ExternalUser'
conn = pyodbc.connect('Driver={SQL Server};'
'Server=example server;'
'Database=ExternalUser;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
cursor.execute("my SQL query")
col_headers = [ i[0] for i in cursor.description ]
rows = [ list(i) for i in cursor.fetchall()]
df = pd.DataFrame(rows, columns=col_headers)
df.to_csv("Test v2.csv", header = True, index=False)
For needing to export the csv too a certain folder: It depends where/how you run the script. If you run the script in the folder you want the csv file saved then your current df.to_csv('filename.csv') would work great, or add a path 'Test_dir/filename.csv'. Otherwise you could use a library like shutil (https://docs.python.org/3/library/shutil.html) that will then move the .csv file to a given folder.
For running the code on a daily basis, you could do this locally on your machine (https://medium.com/#thabo_65610/three-ways-to-automate-python-via-jupyter-notebook-d14aaa78de9). Or you could look into configuring a cronjob.

Loading csv files into database in linux

I have been scraping csv files from the web every minute and storing them into a directory.
The files are being named according to the time of retrieval:
name = 'train'+str(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"))+'.csv'
I need to upload each file into a database created on some remote server.
How can I do the above?
You can use pandas and sqlalchemy for loading CSV into databases. I use MSSQL and my code looks like this:
import os
import pandas as pd
import sqlalchemy as sa
server = 'your server'
database = 'your database'
for filename in os.listdir(directory): #iterate over files
df = pandas.read_csv(filename, sep=',')
engine = sa.create_engine('mssql+pyodbc://'+server+'/'+database+'?
driver=SQL+Server+Native+Client+11.0')
tableName = os.path.splitext(filename)[0]) #removes .csv extension
df.to_sql(tableName, con=engine,dtype=None) #sent data to server
By setting the dtype parameter you can change the conversion of datatype (e.g. if you want smallint instead of integer, etc)
to ensure you dont write the same file/table twice I would suggest to perhaps keep a logfile in the directory, where you can log what csv files are written to the DB. and then exclude those in your for-loop.

How to open a remote file with GDAL in Python through a Flask application

So, I'm developing a Flask application which uses the GDAL library, where I want to stream a .tif file through an url.
Right now I have method that reads a .tif file using gdal.Open(filepath). When run outside of the Flask environment (like in a Python console), it works fine by both specifying the filepath to a local file and a url.
from gdalconst import GA_ReadOnly
import gdal
filename = 'http://xxxxxxx.blob.core.windows.net/dsm/DSM_1km_6349_614.tif'
dataset = gdal.Open(filename, GA_ReadOnly )
if dataset is not None:
print 'Driver: ', dataset.GetDriver().ShortName,'/', \
dataset.GetDriver().LongName
However, when the following code is executed inside the Flask environement, I get the following message:
ERROR 4: `http://xxxxxxx.blob.core.windows.net/dsm/DSM_1km_6349_614.tif' does
not exist in the file system,
and is not recognised as a supported dataset name.
If I instead download the file to the local filesystem of the Flask app, and insert the path to the file, like this:
block_blob_service = get_blobservice() #Initialize block service
block_blob_service.get_blob_to_path('dsm', blobname, filename) # Get blob to local filesystem, path to file saved in filename
dataset = gdal.Open(filename, GA_ReadOnly)
That works just fine...
The thing is, since I'm requesting some big files (200 mb), I want to stream the files using the url instead of the local file reference.
Does anyone have an idea of what could be causing this? I also tried putting "/vsicurl_streaming/" in front of the url as suggested elsewhere.
I'm using Python 2.7, 32-bit with GDAL 2.0.2
Please try the follow code snippet:
from gzip import GzipFile
from io import BytesIO
import urllib2
from uuid import uuid4
from gdalconst import GA_ReadOnly
import gdal
def open_http_query(url):
try:
request = urllib2.Request(url,
headers={"Accept-Encoding": "gzip"})
response = urllib2.urlopen(request, timeout=30)
if response.info().get('Content-Encoding') == 'gzip':
return GzipFile(fileobj=BytesIO(response.read()))
else:
return response
except urllib2.URLError:
return None
url = 'http://xxx.blob.core.windows.net/container/example.tif'
image_data = open_http_query(url)
mmap_name = "/vsimem/"+uuid4().get_hex()
gdal.FileFromMemBuffer(mmap_name, image_data.read())
dataset = gdal.Open(mmap_name)
if dataset is not None:
print 'Driver: ', dataset.GetDriver().ShortName,'/', \
dataset.GetDriver().LongName
Which use a GDAL memory-mapped file to open an image retrieved via HTTP directly as a NumPy array without saving to a temporary file.
Refer to https://gist.github.com/jleinonen/5781308 for more info.

Psycopg2 - copy_expert permission denied error

I'm attempting to make the switch from Windows to ubuntu (am using 12.04 LTS) and am trying to use some of my old scripts to run my old databases.
Previously I used postgresql and psycopg2 to maintain them and I am trying to do so again here.
My error is around importing a csv file to a table using the copy expert command.
Code is as follows:
#!/usr/bin/env python
import psycopg2 as psy
import sys
conn = psy.connect("dbname, user, host, password") # with the appropriate values
curs = conn.cursor()
table = 'tablename' # a table with the appropriate columns etc
file = 'filename' # a csv file
SQL = "COPY %s FROM '%s' WITH CSV HEADERS" % (tablename, filename)
curs.copy_expert(SQL, sys.stdin) # Error occurs here
conn.commit()
curs.close()
conn.close()
The specific error which is occurring is as follows:
psycopg2.ProgrammingError: could not open file "filename" for reading: Permission denied
Any assistance would be greatly appreciated:
I am completely stuck and I believe it is due some quirk of how I've set up the database or the files.
Adding a simple read and print command using the csv module works fine as well (from the same script in fact) It will output all of the information from the csv file and then error out with the permission denied when attempting to import it to the database
import csv
f = open(filename, 'rb')
read = csv.reader(f, delimiter =',')
for row in read:
print row
f.close()
Try executing the command as the super user by using su or sudo and if this doesn't help, the other possiblity is that the location of the filename is out of bounds so I would try copying it to the desktop or your home directory or folder where you know you definitely have full permissions and see if this works.

Categories

Resources