I have a problem. I need parse muliple xml files and insert data to database.
import os
from lxml import etree
import sqlite3
conn = sqlite3.connect("xml.db")
cursor = conn.cursor()
path = 'C:/tools/XML'
for filename in os.listdir(path):
fullname = os.path.join(path, filename)
tree = etree.parse(fullname)
test = tree.xpath('//*[#name="Name"]/text()')
tpl = tuple(test)
cursor.executemany("INSERT INTO parsee VALUES (?);", (tpl,))
conn.commit()
sql = "SELECT * FROM parsee"
cursor.execute(sql)
print(cursor.fetchall())
result:
[('testname1',)]
If I run the program again the program adds another same name. Result:
[('testname1',),('testname1',)]
There are 100 files in folder:
<curent name="Name">testname1<curent>
<curent name="Name">testname2<curent>
<curent name="Name">testname3<curent>
<curent name="Name">testname4<curent>
Since I don't have admin rights to install lxml on my computer, I will use a battery (class) that is included in Python by default to deal with XPATHs - xml.etree.ElementTree. However, my code will show you how to insert multiple records in SQLITE using executemany()
It looks like in C:/tools/XML you will have many xml files of same structure.
I put following two in a folder to simulate this (I noted that your example has 'curent' as element, not sure if it is a typo, I am using 'current')
file1.xml
<note>
<current name="Name">testname1</current>
<current name="Name">testname2</current>
<otherdetail></otherdetail>
</note>
file2.xml
<note>
<current name="Name">testname3</current>
<current name="Name">testname4</current>
<otherdetail></otherdetail>
</note>
Created an sqlite database case called xml.db and a table in it with following statement
CREATE TABLE PARSEE (NAME VARCHAR(100));
And here is my python script
import os
import xml.etree.ElementTree as ET
import sqlite3
conn = sqlite3.connect("xml.db")
cursor = conn.cursor()
path = 'C:/tools/XML'
for filename in os.listdir(path):
fullname = os.path.join(path, filename)
print("Parsing file: %s" %fullname)
tree = ET.parse(fullname)
root = tree.getroot()
elements = root.findall(".//*[#name='Name']");
names = [(e.text,) for e in elements]
print("Names found: %s" %names)
cursor.executemany("INSERT INTO PARSEE VALUES (?)", names)
conn.commit()
sql = "SELECT * FROM PARSEE"
print("Printing table PARSEE content")
cursor.execute(sql)
print(cursor.fetchall())
And here is the output
Related
I would like to store my pictures into my sqlite database with python. How can I do that?
Here is my code, but it is not working:
import sqlite3
import os
conn = sqlite3.connect('images.db')
cursor = conn.cursor()
sql_bd = 'CREATE TABLE IF NOT EXISTS tabela (foto BLOB);'
for i in os.listdir('\myphotos'):
cursor.execute("INSERT INTO tabela (foto) VALUES (?);", i)
conn.commit()
cursor.close()
conn.close()
Anyone could help me, please?
Using pathlib instead of os.listdir, it can become:
from pathlib import Path
mydir = Path("myphotos")
for image_path in Path("/myphotos").iterdir():
...
if image_path.suffix.lower() in [".jpeg", ".jpg", ...]:
cursor.execute("INSERT INTO tabela (foto) VALUES (?);", image_path.read_bytes()
...
Pathlib's iterdir already yields the full path for each file, not just the filename, and provide the read_bytes method: no need to open the file and call a method on the returned object.
I have a table in my PostgreSQL database in which a column type is set to bytea in order to store zipped files.
The storing procedure works fine. I have problems when I need to retrieve the zipped file I uploaded.
def getAnsibleByLibrary(projectId):
con = psycopg2.connect(
database="xyz",
user="user",
password="pwd",
host="localhost",
port="5432",
)
print("Database opened successfully")
cur = con.cursor()
query = "SELECT ansiblezip FROM library WHERE library.id = (SELECT libraryid from project WHERE project.id = '"
query += str(projectId)
query += "')"
cur.execute(query)
rows = cur.fetchall()
repository = rows[0][0]
con.commit()
con.close()
print(repository, type(repository))
with open("zippedOne.zip", "wb") as fin:
fin.write(repository)
This code creates a zippedOne.zip file but it seems to be an invalid archive.
I tried also saving repository.tobytes() but it gives the same result.
I don't understand how I can handle memoriview objects.
If I try:
print(repository, type(repository))
the result is:
<memory at 0x7f6b62879348> <class 'memoryview'>
If I try to unzip the file:
chain#wraware:~$ unzip zippedOne.zip
The result is:
Archive: zippedOne.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of zippedOne.zip or
zippedOne.zip.zip, and cannot find zippedOne.zip.ZIP, period.
Trying to extract it in windows gives me the error: "The compressed (zipped) folder is invalid"
This code, based on the example in the question, works for me:
import io
import zipfile
import psycopg2
DROP = """DROP TABLE IF EXISTS so69434887"""
CREATE = """\
CREATE TABLE so69434887 (
id serial primary key,
ansiblezip bytea
)
"""
buf = io.BytesIO()
with zipfile.ZipFile(buf, mode='w') as zf:
zf.writestr('so69434887.txt', 'abc')
with psycopg2.connect(database="test") as conn:
cur = conn.cursor()
cur.execute(DROP)
cur.execute(CREATE)
conn.commit()
cur.execute("""INSERT INTO so69434887 (ansiblezip) VALUES (%s)""", (buf.getvalue(),))
conn.commit()
cur.execute("""SELECT ansiblezip FROM so69434887""")
memview, = cur.fetchone()
with open('so69434887.zip', 'wb') as f:
f.write(memview)
and is unzippable (on Linux, at least)
$ unzip -p so69434887.zip so69434887.txt
abc
So perhaps the data is not being inserted correctly.
FWIW I got the "End-of-central-directory signature not found" until I made sure I closed the zipfile object before writing to the database.
I have a requirement that there are a lot of files (such as image, .csv) saved in a table hosted in Azure PostgreSQL. Files are saved as binary data type. Is it possible extract them directly to local file system by SQL query? I am using python as my programming language, any guide or code sample is appreciated, thanks!
If you just want to extract binary files from SQL to local and save as a file, try the code below:
import psycopg2
import os
connstr = "<conn string>"
rootPath = "d:/"
def saveBinaryToFile(sqlRowData):
destPath = rootPath + str(sqlRowData[1])
if(os.path.isdir(destPath)):
destPath +='_2'
os.mkdir(destPath)
else:
os.mkdir(destPath)
newfile = open(destPath +'/' + sqlRowData[0]+".jpg", "wb");
newfile.write(sqlRowData[2])
newfile.close
conn = psycopg2.connect(connstr)
cur = conn.cursor()
sql = 'select * from images'
cur.execute(sql)
rows = cur.fetchall()
print(sql)
print('result:' + str(rows))
for i in range(len(rows)):
saveBinaryToFile(rows[i])
conn.close()
This is my sample SQL table :
Result:
I am using Python 3.6 to iterate through a folder structure and return the file paths of all these CSVs I want to import into two already created Oracle tables.
con = cx_Oracle.connect('BLAH/BLAH#XXX:666/BLAH')
#Targets the exact filepaths of the CSVs we want to import into the Oracle database
if os.access(base_cust_path, os.W_OK):
for path, dirs, files in os.walk(base_cust_path):
if "Daily" not in path and "Daily" not in dirs and "Jul" not in path and "2017-07" not in path:
for f in files:
if "OUTPUT" in f and "MERGE" not in f and "DD" not in f:
print("Import to OUTPUT table: "+ path + "/" + f)
#Run function to import to SQL Table 1
if "MERGE" in f and "OUTPUT" not in f and "DD" not in f:
print("Import to MERGE table: "+ path + "/" + f)
#Run function to import to SQL Table 2
A while ago I was able to use PHP to produce a function that used the BULK INSERT SQL command for SQL Server:
function bulkInserttoDB($csvPath){
$tablename = "[DATABASE].[dbo].[TABLE]";
$insert = "BULK
INSERT ".$tablename."
FROM '".$csvPath."'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\\n')";
print_r($insert);
print_r("<br>");
$result = odbc_prepare($GLOBALS['connection'], $insert);
odbc_execute($result)or die(odbc_error($connection));
}
I was looking to replicate this for Python, but a few Google searches left me to believe there is no 'BULK INSERT' command for Oracle. This BULK INSERT command had awesome performance.
Since these CSVs I am loading are huge (2GB x 365), performance is crucial. What is the most efficient way of doing this?
The bulk insert is made using the cx_oracle library and the commands
con = cx_Oracle.connect(CONNECTION_STRING)
cur= con.cursor()
cur.prepare("INSERT INTO MyTable values (
to_date(:1,'YYYY/MM/DD HH24:MI:SS'),
:2,
:3,
to_date(:4,'YYYY/MM/DD HH24:MI:SS'),
:5,
:6,
to_date(:7,'YYYY/MM/DD HH24:MI:SS'),
:8,
to_date(:9,'YYYY/MM/DD HH24:MI:SS'))"
) ##prepare your statment
list.append((sline[0],sline[1],sline[2],sline[3],sline[4],sline[5],sline[6],sline[7],sline[8])) ##prepare your data
cur.executemany(None, list) ##insert
you prepare an insert statement. Then you store your file and your list. finally you execute the many. It will paralyze everything.
Can anyone tell me where I have done wrong?
my code:
import csv
import sqlite3
import os
import subprocess
import glob
#Connect to database
conn = sqlite3.connect("Mpeg_editor_Final.db")
try:
conn.execute("drop table Mpeg_editor_Final")
conn.execute("drop table edited")
conn.execute("drop table ffmpeg")
except sqlite3.OperationalError, e:
print e.message
#CREATE table in databse
conn.execute("PRAGMA foreign_keys = 1")
conn.execute("CREATE TABLE Mpeg_editor_Final (fileName VARCHAR(120), fileType VARCHAR(120), fileFolder VARCHAR(120))")
conn.execute("CREATE TABLE edited (fileName VARCHAR(120), fileType VARCHAR(120), fileFolder VARCHAR(120))")
conn.execute("CREATE TABLE ffmpeg (fileName VARCHAR(120), fileType VARCHAR(120), fileFolder VARCHAR(120))")
#mpegEditorFinal file location
mpegEditorFinal = 'C:\Mpeg_editor_Final'
#list all folders and file in Mpeg_editor_Final
mpegEditorFinaldirs = os.listdir(mpegEditorFinal)
# tell file's extensions
for i in mpegEditorFinaldirs:
mpegEditorFinalext = os.path.splitext(i)
#find current path
for x in mpegEditorFinaldirs:
mpegEditorFinalpath = os.path.dirname(x)
#To write information into the Mpeg_editor_final table
conn.executemany("INSERT INTO Mpeg_editor_Final (fileName, fileType, fileFolder) VALUES (?,?,?);", [mpegEditorFinaldirs , mpegEditorFinalext , mpegEditorFinalpath,])
conn.commit()
The error message:
Message File Name Line Position
Traceback
<module> C:\Mpeg_editor_Final\database.py 36
ProgrammingError: Incorrect number of bindings supplied. The current statement uses 3, and there are 16 supplied.
You've used executemany(), but you only provide parameters for a single execution. Either nest them further, or use execute() instead.
conn.executemany("INSERT INTO Mpeg_editor_Final (fileName, fileType, fileFolder) VALUES (?,?,?);", [[mpegEditorFinaldirs, mpegEditorFinalext, mpegEditorFinalpath]])
conn.execute("INSERT INTO Mpeg_editor_Final (fileName, fileType, fileFolder) VALUES (?,?,?);", [mpegEditorFinaldirs, mpegEditorFinalext, mpegEditorFinalpath])
Its because the conn.execute requires two parameters ..Here you havnt provided the exact parameters.
See here.It solves your problem .