load multiple txt files into mysql

load multiple txt files into mysql - python

I have more than 40 txt files needed to be loaded into a table in Mysql. Each file contains 3 columns of data, each column lists one specific type of data, but in general the format of each txt file is exactly the same, but these file names are various, first I tried LOAD DATA LOCAL INFILE 'path/*.txt' INTO TABLE xxx"
Cause I think maybe use *.txt can let Mysql load all the txt file in this folder. But it turned out no.
So how can I let Mysql or python do this? Or do I need to merge them into one file manually first, then use LOAD DATA LOCAL INFILE command?
Many thanks!

If you want to avoid merging your text files, you can easily "scan" the folder and run the SQL import query for each file:
import os
for dirpath, dirsInDirpath, filesInDirPath in os.walk("yourFolderContainingTxtFiles"):
for myfile in filesInDirPath:
sqlQuery = "LOAD DATA INFILE %s INTO TABLE xxxx (col1,col2,...);" % os.path.join(dirpath, myfile)
# execute the query here using your mysql connector.
# I used string formatting to build the query, but you should use the safe placeholders provided by the mysql api instead of %s, to protect against SQL injections

The only and best way is to merge your data into 1 file. That's fairly easy using Python :
fout=open("out.txt","a")
# first file:
for line in open("file1.txt"):
fout.write(line)
# now the rest:
for num in range(2,NB_FILES):
f = open("file"+str(num)+".txt")
for line in f:
fout.write(line)
f.close() # not really needed
fout.close()
Then run the command you know (... INFILE ...) to load the one file to MySql. Works fine as long as your separation between columns are strictly the same. Tabs are best in my opinion ;)

Related

i have multiple txt files in 1 folder and I need to insert the txt data into MySql table using python

I have Multiple txt file in a folder. I need to insert the data from the txt file into mySql table
I also need to sort the files by modified date before inserting the data into the sql table named TAR.
below is the file inside one of the txt file. I also need to remove the first character in every line
SSerial1234
CCustomer
IDivision
Nat22
nAembly
rA0
PFVT
fchassis1-card-linec
RUnk
TP
Oeka
[06/22/2020 10:11:50
]06/22/2020 10:27:22
My code only reads all the files in the folder and prints the contents of the file. im not sure how to sort the files before reading the files 1 by 1.
Is there also a way to read only a specific file (JPE*.log)
import os
for path, dirs, files in os.walk("C:\TAR\TARS_Source/"):
for f in files:
fileName = os.path.join(path, f)
with open(fileName, "r") as myFile:
print(myFile.read())

Use glob.glob method to get all files using a regex like following...
import glob
files=glob.glob('./JPE*.log')
And you can use following to sort files
sorted_files=sorted(files)

Exporting zipped folder only with csv content

1.I am using Oracle and the idea is to use python script to export tables as zipped folder containing csv file which holds my data.
2.Additionaly: Is it possible to save this data in csv per columns. For example, I have 4 columns and all of them are stored in 1 column in csv.
see this image
This is my script:
import os
import cx_Oracle
import csv
import zipfile
connection = cx_Oracle.connect('dbconnection')
SQL = "SELECT * FROM AIR_CONDITIONS_YEARLY_MVIEW ORDER BY TIME"
filename = "sample.csv"
cursor = connection.cursor()
cursor.execute(SQL)
with open (filename, 'r') as output:
writer = csv.writer (output)
writer.writerow([i[0] for i in cursor.description])
writer.writerows(cursor.fetchall())
air_zip = zipfile.ZipFile("sample.zip", 'w')
air_zip.write(filename, compress_type=zipfile.ZIP_DEFLATED)
cursor.close()
connection.close()
air_zip.close()
Code I did exports me separately both csv and zipped folder with proper csv file inside and I want to keep exporting this zipped folder only!
Both sample.zip containing sample.csv as expected and sample.csv generated at the same time.

There are a list of problems:
The .csv file is not properly formatted (a row is seen as a single record (string) instead of a sequence of records):
Looking (blindly) at the code and tools (csv.writer, cx_Oracle) documentation, it seems correct
When noticing that the file is opened with Excel, I remembered that at some point I had a similar issue. A quick search yielded [SuperUser]: How to get Excel to interpret the comma as a default delimiter in CSV files?
. And this was the culprit (the .csv file looks fine in an online viewer / editor)
Code "exporting" both .csv and .zip files (I don't know what export means, I assume generate - meaning that after running the code, both files are present):
The way of getting around this is by deleting the .csv file after it was archived into the .zip file. Translated into code that would mean adding at the end of the current script snippet:
os.unlink(filename)
As a final observation (if one wants to be pedantic), the lines that close the cursor and the databbase could be moved just after the with block or before air_zip creation (there's no point keeping them open while archiving).

Skip first row in python

I am using
for file in fileList:
f.write(open(file).read())
I am combining files if a folder to one csv. However I dont need X amount of headers in the one file.
Is there a way to use this and have it write everything but the first row (header) coming from the files in the files?

Use python csv module
Or something like that:
for file_name in file_list:
file_obj = open(file_name)
file_obj.read()
f.write(file_obj.read())
This solution doesn't load whole file into memory, so when you use file_obj.readlines(), whole file content is load into memory
Note, that it isn't good practice to name variables with builtin names

for file in fileList:
mylines = open(file).readlines()
f.write("".join(mylines[1:]))
This should point you in the right direction. Please don't do your homework on stackoverflow.
If it's a cvs file, look into python csv lib.

Optparse to find a string

I have a mysql database and I am trying to print all the test result from a specific student. I am trying to create a command line where I enter the username and then it will shows his/her test result.
I visited this page already but I couldn't get my answer.
optparse and strings
#after connecting to mysql
cursor.execute("select * from database")
def main():
parser = optparse.OptionParser()
parser.add_option("-n", "--name", type="string", help = "student name")
(options, args) = parser.parse_args()
studentinfo = []
f = open("Index", "r")
#Index is inside database, it is a folder holds all kinds of files

Well, the first thing you should do is not use optparse, as it's deprecated - use argparse instead. The help I linked you to is quite useful and informative, guiding you through creating a parser and setting the different options. After reading through it you should have no problem accessing the variables passed from the command line.
However, there are other errors in your script as well that will prevent it from running. First, you can't open a directory with the open() command - you need to use os.listdir() for that, then read the resulting list of files. It is also very much advisable to use a context manager when open()ing files:
filelist = os.listdir("/path/to/Index")
for filename in filelist:
with open(filename, "r") as f:
for line in f:
# do stuff with each line
This way you don't need to worry about closing the file handler later on, and it's just a generally cleaner way of doing things.
You don't provide enough information in your question as to how to get the student's scores, so I'm afraid I can't help you there. You'll (I assume) have to connect the data that's coming out of your database query with the files (and their contents) in the Index directory. I suspect that if the student scores are kept in the DB, then you'll need to retrieve them from the DB using SQL, instead of trying to read raw files in the filesystem. You can easily get the student of interest's name from the command line, but then you'll have to interpolate that into a SQL query to find the correct table, select the rows from the table corresponding to the student's test scores, then process the results with Python to print out a pretty summary.
Good luck!

mysql load data infile for a list of files

I am using ubuntu 12.04 operating system. I have a folder full of .csv files. I need to import all these csv files into a mysql data base on the local machine. Currently, I have been using this syntax, from the mysql command line, to load the csv files into the data base 1 by 1:
load data local infile 'file_name.csv' into table table_name fields terminated by ',' optionally enclosed by '"' lines terminated by '\r\n';
This works really well. I want to know if there is a way that I could load all these files at once. My first idea was to make a python script to handle it:
import MySQLdb as mysql
import os
import string
db=mysql.connect(host="localhost",user="XXXX",passwd="XXXX",db="test")
l = os.listdir(".")
for file_name in l:
print file_name
c=db.cursor()
if (file_name.find("DIV.csv")>-1):
c.execute("""load data local infile '%s' into table cef_div_table fields terminated by ',' optionally enclosed by '"' lines terminated by '\r\n';""" % file_name)
With this solution, I am running into the problem that load data local infile will not work with the new versions of MySQL clients, unless I start MySQL from the command line with the --local-infile option. That is really a drag...
I found a solution that seemed to work. I use the local_file = 1 option when establishing the connection in python (as suggested here: MySQL LOAD DATA LOCAL INFILE Python). This way, the code appears to complete without any errors, but nothing is every uploaded to the database.
It is strange, just to make sure, I tried uploading a single file from the mysql command line, and it worked file.
I am willing to try another solution to this problem of uploading multiple csv files into mysql all at once. Any help is greatly appreciated!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.