1.I am using Oracle and the idea is to use python script to export tables as zipped folder containing csv file which holds my data.
2.Additionaly: Is it possible to save this data in csv per columns. For example, I have 4 columns and all of them are stored in 1 column in csv.
see this image
This is my script:
import os
import cx_Oracle
import csv
import zipfile
connection = cx_Oracle.connect('dbconnection')
SQL = "SELECT * FROM AIR_CONDITIONS_YEARLY_MVIEW ORDER BY TIME"
filename = "sample.csv"
cursor = connection.cursor()
cursor.execute(SQL)
with open (filename, 'r') as output:
writer = csv.writer (output)
writer.writerow([i[0] for i in cursor.description])
writer.writerows(cursor.fetchall())
air_zip = zipfile.ZipFile("sample.zip", 'w')
air_zip.write(filename, compress_type=zipfile.ZIP_DEFLATED)
cursor.close()
connection.close()
air_zip.close()
Code I did exports me separately both csv and zipped folder with proper csv file inside and I want to keep exporting this zipped folder only!
Both sample.zip containing sample.csv as expected and sample.csv generated at the same time.
There are a list of problems:
The .csv file is not properly formatted (a row is seen as a single record (string) instead of a sequence of records):
Looking (blindly) at the code and tools (csv.writer, cx_Oracle) documentation, it seems correct
When noticing that the file is opened with Excel, I remembered that at some point I had a similar issue. A quick search yielded [SuperUser]: How to get Excel to interpret the comma as a default delimiter in CSV files?
. And this was the culprit (the .csv file looks fine in an online viewer / editor)
Code "exporting" both .csv and .zip files (I don't know what export means, I assume generate - meaning that after running the code, both files are present):
The way of getting around this is by deleting the .csv file after it was archived into the .zip file. Translated into code that would mean adding at the end of the current script snippet:
os.unlink(filename)
As a final observation (if one wants to be pedantic), the lines that close the cursor and the databbase could be moved just after the with block or before air_zip creation (there's no point keeping them open while archiving).
Related
I have Multiple txt file in a folder. I need to insert the data from the txt file into mySql table
I also need to sort the files by modified date before inserting the data into the sql table named TAR.
below is the file inside one of the txt file. I also need to remove the first character in every line
SSerial1234
CCustomer
IDivision
Nat22
nAembly
rA0
PFVT
fchassis1-card-linec
RUnk
TP
Oeka
[06/22/2020 10:11:50
]06/22/2020 10:27:22
My code only reads all the files in the folder and prints the contents of the file. im not sure how to sort the files before reading the files 1 by 1.
Is there also a way to read only a specific file (JPE*.log)
import os
for path, dirs, files in os.walk("C:\TAR\TARS_Source/"):
for f in files:
fileName = os.path.join(path, f)
with open(fileName, "r") as myFile:
print(myFile.read())
Use glob.glob method to get all files using a regex like following...
import glob
files=glob.glob('./JPE*.log')
And you can use following to sort files
sorted_files=sorted(files)
I have an Excel source file in a source folder (*.xlsm) and another file (also *.xlsm) that contain some data. I have to create a third file, that has to be a *.xls file, that is basically the Excel source file that contains some data of the second file. In order to do that I have written this code:
from openpyxl import load_workbook
file1 = "C:\\Users\Desktop\file1.xlsm"
file2 = "C:\\Users\Desktop\file2.xlsm"
file3 = "C:\\Users\Desktop\file3.xls"
wb1 = load_workbook(file1)
sheet1 = wb1["Sheet1"]
wb2 = load_workbook(file2)
sheet2 = wb2["Sheet1"]
sheet1["A1"].value = sheet2["A1"].value
wb1.save(file3)
The code seems to be OK and doesn't return any error, but the I cannot open the created file3.
I don't understand why, I tried to change the extension of the third file but both *.xlsx and *.xlsm show this problem. I also tried to delete the line part
sheet1["A1"].value = sheet2["A1"].value
To understand if the problem was linked to the writing of the sheet, but the problem remains.
First of all please not that your code is not creating any new file but just resaving an existing one.
Also is not clear what you want: do you want to create file3? With what information? Your code is not doing anything of that.
However I tried to run a short version of your code and I got the error:
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not
support .xlsm' file format, please check you can open it with Excel
first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
Most likely your file format is unsupported. Try to resave your files in the format xlsx. I think the problem are macros: if you don't have any of them in your files then changing the format should not be any issue. If you have I am not sure openpyxl will work in that way (without any workaround at least).
This answer might help. It propose to extract the xlms files (they are zip files), work on the ones that represent the format of your sheet (not the macro) and then put everything together again.
One error might be that the filepath variables require unicode escape's for the \
Thus: the correct version would be
file1 = "C:\\Users\\Desktop\\file1.xlsm"
file2 = "C:\\Users\\Desktop\\file2.xlsm"
file3 = "C:\\Users\\Desktop\\file3.xls"
I have downloaded about 100 csv files from the web using python. Each file is is for a month in a year, so effectively I am downloading time series data.
Now what I want is to put all of these csv files into one csv file in time order, i'm not sure how to do this one after eachother?
Also I should note that excluding the first time, I want to remove the headers every time I put a new csv file in.
This will make sense why when you see my data:
Appreciate any help, thanks
Sort your CSV files by time (presumably this can be done with an alphanumeric sort of the filenames) and then just concatenate all of them together. This is probably easier to do in bash than in python but here's a python solution (untested):
from glob import glob
# Fetch a sorted list of all .csv files
files = sorted(glob('*.csv'))
# Open output file for writing
with open('cat.csv', 'w') as fi_out:
# iterate over all csv files
for i, fname_in in enumerate(files):
# open each csv file
with open(fname_in, 'r') as fi_in:
# iterate through all files in the csv file
for i_line, line in enumerate(fi_in):
# Write all lines of the first file (i == 0)
# For all other files write all lines except the first one (i_line > 0)
if i_line > 0 or i == 0:
fi_out.write(line)
I'm working on upgrading a legacy system and have come across a table full of .pdf files saved as binary data. I have dumped the table into a csv file and am trying to write a script which will take each row and recreate the files that were uploaded in the first place so that I can upload the files to S3.
I have tried this:
new_file = open(file_name, "wb")
doc = doc.encode('utf-8')
new_file.write(doc)
new_file.close()
where file_name = the saved file name, and doc = the binary data stored as a string in the database.
but all it gives me is a bunk pdf file with the binary data in it.
Here is what the data looks like stored, its just the first bit, way to big to copy and paste.
0x255044462D312E340A25E2E3CFD30D0A312030206F626A0A3C3C200A2F43726561746F72202843616E6F6E2069522D4144562043353034352020504446290A2F4372656174696F6E446174652028443A32303133303432393133303830342D303527303027290A2F50726F647563657220285C3337365C3337375C303030415C303030645C3030306F5C303030625C303030655C303030205C303030505C303030445C303030465C303030205C303030535C303030635C3030305C0A615C3030306E5C303030205C3030304C5C303030695C303030625C303030725C303030615C303030725C303030795C303030205C303030315C3030302E5C303030305C303030655C3030305C0A205C303030665C3030306F5C303030725C303030205C303030435C303030615C3030306E5C3030306F5C3030306E5C303030205C303030695C3030306D5C303030615C303030675C3030305C0A655C303030525C303030555C3030304E5C3030304E5C303030455C303030525C3030305C303030290A3E3E200A656E646F626A0A322030206F626A0A3C3C200A2F5061676573203320302052200A2F54797065202F436174616C6F67200A2F4F7574707574496E74656E747320313120302052200A2F4D6574616461746120313220302052200A3E3E200A656E646F626A0A342030206F626A0A3C3C202F54797065202F
I have more than 40 txt files needed to be loaded into a table in Mysql. Each file contains 3 columns of data, each column lists one specific type of data, but in general the format of each txt file is exactly the same, but these file names are various, first I tried LOAD DATA LOCAL INFILE 'path/*.txt' INTO TABLE xxx"
Cause I think maybe use *.txt can let Mysql load all the txt file in this folder. But it turned out no.
So how can I let Mysql or python do this? Or do I need to merge them into one file manually first, then use LOAD DATA LOCAL INFILE command?
Many thanks!
If you want to avoid merging your text files, you can easily "scan" the folder and run the SQL import query for each file:
import os
for dirpath, dirsInDirpath, filesInDirPath in os.walk("yourFolderContainingTxtFiles"):
for myfile in filesInDirPath:
sqlQuery = "LOAD DATA INFILE %s INTO TABLE xxxx (col1,col2,...);" % os.path.join(dirpath, myfile)
# execute the query here using your mysql connector.
# I used string formatting to build the query, but you should use the safe placeholders provided by the mysql api instead of %s, to protect against SQL injections
The only and best way is to merge your data into 1 file. That's fairly easy using Python :
fout=open("out.txt","a")
# first file:
for line in open("file1.txt"):
fout.write(line)
# now the rest:
for num in range(2,NB_FILES):
f = open("file"+str(num)+".txt")
for line in f:
fout.write(line)
f.close() # not really needed
fout.close()
Then run the command you know (... INFILE ...) to load the one file to MySql. Works fine as long as your separation between columns are strictly the same. Tabs are best in my opinion ;)