Converting DBF file to CSV using Python 3+ getting errors - python

So I'm new to Python and my goal is to convert the different large dbf files into csv files. I have looked at different code and don't understand a lot of the parts. The below code runs for data1.dbf but not data2.dbf. I get an error stating:
UnicodeDecodeError: 'ascii' codec can't decode byte...
I did look into encoding for the dbfread but it says that encoding is not needed...The other part I need is to get these Large dbfs into csv. If I use the dbfread I'm not knowledgeable about the code to place it into the csv file.
import sys
import csv
from dbfread import DBF
file = "C:/Users/.../Documents/.../data2.dbf"
table = DBF(file)
writer = csv.writer(sys.stdout)
writer.writerow(table.field_names)
for record in table:
writer.writerow(list(record.values()))
Here is another try using dbf library.
import dbf # instead of dbfpy
for table in sys.argv[1:]:
dbf.export(table, header = True)
This one is ran from the command prompted with the statement "python dbf2csv_EF.py data1.dbf" produces a different error when attempting both my dbf files. The error is:
...AttributeError: 'str' object has no attribute '_meta'

Since you're on Windows and are attempting to write to sys.stdout, I think (part of) your first problem is that the Windows console is not very Unicode savvy, and you should write to files instead.
Assuming that's the case, something like
import sys
import csv
from dbfread import DBF
for file in sys.argv[1:]:
csv_file = file + ".csv"
table = DBF(file, encoding="UTF-8")
with open(csv_file, "w") as outf:
writer = csv.writer(outf)
writer.writerow(table.field_names)
for record in table:
writer.writerow(list(record.values()))
might do the trick - running the script with e.g. "python thatscript.py foo.dbf" should have you end up with "foo.dbf.csv".

Related

Python - Pandas "BadGzipFile" Error When Reading in ".json.gz" File

I am trying to read in data from a ".json.gz" file as a dataframe. I keep getting an error indicating that it is a "BadGzipFile". However, when I unzip the file manually (i.e., just double clicking it in my finder), I am able to successfully open the json file. This leads me to believe that the file is fine, but when I run the below code in Python, I receive the "BadGzipFile" error.
I am very new to .gzip files and have done a fair bit of research trying to figure out what the issue is. So far, I have been unsuccessful. Any help would be greatly appreciated!
Here is my code:
import os
import json
import gzip
file_path = '/data/data_0_0_0.json.gz'
with gzip.open(file_path, 'rb') as f:
df = pd.read_json(f, compression='gzip', lines=True)
And here is the error I am receiving:
BadGzipFile: Not a gzipped file (b'{"')
What's happening with your code here:
with gzip.open(file_path, 'rb') as f:
df = pd.read_json(f, compression='gzip', lines=True)
Is that you're opening a Gzip file at file_path. Then you're telling Pandas that the thing that you opened (f), is itself another Gzip file. It isn't; it's a Json file. When it says BadGzipFile with that starting bracket, it is telling you that the file it found starts with a bracket instead of the Gzip file's magic number.
You should change it either to open the file with gzip and then directly read the resulting file or have Pandas read the file.
The first would be:
with gzip.open(file_path, 'rb') as f:
df = pd.read_json(f, lines=True)
The second is actually easier. Because pd.read_json will infer the compression format based on the file name and your file ends with .gz, you can just write:
df = pd.read_json(file_path)

Python - Remove special character from csv file import into PostgreSQL

I would like to import my csv file into Postgresql using Python.
The import works well. However, when I display the imported data, I find a special symbol on the first line and first column.
I tried to solve the problem by adding the encoding in my python code but nothing has to do.
Here is my code:
import sys
import os
import csv
import io
f = io.open(r'C:\\list.csv', mode='r', encoding='utf-8')
curs.copy_from(f, 'list', sep=';')
conn.commit()
Here is the symbol or special character:

Thank you
You are picking up the Byte order mark.
In order to have the io module expect and strip off the BOM try changing your encoding to utf-8-sig:
f = io.open(r'C:\\list.csv', mode='r', encoding='utf-8-sig')
More info here.

How to resolve an encoding issue?

I need to read the content of a csv file using Python. However when I run this code:
with(open(self.path, 'r')) as csv_file:
csv_reader = csv.reader(csv_file, dialect=csv.excel, delimiter=';')
self.data = [[cell for cell in row] for row in csv_reader]
I get this error:
File "C:\Python36\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1137: character maps to <undefined>
My understanding is that this file was not encoded in cp-1252, and that I need to find out what encoding was used. I tried a bunch of things, but nothing worked for now.
About the file:
It is sent by an external company, I can't have more information about it.
It comes with other similar files, with which I don't have any issue when I run the same code
It has an .xls extension, but is more a csv file delimited with semicolons
When I open it with Excel it opens in Compatibility mode. But I don't see any sort of encoding issue: everything displays right.
What I already tried:
Saving it under a different file format to get rid of the compatibility mode
Adding an encoding in the first line of my code: (I tried more or less randomly some encodings that I know of)
with(open(self.path, 'r', encoding = 'utf8')) as csv_file:
Copy-pasting the content of the file into a new file, or deleting the whole content of the file. Still does not work. This one really bugs me because I feel like it means the probelm is not in the content of the file, and not in the file itself.
Searching a lot everywhere how to solve this kind of issue.
I recommend using pandas library (as well as numpy), it is very handy when it comes to data manipulation. This function imports the data from an xlsx or csv file type.
/!\ change datapath according to your needs /!\
import os
import pandas as pd
def GetData(directory, dataUse, format):
dataPath = os.getcwd() + "\\Data\\" + directory + "\\" + dataUse + "Set." + format
if format == "xlsx":
dataSet = pd.read_excel(dataPath, sheetname = 'Sheet1')
elif format == "csv":
dataSet = pd.read_csv(dataPath)
return dataSet
I finally found some sort of solution :
Open the file with Excel
Display the file properly using the "Text to Columns" feature
Save the file to csv format
Run the code
This does not quite satisfy me, but it works.
I still don't understand what the problem actually is, and why this solved it, so I am interested in any additional information !

unicode issue in python when writing to file

I have an csv sheet that i read it like this:
with open(csvFilePath, 'rU') as csvFile:
reader = csv.reader(csvFile, delimiter= '|')
numberOfMovies = 0
for row in reader:
title = row[1:2][0]
as you see, i am taking the value of title
Then i surf the internet for some info about that value and then i write to a file, the writing is like this:
def writeRDFToFile(rdf, fileName):
f = open("movies/" + fileName + '.ttl','a')
try:
#rdf = rdf.encode('UTF-8')
f.write(rdf) # python will convert \n to os.linesep
except:
print "exception happened for movie " + movieTitle
f.close()
In that function, i am writing the rdf variable to a file.
As you see there is a commetted line
If the value of rdf variable contains unicode char and that line was not commeted, that code doesn't write anything to the file.
However, if I just commet that line, that code writes to a file.
Okay you can say that: commit that line and everything will be fine, but that is not correct, because i have another java process (which is Fuseki server) that reads the file and if the file contains unicode chars, it throws an error.
so i need to solve the file myself, i need to encode that data to ut8,
help please
The normal csv library can have difficulty writing unicode to files. I suggest you use the unicodecsv library instead of the csv library. It supports writing unicode to CSVs.
Practically speaking, just write:
import unicodecsv as csv

Django upload and handle CSV file with right encoding

I try to upload and handle a CSV file in my Django project, but I get an encoding error, the CSV file is created on a mac with excel..
reader = csv.reader(request.FILES['file'].read().splitlines(), delimiter=";")
if withheader:
reader.next()
data = [[field.decode('utf-8') for field in row] for row in reader]
With this code example i get an error: http://puu.sh/1VmXc
If I use latin-1 decode i get an other "error"..
data = [[field.decode('latin-1') for field in row] for row in reader]
the result is: v¾gmontere and the result should be: vægmontere
Anyone know what to do? .. i have tried a lot!
The Python 2 csv module comes with lots of unicode hassle. Try unicodecsv instead or use Python 3.
Excel on Mac exports to CSV with broken encoding. Don't use it, use something useful like LibreOffice instead (has a much better CSV export with options).
When handling user files: either make sure files are consistently encoded in UTF-8 and only decode to UTF-8 (recommended) or use an encoding detection library like chardet.

Categories

Resources