Python - Remove special character from csv file import into PostgreSQL - python

I would like to import my csv file into Postgresql using Python.
The import works well. However, when I display the imported data, I find a special symbol on the first line and first column.
I tried to solve the problem by adding the encoding in my python code but nothing has to do.
Here is my code:
import sys
import os
import csv
import io
f = io.open(r'C:\\list.csv', mode='r', encoding='utf-8')
curs.copy_from(f, 'list', sep=';')
conn.commit()
Here is the symbol or special character:

Thank you

You are picking up the Byte order mark.
In order to have the io module expect and strip off the BOM try changing your encoding to utf-8-sig:
f = io.open(r'C:\\list.csv', mode='r', encoding='utf-8-sig')
More info here.

Related

Running Json file in VScode using Python

I am very fresh in Python. I would like to read JSON files in Python, but I did not get what are the problems. Please see the image.
You have to specify a mode to the open() function. In this case I think you're trying to read the file, so your mode would be "r". Your code should be:
with open(r'path/to/read/','r') as file:
data = json.load(file)
Your code should run now.
Your path should not contain spaces. Please modify the file path.
Generally speaking, the file path is best to be in full English with no spaces and no special characters.
import sys
import os
import json
def JsonRead(str):
with open(str, encoding='utf-8') as f:
data = json.load(f)
return data
new_Data = JsonRead(filePath)
Then import JsonRead in project

Converting DBF file to CSV using Python 3+ getting errors

So I'm new to Python and my goal is to convert the different large dbf files into csv files. I have looked at different code and don't understand a lot of the parts. The below code runs for data1.dbf but not data2.dbf. I get an error stating:
UnicodeDecodeError: 'ascii' codec can't decode byte...
I did look into encoding for the dbfread but it says that encoding is not needed...The other part I need is to get these Large dbfs into csv. If I use the dbfread I'm not knowledgeable about the code to place it into the csv file.
import sys
import csv
from dbfread import DBF
file = "C:/Users/.../Documents/.../data2.dbf"
table = DBF(file)
writer = csv.writer(sys.stdout)
writer.writerow(table.field_names)
for record in table:
writer.writerow(list(record.values()))
Here is another try using dbf library.
import dbf # instead of dbfpy
for table in sys.argv[1:]:
dbf.export(table, header = True)
This one is ran from the command prompted with the statement "python dbf2csv_EF.py data1.dbf" produces a different error when attempting both my dbf files. The error is:
...AttributeError: 'str' object has no attribute '_meta'
Since you're on Windows and are attempting to write to sys.stdout, I think (part of) your first problem is that the Windows console is not very Unicode savvy, and you should write to files instead.
Assuming that's the case, something like
import sys
import csv
from dbfread import DBF
for file in sys.argv[1:]:
csv_file = file + ".csv"
table = DBF(file, encoding="UTF-8")
with open(csv_file, "w") as outf:
writer = csv.writer(outf)
writer.writerow(table.field_names)
for record in table:
writer.writerow(list(record.values()))
might do the trick - running the script with e.g. "python thatscript.py foo.dbf" should have you end up with "foo.dbf.csv".

Import Data from scraping into CSV

I'm using pycharm and Python 3.7.
I would like to write data in a csv, but my code writes in the File just the first line of my data... someone knows why?
This is my code:
from pytrends.request import TrendReq
import csv
pytrend = TrendReq()
pytrend.build_payload(kw_list=['auto model A',
'auto model C'])
# Interest Over Time
interest_over_time_df = pytrend.interest_over_time()
print(interest_over_time_df.head(100))
writer=csv.writer(open("C:\\Users\\
Desktop\\Data\\c.csv", 'w', encoding='utf-8'))
writer.writerow(interest_over_time_df)
try using pandas,
import pandas as pd
interest_over_time_df.to_csv("file.csv")
Once i encountered the same problem and solve it like below:
with open("file.csv", "rb", encoding="utf-8) as fh:
precise details:
r = read mode
b = mode specifier in the open() states that the file shall be treated as binary,
so contents will remain a bytes. No decoding attempt will happen this way.
As we know python tries to convert a byte-array (a bytes which it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).
You could try something like:
import csv
with open(<path to output_csv>, "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
for line in interest_over_time_df:
writer.writerow(line)
Read more here: https://www.pythonforbeginners.com/files/with-statement-in-python
You need to loop over the data and write in line by line

Decoding UTF8 literals in a CSV file

Question:
Does anyone know how I could transform this b"it\\xe2\\x80\\x99s time to eat" into this it's time to eat
More details & my code:
Hello everyone,
I'm currently working with a CSV file which full of rows with UTF8 literals in them, for example:
b"it\xe2\x80\x99s time to eat"
The end goal is to to get something like this:
it's time to eat
To achieve this I have tried using the following code:
import pandas as pd
file_open = pd.read_csv("/Users/Downloads/tweets.csv")
file_open["text"]=file_open["text"].str.replace("b\'", "")
file_open["text"]=file_open["text"].str.encode('ascii').astype(str)
file_open["text"]=file_open["text"].str.replace("b\"", "")[:-1]
print(file_open["text"])
After running the code the row that I took as an example is printed out as:
it\xe2\x80\x99s time to eat
I have tried solving this issue
using the following code to open the CSV file:
file_open = pd.read_csv("/Users/Downloads/tweets.csv", encoding = "utf-8")
which printed out the example row in the following manner:
it\xe2\x80\x99s time to eat
and I have also tried decoding the rows using this:
file_open["text"]=file_open["text"].str.decode('utf-8')
Which gave me the following error:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
Thank you very much in advance for your help.
b"it\\xe2\\x80\\x99s time to eat" sounds like your file contains an escaped encoding.
In general, you can convert this to a proper Python3 string with something like:
x = b"it\\xe2\\x80\\x99s time to eat"
x = x.decode('unicode-escape').encode('latin1').decode('utf8')
print(x) # it’s time to eat
(Use of .encode('latin1') explained here)
So, if after you use pd.read_csv(..., encoding="utf8") you still have escaped strings, you can do something like:
pd.read_csv(..., encoding="unicode-escape")
# ...
# Now, your values will be strings but improperly decoded:
# itâs time to eat
#
# So we encode to bytes then decode properly:
val = val.encode('latin1').decode('utf8')
print(val) # it’s time to eat
But I think it's probably better to do this to the whole file instead of to each value individually, for example with StringIO (if the file isn't too big):
from io import StringIO
# Read the csv file into a StringIO object
sio = StringIO()
with open('yourfile.csv', 'r', encoding='unicode-escape') as f:
for line in f:
line = line.encode('latin1').decode('utf8')
sio.write(line)
sio.seek(0) # Reset file pointer to the beginning
# Call read_csv, passing the StringIO object
df = pd.read_csv(sio, encoding="utf8")

Strange character while reading a CSV file

I try to read a CSV file in Python, but the first element in the first row is read like that 0, while the strange character isn't in the file, its just a simple 0. Here is the code I used:
matriceDist=[]
file=csv.reader(open("distanceComm.csv","r"),delimiter=";")
for row in file:
matriceDist.append(row)
print (matriceDist)
I had this same issue. Save your excel file as CSV (MS-DOS) vs. UTF-8 and those odd characters should be gone.
Specifying the byte order mark when opening the file as follows solved my issue:
open('inputfilename.csv', 'r', encoding='utf-8-sig')
Just use pandas together with some encoding (utf-8 for example) is gonna be easier
import pandas as pd
df = pd.read_csv('distanceComm.csv', header=None, encoding = 'utf8', delimiter=';')
print(df)
I don't know what your input file is. But since it has a Byte Order Mark for UTF-8, you can use something like this:
import codecs
matriceDist=[]
file=csv.reader(codecs.open('distanceComm.csv', encoding='utf-8'),delimiter=";")
for row in file:
matriceDist.append(row)
print (matriceDist)

Categories

Resources