Error reading csv numbers like "-1 000,00" - python

I get errror when i try to read numbers like "-1 000,00".
Result is '-1\xa0000,00'.
How to fix my code to clear the error?
def read_csv(filename):
list = []
with open(filename, 'r', encoding='utf-8') as local_file:
fields = ['Account_group_name',
'Current_balance',
'Account_name',
'Transfer_account_name',
'Description',
'Partner_name',
'Category',
'Date',
'Time',
'Memo',
'Sum',
'Currency',
'Face_balance',
'Balance',
]
reader = csv.DictReader(local_file, fields, delimiter=';')
next(reader)
for row in reader:
list.append(row)
return list

Related

Python CSV Writes ;;;;;; to every line

I stored data on a CSV file with Python. Now I need to read it with Python but there are some issues with it. There is a
";;;;;;"
statement on the finish of every line.
Here is the code that I used for writing data to CSV :
file = open("products.csv", "a")
writer = csv.writer(file, delimiter=",", quotechar='"', quoting=csv.QUOTE_ALL)
writer.writerow(data)
And I am trying to read that with that code :
with open("products.csv", "r", newline="") as in_file, open("3.csv", "w", newline='') as to_file:
reader = csv.reader(in_file, delimiter="," ,doublequote=True)
for row in reader:
print(row)
Of course, I am not reading it for just printing I need to remove duplicated lines and make it a readable CSV.
I've tried this to fetch strings and edit them and it's worked for other fields except for semicolons. I cant understand why I cant edit those semicolons.
for row in reader:
try:
print(row)
rowList = row[0].split(",")
for index, field in enumerate(rowList):
if '"' in field:
field = field.replace('"', "")
elif ";;;;;;" in rowList[index]:
field = field.replace(";;;;;;", "")
rowList[index] = field
print(rowList)
Here is the output of the code above :
['Product Name', 'Product Description', 'SKU', 'Regular Price', 'Sale Price', 'Images;;;;;;']
Can anybody help me?
I realized that I used 'elif' on there. I changed it and it solved. Thanks for the help. But I still don't know why it added that semicolon to there.

Python 2 replacement specific column from CSV

I have some CSV files the format is ID, timestamp, customerID, email, etc. I want fill the Email column to empty and other columns keep it same. I'm using Python 2.7 and is restricted to use Pandas. Can anyone help me?
Thank you all for the help
My code below, but this is not that efficiency and reliable also if some raw have the strange character it will be broken the logic.
new_columns = [
'\xef\xbb\xbfID', 'timestamp', 'CustomerID', 'Email', 'CountryCode', 'LifeCycle', 'Package', 'Paystatus', 'NoUsageEver', 'NoUsage', 'VeryLowUsage',
'LowUsage', 'NormalUsage', 'HighUsage', 'VeryHighUsage', 'LastStartDate', 'NPS 0-8', 'NPS Score (Q2)', 'Gender(Q38)', 'DOB(Q39)',
'Viaplay users(Q3)', 'Primary Content (Q42)', 'Primary platform(Q4)', 'Detractor (strong) (Q5)', 'Detractor open text(Q22)',
'Contact Detractor (Q21)', 'Contact Detractor (Q20)', 'Contact Detractor (Q43)', 'Contact Detractor(Q26)', 'Contact Detractor(Q27)',
'Contact Detractor(Q44)', 'Improvement areas(Q7)', 'Improvement areas (Q40)', 'D2 More value for money(Q45)', 'D2 Sport content(Q8)',
'D2 Series content(Q9)', 'D2 Film content(Q10)', 'D2 Children content(Q11)', 'D2 Easy to start and use(Q12)',
'D2 Technical and quality(Q13)',
'D2 Platforms(Q14)', 'D2 Service and support(Q15)', 'D3 Sport content(Q16)', 'Missing Sport Content (Q41)',
'D3 Series and films content(Q17)',
'NPS 9-10', 'Recommendation drivers(Q28)', 'R2 Sport content(Q29)', 'R2 Series content(Q30)', 'R2 Film content(Q31)',
'R2 Children content(Q32)', 'R2 Easy to start and use(Q33)', 'R2 Technical and quality(Q34)', 'R2 Platforms(Q35)',
'R2 Service and support(Q36)',
'Promoter open text(Q37)'
]
with open(file_path, 'r') as infile:
print file_path
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
for row in reader:
output_row = []
for column_name in new_columns:
ind = first_row.index(column_name)
data = row[ind]
if ind == first_row.index('Email'):
data = ''
output_row.append(data)
writer.writerow(output_row)
File format before
File format after
So you are reordering the columns and clearing the email column:
with open(file_path, 'r') as infile:
print file_path
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
for row in reader:
output_row = []
for column_name in new_columns:
ind = first_row.index(column_name)
data = row[ind]
if ind == first_row.index('Email'):
data = ''
output_row.append(data)
writer.writerow(output_row)
I would suggest moving the searches first_row.index(column_name) and first_row.index('Email') out of the per row processing.
with open(file_path, 'r') as infile:
print file_path
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
email = first_row.index('Email')
indexes = []
for column_name in new_columns:
ind = first_row.index(column_name)
indexes.append(ind)
for row in reader:
output_row = []
for ind in indexes:
data = row[ind]
if ind == email:
data = ''
output_row.append(data)
writer.writerow(output_row)
email is the index of the email column in the input. indexes is a list of the indexes of the columns in the input in the order specified by the new_columns.
Untested.
You could use dict versions of the csv reader/writer to get the column by name. Something like this:
import csv
with open('./test.csv', 'r') as infile:
reader = csv.DictReader(infile, delimiter=";")
with open('./output.csv', 'w') as outfile:
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
row['Email'] = ''
writer.writerow(row)

Python csv file writing in different columns

Im trying to read a csv file, and create a new cvs file, with the contents of the old cvs file with Python. My Problem is, that all entrys are saved in the first column, and i cant find a way to save the informations in different columns. Here is my code:
import csv
from itertools import zip_longest
fieldnamesOrdered = ['First Name', 'Last Name' , 'Email', 'Phone Number',
'Street Address', 'City', 'State', 'HubSpot Owner', 'Lifecyle Stage', 'Lead
Status', 'Favorite Color']
listOne = []
listTwo = []
with open('Contac.csv', 'r', encoding = 'utf-8') as inputFile,
open('result.csv', 'w', encoding = 'utf-8') as outputFile:
reader = csv.DictReader(inputFile)
writer = csv.writer(outputFile, delimiter = 't')
for row in reader:
listOne.append(row['First Name'])
listTwo.append(row['Last Name'])
dataLists = [listOne, listTwo]
export_data = zip_longest(*dataLists, fillvalue='')
writer.writerow(fieldnamesOrdered)
writer.writerows(export_data)
inputFile.close()
outputFile.close()
Thank you very much for your answers
writer = csv.writer(outputFile, delimiter = 't')
Aren't those entries in the first column additionally interspersed with strange unsolicited 't' characters?

looping through variable using os.path.join

I am trying to process XML files one-by-one in the directory. Basically reading the values and then populate CSV file. I am having trouble parsing each XML one by one. Issue with my code is csvWriter.writerow only write the values from the last XML file in the directory. Even I have a loop for all items in root of ElementTree.parse(path). I want it to write each line for each XML file in the directory.
from lxml import etree as ElementTree
import csv
import os
import errno
import shutil
def writeData(item):
csvFile = open('D:\\metadata.csv', 'w', newline='')
csvWriter = csv.writer(csvFile, delimiter='|',
lineterminator='\n')
csvWriter.writerow([
'type',
'object',
'title',
'subject',
'domain',
'name',
'_name',
'version_label',
'creator_name',
'creation_date',
'modifier',
'modify_date',
'content_type',
'chronicle_id',
'antecedent_id',
'activity_date',
'search_from_date',
'number',
'service_code',
'initial_inspection_date',
'search_to_date',
'File Name',
'Location',
])
csvWriter.writerow([
root[0][0].text,
root[0][1].text,
root[0][2].text,
root[0][3].text,
root[0][4].text,
root[0][5].text,
root[0][6].text,
root[0][7].text,
root[0][8].text,
root[0][9].text,
root[0][10].text,
root[0][11].text,
root[0][12].text,
root[0][13].text,
root[0][14].text,
root[0][15].text,
root[0][16].text,
root[0][17].text,
root[0][18].text,
root[0][19].text,
root[0][20].text,
root[2].text,
root[1].text,
])
csvFile.close()
for file in os.listdir('D:\\temp\\Export\\test'):
if file.endswith('.xml'):
path = os.path.join('D:\\temp\\Export\\test', file)
tree = ElementTree.parse(path)
#print(tree)
root = tree.getroot()
#print(root)
for item in root:
print(item)
writeData(item)
The reason you only see the data of the last xml file, is that you keep overwriting the data in the final .csv file. Instead of reopening the .csv file for every write iteration, try opening it just once, and passing it to your writeData function like this:
def writeData(csv_writer, item):
csv_writer.writerow([
'type',
'object',
'title',
'subject',
'domain',
'name',
'_name',
'version_label',
'creator_name',
'creation_date',
'modifier',
'modify_date',
'content_type',
'chronicle_id',
'antecedent_id',
'activity_date',
'search_from_date',
'number',
'service_code',
'initial_inspection_date',
'search_to_date',
'File Name',
'Location',
])
csv_writer.writerow([
root[0][0].text,
root[0][1].text,
root[0][2].text,
root[0][3].text,
root[0][4].text,
root[0][5].text,
root[0][6].text,
root[0][7].text,
root[0][8].text,
root[0][9].text,
root[0][10].text,
root[0][11].text,
root[0][12].text,
root[0][13].text,
root[0][14].text,
root[0][15].text,
root[0][16].text,
root[0][17].text,
root[0][18].text,
root[0][19].text,
root[0][20].text,
root[2].text,
root[1].text,
])
with open('D:\\metadata.csv', 'w', newline='') as csv_file:
csv_writer = csv.writer(csv_file, delimiter='|', lineterminator='\n')
for file in os.listdir('D:\\temp\\Export\\test'):
if file.endswith('.xml'):
path = os.path.join('D:\\temp\\Export\\test', file)
tree = ElementTree.parse(path)
root = tree.getroot()
writeData(csv_writer, root)

Python OrderedDict to CSV: Eliminating Blank Lines

When I run this code...
from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
cons = sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2")
import csv
with open('c:\test.csv', 'w') as csvfile:
fieldnames = ['contact_name__c', 'recordtypeid']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for con in cons['records']:
writer.writerow({'contact_name__c': con['Id'], 'recordtypeid': '082I8294817IWfiIWX'})
print('done')
I get the following output inside my CSV file...
contact_name__c,recordtypeid
xyzzyID1xyzzy,082I8294817IWfiIWX
abccbID2abccb,082I8294817IWfiIWX
I'm not sure why those extra lines are there.
Any tips for getting rid of them so my CSV file will be normal-looking?
I'm on Python 3.4.3 according to sys.version_info.
Here are a few more code-and-output pairs, to show the kind of data I'm working with:
from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
print(sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2"))
produces
OrderedDict([('totalSize', 2), ('done', True), ('records', [OrderedDict([('attributes', OrderedDict([('type', 'Contact'), ('url', '/services/data/v29.0/sobjects/Contact/xyzzyID1xyzzy')])), ('Id', 'xyzzyID1xyzzy'), ('Name', 'Person One')]), OrderedDict([('attributes', OrderedDict([('type', 'Contact'), ('url', '/services/data/v29.0/sobjects/Contact/abccbID2abccb')])), ('Id', 'abccbID2abccb'), ('Name', 'Person Two')])])])
and
from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
cons = sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2")
for con in cons['records']:
print(con['Id'])
produces
xyzzyID1xyzzy
abccbID2abccb
Two likely possibilities: the output file needs to be opened in binary mode and/or the writer needs to be told not to use DOS style line endings.
To open the file in binary mode in Python 3 replace your current with open line with:
with open('c:\test.csv', 'w', newline='') as csvfile:
to eliminate the DOS style line endings try:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator="\n")

Categories

Resources