I am trying to process XML files one-by-one in the directory. Basically reading the values and then populate CSV file. I am having trouble parsing each XML one by one. Issue with my code is csvWriter.writerow only write the values from the last XML file in the directory. Even I have a loop for all items in root of ElementTree.parse(path). I want it to write each line for each XML file in the directory.
from lxml import etree as ElementTree
import csv
import os
import errno
import shutil
def writeData(item):
csvFile = open('D:\\metadata.csv', 'w', newline='')
csvWriter = csv.writer(csvFile, delimiter='|',
lineterminator='\n')
csvWriter.writerow([
'type',
'object',
'title',
'subject',
'domain',
'name',
'_name',
'version_label',
'creator_name',
'creation_date',
'modifier',
'modify_date',
'content_type',
'chronicle_id',
'antecedent_id',
'activity_date',
'search_from_date',
'number',
'service_code',
'initial_inspection_date',
'search_to_date',
'File Name',
'Location',
])
csvWriter.writerow([
root[0][0].text,
root[0][1].text,
root[0][2].text,
root[0][3].text,
root[0][4].text,
root[0][5].text,
root[0][6].text,
root[0][7].text,
root[0][8].text,
root[0][9].text,
root[0][10].text,
root[0][11].text,
root[0][12].text,
root[0][13].text,
root[0][14].text,
root[0][15].text,
root[0][16].text,
root[0][17].text,
root[0][18].text,
root[0][19].text,
root[0][20].text,
root[2].text,
root[1].text,
])
csvFile.close()
for file in os.listdir('D:\\temp\\Export\\test'):
if file.endswith('.xml'):
path = os.path.join('D:\\temp\\Export\\test', file)
tree = ElementTree.parse(path)
#print(tree)
root = tree.getroot()
#print(root)
for item in root:
print(item)
writeData(item)
The reason you only see the data of the last xml file, is that you keep overwriting the data in the final .csv file. Instead of reopening the .csv file for every write iteration, try opening it just once, and passing it to your writeData function like this:
def writeData(csv_writer, item):
csv_writer.writerow([
'type',
'object',
'title',
'subject',
'domain',
'name',
'_name',
'version_label',
'creator_name',
'creation_date',
'modifier',
'modify_date',
'content_type',
'chronicle_id',
'antecedent_id',
'activity_date',
'search_from_date',
'number',
'service_code',
'initial_inspection_date',
'search_to_date',
'File Name',
'Location',
])
csv_writer.writerow([
root[0][0].text,
root[0][1].text,
root[0][2].text,
root[0][3].text,
root[0][4].text,
root[0][5].text,
root[0][6].text,
root[0][7].text,
root[0][8].text,
root[0][9].text,
root[0][10].text,
root[0][11].text,
root[0][12].text,
root[0][13].text,
root[0][14].text,
root[0][15].text,
root[0][16].text,
root[0][17].text,
root[0][18].text,
root[0][19].text,
root[0][20].text,
root[2].text,
root[1].text,
])
with open('D:\\metadata.csv', 'w', newline='') as csv_file:
csv_writer = csv.writer(csv_file, delimiter='|', lineterminator='\n')
for file in os.listdir('D:\\temp\\Export\\test'):
if file.endswith('.xml'):
path = os.path.join('D:\\temp\\Export\\test', file)
tree = ElementTree.parse(path)
root = tree.getroot()
writeData(csv_writer, root)
Related
I am programming a discord bot that lets users send an embed to a channel. The embed is split into multiple parts, which I want to safe to a CSV file because I want to add features that require the data to be saved.
The problem is, that when a user executes the command, the first line in the CSV gets overridden with the new content of the command/embed.
I tried to use a different mode to save to the next line. 'w' for writing is the one I am experiencing the problem with. The mode 'a' almost works, but it also adds the field names every time.
The CSV Code:
with open('homework.csv', 'a', newline='') as file:
fieldnames = ['user', 'fach', 'aufgabe', 'abgabedatum']
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'user': str(message.author), 'fach': subject, 'aufgabe': task, 'abgabedatum': date})
The CSV Output using the mode a
user,fach,aufgabe,abgabedatum
user,fach,aufgabe,abgabedatum
Akorian#0187,test,ja mddoiddn ,01.03.2021
user,fach,aufgabe,abgabedatum
Akorian#0187,testddd,ja mddoiddn ,01.03.2021
Try this:
import csv
import os
if os.path.isfile('homework.csv'):
with open('homework.csv', 'a', newline='') as file:
fieldnames = ['user', 'fach', 'aufgabe', 'abgabedatum']
w = csv.DictWriter(file, fieldnames=fieldnames)
w.writerow({'user': str(message.author), 'fach': subject, 'aufgabe': task, 'abgabedatum': date})
else:
with open('homework.csv', 'w', newline='') as file:
fieldnames = ['user', 'fach', 'aufgabe', 'abgabedatum']
w = csv.DictWriter(file, fieldnames=fieldnames)
w.writeheader()
w.writerow({'user': str(message.author), 'fach': subject, 'aufgabe': task, 'abgabedatum': date})
Looking to edit rows in my csv document and I keep getting flip-flopping issues between requiring "bytes-like object is required" and "Error: iterator should return strings, not bytes"
Running Python3
I have tried changing the mode from "rb" to "r" as well as placing generic string texts in the writer.writerow loop.
The CSV file is definitely comma separated, not tab separated.
I am following this youtube tutorial: https://www.youtube.com/watch?v=pOJ1KNTlpzE&t=75s (1:40)
temp_file = NamedTemporaryFile(delete=False)
with open('clientlist.csv','rb') as csvfile, temp_file:
reader = csv.DictReader(csvfile)
fieldnames = ['Account Name','Account Number','Date Last Checked']
writer = csv.DictWriter(temp_file, fieldnames=fieldnames)
writer.writeheader()
print(temp_file.name)
for row in reader:
writer.writerow({
'Account Name': row['Account Name'],
'Account Number': row['Account Number'],
'Date Last Checked': row['Date Last Checked'],
})
#shutil.move(temp_file.name, client_list)
Expected result should make it when I open the temp_file there is data. Then, from what I read, the shuthil should copy it. Right now the temp_file is blank.
Any ideas if it would be easier to start from scratch and use numpy or pandas? Saw this video on that: https://www.youtube.com/watch?v=pbjGo3oj0PM&list=PLulVrUACBIGX8JT7vpoHVQLYqgOKeunb6&index=16&t=0s
According to the NamedTemporaryFile documentation, named temporary files are opened in w+b mode by default - i.e. binary.
Since you are reading and writing csv files, it makes no sense (to me) to operate in binary mode, so rather open the input file in r mode, and ask for a temporary file in w mode:
import csv
import tempfile
temp_file = tempfile.NamedTemporaryFile(mode='w', delete=False) # note the mode argument
with open('clientlist.csv','r') as csvfile, temp_file: #note the mode argument
reader = csv.DictReader(csvfile)
fieldnames = ['Account Name','Account Number','Date Last Checked']
writer = csv.DictWriter(temp_file, fieldnames=fieldnames)
writer.writeheader()
for row in reader:
writer.writerow({
'Account Name': row['Account Name'],
'Account Number': row['Account Number'],
'Date Last Checked': row['Date Last Checked'],
})
That seems to behave for me.
Here they recommend defining the encoding:
Python 3.1.3 Win 7: csv writerow Error "must be bytes or buffer, not str"
Nevertheless, why don't you open the temporary file with open?
Like,
temp_file = open("new_file.csv", 'wb');
I get errror when i try to read numbers like "-1 000,00".
Result is '-1\xa0000,00'.
How to fix my code to clear the error?
def read_csv(filename):
list = []
with open(filename, 'r', encoding='utf-8') as local_file:
fields = ['Account_group_name',
'Current_balance',
'Account_name',
'Transfer_account_name',
'Description',
'Partner_name',
'Category',
'Date',
'Time',
'Memo',
'Sum',
'Currency',
'Face_balance',
'Balance',
]
reader = csv.DictReader(local_file, fields, delimiter=';')
next(reader)
for row in reader:
list.append(row)
return list
Im trying to read a csv file, and create a new cvs file, with the contents of the old cvs file with Python. My Problem is, that all entrys are saved in the first column, and i cant find a way to save the informations in different columns. Here is my code:
import csv
from itertools import zip_longest
fieldnamesOrdered = ['First Name', 'Last Name' , 'Email', 'Phone Number',
'Street Address', 'City', 'State', 'HubSpot Owner', 'Lifecyle Stage', 'Lead
Status', 'Favorite Color']
listOne = []
listTwo = []
with open('Contac.csv', 'r', encoding = 'utf-8') as inputFile,
open('result.csv', 'w', encoding = 'utf-8') as outputFile:
reader = csv.DictReader(inputFile)
writer = csv.writer(outputFile, delimiter = 't')
for row in reader:
listOne.append(row['First Name'])
listTwo.append(row['Last Name'])
dataLists = [listOne, listTwo]
export_data = zip_longest(*dataLists, fillvalue='')
writer.writerow(fieldnamesOrdered)
writer.writerows(export_data)
inputFile.close()
outputFile.close()
Thank you very much for your answers
writer = csv.writer(outputFile, delimiter = 't')
Aren't those entries in the first column additionally interspersed with strange unsolicited 't' characters?
When I run this code...
from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
cons = sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2")
import csv
with open('c:\test.csv', 'w') as csvfile:
fieldnames = ['contact_name__c', 'recordtypeid']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for con in cons['records']:
writer.writerow({'contact_name__c': con['Id'], 'recordtypeid': '082I8294817IWfiIWX'})
print('done')
I get the following output inside my CSV file...
contact_name__c,recordtypeid
xyzzyID1xyzzy,082I8294817IWfiIWX
abccbID2abccb,082I8294817IWfiIWX
I'm not sure why those extra lines are there.
Any tips for getting rid of them so my CSV file will be normal-looking?
I'm on Python 3.4.3 according to sys.version_info.
Here are a few more code-and-output pairs, to show the kind of data I'm working with:
from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
print(sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2"))
produces
OrderedDict([('totalSize', 2), ('done', True), ('records', [OrderedDict([('attributes', OrderedDict([('type', 'Contact'), ('url', '/services/data/v29.0/sobjects/Contact/xyzzyID1xyzzy')])), ('Id', 'xyzzyID1xyzzy'), ('Name', 'Person One')]), OrderedDict([('attributes', OrderedDict([('type', 'Contact'), ('url', '/services/data/v29.0/sobjects/Contact/abccbID2abccb')])), ('Id', 'abccbID2abccb'), ('Name', 'Person Two')])])])
and
from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
cons = sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2")
for con in cons['records']:
print(con['Id'])
produces
xyzzyID1xyzzy
abccbID2abccb
Two likely possibilities: the output file needs to be opened in binary mode and/or the writer needs to be told not to use DOS style line endings.
To open the file in binary mode in Python 3 replace your current with open line with:
with open('c:\test.csv', 'w', newline='') as csvfile:
to eliminate the DOS style line endings try:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator="\n")