Python Conditional XML Writing - python

I am using Python to convert CSV files to XML format. The CSV files have a varying amount of rows ranging anywhere from 2 (including headers) to infinity. (realistically 10-15 but unless there's some major performance issue, I'd like to cover my bases) In order to convert the files I have the following code:
for row in csvData:
if rowNum == 0:
xmlData.write(' <'+csvFile[:-4]+'-1>' + "\n")
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
if rowNum == 1:
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </'+csvFile[:-4]+'-1>' + "\n" + ' <' +csvFile[:-4]+'-2>' + "\n")
if rowNum == 2:
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </'+csvFile[:-4]+'-2>' + "\n")
if rowNum == 3:
for i in range(len(tags)):
xmlData.write('<'+csvFile[:-4]+'-3>' + "\n" + ' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </'+csvFile[:-4]+'-3>' + "\n")
rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()
As you can see, I have the upper-level tags set to be created manually if the row exists. Is there a more efficient way to achieve my goal of creating the <csvFile-*></csvFile-*> tags rather than repeating my code 15+ times? Thanks!

I would use xml.etree.ElementTree or lxml.etree to write the XML. xml.etree.ElementTree is in the standard library, but does not have built-in pretty-printing. (You could use the indent function from here, however).
lxml.etree is a third-party module, but it has built-in pretty-printing in its tostring method.
Using lxml.etree, you could do something like this:
import lxml.etree as ET
csvData = [['foo bar', 'baz quux'],['bing bang', 'bim bop', 'bip burp'],]
csvFile = 'rowboat'
name = csvFile[:-4]
root = ET.Element('csv_data')
for num, tags in enumerate(csvData):
row = ET.SubElement(root, '{f}-{n}'.format(f = name, n = num))
for text in tags:
text = text.replace(' ', '_')
tag = ET.SubElement(row, text)
tag.text = text
print(ET.tostring(root, pretty_print = True))
yields
<csv_data>
<row-0>
<foo_bar>foo_bar</foo_bar>
<baz_quux>baz_quux</baz_quux>
</row-0>
<row-1>
<bing_bang>bing_bang</bing_bang>
<bim_bop>bim_bop</bim_bop>
<bip_burp>bip_burp</bip_burp>
</row-1>
</csv_data>
Some suggestions:
In Python, almost never do you need to say
for i in range(len(tags)):
# do stuff with tags[i]
Instead say
for tag in tags:
to loop over all the items in tags.
Also instead of manually counting the times through a loop with
num = 0
for tags in csvData:
num += 1
instead use the enumerate function:
for num, tags in enumerate(csvData):
Strings like
' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n"
are incredibly difficult to read. It mixes together logic of
indentation, with the XML syntax of tags, with the minutia of end of
line characters. That's where xml.etree.ElementTree or lxml.etree
will help you. It will take care of the serialization of the XML for
you; all you need to provide is the relationship between the XML elements.
The code will be much more readable and easier to maintain.

Related

How can I iterate the "<row></row>" tag in my XML file?

The below code will work just fine. However, the resulting xml will have each row tagged all the same for each record. I need this tag to be unique. My intent was to have the tag read <row 1></row 1>, <row 2></row 2>, ... I've commented out what I attempted but I get a type error when I try to run this in python. does anyone know a fix for this issue?
import csv
csvFile = 'BySystem.csv'
xmlFile = 'BySystem.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
#xmlData.write('<row ' + rowNum + '>' + "\n")
xmlData.write('<row>' + "\n")
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
#xmlData.write('</row ' + rowNum + '>' + "\n")
xmlData.write('</row>' + "\n")
rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()
rowNum is int, you can not simply concatenate str with int, either convert int to str before concatenation that is replace
xmlData.write('<row ' + rowNum + '>' + "\n")
xmlData.write('</row ' + rowNum + '>' + "\n")
using
xmlData.write('<row ' + str(rowNum) + '>' + "\n")
xmlData.write('</row ' + str(rowNum) + '>' + "\n")
or use one kind of string formatting, e.g. so-called f-string (requires python3.6 or newer)
xmlData.write(f'<row {rowNum}>\n')
xmlData.write(f'</row {rowNum}>\n')

Python : Calculate values and send in email

UPDATE : I have corrected my code and below is working fine as expected
Basically i need an output like below in mail.
I achieved this. but need to know if any efficient code then below one.
name 5001 5010 9000 4 %
name 5002 5010 9000 4 %
name 5003 5010 9000 4 %
name 5004 5010 9000 4 %
Storing the values in list.
Below are dummy values
container = []
for server in range(1,5):
container.append('name')
container.append(server + 5000)
container.append(5000+10)
container.append(4000+5000)
container.append(2500 % 12)
print('\n' + str(container))
Assign list of values to msgBody in order to send it via email
I'm just putting piece of code here. Below also working fine
msgBody1 = ''
for count in range(4):
if count == 0:
tempValue = '\n' + '\n' + str(container[count]) + '\t' + str(container[count+1]) + '\t' + str(container[count+2]) + '\t'
+ str(container[count+3]) + '\t' + str(container[count+4])
msgBody1 = msgBody1 + str(tempValue) + ' %'
elif count == 1:
tempValue = '\n' + '\n' + str(container[count+4]) + '\t' + str(container[count+5]) + '\t' + str(container[count+6]) + '\t'
+ str(container[count+7]) + '\t' + str(container[count+8])
msgBody1 = msgBody1 + str(tempValue) + ' %'
elif count == 2:
tempValue = '\n' + '\n' + str(container[count+8]) + '\t' + str(container[count+9]) + '\t' + str(container[count+10]) + '\t'
+ str(container[count+11]) + '\t' + str(container[count+12])
msgBody1 = msgBody1 + str(tempValue) + ' %'
elif count == 3:
tempValue = '\n' + '\n' + str(container[count+12]) + '\t' + str(container[count+13]) + '\t' + str(container[count+14]) + '\t'
+ str(container[count+15]) + '\t' + str(container[count+16])
msgBody1 = msgBody1 + str(tempValue) + ' %'
Any other better and short code to replace msgBody1
Thanks in advance
Your question is not clear; the code example does not make any sense. But from the structure of it, it seems like you are trying to use dict, but you are defining or sourcing lists.
Not sure why for server in servers, I hope your servers list is collection of numerical value, which does not make any sense.
Please go through list Vs dict, and list.append() and how to add new key, value pairs to dictionary.

Formatting output csv files

Could I please get some help on the following problem. I can't seem to spot where I have gone wrong in my code. I have 2 output csv files from my code. The first produces the right format but the second does not:
First output file (fileB in my code)
A,B,C
D,E,F
Second output file (fileC in my code)
A,B,
C
D,E,
F
Here is my code:
file1 = open ('fileA.csv', 'rt', newline = '')
shore_upstream = open('fileB.csv', 'wt', newline = '')
shore_downstream = open('fileC.csv', 'wt', newline = '')
for line in file1:
first_comma = line.find(',')
second_comma = line.find(',', first_comma + 1)
start_coordinate = line [first_comma +1 : second_comma]
start_coordinate_number = int(start_coordinate)
end_coordinte = line [second_comma +1 :]
end_coordinate_number = int (end_coordinte)
upstream_start = start_coordinate_number - 2000
downstream_end = end_coordinate_number + 2000
upstream_start_string = str(upstream_start)
downstring_end_string = str(downstream_end)
upstream_shore = line[:first_comma]+','+ upstream_start_string + ',' + start_coordinate
shore_upstream.write(upstream_shore + '\n')
downstream_shore = line[:first_comma]+ ','+ end_coordinte + ',' + downstring_end_string
shore_downstream.write(downstream_shore + '\n')
file1.close()
shore_upstream.close()
shore_downstream.close()
By the way, I am using python 3.3.
Your variable end_coordinte may contain non-decimal characters in it, and probably contains a \n\t at the end, resulting in that output.
The simplest solution might be to evaluate those strings as a number, and printing them back as strings.
Replace:
upstream_shore = line[:first_comma]+','+ upstream_start_string + ',' + start_coordinate
downstream_shore = line[:first_comma]+ ','+ end_coordinte + ',' + downstring_end_string
by:
upstream_shore = line[:first_comma]+','+ upstream_start_string + ',' + str(start_coordinate_number)
downstream_shore = line[:first_comma]+ ','+ str(end_coordinate_number) + ',' + downstring_end_string
And pay attention to the line[:first_comma] output, as it may also contain characters you are not expecting.

Not able to wrap lines in Python

else:
fullName = curLineFin[1] + ' ' + curLineFin[2]
players[fullName] = curLineFin[0] + '\t' + curLineFin[1] + \
'\t' + curLineFin[2] + '\t' + curLineFin[3] + '\t' + \
curLineFin[4] + '\t' + curLineFin[5] + '\t' + curLineFin[6] + \
'\t' + curLineFin[7] + '\t' + curLineFin[8] + '\t' + \
curLineFin[9] + '\t' + curLineFin[10] + '\t'
Every time I run the script, I get the error:
players[fullName] = curLineFin[0] + '\t' + curLineFin[1] + \
^
IndentationError: unindent does not match any outer indentation level
Wrap your code with parentheses
players[fullName] = (curLineFin[0] + '\t' + curLineFin[1] +
'\t' + curLineFin[2] + '\t' + curLineFin[3] + '\t' +
curLineFin[4] + '\t' + curLineFin[5] + '\t' + curLineFin[6] +
'\t' + curLineFin[7] + '\t' + curLineFin[8] + '\t' +
curLineFin[9] + '\t' + curLineFin[10] + '\t' )
or
players[fullName] = '\t'.join(curLineFin[:11]) + '\t'
or if this trailing tab char is not needed and you have exactly eleven elements in curLineFin.
players[fullName] = '\t'.join(curLineFin)
Just use parenthesis:
fullName = (curLineFin[1] + ' ' + curLineFin[2] +
players[fullName] = curLineFin[0] + '\t' + curLineFin[1] +
'\t' + curLineFin[2] + '\t' + curLineFin[3] + '\t' +
curLineFin[4] + '\t' + curLineFin[5] + '\t' + curLineFin[6] +
'\t' + curLineFin[7] + '\t' + curLineFin[8] + '\t' +
curLineFin[9] + '\t' + curLineFin[10] + '\t')
The code you have posted does not generate that error, so it's impossible to diagnose exactly what's happening in the different code you're actually running.
The most likely cause is that it's completely unrelated to the backslashes, and you're doing something like mixing tabs and spaces. (The fact that you're using a weird 6-character indent for the block isn't a good sign…)
Another possibility is that you're putting extra spaces after one of the backslashes. This should usually give you a SyntaxError: unexpected character after line continuation character, but it's possible to confuse Python to the point where that passes and you get the following generic SyntaxError for a + with no right operand or IndentationError for the next line.

.split(",") separating every character of a string

At some point of the program I ask it to take the user's text input and separate the text according to it's commas, and then I ",".join it again in a txt file. The idea is to have a list with all the comma separated information.
The problem is that, apparently, when I ",".join it, it separates every single character with commas, so if I've got the string info1,info2 it separates, getting info1 | info2, but then, when joining it back again it ends like i,n,f,o,1,,,i,n,f,o,2, which is highly unconfortable, since it get's the text back from the txt file to show it to the user later in the program. Can anyone help me with that?
categories = open('c:/digitalLibrary/' + connectedUser + '/category.txt', 'a')
categories.write(BookCategory + '\n')
categories.close()
categories = open('c:/digitalLibrary/' + connectedUser + '/category.txt', 'r')
categoryList = categories.readlines()
categories.close()
for category in BookCategory.split(','):
for readCategory in lastReadCategoriesList:
if readCategory.split(',')[0] == category.strip():
count = int(readCategory.split(',')[1])
count += 1
i = lastReadCategoriesList.index(readCategory)
lastReadCategoriesList[i] = category.strip() + "," + str(count).strip()
isThere = True
if not isThere:
lastReadCategoriesList.append(category.strip() + ",1")
isThere = False
lastReadCategories = open('c:/digitalLibrary/' + connectedUser + '/lastReadCategories.txt', 'w')
for category in lastReadCategoriesList:
if category.split(',')[0] != "" and category != "":
lastReadCategories.write(category + '\n')
lastReadCategories.close()
global finalList
finalList.append({"Title":BookTitle + '\n', "Author":AuthorName + '\n', "Borrowed":IsBorrowed + '\n', "Read":readList[len(readList)-1], "BeingRead":readingList[len(readingList)-1], "Category":BookCategory + '\n', "Collection":BookCollection + '\n', "Comments":BookComments + '\n'})
finalList = sorted(finalList, key=itemgetter('Title'))
for i in range(len(finalList)):
categoryList[i] = finalList[i]["Category"]
toAppend = (str(i + 1) + ".").ljust(7) + finalList[i]['Title'].strip()
s.append(toAppend)
categories = open('c:/digitalLibrary/' + connectedUser + '/category.txt', 'w')
for i in range(len(categoryList)):
categories.write(",".join(categoryList[i]))
categories.close()
You should pass ''.join() a list, you are passing in a single string instead.
Strings are sequences too, so ''.join() treats every character as a separate element instead:
>>> ','.join('Hello world')
'H,e,l,l,o, ,w,o,r,l,d'
>>> ','.join(['Hello', 'world'])
'Hello,world'

Categories

Resources