Formatting output csv files

Formatting output csv files - python

Could I please get some help on the following problem. I can't seem to spot where I have gone wrong in my code. I have 2 output csv files from my code. The first produces the right format but the second does not:
First output file (fileB in my code)
A,B,C
D,E,F
Second output file (fileC in my code)
A,B,
C
D,E,
F
Here is my code:
file1 = open ('fileA.csv', 'rt', newline = '')
shore_upstream = open('fileB.csv', 'wt', newline = '')
shore_downstream = open('fileC.csv', 'wt', newline = '')
for line in file1:
first_comma = line.find(',')
second_comma = line.find(',', first_comma + 1)
start_coordinate = line [first_comma +1 : second_comma]
start_coordinate_number = int(start_coordinate)
end_coordinte = line [second_comma +1 :]
end_coordinate_number = int (end_coordinte)
upstream_start = start_coordinate_number - 2000
downstream_end = end_coordinate_number + 2000
upstream_start_string = str(upstream_start)
downstring_end_string = str(downstream_end)
upstream_shore = line[:first_comma]+','+ upstream_start_string + ',' + start_coordinate
shore_upstream.write(upstream_shore + '\n')
downstream_shore = line[:first_comma]+ ','+ end_coordinte + ',' + downstring_end_string
shore_downstream.write(downstream_shore + '\n')
file1.close()
shore_upstream.close()
shore_downstream.close()
By the way, I am using python 3.3.

Your variable end_coordinte may contain non-decimal characters in it, and probably contains a \n\t at the end, resulting in that output.
The simplest solution might be to evaluate those strings as a number, and printing them back as strings.
Replace:
upstream_shore = line[:first_comma]+','+ upstream_start_string + ',' + start_coordinate
downstream_shore = line[:first_comma]+ ','+ end_coordinte + ',' + downstring_end_string
by:
upstream_shore = line[:first_comma]+','+ upstream_start_string + ',' + str(start_coordinate_number)
downstream_shore = line[:first_comma]+ ','+ str(end_coordinate_number) + ',' + downstring_end_string
And pay attention to the line[:first_comma] output, as it may also contain characters you are not expecting.

Related

Encountering "List Index out of Range" Exception while Web Scraping via Selenium

I'm scraping data for a data science project using Selenium, and I don't know why I get Index errors on the write-to-csv portion. When I print out the data as-is, the output looks normal.
Code below:
'''
driver = webdriver.Firefox(executable_path="/filepath/geckodriver.exe")
url = 'https://website.com'
driver.get(url)
with open('file.csv', 'w') as f:
f.write('Column1', 'Column2', 'Column3', '\n')
ids = driver.find_elements_by_xpath('//*[#class="id-name"]')
id_list = []
for i in range(50):
id_list.append(ids[i].text)
print(len(ids))
print(len(id_list))
print(id_list[0:50])
# Break up into batches to save memory
new_id_list = [id_list[i:i+5] for i in range(0,len(id_list),5)]
#time.sleep(1200)
for i in range(len(new_id_list)):
for j in range(len(new_id_list[i])):
url = 'http://www.website.com?id=' + str(id_list[j])
driver.get(url)
col1 = driver.find_elements_by_xpath('//*[#id="field-value-col_1"]/span/span')
col2 = driver.find_elements_by_xpath('//h1[#id="field-value-col_2"]')
col3 = driver.find_elements_by_xpath('//*[#id="field-value-col_3"]')
print(id_list[i][j] + ',' + col1[0].text + ',' + col2[0].text + ',' + col3[0].text, '\n')
# This is where I get the error usually.
with open('bugzilla.csv', 'w') as f:
f.write(id_list[i][j] + ',' + col1[0].text + ',' + col2[0].text + ',' + col3[0].text, '\n')
print('Batch of 5')
f.close()
'''

Here
print(id_list[i][j] + ',' + col1[0].text + ',' + col2[0].text + ',' + col3[0].text, '\n')
you work with your id_list as with two-dimensional array while earlier you define it as
id_list = []
for i in range(50):
id_list.append(ids[i].text)
You probably meant: print(new_id_list[i][j] + ',' + col1[0].text + ',' + col2[0].text + ',' + col3[0].text, '\n')

Python3: how convert string to "\x00...."

How to convert string "Серия 1" to string "\x412\x437\x440\x44b\x432\x430\x44f" for write to file.
def create_playlist(playlist):
gplaylist = "[playlist]\n"
playlist1 = json.loads(playlist)
x = 1;
for i in enumerate(playlist1):
for j in enumerate(i[1]['folder']):
gplaylist += "File" + str(x) + "=" + parse_file(j[1]['file']) + "\n"
# Variable: j[1]['title'] must converted to "\x412\x437\x440\x44b\x432\x430\x44f"
gplaylist += "Title" + str(x) + "=" + j[1]['title'] + "\n"
x += 1
gplaylist += "NumberOfEntries=" + str(x-1)
write_playlist(gplaylist)
def write_playlist(playlist):
with io.open('play_list.pls', 'w', encoding='utf-8') as outfile:
outfile.write(to_unicode(playlist))

You should stop playing with encodings where it's not really necessary. Everything works perfectly as it is:
$ python
>>> with open('part1.txt', 'w') as fout :
... fout.write( 'Серия 1\n' )
...
>>>
$ cat part1.txt
Серия 1
$

.split(",") separating every character of a string

At some point of the program I ask it to take the user's text input and separate the text according to it's commas, and then I ",".join it again in a txt file. The idea is to have a list with all the comma separated information.
The problem is that, apparently, when I ",".join it, it separates every single character with commas, so if I've got the string info1,info2 it separates, getting info1 | info2, but then, when joining it back again it ends like i,n,f,o,1,,,i,n,f,o,2, which is highly unconfortable, since it get's the text back from the txt file to show it to the user later in the program. Can anyone help me with that?
categories = open('c:/digitalLibrary/' + connectedUser + '/category.txt', 'a')
categories.write(BookCategory + '\n')
categories.close()
categories = open('c:/digitalLibrary/' + connectedUser + '/category.txt', 'r')
categoryList = categories.readlines()
categories.close()
for category in BookCategory.split(','):
for readCategory in lastReadCategoriesList:
if readCategory.split(',')[0] == category.strip():
count = int(readCategory.split(',')[1])
count += 1
i = lastReadCategoriesList.index(readCategory)
lastReadCategoriesList[i] = category.strip() + "," + str(count).strip()
isThere = True
if not isThere:
lastReadCategoriesList.append(category.strip() + ",1")
isThere = False
lastReadCategories = open('c:/digitalLibrary/' + connectedUser + '/lastReadCategories.txt', 'w')
for category in lastReadCategoriesList:
if category.split(',')[0] != "" and category != "":
lastReadCategories.write(category + '\n')
lastReadCategories.close()
global finalList
finalList.append({"Title":BookTitle + '\n', "Author":AuthorName + '\n', "Borrowed":IsBorrowed + '\n', "Read":readList[len(readList)-1], "BeingRead":readingList[len(readingList)-1], "Category":BookCategory + '\n', "Collection":BookCollection + '\n', "Comments":BookComments + '\n'})
finalList = sorted(finalList, key=itemgetter('Title'))
for i in range(len(finalList)):
categoryList[i] = finalList[i]["Category"]
toAppend = (str(i + 1) + ".").ljust(7) + finalList[i]['Title'].strip()
s.append(toAppend)
categories = open('c:/digitalLibrary/' + connectedUser + '/category.txt', 'w')
for i in range(len(categoryList)):
categories.write(",".join(categoryList[i]))
categories.close()

You should pass ''.join() a list, you are passing in a single string instead.
Strings are sequences too, so ''.join() treats every character as a separate element instead:
>>> ','.join('Hello world')
'H,e,l,l,o, ,w,o,r,l,d'
>>> ','.join(['Hello', 'world'])
'Hello,world'

Python Conditional XML Writing

I am using Python to convert CSV files to XML format. The CSV files have a varying amount of rows ranging anywhere from 2 (including headers) to infinity. (realistically 10-15 but unless there's some major performance issue, I'd like to cover my bases) In order to convert the files I have the following code:
for row in csvData:
if rowNum == 0:
xmlData.write(' <'+csvFile[:-4]+'-1>' + "\n")
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
if rowNum == 1:
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </'+csvFile[:-4]+'-1>' + "\n" + ' <' +csvFile[:-4]+'-2>' + "\n")
if rowNum == 2:
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </'+csvFile[:-4]+'-2>' + "\n")
if rowNum == 3:
for i in range(len(tags)):
xmlData.write('<'+csvFile[:-4]+'-3>' + "\n" + ' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </'+csvFile[:-4]+'-3>' + "\n")
rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()
As you can see, I have the upper-level tags set to be created manually if the row exists. Is there a more efficient way to achieve my goal of creating the <csvFile-*></csvFile-*> tags rather than repeating my code 15+ times? Thanks!

I would use xml.etree.ElementTree or lxml.etree to write the XML. xml.etree.ElementTree is in the standard library, but does not have built-in pretty-printing. (You could use the indent function from here, however).
lxml.etree is a third-party module, but it has built-in pretty-printing in its tostring method.
Using lxml.etree, you could do something like this:
import lxml.etree as ET
csvData = [['foo bar', 'baz quux'],['bing bang', 'bim bop', 'bip burp'],]
csvFile = 'rowboat'
name = csvFile[:-4]
root = ET.Element('csv_data')
for num, tags in enumerate(csvData):
row = ET.SubElement(root, '{f}-{n}'.format(f = name, n = num))
for text in tags:
text = text.replace(' ', '_')
tag = ET.SubElement(row, text)
tag.text = text
print(ET.tostring(root, pretty_print = True))
yields
<csv_data>
<row-0>
<foo_bar>foo_bar</foo_bar>
<baz_quux>baz_quux</baz_quux>
</row-0>
<row-1>
<bing_bang>bing_bang</bing_bang>
<bim_bop>bim_bop</bim_bop>
<bip_burp>bip_burp</bip_burp>
</row-1>
</csv_data>
Some suggestions:
In Python, almost never do you need to say
for i in range(len(tags)):
# do stuff with tags[i]
Instead say
for tag in tags:
to loop over all the items in tags.
Also instead of manually counting the times through a loop with
num = 0
for tags in csvData:
num += 1
instead use the enumerate function:
for num, tags in enumerate(csvData):
Strings like
' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n"
are incredibly difficult to read. It mixes together logic of
indentation, with the XML syntax of tags, with the minutia of end of
line characters. That's where xml.etree.ElementTree or lxml.etree
will help you. It will take care of the serialization of the XML for
you; all you need to provide is the relationship between the XML elements.
The code will be much more readable and easier to maintain.

Python Write To File Missing Lines

I'm having trouble using python to write strings into a file:
(what I'm trying to do is using python to generate some C programs)
The code I have is the following:
filename = "test.txt"
i = 0
string = "image"
tempstr = ""
average1 = "average"
average2 = "average*average"
output = ""
FILE = open(filename,"w")
while i < 20:
j = 0
output = "square_sum = square_sum + "
while j < 20:
tempstr = string + "_" + str(i) + "_" + str(j)
output = output + tempstr + "*" + tempstr + " + " + average2 + " - 2*" + average1 + "*" + tempstr
if j != 19:
output = output + " + "
if j == 19:
output = output + ";"
j = j + 1
output = output + "\n"
i = i + 1
print(output)
FILE.writelines(output)
FILE.close
The print gives me correct output, but the FILE has last line missing and some of the second last line missing. What's the problem in writing strings into file?
Thank you!

Probably help if you called the method...
FILE.close()

The problem is that you aren't calling the close() method, just mentioning it in the last line. You need parens to invoke a function.
Python's with statement can make that unnecessary though:
with open(filename,"w") as the_file:
while i < 20:
j = 0
output = "square_sum = square_sum + "
...
print(output)
the_file.writelines(output)
When the with clause is exited, the_file will be closed automatically.

Try:
with open(filename,"w") as FILE:
while i < 20:
# rest of your code with proper indent...
no close needed...

First, a Pythonified version of your code:
img = 'image_{i}_{j}'
avg = 'average'
clause = '{img}*{img} + {avg}*{avg} - 2*{avg}*{img}'.format(img=img, avg=avg)
clauses = (clause.format(i=i, j=j) for i in xrange(20) for j in xrange(20))
joinstr = '\n + '
output = 'square_sum = {};'.format(joinstr.join(clauses))
fname = 'output.c'
with open(fname, 'w') as outf:
print output
outf.write(output)
Second, it looks like you are hoping to speed up your C code by fanatical inlining. I very much doubt the speed gains will justify your efforts over something like
maxi = 20;
maxj = 20;
sum = 0;
sqsum = 0;
for(i=0; i<maxi; i++)
for(j=0; j<maxj; j++) {
t = image[i][j];
sum += t;
sqsum += t*t;
}
square_sum = sqsum + maxi*maxj*average*average - 2*sum*average;

Looks like your indentation may be incorrect, but just some other comments about your code:
writelines() writes the content of a list or iterator to the file.
Since your outputting a single string, just use write().
lines ["lineone\n", "line two\n"]
f = open("myfile.txt", "w")
f.writelines(lines)
f.close()
Or just:
output = "big long string\nOf something important\n"
f = open("myfile.txt", "w")
f.write(output)
f.close()
As another side note it maybe helpful to use the += operator.
output += "more text"
# is equivalent to
output = output + "more text"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Formatting output csv files - python

Related

Encountering "List Index out of Range" Exception while Web Scraping via Selenium

Python3: how convert string to "\x00...."

.split(",") separating every character of a string

Python Conditional XML Writing

Python Write To File Missing Lines

Categories

Resources