Python 2.7 CSV writer issue - python

I have some Python code that lists pull requests in Github. If I print the parsed json output to the console, I get the expected results, but when I output the parsed json to a csv file, I'm not getting the same results. They are cut off after the sixth result (and that varies).
What I'm trying to do is overwrite the csv each time with the latest output.
Also, I'm dealing with unicode output which I use unicodecsv for. I don't know if this is throwing the csv output off.
I will list both instances of the relevant piece of code with the print statement and with the csv code.
Thanks for any help.
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
import csv
import unicodecsv
for pr in result:
data = pr.as_dict()
changes = (gh.repository('my-repo', repo).pull_request(data['number'])).as_dict()
if changes['commits'] == 1 and changes['changed_files'] == 1:
#keep print to console for testing purposes
print "Login: " + changes['user']['login'] + '\n' + "Title: " + changes['title'] + '\n' + "Changed Files: " + str(changes['changed_files']) + '\n' + "Commits: " + str(changes['commits']) + '\n'
With csv:
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
import csv
import unicodecsv
for pr in result:
data = pr.as_dict()
changes = (gh.repository('my-repo', repo).pull_request(data['number'])).as_dict()
if changes['commits'] == 1 and changes['changed_files'] == 1:
with open('c:\pull.csv', 'r+') as f:
csv_writer = unicodecsv.writer(f, encoding='utf-8')
csv_writer.writerow(['Login', 'Title', 'Changed files', 'Commits'])
for i in changes['user']['login'], changes['title'], str(changes['changed_files']), str(changes['commits']) :
csv_writer.writerow([changes['user']['login'], changes['title'],changes['changed_files'], changes['commits']])

The problem is with the way you write data to file.
Every time you open file in r+ mode you will overwrite the last written rows.
And for dealing with JSON

Related

Python Encoding Issue with JSON and CSV

I am having an encoding issue when I run my script below:
Here is the error code:
-UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 9: ordinal not in range(128)
Here is my script:
import logging
import urllib
import csv
import json
import io
import codecs
with open('/home/local/apple.csv',
'rb') as csvinput:
reader = csv.reader(csvinput, delimiter=',')
firstline = True
for row in reader:
if firstline:
firstline = False
continue
address1 = row[0]
print row[0]
locality = row[1]
admin_area = row[2]
query = ' '.join(str(x) for x in (address1, locality, admin_area))
normalized = query.replace(" ", "+")
BaseURL = 'http://localhost:8080/verify?country=JP&freeform='
URL = BaseURL + normalized
print URL
data = urllib.urlopen(URL)
response = data.getcode()
print response
if response == 200:
file= json.load(data)
print file
output_f=open('output.csv','wb')
csvwriter=csv.writer(output_f)
count = 0
for f in file:
if count == 0:
header= f.keys()
csvwriter.writerow(header)
count += 1
csvwriter.writerow(f.values())
output_f.close()
else:
print 'error'
can anyone help me fix this its getting really annoying. I need to encode to utf8
Looks like you are using Python 2.x, instead of python's standard open, use codecs.open where you can optionally pass an encoding to use and what to do when there are errors. Gets a little less confusing in Python 3 where the standard Python open can do this.
So in your two lines where you are opening, do:
with codecs.open('/home/local/apple.csv',
'rb', 'utf-8') as csvinput:
output_f = codecs.open('output.csv','wb', 'utf-8')
The optional error parm defaults to "strict" which raises an exception if the bytes can't be mapped to the given encoding. In some contexts you may want to use 'ignore' or 'replace'.
See the python doc for a bit more info.

Segmentation Fault

I am using python 2.4.4 (old machine, can't do anything about it) on a UNIX machine. I am extremely new to python/programming and have never used a UNIX machine before. This is what I am trying to do:
extract a single sequence from a FASTA file (proteins + nucleotides) to a temporary text file.
Give this temporary file to a program called 'threader'
Append the output from threader (called tempresult.out) to a file called results.out
Remove the temporary file.
Remove the tempresult.out file.
Repeat using the next FASTA sequence.
Here is my code so far:
import os
from itertools import groupby
input_file = open('controls.txt', 'r')
output_file = open('results.out', 'a')
def fasta_parser(fasta_name):
input = fasta_name
parse = (x[1] for x in groupby(input, lambda line: line[0] == ">"))
for header in parse:
header = header.next()[0:].strip()
seq = "\n".join(s.strip() for s in parse.next())
yield (header, '\n', seq)
parsedfile = fasta_parser(input_file)
mylist = list(parsedfile)
index = 0
while index < len(mylist):
temp_file = open('temp.txt', 'a+')
temp_file.write(' '.join(mylist[index]))
os.system('threader' + ' temp.txt' + ' tempresult.out' + ' structures.txt')
os.remove('temp.txt')
f = open('tempresult.out', 'r')
data = str(f.read())
output_file.write(data)
os.remove('tempresult.out')
index +=1
output_file.close()
temp_file.close()
input_file.close()
When I run this script I get the error 'Segmentation Fault'. From what I gather this is to do with me messing with memory I shouldn't be messing with (???). I assume it is something to do with the temporary files but I have no idea how I would get around this.
Any help would be much appreciated!
Thanks!
Update 1:
Threader works fine when I give it the same sequence multiple times like this:
import os
input_file = open('control.txt', 'r')
output_file = open('results.out', 'a')
x=0
while x<3:
os.system('threader' + ' control.txt' + ' tempresult.out' + ' structures.txt')
f = open('tempresult.out', 'r')
data = str(f.read())
output_file.write(data)
os.remove('result.out')
x += 1
output_file.close()
input_file.close()
Update 2: In the event that someone else gets this error. I forgot to close temp.txt before invoking the threader program.

IOError: [Errno 22] invalid mode ('w') or filename

I am getting this error thrown when trying to make a file. It is being designed to take a created .csv file and put it into a plain text file.
I would like it to create a new file after it has been run with the date and time stamp but I seem to get the Errno 22 when trying to generate the file.
Any ideas?
import csv
import time
f = open(raw_input('Enter file name: '),"r")
saveFile = open ('Bursarcodes_'+time.strftime("%x")+ '_'+time.strftime("%X")+
'.txt', 'w+')
csv_f = csv.reader(f)
for row in csv_f:
saveFile.write( 'insert into bursarcode_lookup(bursarcode, note_id)' +
' values (\'' + row[0] + '\', ' + row[1] + ')\n')
f.close()
saveFile.close()
You cannot have slashes (/) and colons (:, but allowed in Unix) in your file name, but they are exactly what strftime generates in its output.
Python tries to help you, it says:
No such file or directory: 'Bursarcodes_01/09/15_19:59:24.txt'
Replace time.strftime("%x") with this:
time.strftime("%x").replace('/', '.')
...and time.strftime("%X") with this:
time.strftime("%X").replace(':', '_')
A cleaned-up and extended version:
import csv
import sys
import time
def make_output_fname():
# Thanks to #Andrew:
return time.strftime("Bursarcodes_%x_%X.txt").replace("/", "-").replace(":", "-")
def main(csv_fname=None, outfname=None, *args):
if not csv_fname:
# first arg not given - prompt for filename
csv_fname = raw_input("Enter .csv file name: ")
if not outfname:
# second arg not given - use serialized filename
outfname = make_output_fname()
with open(csv_fname) as inf, open(outfname, "w") as outf:
incsv = csv.reader(inf)
for row in incsv:
outf.write(
"insert into bursarcode_lookup(bursarcode, note_id) values ('{0}', '{1}')\n"
.format(*row)
)
if __name__=="__main__":
# pass any command-line arguments to main()
main(*sys.argv[1:])
You can now run it from the command-line as well.
Note that if any data items in your csv file contain unescaped single-quotes (') you will get invalid sql.

tab delimited to csv

I am able to get this to output my MYSQL command which I have removed for security, however I keep getting an error when I try and write this tab delimited output to a CSV. Any help to boost the Python rookie would be appreciated.
#!/usr/bin/pytho
import sys, csv
import MySQLdb
import os
import mysql.connector
import subprocess
import string
if __name__ == '__main__':
du = sys.argv[1]
csv_home = '/home/oatey/bundle_' + du + '.csv'
input = sys.stdin
output = sys.stdout
#read and rewrite to file with arguement
new = open("/home/oatey/valid.sql2", "w")
with open("/home/oatey/bundle.sql")as write_query:
#read_file = write_query.read()
for line in write_query:
lr = line.replace('{$$}', du)
print lr
new.write(lr)
new.close()
write_query.close()
with open("/home/oatey/valid.sql2") as w:
mysql_output = subprocess.check_output(MYSQL_COMMAND, stdin=w)
#print mysql_output
b = open("/home/oatey/" + du + ".txt", "r+")
#",".join("%s" % i for i in mysql_output
b.write(mysql_output)
print mysql_output
b.close()
#read tab-delimited file
with open("/home/oatey/" + du + ".txt", 'rb') as data:
cr = data.readlines()
contents = [line for line in cr]
with open("/home/oatey/" + du + ".csv", "wb") as wd:
cw = csv.writer(wd, quotechar='', quoting=csv.QUOTE_NONE)
wd.write(contents)
I bet the error you are getting is:
TypeError: must be string or buffer, not list
contents is a list, you cannot write a list via write(). Quote from docs:
file.write(str)
Write a string to the file.
Instead, use csvwriter.writerows():
with open("/home/oatey/" + du + ".csv", "wb") as wd:
cw = csv.writer(wd, quotechar='', quoting=csv.QUOTE_NONE)
cw.writerows(contents)

Using Argparse to create file converter in Python

I have to use the command prompt and python to recieve an input in the form of a csv file, then read it and convert it into a xml file with the same name as the csv file except with .xml file extension or the user can set the ouput file name and path using the -o --output optional command line argument. Well i have searched on google for days, and so far my program allows me to input command line arguments and i can convert the csv to an xml file but it doesn't print it using the same name as the csv file or when the user sets the name. Instead it just prints out a blank file. Here is my code:
import sys, argparse
import csv
import indent
from xml.etree.ElementTree import ElementTree, Element, SubElement, Comment, tostring
parser=argparse.ArgumentParser(description='Convert wordlist text files to various formats.', prog='Text Converter')
parser.add_argument('-v','--verbose',action='store_true',dest='verbose',help='Increases messages being printed to stdout')
parser.add_argument('-c','--csv',action='store_true',dest='readcsv',help='Reads CSV file and converts to XML file with same name')
parser.add_argument('-x','--xml',action='store_true',dest='toxml',help='Convert CSV to XML with different name')
parser.add_argument('-i','--inputfile',type=argparse.FileType('r'),dest='inputfile',help='Name of file to be imported',required=True)
parser.add_argument('-o','--outputfile',type=argparse.FileType('w'),dest='outputfile',help='Output file name')
args = parser.parse_args()
def main(argv):
reader = read_csv()
if args.verbose:
print ('Verbose Selected')
if args.toxml:
if args.verbose:
print ('Convert to XML Selected')
generate_xml(reader)
if args.readcsv:
if args.verbose:
print ('Reading CSV file')
read_csv()
if not (args.toxml or args.readcsv):
parser.error('No action requested')
return 1
def read_csv():
with open ('1250_12.csv', 'r') as data:
return list(csv.reader(data))
def generate_xml(reader):
root = Element('Solution')
root.set('version','1.0')
tree = ElementTree(root)
head = SubElement(root, 'DrillHoles')
head.set('total_holes', '238')
description = SubElement(head,'description')
current_group = None
i = 0
for row in reader:
if i > 0:
x1,y1,z1,x2,y2,z2,cost = row
if current_group is None or i != current_group.text:
current_group = SubElement(description, 'hole',{'hole_id':"%s"%i})
collar = SubElement (current_group, 'collar',{'':', '.join((x1,y1,z1))}),
toe = SubElement (current_group, 'toe',{'':', '.join((x2,y2,z2))})
cost = SubElement(current_group, 'cost',{'':cost})
i+=1
indent.indent(root)
tree.write(open('hole.xml','w'))
if (__name__ == "__main__"):
sys.exit(main(sys.argv))
for the generate_xml() function, you can ignore it since it accepts csv files formatted a certain way so you might not understand it but, i think the problem lies in tree.write() since that part generates the xml file with a name that is written in the code itself and not the arguments at the command prompt.
You need to pass a file argument to generate_xml(). You appear to have the output file in args.outputfile.
generate_xml(reader, args.outputfile)
...
def generate_xml(reader, outfile):
...
tree.write(outfile)
You should probably also make use of args.inputfile:
reader = read_csv(args.inputfile)
...
def read_csv(inputfile):
return list(csv.reader(inputfile))
And this line does not do anything useful, it processes the .csv file, but doesn't do anything with the results:
read_csv()
The following code has been adapted from FB36's recipie on code.activestate.com
It will do what you need and you don't have to worry about the headers in the csv file, though there should only be one header (the first row) in the csv file. Have a look at the bottom of this page if you want to do batch conversion.
'''Convert csv to xml file
csv2xml.py takes two arguments:
1. csvFile: name of the csv file (may need to specify path to file)
2. xmlFile: name of the desired xml file (path to destination can be specified)
If only the csv file is provided, its name is used for the xml file.
Command line usage:
example1: python csv2xml.py 'fileName.csv' 'desiredName.xml'
example2: python csv2xml.py '/Documents/fileName.csv' '/NewFolder/desiredName.xml'
example3: python csv2xml.py 'fileName.csv'
This code has been adapted from: http://code.activestate.com/recipes/577423/
'''
import csv
def converter(csvFile, xmlFile):
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<csv_data>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
xmlData.write('<row>' + "\n")
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</row>' + "\n")
rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()
## for using csv2xml.py from the command line
if __name__ == '__main__':
import sys
if len(sys.argv)==2:
import os
csvFile = sys.argv[1]
xmlFile = os.path.splitext(csvFile)[0] + '.xml'
converter(csvFile,xmlFile)
elif len(sys.argv)==3:
csvFile = sys.argv[1]
xmlFile = sys.argv[2]
converter(csvFile,xmlFile)
else:
print __doc__

Categories

Resources