Python : csv unexpected quotes in last element - python

I'm using Python 2.7.6 and learning to use the csv module. My script somehow added unexpected double quotes and several spaces after my last element, when I parsed input file to the CSV output file on my last element in each line. I could not remove those double quotes using regular substitution.
The only way that I could remove the extra double quote is to use:
tmp[1] = tmp[1][:-3]
I don't understand how the extra double quote is added when I parsed my input. Please let me know why or how those double quote were added to the partno when the input file did not have them.
My code:
import re
import csv
fname = "inputfile"
try:
fhand = open(fname)
except:
print 'File cannot be opened:', fname
exit()
domain = []
server = []
model = []
serial = []
dn = []
memsize = []
vid = []
partno = []
csv_out = open('/tmp/out.csv','wb')
writer = csv.writer(csv_out)
for line in fhand:
words = line.split("; ")
tmp_row_list = []
for number in [0,1,2,3,4,5,6,7]:
tmp=words[number].split("=")
if "}" in tmp[1]:
tmp[1] = tmp[1][:-3]
#tmp[1] = re.sub('}','', tmp[1])
if number == 0: domain.append(tmp[1])
if number == 1: server.append(tmp[1])
if number == 2: model.append(tmp[1])
if number == 3: serial.append(tmp[1])
if number == 4: dn.append(tmp[1])
if number == 5: memsize.append(tmp[1])
if number == 6: vid.append(tmp[1])
if number == 7: partno.append(tmp[1])
rows = zip(domain,server,model,serial,dn,memsize,vid,partno)
writer.writerows(rows)
csv_out.close()
Input file:
ffile:#{Ucs=uname; ServerId=4/6; Model=UCSB-B200-M3; Serial=FCH; AssignedToDn=; TotalMemory=98304; Vid=V06; PartNumber=73-14689-04}
ffile:#{Ucs=uname; ServerId=4/7; Model=UCSB-B200-M3; Serial=FCH; AssignedToDn=; TotalMemory=98304; Vid=V06; PartNumber=73-14689-04}
My bad output that has the strange double quotes before I had to remove them (if unrem the line with the re.sub, bad output with double quotes and extra spaces will show up in the last field/element):
uname,4/6,UCSB-B200-M3,FCH,,98304,V06,"73-14689-04
"
uname,4/7,UCSB-B200-M3,FCH,,98304,V06,"73-14689-04
"

Looks like you can simplify that lot.
Given input.txt as:
ffile:#{Ucs=uname; ServerId=4/6; Model=UCSB-B200-M3; Serial=FCH; AssignedToDn=; TotalMemory=98304; Vid=V06; PartNumber=73-14689-04}
ffile:#{Ucs=uname; ServerId=4/7; Model=UCSB-B200-M3; Serial=FCH; AssignedToDn=; TotalMemory=98304; Vid=V06; PartNumber=73-14689-04}
Then using the following:
import re, csv
get_col_vals = re.compile('(?:\w+)=(.*?)[;}]').findall
with open('input.txt') as fin, open('output.csv', 'wb') as fout:
csvout = csv.writer(fout, quoting=csv.QUOTE_NONE)
csvout.writerows(get_col_vals(row) for row in fin)
The resulting output.csv is:
uname,4/6,UCSB-B200-M3,FCH,,98304,V06,73-14689-04
uname,4/7,UCSB-B200-M3,FCH,,98304,V06,73-14689-04

Related

Saving formatted data separated by vertical bars(pipe-delimited) to a file in Python

# Get the filepath from the command line
import sys
F1= sys.argv[1]
F2= sys.argv[2]
"""load the two files for processing"""
action_log=[]
#open first file
with open(F1, 'r') as accounts_file:
main_log = accounts_file.read().splitlines()
split_main_log = [word.split('|') for word in main_log]
#open second file
with open(F2, 'r') as command_file:
command_log = command_file.read().splitlines()
split_command_file = [word.split('|') for word in command_log]
for i in range(0, len(split_command_file)):
if (split_main_log[i][1] == split_command_file[i][3] and split_command_file[i][0] == 'sub'):
if split_main_log[i][2] >= split_command_file[i][1]:
split_main_log[i][2] = int(split_main_log[i][2]) - int(split_command_file[i][1])
elif split_main_log[i][1] == split_command_file[i][3] and split_command_file[i][0] == 'add':
split_main_log[i][2] = int(split_main_log[i][2]) + int(split_command_file[i][1])
for i in range(0,len(split_main_log)):
split_main_log[i] = str(split_main_log[i])
for i in range(0,len(split_main_log)):
split_main_log[i]='|'.join(split_main_log[i])
output_new = ""
output_new = "\n".join(split_main_log)
out_file = open(F1,'w') #openfile
out_file.write(output_new)
I am unsure of why my output has so many vertical bars. I'm just overlooking something and need another eye on it (been at it for a few hours). Any help would be awesome.
How about using csv? and simply telling csv writer to use pipe sign as delimiter?
import csv
with open("test.csv", "w", newline='') as csv_file:
csv_writer = csv.writer(csv_file, delimiter='|')
csv_writer.writerow(["ColumnName1", "ColumnName2", "ColumnName3"])
for i in listOfDictionary:
csv_writer.writerow([i["key1"], i["key2"], i["key3"]])
The code inserts | between each character because it passes a str as an argument to join. The str.join method takes a sequence as an argument and returns a str which is each element of the sequence separated by the instance which called join. You may demonstrate this for yourself by running '|'.join('foobarbaz').
You can start by removing this cast to str:
for i in range(0,len(split_main_log)):
split_main_log[i] = str(split_main_log[i])
This will likely break the str.join call since you're casting the values to int in the parent for loop. You'll need to cast those values to str after you're done with the arithmetic, like this:
for i in range(0, len(split_command_file)):
if (split_main_log[i][1] == split_command_file[i][3] and split_command_file[i][0] == 'sub'):
if split_main_log[i][2] >= split_command_file[i][1]:
split_main_log[i][2] = str(int(split_main_log[i][2]) - int(split_command_file[i][1]))
elif split_main_log[i][1] == split_command_file[i][3] and split_command_file[i][0] == 'add':
split_main_log[i][2] = str(int(split_main_log[i][2]) + int(split_command_file[i][1]))
Please provide a minimum reproducible sample and I can help more. It's difficult to debug this without the input files.

How to search for a string in part of text?

I am trying to search multiple text files for the text "1-2","2-3","3-H" which occur in the last field of the lines of text that start with "play".
An example of the text file is show below
id,ARI201803290
version,2
info,visteam,COL
info,hometeam,ARI
info,site,PHO01
play,1,0,lemad001,22,CFBBX,HR/78/F
play,1,0,arenn001,20,BBX,S7/L+
play,1,0,stort001,12,SBCFC,K
play,1,0,gonzc001,02,SS>S,K
play,1,1,perad001,32,BTBBCX,S9/G
play,1,1,polla001,02,CSX,S7/L+.1-2
play,1,1,goldp001,32,SBFBBB,W.2-3;1-2
play,1,1,lambj001,00,X,D9/F+.3-H;2-H;1-3
play,1,1,avila001,31,BC*BBX,31/G.3-H;2-3
play,2,0,grayj003,12,CC*BS,K
play,2,1,dysoj001,31,BBCBX,43/G
play,2,1,corbp001,31,CBBBX,43/G
play,4,1,avila001,02,SC1>X,S8/L.1-2
For the text file above, I would like the output to be '4' since there are 4 occurrences of "1-2","2-3" and "3-H" in total.
The code I have got so far is below, however I'm not sure where to start with writing a line of code to do this function.
import os
input_folder = 'files' # path of folder containing the multiple text files
# create a list with file names
data_files = [os.path.join(input_folder, file) for file in
os.listdir(input_folder)]
# open csv file for writing
csv = open('myoutput.csv', 'w')
def write_to_csv(line):
print(line)
csv.write(line)
j=0 # initialise as 0
count_of_plate_appearances=0 # initialise as 0
for file in data_files:
with open(file, 'r') as f: # use context manager to open files
for line in f:
lines = f.readlines()
i=0
while i < len(lines):
temp_array = lines[i].rstrip().split(",")
if temp_array[0] == "id":
j=0
count_of_plate_appearances=0
game_id = temp_array[1]
awayteam = lines[i+2].rstrip().split(",")[2]
hometeam = lines[i+3].rstrip().split(",")[2]
date = lines[i+5].rstrip().split(",")[2]
for j in range(i+46,i+120,1): #only check for plate appearances this when temp_array[0] == "id"
temp_array2 = lines[j].rstrip().split(",") #create new array to check for plate apperances
if temp_array2[0] == "play" and temp_array2[2] == "1": # plate apperance occurs when these are true
count_of_plate_appearances=count_of_plate_appearances+1
#print(count_of_plate_appearances)
output_for_csv2=(game_id,date,hometeam, awayteam,str(count_of_plate_appearances))
print(output_for_csv2)
csv.write(','.join(output_for_csv2) + '\n')
i=i+1
else:
i=i+1
j=0
count_of_plate_appearances=0
#quit()
csv.close()
Any suggestions on how I can do this? Thanks in advance!
You can use regex, I put your text in a file called file.txt.
import re
a = ['1-2', '2-3', '3-H'] # What you want to count
find_this = re.compile('|'.join(a)) # Make search string
count = 0
with open('file.txt', 'r') as f:
for line in f.readlines():
count += len(find_this.findall(line)) # Each findall returns the list of things found
print(count) # 7
or a shorter solution: (Credit to wjandrea for hinting the use of a generator)
import re
a = ['1-2', '2-3', '3-H'] # What you want to count
find_this = re.compile('|'.join(a)) # Make search string
with open('file.txt', 'r') as f:
count = sum(len(find_this.findall(line)) for line in f)
print(count) # 7

Read CSV file and filter results

Im writing a script where one of its functions is to read a CSV file that contain URLs on one of its rows. Unfortunately the system that create those CSVs doesn't put double-quotes on values inside the URL column so when the URL contain commas it breaks all my csv parsing.
This is the code I'm using:
with open(accesslog, 'r') as csvfile, open ('results.csv', 'w') as enhancedcsv:
reader = csv.DictReader(csvfile)
for row in reader:
self.uri = (row['URL'])
self.OriCat = (row['Category'])
self.query(self.uri)
print self.URL+","+self.ServerIP+","+self.OriCat+","+self.NewCat"
This is a sample URL that is breaking up the parsing - this URL comes on the row named "URL". (note the commas at the end)
ams1-ib.adnxs.com/ww=1238&wh=705&ft=2&sv=43&tv=view5-1&ua=chrome&pl=mac&x=1468251839064740641,439999,v,mac,webkit_chrome,view5-1,0,,2,
The following row after the URL always come with a numeric value between parenthesis. Ex: (9999) so this could be used to define when the URL with commas end.
How can i deal with a situation like this using the csv module?
You will have to do it a little more manually. Try this
def process(lines, delimiter=','):
header = None
url_index_from_start = None
url_index_from_end = None
for line in lines:
if not header:
header = [l.strip() for l in line.split(delimiter)]
url_index_from_start = header.index('URL')
url_index_from_end = len(header)-url_index_from_start
else:
data = [l.strip() for l in line.split(delimiter)]
url_from_start = url_index_from_start
url_from_end = len(data)-url_index_from_end
values = data[:url_from_start] + data[url_from_end+1:] + [delimiter.join(data[url_from_start:url_from_end+1])]
keys = header[:url_index_from_start] + header[url_index_from_end+1:] + [header[url_index_from_start]]
yield dict(zip(keys, values))
Usage:
lines = ['Header1, Header2, URL, Header3',
'Content1, "Content2", abc,abc,,abc, Content3']
result = list(process(lines))
assert result[0]['Header1'] == 'Content1'
assert result[0]['Header2'] == '"Content2"'
assert result[0]['Header3'] == 'Content3'
assert result[0]['URL'] == 'abc,abc,,abc'
print(result)
Result:
>>> [{'URL': 'abc,abc,,abc', 'Header2': '"Content2"', 'Header3': 'Content3', 'Header1': 'Content1'}]
Have you considered using Pandas to read your data in?
Another possible solution would be to use regular expressions to pre-process the data...
#make a list of everything you want to change
old = re.findall(regex, f.read())
#append quotes and create a new list
new = []
for url in old:
url2 = "\""+url+"\""
new.append(url2)
#combine the lists
old_new = list(zip(old,new))
#Then use the list to update the file:
f = open(filein,'r')
filedata = f.read()
f.close()
for old,new in old_new:
newdata = filedata.replace(old,new)
f = open(filein,'w')
f.write(newdata)
f.close()

Reading comma separated values from text file in python

I have a text file consisting of 100 records like
fname,lname,subj1,marks1,subj2,marks2,subj3,marks3.
I need to extract and print lname and marks1+marks2+marks3 in python. How do I do that?
I am a beginner in python.
Please help
When I used split, i got an error saying
TypeError: Can't convert 'type' object to str implicitly.
The code was
import sys
file_name = sys.argv[1]
file = open(file_name, 'r')
for line in file:
fname = str.split(str=",", num=line.count(str))
print fname
If you want to do it that way, you were close. Is this what you were trying?
file = open(file_name, 'r')
for line in file.readlines():
fname = line.rstrip().split(',') #using rstrip to remove the \n
print fname
Note: its not a tested code. but it tries to solve your problem. Please give it a try
import csv
with open(file_name, 'rb') as csvfile:
marksReader = csv.reader(csvfile)
for row in marksReader:
if len(row) < 8: # 8 is the number of columns in your file.
# row has some missing columns or empty
continue
# Unpack columns of row; you can also do like fname = row[0] and lname = row[1] and so on ...
(fname,lname,subj1,marks1,subj2,marks2,subj3,marks3) = *row
# you can use float in place of int if marks contains decimals
totalMarks = int(marks1) + int(marks2) + int(marks3)
print '%s %s scored: %s'%(fname, lname, totalMarks)
print 'End.'
"""
sample file content
poohpool#signet.com; meixin_kok#hotmail.com; ngai_nicole#hotmail.com; isabelle_gal#hotmail.com; michelle-878#hotmail.com;
valerietan98#gmail.com; remuskan#hotmail.com; genevieve.goh#hotmail.com; poonzheng5798#yahoo.com; burgergirl96#hotmail.com;
insyirah_powergals#hotmail.com; little_princess-angel#hotmail.com; ifah_duff#hotmail.com; tweety_butt#hotmail.com;
choco_ela#hotmail.com; princessdyanah#hotmail.com;
"""
import pandas as pd
file = open('emaildump.txt', 'r')
for line in file.readlines():
fname = line.split(';') #using split to form a list
#print(fname)
df1 = pd.DataFrame(fname,columns=['Email'])
print(df1)

Problems with Python's file.write() method and string handling

The problem I am having at this point in time (being new to Python) is writing strings to a text file. The issue I'm experiencing is one where either the strings don't have linebreaks inbetween them or there is a linebreak after every character. Code to follow:
import string, io
FileName = input("Arb file name (.txt): ")
MyFile = open(FileName, 'r')
TempFile = open('TempFile.txt', 'w', encoding='UTF-8')
for m_line in MyFile:
m_line = m_line.strip()
m_line = m_line.split(": ", 1)
if len(m_line) > 1:
del m_line[0]
#print(m_line)
MyString = str(m_line)
MyString = MyString.strip("'[]")
TempFile.write(MyString)
MyFile.close()
TempFile.close()
My input looks like this:
1 Jargon
2 Python
3 Yada Yada
4 Stuck
My output when I do this is:
JargonPythonYada YadaStuck
I then modify the source code to this:
import string, io
FileName = input("Arb File Name (.txt): ")
MyFile = open(FileName, 'r')
TempFile = open('TempFile.txt', 'w', encoding='UTF-8')
for m_line in MyFile:
m_line = m_line.strip()
m_line = m_line.split(": ", 1)
if len(m_line) > 1:
del m_line[0]
#print(m_line)
MyString = str(m_line)
MyString = MyString.strip("'[]")
#print(MyString)
TempFile.write('\n'.join(MyString))
MyFile.close()
TempFile.close()
Same input and my output looks like this:
J
a
r
g
o
nP
y
t
h
o
nY
a
d
a
Y
a
d
aS
t
u
c
k
Ideally, I would like each of the words to appear on a seperate line without the numbers in front of them.
Thanks,
MarleyH
You have to write the '\n' after each line, since you're stripping the original '\n';
Your idea of using '\n'.join() doesn't work because it will use\n to join the string, inserting it between each char of the string. You need a single \n after each name, instead.
import string, io
FileName = input("Arb file name (.txt): ")
with open(FileName, 'r') as MyFile:
with open('TempFile.txt', 'w', encoding='UTF-8') as TempFile:
for line in MyFile:
line = line.strip().split(": ", 1)
TempFile.write(line[1] + '\n')
fileName = input("Arb file name (.txt): ")
tempName = 'TempFile.txt'
with open(fileName) as inf, open(tempName, 'w', encoding='UTF-8') as outf:
for line in inf:
line = line.strip().split(": ", 1)[-1]
#print(line)
outf.write(line + '\n')
Problems:
the result of str.split() is a list (this is why, when you cast it to str, you get ['my item']).
write does not add a newline; if you want one, you have to add it explicitly.

Categories

Resources