IndexError: list index out of range csv reader python - python

I have the following csv called report.csv. It's an excel file:
email agent_id misc
test#email.com 65483843154f35d54 blah1
test1#email.com sldd989eu99ufj9ej9e blah 2
I have the following code:
import csv
data_file = 'report.csv'
def import_data(data_file):
attendee_data = csv.reader(open(data_file, 'rU'), dialect=csv.excel_tab)
for row in attendee_data:
email = row[1]
agent_id = row[2]
pdf_file_name = agent_id + '_' + email + '.pdf'
generate_certificate(email, agent_id, pdf_file_name)
I get the following error:
Traceback (most recent call last):
File "report_test.py", line 56, in <module>
import_data(data_file)
File "report_test.py", line 25, in import_data
email = row[1]
IndexError: list index out of range
I thought the index was the number of columns in, within each row. row[1] and 'row[2]` should be within range, no?

There is most likely a blank line in your CSV file. Also, list indices start at 0, not 1.
import csv
data_file = 'report.csv'
def import_data(data_file):
attendee_data = csv.reader(open(data_file, 'rU'), dialect=csv.excel_tab)
for row in attendee_data:
try:
email = row[0]
agent_id = row[1]
except IndexError:
pass
else:
pdf_file_name = agent_id + '_' + email + '.pdf'
generate_certificate(email, agent_id, pdf_file_name)

You say you have an "Excel CSV", which I don't quite understand so I'll answer assuming you have an actual .csv file.
If I'm loading a .csv into memory (and the file isn't enormous), I'll often have a load_file method on my class that doesn't care about indexes.
Assuming the file has a header row:
import csv
def load_file(filename):
# Define data in case the file is empty.
data = []
with open(filename) as csvfile:
reader = csv.reader(csvfile)
headers = next(reader)
data = [dict(zip(headers, row)) for row in reader]
return data
This returns a list of dictionaries you can use by key, instead of index. The key will be absent in the event, say misc is missing from the row (index 2), so simply .get from the row. This is cleaner than a try...except.
for row in data:
email = row.get('email')
agent_id = row.get('agent_id')
misc = row.get('misc')
This way the order of the file columns don't matter, only the headers do. Also, if any of the columns have a blank value, your script won't error out by giving an IndexError. If you don't want to include blank values, simply handle them by checking:
if not email:
do.something()
if not agent_id:
do.something_else()

Related

extracting row from CSV file with Python / Django

hey I'm trying to extract certain row from a CSV file with content in this form:
POS,Transaction id,Product,Quantity,Customer,Date
1,E100,TV,1,Test Customer,2022-09-19
2,E100,Laptop,3,Test Customer,2022-09-20
3,E200,TV,1,Test Customer,2022-09-21
4,E300,Smartphone,2,Test Customer,2022-09-22
5,E300,Laptop,5,New Customer,2022-09-23
6,E300,TV,1,New Customer,2022-09-23
7,E400,TV,2,ABC,2022-09-24
8,E500,Smartwatch,4,ABC,2022-09-25
the code I wrote is the following
def csv_upload_view(request):
print('file is being uploaded')
if request.method == 'POST':
csv_file = request.FILES.get('file')
obj = CSV.objects.create(file_name=csv_file)
with open(obj.file_name.path, 'r') as f:
reader = csv.reader(f)
reader.__next__()
for row in reader:
data = "".join(row)
data = data.split(";")
#data.pop()
print(data[0], type(data))
transaction_id = data[0]
product = data[1]
quantity = int(data[2])
customer = data[3]
date = parse_date(data[4])
In the console then I get the following output:
Quit the server with CONTROL-C.
[22/Sep/2022 15:16:28] "GET /reports/from-file/ HTTP/1.1" 200 11719
file is being uploaded
1E100TV1Test Customer2022-09-19 <class 'list'>
So that I get the correct row put everything concatenated. If instead I put in a space in the " ".join.row I get the entire row separated with empty spaces - what I would like to do is access this row with
transaction_id = data[0]
product = data[1]
quantity = int(data[2])
customer = data[3]
date = parse_date(data[4])
but I always get an
IndexError: list index out of range
I also tried with data.replace(" ",";") but this gives me another error and the data type becomes a string instead of a list:
ValueError: invalid literal for int() with base 10: 'E'
Can someone please show me what I'm missing here?
I'm not sure why you are joining/splitting the row up. And you realize your split is using a semicolon?
I would expect something like this:
import csv
from collections import namedtuple
Transaction = namedtuple('Transaction', ['id', 'product', 'qty', 'customer', 'date'])
f_name = 'data.csv'
transactions = [] # to hold the result
with open(f_name, 'r') as src:
src.readline() # burn the header row
reader = csv.reader(src) # if you want to use csv reader
for data in reader:
#print(data) <-- to see what the csv reader gives you...
t = Transaction(data[1], data[2], int(data[3]), data[4], data[5])
transactions.append(t)
for t in transactions:
print(t)
The above "catches" results with a namedtuple, which is obviously optional. You could put them in lists, etc.
Also csv.reader will do the splitting (by comma) by default. I edited my previous answer.
As far as your question goes... You mention extracting a "certain row" but you gave no indication how you would find such row. If you know the row index/number, you could burn lines with readline or such, or just keep a counter while you read. If you are looking for keyword in the data, just pop a conditional statement in either before or after splitting up the line.
This way you can split the rows (and find which row you want based on some provided value)
with open('data.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter = ',')
line_count = 0
for row in csv_reader:
# Line 0 is the header
if line_count == 0:
print(f'Column names are {", ".join(row)}')
line_count += 1
else:
line_count += 1
# Here you can check if the row value is equal what you're finding
# row[0] = POS
# row[1] = Transaction id
# row[2] = Product
# row[3] = Quantity
# row[4] = Customer
# row[5] = Date
if row[2] = "TV":
#If you want to add all variables into a single string:
data = ",".join(row)
# Make each row into a single variable:
transaction_id = row[0]
product = row[1]
quantity = row[2]
customer = row[3]
date = row[4]

Txt file to excel conversion in python

I'm trying to convert text file to excel sheet in python. The txt file contains data in the below specified formart
Column names: reg no, zip code, loc id, emp id, lastname, first name. Each record has one or more error numbers. Each record have their column names listed above the values. I would like to create an excel sheet containing reg no, firstname, lastname and errors listed in separate rows for each record.
How can I put the records in excel sheet ? Should I be using regular expressions ? And how can I insert error numbers in different rows for that corresponding record?
Expected output:
Here is the link to the input file:
https://github.com/trEaSRE124/Text_Excel_python/blob/master/new.txt
Any code snippets or suggestions are kindly appreciated.
Here is a draft code. Let me know if any changes needed:
# import pandas as pd
from collections import OrderedDict
from datetime import date
import csv
with open('in.txt') as f:
with open('out.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
#Remove inital clutter
while("INPUT DATA" not in f.readline()):
continue
header = ["REG NO", "ZIP CODE", "LOC ID", "EMP ID", "LASTNAME", "FIRSTNAME", "ERROR"]; data = list(); errors = list()
spamwriter.writerow(header)
print header
while(True):
line = f.readline()
errors = list()
if("END" in line):
exit()
try:
int(line.split()[0])
data = line.strip().split()
f.readline() # get rid of \n
line = f.readline()
while("ERROR" in line):
errors.append(line.strip())
line = f.readline()
spamwriter.writerow(data + errors)
spamwriter.flush()
except:
continue
# while(True):
# line = f.readline()
Use python-2 to run. The errors are appended as subsequent columns. It's slightly complicated the way you want it. I can fix it if still needed
Output looks like:
You can do this using the openpyxl library which is capable of depositing items directly into a spreadsheet. This code shows how to do that for your particular situation.
NEW_PERSON, ERROR_LINE = 1,2
def Line_items():
with open('katherine.txt') as katherine:
for line in katherine:
line = line.strip()
if not line:
continue
items = line.split()
if items[0].isnumeric():
yield NEW_PERSON, items
elif items[:2] == ['ERROR', 'NUM']:
yield ERROR_LINE, line
else:
continue
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws['A2'] = 'REG NO'
ws['B2'] = 'LASTNAME'
ws['C2'] = 'FIRSTNAME'
ws['D2'] = 'ERROR'
row = 2
for kind, data in Line_items():
if kind == NEW_PERSON:
row += 2
ws['A{:d}'.format(row)] = int(data[0])
ws['B{:d}'.format(row)] = data[-2]
ws['C{:d}'.format(row)] = data[-1]
first = True
else:
if first:
first = False
else:
row += 1
ws['D{:d}'.format(row)] = data
wb.save(filename='katherine.xlsx')
This is a screen snapshot of the result.

python writing program to iterate a csv file to match field and save the result in a different data file

I am trying to write a program to do the following :
specify a field from a record in a csv file called data.
specify a field from a record in a csv file called log.
compare the position of the two in the data and in the log. If they are on the same line proceed to write the record in the file called log in a new file called result.
If the field does not match the record position in the log file proceed to move to the next record in the log file and compare it until a matching record is found and then the record is saved in the file called result.
reset the index of the log file
go to the next line in the data file and proceed to do the verification until the data file reaches the end.
This is whay i was able to do but i am stuck
import csv
def main():
datafile_csv = open('data.txt')
logfile_csv = open('log.txt')
row_data = []
row_log = []
row_log_temp = []
index_data = 1
index_log = 1
index_log_temp = index_log
counter = 0
data = ''
datareader = ''
logreader = ''
log = ''
# row = 0
logfile_len = sum (1 for lines in open('log.txt'))
with open('resultfile.csv','w') as csvfile:
out_write = csv.writer(csvfile, delimiter=',',quotechar='"')
with open('data.txt','r') as (data):
row_data = csv.reader(csvfile, delimiter=',', quotechar='"')
row_data = next(data)
print(row_data)
with open ('log.txt','r') as (log):
row_log = next(log)
print(row_log)
while counter != logfile_len:
comp_data = row_data[index_data:]
comp_log = row_log[index_log:]
comp_data = comp_data.strip('"')
comp_log = comp_log.strip('"')
print(row_data[1])
print(comp_data)
print(comp_log)
if comp_data != comp_log:
while comp_data != comp_log:
row_log = next(log)
comp_log = row_log[index_log]
out_write.writerow(row_log)
row_data = next(data)
else :
out_write.writerow(row_log)
row_data = next(data)
log.seek(0)
counter +=1
The problem i have are the following :
I cannot convert the data line in a string properly and i cannot compare correctly.
Also i need to be able to reset the pointer in the log file but seek does not seem to be working....
This is the content of the data file
"test1","test2","test3"
"1","2","3"
"4","5","6"
This is the content of the log file
"test1","test2","test3"
"4","5","6"
"1","2","3"
This is what the compiler return me
t
"test1","test2","test3"
t
test1","test2","test3"
test1","test2","test3"
1
1","2","3"
test1","test2","test3"
Traceback (most recent call last):
File "H:/test.py", line 100, in <module>
main()
File "H:/test.py", line 40, in main
comp_log = row_log[index_log]
IndexError: string index out of range
Thank you very much for the help
Regards
Danilo
Joining two files by columns (rowcount and a Specific Column[not defined]), and returning the results limited to the columns of the left/first file.
import petl
log = petl.fromcsv('log.txt').addrownumbers() # Load csv/txt file into PETL table, and add row numbers
log_columns = len(petl.header(log)) # Get the amount of columns in the log file
data = petl.fromcsv('data.txt').addrownumbers() # Load csv/txt file into PETL table, and add row numbers
joined_files = petl.join(log, data, key=['row', 'SpecificField']) # Join the tables using row and a specific field
joined_files = petl.cut(joined_files, *range(1, log_columns)) # Remove the extra columns obtained from right table
petl.tocsv(joined_files, 'resultfile.csv') # Output results to csv file
log.txt
data.txt
resultfile.csv
Also Do not forget to pip install (version used for this example):
pip install petl==1.0.11

Find duplicates in a column, then add values in adjacent column

I have a csv file that has a one word title and a description that is always a number.
My current code extracts just the title an description to another csv file and then converts the csv into an excel file.
import csv
import output
f = open("Johnny_Test-punch_list.csv")
csv_f = csv.reader(f)
m = open('data.csv', "w")
for row in csv_f:
m.write(row[1])
m.write(",")
m.write(row[3])
m.write("\n")
m.close()
output.toxlsx()
How can I look for matching Titles and then add the descriptions of the titles?
import csv
import output
f = open("Johnny_Test-punch_list.csv")
csv_f = csv.reader(f)
m = open('data.csv', "w")
dict_out = {}
for row in csv_f:
if row[1] in dict_out:
dict_out[row[1]] += row[3]
else:
dict_out[row[1]] = row[3]
for title, value in dict_out.iteritems():
m.write('{},{}\n'.format(title, value))
If I understood you correctly, you need to write in a single line as a string.
can you try with below code:
for row in csv_f:
m.write(row[1] + "," + str(row[3]) + "\n")

Reading comma separated values from text file in python

I have a text file consisting of 100 records like
fname,lname,subj1,marks1,subj2,marks2,subj3,marks3.
I need to extract and print lname and marks1+marks2+marks3 in python. How do I do that?
I am a beginner in python.
Please help
When I used split, i got an error saying
TypeError: Can't convert 'type' object to str implicitly.
The code was
import sys
file_name = sys.argv[1]
file = open(file_name, 'r')
for line in file:
fname = str.split(str=",", num=line.count(str))
print fname
If you want to do it that way, you were close. Is this what you were trying?
file = open(file_name, 'r')
for line in file.readlines():
fname = line.rstrip().split(',') #using rstrip to remove the \n
print fname
Note: its not a tested code. but it tries to solve your problem. Please give it a try
import csv
with open(file_name, 'rb') as csvfile:
marksReader = csv.reader(csvfile)
for row in marksReader:
if len(row) < 8: # 8 is the number of columns in your file.
# row has some missing columns or empty
continue
# Unpack columns of row; you can also do like fname = row[0] and lname = row[1] and so on ...
(fname,lname,subj1,marks1,subj2,marks2,subj3,marks3) = *row
# you can use float in place of int if marks contains decimals
totalMarks = int(marks1) + int(marks2) + int(marks3)
print '%s %s scored: %s'%(fname, lname, totalMarks)
print 'End.'
"""
sample file content
poohpool#signet.com; meixin_kok#hotmail.com; ngai_nicole#hotmail.com; isabelle_gal#hotmail.com; michelle-878#hotmail.com;
valerietan98#gmail.com; remuskan#hotmail.com; genevieve.goh#hotmail.com; poonzheng5798#yahoo.com; burgergirl96#hotmail.com;
insyirah_powergals#hotmail.com; little_princess-angel#hotmail.com; ifah_duff#hotmail.com; tweety_butt#hotmail.com;
choco_ela#hotmail.com; princessdyanah#hotmail.com;
"""
import pandas as pd
file = open('emaildump.txt', 'r')
for line in file.readlines():
fname = line.split(';') #using split to form a list
#print(fname)
df1 = pd.DataFrame(fname,columns=['Email'])
print(df1)

Categories

Resources