Python 2: IndexError: list index out of range - python

I have an "asin.txt" document:
in,Huawei1,DE
out,Huawei2,UK
out,Huawei3,none
in,Huawei4,FR
in,Huawei5,none
in,Huawei6,none
out,Huawei7,IT
I'm opening this file and make an OrderedDict:
from collections import OrderedDict
reader = csv.reader(open('asin.txt','r'),delimiter=',')
reader1 = csv.reader(open('asin.txt','r'),delimiter=',')
d = OrderedDict((row[0], row[1].strip()) for row in reader)
d1 = OrderedDict((row[1], row[2].strip()) for row in reader1)
Then I want to create variables (a,b,c,d) so if we take the first line of the asin.txt it should be like: a = in; b = Huawei1; c = Huawei1; d = DE. To do this I'm using a "for" loop:
from itertools import izip
for (a, b), (c, d) in izip(d.items(), d1.items()): # here
try:
.......
It worked before, but now, for some reason, it prints an error:
d = OrderedDict((row[0], row[1].strip()) for row in reader)
IndexError: list index out of range
How do I fix that?

Probably you have a row in your textfile which does not have at least two fields delimited by ",". E.g.:
in,Huawei1
Try to find the solution along these lines:
d = OrderedDict((row[0], row[1].strip()) for row in reader if len(row) >= 2)
or
l = []
for row in reader:
if len(row) >= 2:
l.append(row[0], row[1].strip())
d = OrderedDict(l)

Related

Finding Column number in CSV using python

I want to match a certain string in a CSV file and return the column of the string within the CSV file for example
import csv
data = ['a','b','c'],['d','e','f'],['h','i','j']
for example I'm looking for the word e, I want it to return [1] as it is in the second column.
The solution using csv.reader object and enumerate function(to get key/value sequence):
def get_column(file, word):
with open(file) as csvfile:
reader = csv.reader(csvfile)
for row in reader:
for k,v in enumerate(row):
if v == word:
return k # immediate value return to avoid further loop iteration
search_word = 'e'
print(get_column("data/sample.csv", search_word)) # "data/sample.csv" is an exemplary file path
The output:
1
I am not sure why do you need csv in this example.
>>> data = ['a','b','c'],['d','e','f'],['h','i','j']
>>>
>>>
>>> string = 'e'
>>> for idx, lst in enumerate(data):
... if string in lst:
... print idx
1
A variation of wolendranh's answer:
>>> data = ['a','b','c'],['d','e','f'],['h','i','j']
>>> word = 'e'
>>> for row in data:
... try:
... print(row.index(word))
... except ValueError:
... continue
Try the following:
>>> data_list = [['a','b','c'],['d','e','f'],['h','i','j']]
>>> col2_list = []
>>>
>>> for d in data_list:
... col2=d[1]
... col2_list.append(col2)
So in the end you get a list with all the values of column [1]:
col2_list = ["b","e","i"]

Finding duplicates in each row and column

The function needs to be able to check a file for duplicates in each row and column.
Example of file with duplicates:
A B C
A A B
B C A
As you can see, there is a duplicate in row 2 with 2 A's but also in Column 1 with two A's.
code:
def duplication_char(dc):
with open (dc,"r") as duplicatechars:
linecheck = duplicatechar.readlines()
linecheck = [line.split() for line in linecheck]
for row in linecheck:
if len(set(row)) != len(row):
print ("duplicates", " ".join(row))
for column in zip(*checkLine):
if len(set(column)) != len(column):
print ("duplicates"," ".join(column))
Well, here is how I would do it.
First, read your files and create a 2d numpy array with the content:
import numpy
with open('test.txt', 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
arr = numpy.array(lines)
Then, check if each row has duplicates using sets (a set has no duplicates, so if the length of the set is different than the length of the array, the array has duplicates):
for row in arr:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
Then, check if each column has duplicates using sets, by transposing your numpy array:
for col in arr.T:
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
If you wrap all of this in a function:
def check_for_duplicates(filename):
import numpy
with open(filename, 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
arr = numpy.array(lines)
for row in arr:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
for col in arr.T:
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
As suggested by Apero, you can also do this without numpy using zip (https://docs.python.org/3/library/functions.html#zip):
def check_for_duplicates(filename):
with open(filename, 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
for row in lines:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
for col in zip(*lines):
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
In your example, this code prints:
# Duplicates in row: ['A' 'A' 'B']
# Duplicates in column: ['A' 'A' 'B']
You can have a List of Lists and use zip to transpose it.
Given your example, try:
from collections import Counter
with open(fn) as fin:
data=[line.split() for line in fin]
rowdups={}
coldups={}
for d, m in ((rowdups, data), (coldups, zip(*data))):
for i, sl in enumerate(m):
count=Counter(sl)
for c in count.most_common():
if c[1]>1:
d.setdefault(i, []).append(c)
>>> rowdups
{1: [('A', 2)]}
>>> coldups
{0: [('A', 2)]}

How to organise columnar data into a network table

I have a tab delimited file text file with two columns. I need to find a way that prints all values that “hit” each other to one line.
For example, my input looks like this:
A B
A C
A D
B C
B D
C D
B E
D E
B F
C F
F G
F H
H I
K L
My desired output should look like this:
A B C D
B D E
B C F
F G H
H I
K L
My actual data file is much larger than this if that makes any difference. I would prefer to do this in Unix or Python where possible.
Can anybody help?
Thanks in advance!
There's no way to put input file as .csv? It would be easier to parse delimiters.
If it wouldn't be posible, try next example:
from itertools import groupby
from operator import itemgetter
with open('example.txt','rb') as txtfile:
cleaned = []
#store file information in a list of lists
for line in txtfile.readlines():
cleaned.append(line.split())
#group py first element of nested list
for elt, items in groupby(cleaned, itemgetter(0)):
row = [elt]
for item in items:
row.append(item[1])
print row
Hope it helps you.
Solution using a .csv file:
from itertools import groupby
from operator import itemgetter
import csv
with open('example.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
for row in reader:
cleaned.append(row) #group py first element of nested list
for elt, items in groupby(cleaned, itemgetter(0)):
row = [elt]
for item in items:
row.append(item[1])
print row

Adding column in CSV python and enumerating it

my CSV looks like
John,Bomb,Dawn
3,4,5
3,4,5
3,4,5
I want to add ID column in front like so:
ID,John,Bomb,Dawn
1,3,4,5
2,3,4,5
3,3,4,5
using enumerate function, but I don't know how. Here's my code so far:
import csv
with open("testi.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.append('ID')
all.append(row)
count = 0
for row in reader:
count += 1
while count:
all.append(row)
row.append(enumerate(reader, 1))
break
writer.writerows(all)
And the output comes all wrong:
John,Bomb,Dawn,ID
3,4,5,<enumerate object at 0x7fb2a5728d70>
3,4,5,<enumerate object at 0x1764370>
3,4,5,<enumerate object at 0x17643c0>
So the ID comes in the end, when it should be in the start, and it doesn't even do the 1,2,3. Some weird error comes out.
I can suggest the code below to solve your question:
import csv
with open("testi.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.insert(0, 'ID')
all.append(row)
for k, row in enumerate(reader):
all.append([str(k+1)] + row)
writer.writerows(all)
More compact code can be:
all = [['ID'] + next(reader)] + [[str(k+1)] + row for k, row in enumerate(reader)]
UPDATE (some explanation):
Your have wrong enumerate function understanding. enumerate should be used in for loop and when you iterate over enumerate function result you get the sequence of the tuples where first item is ordered number of item from list and the second is item itself.
But enumerate function return is object (docs) so when you try to convert it to string it call __repr__ magic method and cast enumerate object to <enumerate object at ...>.
Another words, enumerate helps to avoid additional counters in loops such as your count += 1 variable.
Also you have a very strange code here:
while count:
all.append(row)
row.append(enumerate(reader, 1))
break
this part of code never can't be performed more than one time.
You should use insert() instead of append(). This will allow you to specify the index where you want to add the element.
Try this
import csv
with open("testi.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.insert(0, 'ID')
all.append(row)
count = 0
for row in reader:
count += 1
row.insert(0, count)
all.append(row)
writer.writerows(all)
You can do something like this:
import csv
with open('testi.csv') as inp, open('temp.csv', 'w') as out:
reader = csv.reader(inp)
writer = csv.writer(out, delimiter=',')
#No need to use `insert(), `append()` simply use `+` to concatenate two lists.
writer.writerow(['ID'] + next(reader))
#Iterate over enumerate object of reader and pass the starting index as 1.
writer.writerows([i] + row for i, row in enumerate(reader, 1))
enumerate() returns an enumerate object, that yield the index and item in a tuple one at a time, so you need to iterate over the enumerate object instead of writing it to the csv file.
>>> lst = ['a', 'b', 'c']
>>> e = enumerate(lst)
>>> e
<enumerate object at 0x1d48f50>
>>> for ind, item in e:
... print ind, item
...
0 a
1 b
2 c
Output:
>>> !cat temp.csv
ID,John,Bomb,Dawn
1,3,4,5
2,3,4,5
3,3,4,5

output items in a list using curly braces

I have a text file with 'n' lines. I want to extract first word, second word, third word, ... of each line into a list1, list2, list3,...
Suppose input txt file contains:
a1#a2#a3
b1#b2#b3#b4
c1#c2
After reading the file, Output should be:
List1: {a1,b1,c1}
List2: {a2,b2,c2}
List3: {a3,b3}
List4: {b4}
The code:
f = open('path','r')
for line in f:
List=line.split('#')
List1 = List[0]
print '{0},'.format(List1),
List2 = List[1]
print '{0},'.format(List2),
List3 = List[2]
print '{0},'.format(List3),
List4 = List[3]
print '{0},'.format(List4),
OUTPUT
a1,b1,c1,a2,b2,c2,a3,b3,b4
You really don't want to use separate lists here; just use a list of lists. Using the csv module here would make handling splitting a little easier:
import csv
columns = [[] for _ in range(4)] # 4 columns expected
with open('path', rb) as f:
reader = csv.reader(f, delimiter='#')
for row in reader:
for i, col in enumerate(row):
columns[i].append(col)
or, if the number of columns needs to grow dynamically:
import csv
columns = []
with open('path', rb) as f:
reader = csv.reader(f, delimiter='#')
for row in reader:
while len(row) > len(columns):
columns.append([])
for i, col in enumerate(row):
columns[i].append(col)
Or you can use itertools.izip_longest() to transpose the CSV rows:
import csv
from itertools import izip_longest
with open('path', rb) as f:
reader = csv.reader(f, delimiter='#')
columns = [filter(None, column) for column in izip_longest(*reader)]
In the end, you can then print your columns with:
for i, col in enumerate(columns, 1):
print 'List{}: {{{}}}'.format(i, ','.join(col))
Demo:
>>> import csv
>>> from itertools import izip_longest
>>> data = '''\
... a1#a2#a3
... b1#b2#b3#b4
... c1#c2
... '''.splitlines(True)
>>> reader = csv.reader(data, delimiter='#')
>>> columns = [filter(None, column) for column in izip_longest(*reader)]
>>> for i, col in enumerate(columns, 1):
... print 'List{}: {{{}}}'.format(i, ','.join(col))
...
List1: {a1,b1,c1}
List2: {a2,b2,c2}
List3: {a3,b3}
List4: {b4}

Categories

Resources