Adding column in CSV python and enumerating it - python

my CSV looks like
John,Bomb,Dawn
3,4,5
3,4,5
3,4,5
I want to add ID column in front like so:
ID,John,Bomb,Dawn
1,3,4,5
2,3,4,5
3,3,4,5
using enumerate function, but I don't know how. Here's my code so far:
import csv
with open("testi.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.append('ID')
all.append(row)
count = 0
for row in reader:
count += 1
while count:
all.append(row)
row.append(enumerate(reader, 1))
break
writer.writerows(all)
And the output comes all wrong:
John,Bomb,Dawn,ID
3,4,5,<enumerate object at 0x7fb2a5728d70>
3,4,5,<enumerate object at 0x1764370>
3,4,5,<enumerate object at 0x17643c0>
So the ID comes in the end, when it should be in the start, and it doesn't even do the 1,2,3. Some weird error comes out.

I can suggest the code below to solve your question:
import csv
with open("testi.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.insert(0, 'ID')
all.append(row)
for k, row in enumerate(reader):
all.append([str(k+1)] + row)
writer.writerows(all)
More compact code can be:
all = [['ID'] + next(reader)] + [[str(k+1)] + row for k, row in enumerate(reader)]
UPDATE (some explanation):
Your have wrong enumerate function understanding. enumerate should be used in for loop and when you iterate over enumerate function result you get the sequence of the tuples where first item is ordered number of item from list and the second is item itself.
But enumerate function return is object (docs) so when you try to convert it to string it call __repr__ magic method and cast enumerate object to <enumerate object at ...>.
Another words, enumerate helps to avoid additional counters in loops such as your count += 1 variable.
Also you have a very strange code here:
while count:
all.append(row)
row.append(enumerate(reader, 1))
break
this part of code never can't be performed more than one time.

You should use insert() instead of append(). This will allow you to specify the index where you want to add the element.
Try this
import csv
with open("testi.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.insert(0, 'ID')
all.append(row)
count = 0
for row in reader:
count += 1
row.insert(0, count)
all.append(row)
writer.writerows(all)

You can do something like this:
import csv
with open('testi.csv') as inp, open('temp.csv', 'w') as out:
reader = csv.reader(inp)
writer = csv.writer(out, delimiter=',')
#No need to use `insert(), `append()` simply use `+` to concatenate two lists.
writer.writerow(['ID'] + next(reader))
#Iterate over enumerate object of reader and pass the starting index as 1.
writer.writerows([i] + row for i, row in enumerate(reader, 1))
enumerate() returns an enumerate object, that yield the index and item in a tuple one at a time, so you need to iterate over the enumerate object instead of writing it to the csv file.
>>> lst = ['a', 'b', 'c']
>>> e = enumerate(lst)
>>> e
<enumerate object at 0x1d48f50>
>>> for ind, item in e:
... print ind, item
...
0 a
1 b
2 c
Output:
>>> !cat temp.csv
ID,John,Bomb,Dawn
1,3,4,5
2,3,4,5
3,3,4,5

Related

Making a list from list of lists

I have the below list of lists
[['Afghanistan,2.66171813,7.460143566,0.490880072,52.33952713,0.427010864,-0.106340349,0.261178523'], ['Albania,4.639548302,9.373718262,0.637698293,69.05165863,0.74961102,-0.035140377,0.457737535']]
I want to create a new list with only the country names.
So
[Afghanistan, Albania]
Currently using this code.
with open(fileName, "r") as f:
_= next(f)
row_lst = f.read().split()
countryLst = [[i] for i in row_lst]
Try this, using split(',') as your first element in list of list is string separated by comma.
>>> lst = [['Afghanistan,2.66171813,7.460143566,0.490880072,52.33952713,0.427010864,-0.106340349,0.261178523'], ['Albania,4.639548302,9.373718262,0.637698293,69.05165863,0.74961102,-0.035140377,0.457737535']]
Output:
>>> [el[0].split(',')[0] for el in lst]
['Afghanistan', 'Albania']
Explanation:
# el[0] gives the first element in you list which a string.
# .split(',') returns a list of elements after spliting by `,`
# [0] finally selecting your first element as required.
Edit-1:
Using regex,
pattern = r'([a-zA-Z]+)'
new_lst = []
for el in lst:
new_lst+=re.findall(pattern, el[0])
>>> new_lst # output
['Afghanistan', 'Albania']
Looks like a CSV file. Use the csv module
Ex:
import csv
with open(fileName, "r") as f:
reader = csv.reader(f)
next(reader) #Skip header
country = [row[0] for row in reader]

Cannot remove duplicates from a list using Python

I have a csv file which I want to edit so I read the file and copy the contents in a list. The list contains duplicates. So I do:
csv_in = list(set(csv_in))
But I get:
Unhashable list Error
with open(source_initial2, 'r', encoding='ISO-8859-1') as file_in, open(source_initial3, 'w', encoding='ISO-8859-1',newline='') as file_out:
csv_in = csv.reader(file_in, delimiter=',')
csv_out = csv.writer(file_out, delimiter=';')
csv_in = list(set(csv_in))
for row in csv_in:
for i in range(len(row)):
if "/" in row[i]:
row[i] = row[i].replace('/', '')
if "\"" in row[i]:
row[i] = row[i].replace('\"', '')
if "Yes" in row[i]:
row[i] = row[i].replace('Yes', '1')
if "No" in row[i]:
row[i] = row[i].replace('No', '0')
if myrowlen > 5:
break
print(row)
csv_out.writerow(row)
The list is something like
[['DCA.P/C.05820', '5707119001793', 'P/C STEELSERIES SUR... QcK MINI', '5,4', 'Yes'],['DCA.P/C.05820', '5707119001793', 'P/C STEELSERIES SUR... QcK MINI', '5,4', 'Yes'].....['DCA.P/C.05820', '5707119001793', 'P/C STEELSERIES SUR... QcK MINI', '5,4', 'Yes']]
Why I get this, how can I solve it?
thank you
The problem you have is that csv_in is a list of lists and list is not hashable datatype. In order to get around the issue you can do the following:
csv_in = list(set([tuple(row) for row in csv_in]))
or if you need it as a list of lists:
csv_in = [list(element) for element in set([tuple(row) for row in csv_in])]
csv.reader contains rows where each row read from the csv file is returned as a list of strings.
While set object requires its items to be an immutable data type (thereby hashable), list type is not one of those.
test_reader = [[0,1,2], [3,4,5]]
print(set(test_reader)) # throws TypeError: unhashable type: 'list'
# after casting to tuple type
test_reader = [(0,1,2), (3,4,5)]
print(set(test_reader)) # {(0, 1, 2), (3, 4, 5)}

How can I customize map() for a list of strings in Python?

How do I tell map() to selectively convert only some of the strings (not all the strings) within a list to integer values?
Input file (tab-delimited):
abc1 34 56
abc1 78 90
My attempt:
import csv
with open('file.txt') as f:
start = csv.reader(f, delimiter='\t')
for row in start:
X = map(int, row)
print X
Error message: ValueError: invalid literal for int() with base 10: 'abc1'
When I read in the file with the csv module, it is a list of strings:
['abc1', '34', '56']
['abc1', '78', '90']
map() obviously does not like 'abc1'even though it is a string just like '34' is a string.
I thoroughly examined Convert string to integer using map() but it did not help me deal with the first column of my input file.
def safeint(val):
try:
return int(val)
except ValueError:
return val
for row in start:
X = map(safeint, row)
print X
is one way to do it ... you can step it up even more
from functools import partial
myMapper = partial(map,safeint)
map(myMapper,start)
Map only the part of the list that interests you:
row[1:] = map(int, row[1:])
print row
Here, row[1:] is a slice of the list that starts at the second element (the one with index 1) up to the end of the list.
I like Roberto Bonvallet's answer, but if you want to do things immutably, as you're doing in your question, you can:
import csv
with open('file.txt') as f:
start = csv.reader(f, delimiter='\t')
for row in start:
X = [row[0]] + map(int, row[1:])
print X
… or…
numeric_cols = (1, 2)
X = [int(value) if col in numeric_cols else value
for col, value in enumerate(row])
… or, probably most readably, wrap that up in a map_partial function, so you can do this:
X = map_partial(int, (1, 2), row)
You could implement it as:
def map_partial(func, indices, iterable):
return [func(value) if i in indices else value
for i, value in enumerate(iterable)]
If you want to be able to access all of the rows after you're done, you can't just print each one, you have to store it in some kind of structure. What structure you want depends on how you want to refer to these rows later.
For example, maybe you just want a list of rows:
rows = []
with open('file.txt') as f:
for row in csv.reader(f, delimiter='\t'):
rows.append(map_partial(int, (1, 2), row))
print('The second column of the first row is {}'.format(rows[0][1]))
Or maybe you want to be able to look them up by the string ID in the first column, rather than by index. Since those IDs aren't unique, each ID will map to a list of rows:
rows = {}
with open('file.txt') as f:
for row in csv.reader(f, delimiter='\t'):
rows.setdefault(row[0], []).append(map_partial(int, (1, 2), row))
print('The second column of the first abc1 row is {}'.format(rows['abc1'][0][1]))

output items in a list using curly braces

I have a text file with 'n' lines. I want to extract first word, second word, third word, ... of each line into a list1, list2, list3,...
Suppose input txt file contains:
a1#a2#a3
b1#b2#b3#b4
c1#c2
After reading the file, Output should be:
List1: {a1,b1,c1}
List2: {a2,b2,c2}
List3: {a3,b3}
List4: {b4}
The code:
f = open('path','r')
for line in f:
List=line.split('#')
List1 = List[0]
print '{0},'.format(List1),
List2 = List[1]
print '{0},'.format(List2),
List3 = List[2]
print '{0},'.format(List3),
List4 = List[3]
print '{0},'.format(List4),
OUTPUT
a1,b1,c1,a2,b2,c2,a3,b3,b4
You really don't want to use separate lists here; just use a list of lists. Using the csv module here would make handling splitting a little easier:
import csv
columns = [[] for _ in range(4)] # 4 columns expected
with open('path', rb) as f:
reader = csv.reader(f, delimiter='#')
for row in reader:
for i, col in enumerate(row):
columns[i].append(col)
or, if the number of columns needs to grow dynamically:
import csv
columns = []
with open('path', rb) as f:
reader = csv.reader(f, delimiter='#')
for row in reader:
while len(row) > len(columns):
columns.append([])
for i, col in enumerate(row):
columns[i].append(col)
Or you can use itertools.izip_longest() to transpose the CSV rows:
import csv
from itertools import izip_longest
with open('path', rb) as f:
reader = csv.reader(f, delimiter='#')
columns = [filter(None, column) for column in izip_longest(*reader)]
In the end, you can then print your columns with:
for i, col in enumerate(columns, 1):
print 'List{}: {{{}}}'.format(i, ','.join(col))
Demo:
>>> import csv
>>> from itertools import izip_longest
>>> data = '''\
... a1#a2#a3
... b1#b2#b3#b4
... c1#c2
... '''.splitlines(True)
>>> reader = csv.reader(data, delimiter='#')
>>> columns = [filter(None, column) for column in izip_longest(*reader)]
>>> for i, col in enumerate(columns, 1):
... print 'List{}: {{{}}}'.format(i, ','.join(col))
...
List1: {a1,b1,c1}
List2: {a2,b2,c2}
List3: {a3,b3}
List4: {b4}

Python 2: IndexError: list index out of range

I have an "asin.txt" document:
in,Huawei1,DE
out,Huawei2,UK
out,Huawei3,none
in,Huawei4,FR
in,Huawei5,none
in,Huawei6,none
out,Huawei7,IT
I'm opening this file and make an OrderedDict:
from collections import OrderedDict
reader = csv.reader(open('asin.txt','r'),delimiter=',')
reader1 = csv.reader(open('asin.txt','r'),delimiter=',')
d = OrderedDict((row[0], row[1].strip()) for row in reader)
d1 = OrderedDict((row[1], row[2].strip()) for row in reader1)
Then I want to create variables (a,b,c,d) so if we take the first line of the asin.txt it should be like: a = in; b = Huawei1; c = Huawei1; d = DE. To do this I'm using a "for" loop:
from itertools import izip
for (a, b), (c, d) in izip(d.items(), d1.items()): # here
try:
.......
It worked before, but now, for some reason, it prints an error:
d = OrderedDict((row[0], row[1].strip()) for row in reader)
IndexError: list index out of range
How do I fix that?
Probably you have a row in your textfile which does not have at least two fields delimited by ",". E.g.:
in,Huawei1
Try to find the solution along these lines:
d = OrderedDict((row[0], row[1].strip()) for row in reader if len(row) >= 2)
or
l = []
for row in reader:
if len(row) >= 2:
l.append(row[0], row[1].strip())
d = OrderedDict(l)

Categories

Resources