Remove new line \n reading from CSV - python

I have a CSV file which looks like:
Name1,1,2,3
Name2,1,2,3
Name3,1,2,3
I need to read it into a 2D list line by line. The code I have written almost does the job; however, I am having problems removing the new line characters '\n' at the end of the third index.
score=[]
for eachLine in file:
student = eachLine.split(',')
score.append(student)
print(score)
The output currently looks like:
[['name1', '1', '2', '3\n'], ['name2', '1', '2', '3\n'],
I need it to look like:
[['name1', '1', '2', '3'], ['name2', '1', '2', '3'],

simply call str.strip on each line as you process them:
score=[]
for eachLine in file:
student = eachLine.strip().split(',')
score.append(student)
print(score)

You can use splitlines
First method
>>> s = '''Name1,1,2,3
... Name2,1,2,3
... Name3,1,2,3'''
>>> [ item.split(',') for item in s.splitlines() ]
[['Name1', '1', '2', '3'], ['Name2', '1', '2', '3'], ['Name3', '1', '2', '3']]
Second method
>>> l = []
>>> for item in s.splitlines():
... l.append(item.split(','))
...
>>> l
[['Name1', '1', '2', '3'], ['Name2', '1', '2', '3'], ['Name3', '1', '2', '3']]

If you know it's a \n, and a \n only,
score=[]
for eachLine in file:
student = eachLine[:-1].split(',')
score.append(student)
print(score)
Uses slicing to remove the trailing new line characters before the split happens.
EDITED, per the suggestions of the commentors ;) Much more neat.

Use the rstrip function to identify \n at the end of every line and remove it.
See the code below for reference.
with open('myfile.csv', 'wb') as file:
for line in file:
line.rstrip('\n')
file.write(line)

Related

How to check if a item is the first item in a list

I am trying to read a dat file and extract certain information from the dat file.
My code looks like this:
datContent = [i.strip().split() for i in open("data.dat").readlines()]
positions = []
myItem = 'ST'
# write it as a new CSV file
for list in datContent:
if myItem in list:
positions.append(list)
I would like to check whether an item is the first item in the list and i want the two list below that. How do I do that?
if you want the second next list after a list has the first item myItem you can use:
[s for f, s in zip(datContent, datContent[2:]) if f[0] == myItem]
example:
datContent = [['ST', '1', '2', '3'], ['1', '5', '3'],['2', '6', '3'],['ST', '2', '4'], ['ST', '2', '2'],['2', '6', '3']]
myItem = 'ST'
[s for f, s in zip(datContent, datContent[2:]) if f[0] == myItem]
output:
[['2', '6', '3'], ['2', '6', '3']]
you can have a look over zip built-in function

How to create lists in list where every future list is separate by spaces in list

User provides input with spaces:
row = list(input())
print(row)
['1','2','3',' ','4','5','6',' ','7','8','9',' ']
So I need to create 'row' list into the below. The list is divided into sub-lists based on whitespace:
[['1','2','3'],['4','5','6'],['7','8','9']]
You can use str.split to split by whitespace:
myinput = '123 456 789'
row = list(map(list, myinput.split()))
print(row)
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
Alternatively, using a list comprehension:
row = [list(i) for i in myinput.split()]
You can usestr.split to split the input on spaces to give a list of sub-strings.
E.g. '123 456 789' would become ['123', '456', '789'].
Then use a list-comprehension to convert these strings into lists of characters with the list() constructor (as you are already familiar with).
Making the final code:
row = [list(s) for s in input().split()]
#[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
Starting with your list rather than the string, you can do that using itetools.groupby:
from itertools import groupby
row = ['1','2','3',' ','4','5','6',' ','7','8','9',' ']
out = [list(group) for key, group in groupby(row, lambda x: x != ' ') if key]
print(out)
# [['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
We group the values depending on whether or not they are spaces, and only keep the groups that are not made of spaces.
Try this:
abc=['1','2','3',' ','4','5','6',' ','7','8','9',' ']
newList=list()
temp=list()
for i in abc:
if(i==' '):
newList.append(temp)
temp=list()
else:
temp.append(i)
print(newList)
Output:
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]

Getting data from a list on a specific line in a file (python)

I've got a very large file that has a format like this:
[['1', '2', '3', '4']['11', '12', '13', '14']]
[['5', '6', '7', '8']['55', '66', '77', '88']]
(numbers indicate line number)
The lists on each line are very long, unlike this example.
Now if it was only 1 list I could for example obtain the '11' value with:
itemdatatxt = open("tempoutput", "r")
itemdata = eval(itemdatatxt.read())
print itemdata[1][0]
However because the file contains a new list on each line I cannot see how I can for example obtain the '55' value.
I thought itemdatatxt.readline(1) would select the second line of the file but after reading about the .readline I understand that this would result in the 2nd symbol on the first line.
Can anyone explain to me how to do this? (preferably I wouldn't want to change the 'tempoutput' datafile format)
Try this:
import ast
with open("tempoutput", "r") as f:
for i, line in enumerate(f):
if i == 1:
itemdata = ast.literal_eval(line)
print itemdata[1][0]
break
enumerate(f) returns:
0, <<first line>>
1, <<second line>>
...
So when i becomes 1, we've reached second line and we output 55. We also break the loop since we don't care about reading the rest of the lines.
I used ast.literal_eval because it's a safer form of eval.
You can add the whole file to a dictionary where the key is the line number and the value is the content (the two lists). This way you can easily get any value you want by selecting first the line number, then the list and then the index.
data.txt
[['1', '2', '3', '4'], ['11', '12', '13', '14']]
[['5', '6', '7', '8'], ['55', '66', '77', '88']]
[['5', '6', '3', '8'], ['155', '66', '277', '88']]
code
import ast
data = {}
with open('data.txt', 'r') as f:
for indx, ln in enumerate(f):
data[indx] = ast.literal_eval(ln.strip())
print data[1][1][0] #55
print data[1][1][3] #88
readline() reads until the next line break. If you call it a second time it will read from where it stopped to the linebreak after that. Thus, you could have a loop:
lines = []
with open('filepath', 'r') as f:
lines.append(eval(f.readline()))
print lines # [[['1', '2', '3', '4'],['11', '12', '13', '14']],
# [['5', '6', '7', '8'],['55', '66', '77', '88']]]
Or you could read the entire file and split by linebreak:
lines = open('filepath', 'r').read().split('\n');
Alternatively if you want to read a specific line you can use the linecache module:
import linecache
line = linecache.getline('filepath', 2) # 2 is the second line of the file

Membership test results different with list and csv.reader

I get different results if I check for membership with a list than with a csv.reader object.
The below uses the unittest module.
csv.reader test for membership
with open("file.tab", 'rb') as f:
reader = csv.reader(f, delimiter='\t')
self.assertTrue(['1', '2', '3', '4'] in reader)
self.assertTrue(['2', '3', '4', '5'] in reader)
self.assertTrue(['3', '4', '5', '6'] in reader)
list test for membership
with open("file.tab", 'rb') as f:
reader = csv.reader(f, delimiter='\t')
reader = [record for record in reader]
self.assertTrue(['1', '2', '3', '4'] in reader)
self.assertTrue(['2', '3', '4', '5'] in reader)
self.assertTrue(['3', '4', '5', '6'] in reader)
I know that file.tab contains entries for the three records I'm testing for, but the third assert comes up "False is not true" when using csv.reader and passes when using a list.
csv.reader is a generator; the docs don't explicitly say, but since I can exhaust it I think that means it's a generator. My thinking was this might be the reason, but the following test prints nothing but true:
x = xrange(5)
for m in range(5):
for n in range(5):
print m in x
print n in x
Which makes me think that there are no problems testing for membership with a generator.
Why does the third assert statement evaluate differently when I use a csv.reader than when I use a list?
You had some bad luck there-- xrange isn't actually a generator, but a special type of its own which behaves lazily, and so can fool you into thinking it's one.
>>> x = xrange(10)
>>> 5 in x
True
>>> 5 in x
True
but
>>> it = iter(range(10))
>>> 5 in it
True
>>> 5 in it
False
So your logic was right: the reader instance can be exhausted, but the list can't, which is why membership tests can return different answers, depending on the contents. Note though that membership tests may short-circuit, and so they don't have to exhaust in case of a positive result:
>>> it = iter(range(10))
>>> 3 in it
True
>>> next(it)
4
Yes, csv.reader is a generator and in iterates while it finds the value. As DSM demonstrated.
In your CSV file the order of the rows is different than in your tests. Your tests will pass if you change the order:
>>> def fake_reader():
... yield ['1', '2', '3', '4']
... yield ['2', '3', '4', '5']
... yield ['3', '4', '5', '6']
>>> reader = fake_reader()
>>> ['1', '2', '3', '4'] in reader
True
>>> ['2', '3', '4', '5'] in reader
True
>>> ['3', '4', '5', '6'] in reader
True
And it fails if the order is different:
>>> def fake_reader():
... yield ['1', '2', '3', '4']
... yield ['3', '4', '5', '6'] # changed order
... yield ['2', '3', '4', '5']
>>> reader = fake_reader()
>>> ['1', '2', '3', '4'] in reader # reads one row
True
>>> ['2', '3', '4', '5'] in reader # reads two rows!
True
>>> ['3', '4', '5', '6'] in reader # there are no more rows to read
False

How to split string array to 2-dimension char array in python

I have a string array, for example:
a = ['123', '456', '789']
I want to split it to form a 2-dimension char array:
b = [['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
I'm using
[[element for element in line] for line in array]
to achieve my goal but found it not easy to read, is there any built-in function or any readable way to do it?
Looks like a job for map:
>>> a = ['123', '456', '789']
>>> map(list, a)
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
Relevant documentation:
map
list
you could do something like:
first_list = ['123', '456', '789']
other_weirder_list = [list(line) for line in first_list]
Your solution isn't that bad, but you might do something like this or the map suggestion by arashajii.
map(list, array) should do it.
You can use map:
>>> a
['123', '456', '789']
>>> map(list, a)
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
Although I really don't see why you'd need to do this (unless you plan on editing one specific character in the string?). Strings behave similarly to lists.
First I tried e.split(''), but I get ValueError: empty separator.
Try this:
a = ['123', '456', '789']
b = [list(e) for e in a]
b
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]

Categories

Resources