File manipulation: lists, removing `\n`s - python

Code:
with open ("Test1_Votes.txt", 'r'):
f = open("Test1_Votes.txt")
lines = f.readlines()
print(lines[0])
print(lines[1])
all_lines = []
lines = lines.rstrip("\n") #does not work
for line in lines:
#in here
all_lines.append(line)
print(all_lines)
Right now it prints outputs something like:
['1,2,3,0,0\n', ...]
I would like it to output [[1, 2, 3, 0, 0], ...]
File Sample:
1,2,3,0,0
1,3,2,0,0
2,3,1,0,0
3,0,1,2,0
3,0,1,0,2
The zero's must be kept in there and there is not a blank line in between each line in the .txt
Any suggestions/answers?
Thanks in advance

You have a few minor glitches in your code. The with context opens the file, so you don't need the second open statement. lines is a list of each line in your file, including the trailing '\n' characters, to remove them you can iterate over the lines list and strip off the new line characters.
with open("Test1_Votes.txt", 'r') as f:
lines = [line.rstrip() for line in f.readlines()]
print(lines[0])
print(lines[1])

You're currently stripping the new line character only from the last line of the file, if any. You should strip from each line instead:
with open ("Test1_Votes.txt") as f:
all_lines = []
for line in f:
line = line.rstrip("\n") # strip new line character
lst = [int(x) for x in line.split(',')] # split line and cast to int
all_lines.append(lst)
Of course, you can put the entire logic into a list comprehension:
with open ("Test1_Votes.txt") as f:
all_lines = [[int(x) for x in l.rstrip("\n").split(',')] for l in f]

try this
fle=open("infile.txt", "r")
lst=fle.readlines()
lst=[i.strip() for i in lst]
for i in lst:
print i
print lst

Use re.split() instead of readlines():
import re
your_file='abc def\nghi jkl\nmno pqr...'
all_lines=re.split('\n', your_file)

A one liner might look like this:
all_lines = list([int(x) for x in line.replace('\n', '').split(',')] for line in open ("filepath", 'r').readlines())
print(all_lines)

Related

how to iterate lines from a text file and do something?

with open("list.txt") as f:
lst = (f)
print(lst)
for h in lst:
print(lst[h])
In a loop i want to open a file take 1st line as a var and do something with selenium and then take another line and do the selenium stuff again until last line.
trying since evening only getting errors. like list indices must be integers or slices, not str.
Why not this :
with open("list.txt") as f:
for line in f:
print(line)
# do selenium stuff
if somecondition :
break
with open("list.txt", 'r') as f:
lines = f.readlines()
for line in Lines:
# do smth here
Looks like you want to read the data in the file list.txt and want to print out the list with each line of file list.txt as an element of lst. You would want to do that like-
lst = []
with open("list.txt") as f:
for new in f:
if new != '\n':
addIt = new[:-1].split(',')
lst.append(addIt)
for h in lst:
print(lst[h])
You can try something like this as well
lst = open("list.txt").read().split('\n')
for each in lst:
print(each)

Python - Remove all the lines starting with word/string present in a list

I am trying to parse huge 50K lined file in which I have to remove any line that starts with the word present in a predefined list.
Currently I have tried the below and the output file (DB12_NEW) is not working as desired -
rem = ['remove', 'remove1', 'remove2'....., 'removen']
inputFile = open(r"C:\file", "r")
outputFile = open(r"C:\file_12", "w")
lines = inputFile.readlines()
inputFile.close()
for line in lines:
for i in rem:
if line.startswith(i):
outputFile.write('\n')
else:
outputFile.write(line)
I am getting the same file as output that I initially put in... the script is not removing the lines that start with any of the strings present in the list.
Can you please help understand how to achieve this?
Use a tuple instead of list for str.startswith.
# rem = ['remove', 'rem-ove', 'rem ove']
rem = ('remove', 'rem-ove', 'rem ove')
with open('DB12', 'r') as inputFile, open('DB12_NEW', 'w') as outputFile:
for line in inputFile.readlines():
if not line.startswith(rem):
outputFile.writelines(line)
Currently you check if the line starts with the a word from the remove list one at a time. For example:
If the line starts with "rem ABCDF..." and in your loop you check if the line starts with 'remove' then your if-statement returns false and writes the line in your outputfile.
You could try something like this:
remove = ['remove', 'rem-ove', 'rem', 'rem ove' ...... 'n']
inputFile = open(r"C:\DB12", "r")
outputFile = open(r"C:\DB12_NEW", "w")
for line in inputFile.splitlines():
if not any(line.startswith(i) for i in remove):
outputFile.write(line)
The any keyword only returns False if all elements are also False.
Sometimes this could be caused by leading/trailing spaces.
Try stripping off empty spaces using strip() and check.
rem = [x.strip() for x in rem]
lines = [line.strip() for line in lines]

Why is my if statement not working and just outputting the else, everything works till there? [duplicate]

In Python, calling e.g. temp = open(filename,'r').readlines() results in a list in which each element is a line from the file. However, these strings have a newline character at the end, which I don't want.
How can I get the data without the newlines?
You can read the whole file and split lines using str.splitlines:
temp = file.read().splitlines()
Or you can strip the newline by hand:
temp = [line[:-1] for line in file]
Note: this last solution only works if the file ends with a newline, otherwise the last line will lose a character.
This assumption is true in most cases (especially for files created by text editors, which often do add an ending newline anyway).
If you want to avoid this you can add a newline at the end of file:
with open(the_file, 'r+') as f:
f.seek(-1, 2) # go at the end of the file
if f.read(1) != '\n':
# add missing newline if not already present
f.write('\n')
f.flush()
f.seek(0)
lines = [line[:-1] for line in f]
Or a simpler alternative is to strip the newline instead:
[line.rstrip('\n') for line in file]
Or even, although pretty unreadable:
[line[:-(line[-1] == '\n') or len(line)+1] for line in file]
Which exploits the fact that the return value of or isn't a boolean, but the object that was evaluated true or false.
The readlines method is actually equivalent to:
def readlines(self):
lines = []
for line in iter(self.readline, ''):
lines.append(line)
return lines
# or equivalently
def readlines(self):
lines = []
while True:
line = self.readline()
if not line:
break
lines.append(line)
return lines
Since readline() keeps the newline also readlines() keeps it.
Note: for symmetry to readlines() the writelines() method does not add ending newlines, so f2.writelines(f.readlines()) produces an exact copy of f in f2.
temp = open(filename,'r').read().split('\n')
Reading file one row at the time. Removing unwanted chars from end of the string with str.rstrip(chars).
with open(filename, 'r') as fileobj:
for row in fileobj:
print(row.rstrip('\n'))
See also str.strip([chars]) and str.lstrip([chars]).
I think this is the best option.
temp = [line.strip() for line in file.readlines()]
temp = open(filename,'r').read().splitlines()
My preferred one-liner -- if you don't count from pathlib import Path :)
lines = Path(filename).read_text().splitlines()
This it auto-closes the file, no need for with open()...
Added in Python 3.5.
https://docs.python.org/3/library/pathlib.html#pathlib.Path.read_text
Try this:
u=open("url.txt","r")
url=u.read().replace('\n','')
print(url)
To get rid of trailing end-of-line (/n) characters and of empty list values (''), try:
f = open(path_sample, "r")
lines = [line.rstrip('\n') for line in f.readlines() if line.strip() != '']
You can read the file as a list easily using a list comprehension
with open("foo.txt", 'r') as f:
lst = [row.rstrip('\n') for row in f]
my_file = open("first_file.txt", "r")
for line in my_file.readlines():
if line[-1:] == "\n":
print(line[:-1])
else:
print(line)
my_file.close()
This script here will take lines from file and save every line without newline with ,0 at the end in file2.
file = open("temp.txt", "+r")
file2 = open("res.txt", "+w")
for line in file:
file2.writelines(f"{line.splitlines()[0]},0\n")
file2.close()
if you looked at line, this value is data\n, so we put splitlines()
to make it as an array and [0] to choose the only word data
import csv
with open(filename) as f:
csvreader = csv.reader(f)
for line in csvreader:
print(line[0])

Trouble with sys.stdin.readline [duplicate]

In Python, calling e.g. temp = open(filename,'r').readlines() results in a list in which each element is a line from the file. However, these strings have a newline character at the end, which I don't want.
How can I get the data without the newlines?
You can read the whole file and split lines using str.splitlines:
temp = file.read().splitlines()
Or you can strip the newline by hand:
temp = [line[:-1] for line in file]
Note: this last solution only works if the file ends with a newline, otherwise the last line will lose a character.
This assumption is true in most cases (especially for files created by text editors, which often do add an ending newline anyway).
If you want to avoid this you can add a newline at the end of file:
with open(the_file, 'r+') as f:
f.seek(-1, 2) # go at the end of the file
if f.read(1) != '\n':
# add missing newline if not already present
f.write('\n')
f.flush()
f.seek(0)
lines = [line[:-1] for line in f]
Or a simpler alternative is to strip the newline instead:
[line.rstrip('\n') for line in file]
Or even, although pretty unreadable:
[line[:-(line[-1] == '\n') or len(line)+1] for line in file]
Which exploits the fact that the return value of or isn't a boolean, but the object that was evaluated true or false.
The readlines method is actually equivalent to:
def readlines(self):
lines = []
for line in iter(self.readline, ''):
lines.append(line)
return lines
# or equivalently
def readlines(self):
lines = []
while True:
line = self.readline()
if not line:
break
lines.append(line)
return lines
Since readline() keeps the newline also readlines() keeps it.
Note: for symmetry to readlines() the writelines() method does not add ending newlines, so f2.writelines(f.readlines()) produces an exact copy of f in f2.
temp = open(filename,'r').read().split('\n')
Reading file one row at the time. Removing unwanted chars from end of the string with str.rstrip(chars).
with open(filename, 'r') as fileobj:
for row in fileobj:
print(row.rstrip('\n'))
See also str.strip([chars]) and str.lstrip([chars]).
I think this is the best option.
temp = [line.strip() for line in file.readlines()]
temp = open(filename,'r').read().splitlines()
My preferred one-liner -- if you don't count from pathlib import Path :)
lines = Path(filename).read_text().splitlines()
This it auto-closes the file, no need for with open()...
Added in Python 3.5.
https://docs.python.org/3/library/pathlib.html#pathlib.Path.read_text
Try this:
u=open("url.txt","r")
url=u.read().replace('\n','')
print(url)
To get rid of trailing end-of-line (/n) characters and of empty list values (''), try:
f = open(path_sample, "r")
lines = [line.rstrip('\n') for line in f.readlines() if line.strip() != '']
You can read the file as a list easily using a list comprehension
with open("foo.txt", 'r') as f:
lst = [row.rstrip('\n') for row in f]
my_file = open("first_file.txt", "r")
for line in my_file.readlines():
if line[-1:] == "\n":
print(line[:-1])
else:
print(line)
my_file.close()
This script here will take lines from file and save every line without newline with ,0 at the end in file2.
file = open("temp.txt", "+r")
file2 = open("res.txt", "+w")
for line in file:
file2.writelines(f"{line.splitlines()[0]},0\n")
file2.close()
if you looked at line, this value is data\n, so we put splitlines()
to make it as an array and [0] to choose the only word data
import csv
with open(filename) as f:
csvreader = csv.reader(f)
for line in csvreader:
print(line[0])

Read a file line by line but only some characters within each line

I have a text file and I would like to get the string from the 300th character to the 500th character (or column in the text file) within each line and put the string into a list.
I started my code with this but I don't know how to modify the file reading with specifying the ch.
with open("filename") as f:
for line in f:
for ch in line:
Try:
with open("filename") as f:
for line in f:
chs = line[299:500]
This should slice it and return the characters from 300-500. From there, you could just do list.append to put it into a list, or whatever you need to do.
You can use the subscript slicing notation on python strings:
lines = []
with open("filename") as f:
for line in f:
lines.append(line[300:500])
Or
with open("filename") as f:
lines = [l[300:500] for l in f]

Categories

Resources