New to python and trying to learn the ropes of file i/o.
Working with pulling lines from a large (2 million line) file in this format:
56fr4
4543d
4343d
hirh3
I've been reading that readline() is best because it doesn't pull the whole file into memory. But when I try to read the documentation on it, it seems to be Unix only? And I'm on a Mac.
Can I use readline on the Mac without loading the whole file into memory? What would the syntax be to simply readline number 3 in the file? The examples in the docs are a bit over my head.
Edit
Here is the function to return a code:
def getCode(i):
with open("test.txt") as file:
for index, line in enumerate(f):
if index == i:
code = # what does it equal?
break
return code
You don't need readline:
with open("data.txt") as file:
for line in file:
# do stuff with line
This will read the entire file line-by-line, but not all at once (so you don't need all the memory). If you want to abort reading the file, because you found the line you want, use break to terminate the loop. If you know the index of the line you want, use this:
with open("data.txt") as file:
for index, line in enumerate(file):
if index == 2: # looking for third line (0-based indexes)
# do stuff with this line
break # no need to go on
+1 # SpaceC0wb0y
You could also do:
f = open('filepath')
f.readline() # first line - let it pass
f.readline() # second line - let it pass
third_line = f.readline()
f.close()
Related
I want to go to line 34 in a .txt file and read it. How would you do that in Python?
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!
This code will open the file, read the line and print it.
# Open and read file into buffer
f = open(file,"r")
lines = f.readlines()
# If we need to read line 33, and assign it to some variable
x = lines[33]
print(x)
A solution that will not read more of the file than necessary is
from itertools import islice
line_number = 34
with open(filename) as f:
# Adjust index since Python/islice indexes from 0 and the first
# line of a file is line 1
line = next(islice(f, line_number - 1, line_number))
A very straightforward solution is
line_number = 34
with open(filename) as f:
f.readlines()[line_number - 1]
There's two ways:
Read the file, line by line, stop when you've gotten to the line you want
Use f.readlines() which will read the entire file into memory, and return it as a list of lines, then extract the 34th item from that list.
Solution 1
Benefit: You only keep, in memory, the specific line you want.
code:
for i in xrange(34):
line = f.readline();
# when you get here, line will be the 34th line, or None, if there wasn't
# enough lines in the file
Solution 2
Benefit: Much less code
Downside: Reads the entire file into memory
Problem: Will crash if less than 34 elements are present in the list, needs error handling
line = f.readlines()[33]
You could just read all the lines and index the line your after.
line = open('filename').readlines()[33]
for linenum,line in enumerate(open("file")):
if linenum+1==34: print line.rstrip()
I made a thread about this and didn't receive help so I took matter into my own hands.
Not any complicated code here.
import linecache
#Simply just importing the linecache function to read our line of choosing
number = int(input("Enter a number from 1-10 for a random quote "))
#Asks the user for which number they would like to read(not necessary)
lines = linecache.getline("Quotes.txt", number)
#Create a new variable in order to grab the specific line, the variable
#integer can be replaced by any integer of your choosing.
print(lines)
#This will print the line of your choosing.
If you are completing this in python make sure you have both files (.py) and (.txt) in the same location otherwise python will not be able to retrieve this, unless you specify the file location. EG.
linecache.getline("C:/Directory/Folder/Quotes.txt
This is used when the file is in another folder than the .py file you are using.
Hope this helps!
Option that always closes the file and doesn't load the whole file into memory
with open('file.txt') as f:
for i, line in enumerate(f):
if i+1 == 34: break
print(line.rstrip())
I wrote this little Python 2.7 prototype script to try and read specified lines (in this example lines 3,4,5) from a formatted input file. I am going to be later parsing data from this and operating on the input to construct other files.
from sys import argv
def comparator (term, inputlist):
for i in inputlist:
if (term==i):
return True
print "fail"
return False
readthese = [3,4,5]
for filename in argv[1:]:
with open(filename) as file:
for line in file:
linenum=#some kind of way to get line number from file
if comparator(linenum, readthese):
print(line)
I fixed all the errors I had found with the script but currently I don't see anyway to get a line number from file. It's a bit different than pulling the line number from a file object since file is a class not an object if I'm not mistakened. Is there someway I can pull the the line number for my input file?
I think a lot of my confusion probably stems from what I did with my with statement so if someone could also explain what exactly I have done with that line that would be great.
You could just enumerate the file object since enumerate works with anything iterable...
for line_number, line in enumerate(file):
if comparator(line_number, line):
print line
Note, this indexes starting at 0 -- If you want the first line to be 1, just tell enumerate that's where you want to start:
for line_number, line in enumerate(file, 1):
...
Note, I'd recommend not using the name file -- On python2.x, file is a type so you're effectively shadowing a builtin (albeit a rarely used one...).
You could also use the list structure's index itself like so:
with open('a_file.txt','r') as f:
lines = f.readlines()
readthese = [3,4,5]
for lineno in readthese:
print(lines[1+lineno])
Since the list of lines already implicitly contains the line numbers based on index+1
If the file is too large to hold in memory you could also use:
readthese = [3,4,5]
f = open('a_file.txt','r')
for lineno in readthese:
print(f.readline(lineno+1))
f.close()
I'm trying to read the rest of a file after finding a word.
I'm trying to write a program that searches for a word in a file and then, when the word was found, it needs to do something with the remaining lines that are below / after the word.
Here's what I have so far but it's not working. Please assist. thanks.
def readFile():
with open(“file.txt”, "r") as file:
for line in file:
if “Hello” in line:
break
nextline = file.readlines()
for line in nextline
print(line)
You can't mix iteration (which basically calls the file.next method in the loop) and readlines.
To quote a great man (and file.next documentation):
In order to make a for loop the most efficient way of looping over the
lines of a file (a very common operation), the next() method uses a
hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline())
does not work right
You can do fine with using just iteration:
def readFile():
with open("file.txt", "r") as file:
for line in file:
if "Hello" in line:
break
for line in file:
# do something with the line
When I run the following in the Python IDLE Shell:
f = open(r"H:\Test\test.csv", "rb")
for line in f:
print line
#this works fine
however, when I run the following for a second time:
for line in f:
print line
#this does nothing
This does not work because you've already seeked to the end of the file the first time. You need to rewind (using .seek(0)) or re-open your file.
Some other pointers:
Python has a very good csv module. Do not attempt to implement CSV parsing yourself unless doing so as an educational exercise.
You probably want to open your file in 'rU' mode, not 'rb'. 'rU' is universal newline mode, which will deal with source files coming from platforms with different line endings for you.
Use with when working with file objects, since it will cleanup the handles for you even in the case of errors. Ex:
.
with open(r"H:\Test\test.csv", "rU") as f:
for line in f:
...
You can read the data from the file in a variable, and then you can iterate over this data any no. of times you want to in your script. This is better than doing seek back and forth.
f = open(r"H:\Test\test.csv", "rb")
data = f.readlines()
for line in data:
print line
for line in data:
print line
Output:
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
Because you've gone all the way through the CSV file, and the iterator is exhausted. You'll need to re-open it before the second loop.
I want to go to line 34 in a .txt file and read it. How would you do that in Python?
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!
This code will open the file, read the line and print it.
# Open and read file into buffer
f = open(file,"r")
lines = f.readlines()
# If we need to read line 33, and assign it to some variable
x = lines[33]
print(x)
A solution that will not read more of the file than necessary is
from itertools import islice
line_number = 34
with open(filename) as f:
# Adjust index since Python/islice indexes from 0 and the first
# line of a file is line 1
line = next(islice(f, line_number - 1, line_number))
A very straightforward solution is
line_number = 34
with open(filename) as f:
f.readlines()[line_number - 1]
There's two ways:
Read the file, line by line, stop when you've gotten to the line you want
Use f.readlines() which will read the entire file into memory, and return it as a list of lines, then extract the 34th item from that list.
Solution 1
Benefit: You only keep, in memory, the specific line you want.
code:
for i in xrange(34):
line = f.readline();
# when you get here, line will be the 34th line, or None, if there wasn't
# enough lines in the file
Solution 2
Benefit: Much less code
Downside: Reads the entire file into memory
Problem: Will crash if less than 34 elements are present in the list, needs error handling
line = f.readlines()[33]
You could just read all the lines and index the line your after.
line = open('filename').readlines()[33]
for linenum,line in enumerate(open("file")):
if linenum+1==34: print line.rstrip()
I made a thread about this and didn't receive help so I took matter into my own hands.
Not any complicated code here.
import linecache
#Simply just importing the linecache function to read our line of choosing
number = int(input("Enter a number from 1-10 for a random quote "))
#Asks the user for which number they would like to read(not necessary)
lines = linecache.getline("Quotes.txt", number)
#Create a new variable in order to grab the specific line, the variable
#integer can be replaced by any integer of your choosing.
print(lines)
#This will print the line of your choosing.
If you are completing this in python make sure you have both files (.py) and (.txt) in the same location otherwise python will not be able to retrieve this, unless you specify the file location. EG.
linecache.getline("C:/Directory/Folder/Quotes.txt
This is used when the file is in another folder than the .py file you are using.
Hope this helps!
Option that always closes the file and doesn't load the whole file into memory
with open('file.txt') as f:
for i, line in enumerate(f):
if i+1 == 34: break
print(line.rstrip())