How can I make Python program read line in file - python

I have 2 files, passwd and dictionary. The passwd is a test file with one word, while the dictionary has a list of a few lines of words. My program so far reads and compares only the first line of the dictionary file. For example. My dictionary file contain (egg, fish, red, blue). My passwd file contains only (egg).
The program runs just fine, but once I switch the word egg in the dictionary file to lets say last in the list, the program wont read it and wont pull up results.
My code is below.
#!/usr/bin/passwd
import crypt
def testPass(line):
e = crypt.crypt(line,"HX")
print e
def main():
dictionary = open('dictionary', 'r')
password = open('passwd', 'r')
for line in dictionary:
for line2 in password:
if line == line2:
testPass(line2)
dictionary.close()
password.close()
main()

If you do
for line in file_obj:
....
you are implicitly using the readline method of the file, advancing the file pointer with each call. This means that after the inner loop is done for the first time, it will no longer be executed, because there are no more lines to read.
One possible solution is to keep one -- preferably the smaller -- file in memory using readlines. This way, you can iterate over it for each line you read from the other file.
file_as_list = file_obj.readlines()
for line in file_obj_2:
for line in file_as_list:
..

Once your inner loop runs once, it will have reached the end of the password file. When the outer loop hits its second iteration, there's nothing left to read in the password file because you haven't move the file pointer back to the start of the file.
There are many solutions to the problem. You can use seek to move the file pointer back to the start. Or, you can read the whole password file once and save the data in a list. Or, you can reopen the file on every iteration of the outer loop. The choice of which is best depends on the nature of the data (how many lines there are, are they on a slow network share or fast local disk?) and what your performance requirements are.

Related

How to start reading a file from a particular line in the case of a huge text file as I cannot iterate from line one

This is an issue of trying to reach to the line to start from and proceed from there in the shortest time possible.
I have a huge text file that I'm reading and performing operations line after line. I am currently keeping track of the line number that i have parsed so that in case of any system crash I know how much I'm done with.
How do I restart reading a file from the point if I don't want to start over from the beginning again.
count = 0
all_parsed = os.listdir("urltextdir/")
with open(filename,"r") as readfile :
for eachurl in readfile:
if str(count)+".txt" not in all_parsed:
urltext = getURLText(eachurl)
with open("urltextdir/"+str(count)+".txt","w") as writefile:
writefile.write(urltext)
result = processUrlText(urltext)
saveinDB(result)
This is what I'm currently doing, but when it crashes at a million lines, I'm having to through all these lines in the file to reach the point I want to start from, my Other alternative is to use readlines and load the entire file in memory.
Is there an alternative that I can consider.
Unfortunately line number isn't really a basic position for file objects, and the special seeking/telling functions are ruined by next, which is called in your loop. You can't jump to a line, but you can to a byte position. So one way would be:
line = readfile.readline()
while line:
line = readfile.readline(): #Must use `readline`!
lastell = readfile.tell()
print(lastell) #This is the location of the imaginary cursor in the file after reading the line
print(line) #Do with line what you would normally do
print(line) #Last line skipped by loop
Now you can easily jump back with
readfile.seek(lastell) #You need to keep the last lastell)
You would need to keep saving lastell to a file or printing it so on restart you know which byte you're starting at.
Unfortunately you can't use the written file for this, as any modification to the character amount will ruin a count based on this.
Here is one full implementation. Create a file called tell and put 0 inside of it, and then you can run:
with open('tell','r+') as tfd:
with open('abcdefg') as fd:
fd.seek(int(tfd.readline())) #Get last position
line = fd.readline() #Init loop
while line:
print(line.strip(),fd.tell()) #Action on line
tfd.seek(0) #Clear and
tfd.write(str(fd.tell())) #write new position only if successful
line = fd.readline() #Advance loop
print(line) #Last line will be skipped by loop
You can check if such a file exists and create it in the program of course.
As #Edwin pointed out in the comments, you may want to fd.flush() and os.fsync(fd.fileno) (import os if that isn't clear) to make sure after every write you file contents are actually on disk - this would apply to both write operations you are doing, the tell the quicker of the two of course. This may slow things down considerably for you, so if you are satisfied with the synchronicity as is, do not use that, or only flush the tfd. You can also specify the buffer when calling open size so Python automatically flushes faster, as detailed in https://stackoverflow.com/a/3168436/6881240.
If I got it right,
You could make a simple log file to store the count in.
but still would would recommand to use many files or store every line or paragraph in a database le sql or mongoDB
I guess it depends on what system your script is running on, and what resources (such as memory) you have available.
But with the popular saying "memory is cheap", you can simply read the file into memory.
As a test, I created a file with 2 million lines, each line 1024 characters long with the following code:
ms = 'a' * 1024
with open('c:\\test\\2G.txt', 'w') as out:
for _ in range(0, 2000000):
out.write(ms+'\n')
This resulted in a 2 GB file on disk.
I then read the file into a list in memory, like so:
my_file_as_list = [a for a in open('c:\\test\\2G.txt', 'r').readlines()]
I checked the python process, and it used a little over 2 GB in memory (on a 32 GB system)
Access to the data was very fast, and can be done by list slicing methods.
You need to keep track of the index of the list, when your system crashes, you can start from that index again.
But more important... if your system is "crashing" then you need to find out why it is crashing... surely a couple of million lines of data is not a reason to crash anymore these days...

Remove lines in a text file after processing them in a loop

I have a simple program that processes some lines in a text file (adds some text to them). But then it saves them to another file. I would like to know if you can remove the line after the line is processed in the loop. Here is a example of how my program works:
datafile = open("data.txt", "a+")
donefile = open("done.txt", "a+")
for i in datafile:
#My program goes in here
donefile.write(processeddata)
#end of loop
datafile.close()
donefile.close()
As you can see, it just processes some lines from a file (separated by a newline). Is there a way to remove the line in the end of the loop so that when the program is closed it can continue where it left off?
Just so that I get the question right- you'd like to remove the line from datafile once you've processed and stored it in donefile ?
There is no need to do this and its also pretty risky to write to a file which is your source of read.
Instead , why not delete the donefile after you exit the loop? (i.e. after you close your files)
file iterator is a lazy iterator. So when you do for i in datafile it loads one line into memory at a time, so you are only working with that one line...so memory constraints shouldn't be of your concern
Lastly, to access files, please consider using with statement. It takes care of file handle exceptions and makes your program more robust

Creating a program which counts words number in a row of a text file (Python)

I am trying to create a program which takes an input file, counts the number of words in each row and writes a string of that certain number in another output file. I managed to develope this code:
in_file = "our_input.txt"
out_file = "output.txt"
f=open(in_file)
g=open(out_file,"w")
for line in f:
if line == "\n":
g.write("0\n")
else:
g.write(str(line.count(" ")+1)+"\n")
now, this works well, but the problem is that it works for only a certain amount of lines. If my input file has 8000 lines, it will display only the first 6800. If it has 6000, than will be displayed (all numbers are rounded, right).
I tried creating another program, which splits each line to a list, and then counting the length of it, but the problem remains just the same.
Any idea what could cause this?
You need to close each file after you're done with it. The safest way to do this is by using the with statement:
with open(in_file) as f, open(out_file,"w") as g:
for line in f:
if line == "\n":
g.write("0\n")
else:
g.write(str(line.count(" ")+1)+"\n")
When reaching the end of a with block, all files you opened in the with line will be closed.
The reason for the behavior you see is that for performance reasons, reading and writing to/from files is buffered. Because of the way hard drives are constructed, data is read/written in blocks rather than in individual bytes - so even if you attempt to read/write a single byte, you have to read/write an entire block. Therefore, most programming languages' built-in file IO functions actually read (at least) one block at a time into memory and feed you data from that in-memory block until it needs to read another block. Similarly, writing is performed by actually writing into a memory block first, and only writing the block to disk when it is full. If you don't close the file writer, whatever is in the last in-memory block won't be written.

edit a file line by line interactively from user input in python

I want to know how to edit a file on the fly row by row in python.
For example I have a text file where I usually have:
key value
key value
key value
key value
key value
...
they are not necessarily the same pair for each line. It's just the way I explained it.
I would like to show line by line key and value (on my terminal) and then I want to do one of this two things:
-just press enter (or whatever hot-key) to go ahead and read (show) next line.
-enter a new value then hit enter. this will actually replace the value (that was being shown) on the file and finally go ahead to show next pair of key values.
Till end of file or possibly till I type 'quit' or some other keyword. doesn't matter.
-Being able to go back to the previous row would be a plus (in case of accidentally going to next row), but it's not too important for now.
I find myself often editing huge files in a very tedious and repetitive way, and text editors are really frustrating with their cursors going everywhere when pressing the arrow-key. Also having to use the backspace to delete is annoying.
I know how to read a file and how to write a file in python. But not in such interactive way. I only know how to write the whole file at once. Plus I wouldn't know if it is safe to open the same file in both reading and writing. Also I know how to manipulate each line, split the text in a list of values etc... all I really need is to understand how to modify the file at that exact current line and handle well this type of interaction.
what is the best way to do this?
All the answers focus on loading the contents of the file in memory, modifying and then on close saving all on disk, so I thought I'd give it a try:
import os
sep = " "
with open("inline-t.txt", "rb+") as fd:
seekpos = fd.tell()
line = fd.readline()
while line:
print line
next = raw_input(">>> ")
if next == ":q":
break
if next:
values = line.split(sep)
newval = values[0] + sep + next + '\n'
if len(newval) == len(line):
fd.seek(seekpos)
fd.write(newval)
fd.flush()
os.fsync(fd)
else:
remaining = fd.read()
fd.seek(seekpos)
fd.write(newval + remaining)
fd.flush()
os.fsync(fd)
fd.seek(seekpos)
line = fd.readline()
seekpos = fd.tell()
line = fd.readline()
The script simply opens the file, reads line by line, and rewrites it if the user inputs a new value. If the length of the data matches previous data, seek and write are enough. If the new data is of different size, we need to clean-up after us. So the remainder of the file is read, appended to the new data, and everything is rewritten to disk. fd.flush and os.fsync(fd) guarantee that changes are indeed available in the file as soon as it is written out. Not the best solution, performance-wise, but I believe this is closer to what he asked.
Also, consider there might be a few quirks in this code, and I'm sure there's room for optimizing -- perhaps one global read at the beggining to avoid multiple whole file reads if changes that need adjusting are made often, or something like that.
The way I would go about this is to load all the lines of the text file in a list, and then iterate through that list, changing the values of the list as you go along. Then at the very end (when you get to the last line or whenever you want), you will write that whole list out to the file with the same name, so that way it will overwrite the old file.

Why doesn't this simple search work?

am simply iterating through an external file (which contains a phrase) and want to see if a line exists (which has the word 'Dad' in it) If i find it, I want to replace it with 'Mum'. Here is the program i've built... but am not sure why it isn't working?!
message_file = open('test.txt','w')
message_file.write('Where\n')
message_file.write('is\n')
message_file.write('Dad\n')
message_file.close()
message_temp_file = open('testTEMP.txt','w')
message_file = open('test.txt','r')
for line in message_file:
if line == 'Dad': # look for the word
message_temp_file.write('Mum') # replace it with mum in temp file
else:
message_temp_file.write(line) # else, just write the word
message_file.close()
message_temp_file.close()
import os
os.remove('test.txt')
os.rename('testTEMP.txt','test.txt')
This should be so simple...it's annoyed me! Thanks.
You don't have any lines that are "Dad". You have a line that is "Dad\n", but no "Dad". In addition, since you've done message_file.read(), the cursor is at the end of your file so for line in message_file will return StopIteration immediately. You should do message_file.seek(0) just before your for loop.
print(message_file.read())
message_file.seek(0)
for line in message_file:
if line.strip() == "Dad":
...
That should put the cursor back at the beginning of the file, and strip out the newline and get you what you need.
Note that this exercise is a great example of how not to do things in general! The better implementation would have been:
in_ = message_file.read()
out = in_.replace("Dad","Mum")
message_temp_file.write(out)
print(message_file.read())
here you already read the whole file.
Nothing is left for the for loop to check
A file object always remembers where it stopped to read/write the last time you accessed it.
So if you call print(message_file.readline()), the first line of the file is read and printed. Next time you call the same command, the second line is read and printed and so on until you reach the end of the file. By using print(message_file.read()) you have read the whole file and any further call of read or readline will give you nothing
You can get the current position by message_file.tell() and set it to a certain value by message_file.seek(value), or simply reopen the file
The problem most likely is due to the fact that your conditional will only match the string "Dad", when the string is actually "Dad\n". You could either update your conditional to:
if line == "Dad\n":
OR
if "Dad" in line:
Lastly, you also read the entire file when you call print(message_file.read()). You either need to remove that line, or you need to put a call to message_file.seek(0) in order for the loop that follows to actually do anything.

Categories

Resources