Python, checking data file for certain lines - python

I've never taken a class that used python, just c, c++, c#, java, etc..
This should be easy but I'm feeling like I'm missing something huge that python reacts to.
All I'm doing is reading in a file, checking for lines that are only digits, counting how many lines like that and displaying it.
So I'm opening, reading, striping, checking isdigit(), and incrementing. What's wrong?
# variables
sum = 0
switch = "run"
print( "Reading data.txt and counting..." )
# open the file
file = open( 'data.txt', 'r' )
# run through file, stripping lines and checking for numerics, incrementing sum when neeeded
while ( switch == "run" ):
line = file.readline()
line = line.strip()
if ( line.isdigit() ):
sum += 1
if ( line == "" ):
print( "End of file\ndata.txt contains %s lines of digits" %(sum) )
switch = "stop"

The correct way in Python to tell if you've reached the end of a file is not to see if it returns an empty line.
Instead, iterate over all the lines in the file, and the loop will end when the end of the file is reached.
num_digits = 0
with open("data.txt") as f:
for line in f:
if line.strip().isdigit():
num_digits += 1
Because files can be iterated over, you can simplify this using a generator expression:
with open("data.txt") as f:
num_digits = sum( 1 for line in f if line.strip().isdigit() )
I would also recommend against using reserved Python keywords such as sum as variable names, and it's also terribly inefficient to use string comparisons for flow logic like you're doing.

sum=0
f=open("file")
for line in f:
if line.strip().isdigit():
sum+=1
f.close()

I just tried running your code:
matti#konata:~/tmp$ cat data.txt
1
a
542
dfd
b
42
matti#konata:~/tmp$ python johnredyns.py
Reading data.txt and counting...
End of file
data.txt contains 3 lines of digits
It works fine here. What's in your data.txt?

As several people have said, your code appears to work perfectly. Perhaps your "data.txt" file is in a different directory than your current working directory (not necessarily the directory that your script is in)?
However, here's a more "pythonic" way of doing the same thing:
counter = 0
with open('data.txt', 'r') as infile:
for line in infile:
if line.strip().isdigit():
counter += 1
print 'There are a total of {0} lines that start with digits'.format(counter)
You could even make it a one-liner with:
counter = sum([line.strip().isdigit() for line in open('data.txt', 'r')])
I'd avoid that route at first though... It's much less readable!

How are you running the program? Are you sure data.txt has data? Is there an empty line in the file?
try this:
while 1:
line = file.readline()
if not line: break
line = line.strip()
if ( line.isdigit() ):
sum += 1
print( "End of file\ndata.txt contains %s lines of digits" %(sum) )

Related

Reading CSV file with python

filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(0, os.SEEK_END)
while 1:
time.sleep(1)
where = mycsv.tell()
line = mycsv.readline()
if not line:
mycsv.seek(where)
else:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
I have this Paython code which is reading the values from a csv file every time there is a new line printed in the csv from external program. My problem is that the csv file is periodically completely rewriten and then python stops reading the new lines. My guess is that python is stuck on some line number and the new update can put maybe 50 more or less lines. So for example python is now waiting a new line at line 70 and the new line has come at line 95. I think the solution is to let mycsv.seek(0, os.SEEK_END) been updated but not sure how to do that.
What you want to do is difficult to accomplish without rewinding the file every time to make sure that you are truly on the last line. If you know approximately how many characters there are on each line, then there is a shortcut you could take using mycsv.seek(-end_buf, os.SEEK_END), as outlined in this answer. So your code could work somehow like this:
avg_len = 50 # use an appropriate number here
end_buf = 3 * avg_len / 2
filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(-end_buf, os.SEEK_END)
last = mycsv.readlines()[-1]
while 1:
time.sleep(1)
mycsv.seek(-end_buf, os.SEEK_END)
line = mycsv.readlines()[-1]
if not line == last:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
Here, in each iteration of the while loop, you seek to a position close to the end of the file, just far back enough that you know for sure the last line will be contained in what remains. Then you read in all the remaining lines (this will probably include a partial amount of the second or third to last lines) and check if the last line of these is different to what you had before.
You can do a simpler way of reading lines in your program. Instead of trying to use seek in order to get what you need, try using readlines on the file object mycsv.
You can do the following:
mycsv = open('NTS.csv', 'r')
csv_lines = mycsv.readlines()
for line in csv_lines:
arr_line = line.split(',')
var3 = arr_line[3]
print(var3)

Python 3.4.3: Iterating over each line and each character in each line in a text file

I have to write a program that iterates over each line in a text file and then over each character in each line in order to count the number of entries in each line.
Here is a segment of the text file:
N00000031,B,,D,D,C,B,D,A,A,C,D,C,A,B,A,C,B,C,A,C,C,A,B,D,D,D,B,A,B,A,C,B,,,C,A,A,B,D,D
N00000032,B,A,D,D,C,B,D,A,C,C,D,,A,A,A,C,B,D,A,C,,A,B,D,D
N00000033,B,A,D,D,C,,D,A,C,B,D,B,A,B,C,C,C,D,A,C,A,,B,D,D
N00000034,B,,D,,C,B,A,A,C,C,D,B,A,,A,C,B,A,B,C,A,,B,D,D
The first and last lines are "unusable lines" because they contain too many entries (more or less than 25). I would like to count the amount of unusable lines in the file.
Here is my code:
for line in file:
answers=line.split(",")
i=0
for i in answers:
i+=1
unusable_line=0
for line in file:
if i!=26:
unusable_line+=1
print("Unusable lines in the file:", unusable_line)
I tried using this method as well:
alldata=file.read()
for line in file:
student=alldata.split("\n")
answer=student.split(",")
My problem is each variable I create doesn't exist when I try to run the program. I get a "students" is not defined error.
I know my coding is awful but I'm a beginner. Sorry!!! Thank you and any help at all is appreciated!!!
A simplified code for your method using list,count and if condition
Code:
unusable_line = 0
for line in file:
answers = line.strip().split(",")
if len(answers) < 26:
unusable_line += 1
print("Unusable lines in the file:", unusable_line)
Notes:
Initially I have created a variable to store count of unstable lines unusable_line.
Then I iterate over the lines of the file object.
Then I split the lines at , to create a list.
Then I check if the count of list is less then 26. If so I increment the unusable_line varaiable.
Finally I print it.
You could use something like this and wrap it into a function. You don't need to re-iterate the items in the line, str.split() returns a list[] that has your elements in it, you can count the number of its elements with len()
my_file = open('temp.txt', 'r')
lines_count = usable = ununsable = 0
for line in my_file:
lines_count+=1
if len(line.split(',')) == 26:
usable+=1
else:
ununsable+=1
my_file.close()
print("Processed %d lines, %d usable and %d ununsable" % (lines_count, usable, ununsable))
You can do it much shorter:
with open('my_fike.txt') as fobj:
unusable = sum(1 for line in fobj if len(line.split(',')) != 26)
The line with open('my_fike.txt') as fobj: opens the file for reading and closes it automatically after leaving the indented block. I use a generator expression to go through all lines and add up all that have a length different from 26.

Printing specific lines txt file python

I have a text file I wish to analyze. I'm trying to find every line that contains certain characters (ex: "#") and then print the line located 3 lines before it (ex: if line 5 contains "#", I would like to print line 2)
This is what I got so far:
file = open('new_file.txt', 'r')
a = list()
x = 0
for line in file:
x = x + 1
if '#' in line:
a.append(x)
continue
x = 0
for index, item in enumerate(a):
for line in file:
x = x + 1
d = a[index]
if x == d - 3:
print line
continue
It won't work (it prints nothing when I feed it a file that has lines containing "#"), any ideas?
First, you are going through the file multiple times without re-opening it for subsequent times. That means all subsequent attempts to iterate the file will terminate immediately without reading anything.
Second, your indexing logic a little convoluted. Assuming your files are not huge relative to your memory size, it is much easier to simply read the whole into memory (as a list) and manipulate it there.
myfile = open('new_file.txt', 'r')
a = myfile.readlines();
for index, item in enumerate(a):
if '#' in item and index - 3 >= 0:
print a[index - 3].strip()
This has been tested on the following input:
PrintMe
PrintMe As Well
Foo
#Foo
Bar#
hello world will print
null
null
##
Ok, the issue is that you have already iterated completely through the file descriptor file in line 4 when you try again in line 11. So line 11 will make an empty loop. Maybe it would be a better idea to iterate the file only once and remember the last few lines...
file = open('new_file.txt', 'r')
a = ["","",""]
for line in file:
if "#" in line:
print(a[0], end="")
a.append(line)
a = a[1:]
For file IO it is usually most efficient for programmer time and runtime to use reg-ex to match patterns. In combination with iteration through the lines in the file. your problem really isn't a problem.
import re
file = open('new_file.txt', 'r')
document = file.read()
lines = document.split("\n")
LinesOfInterest = []
for lineNumber,line in enumerate(lines):
WhereItsAt = re.search( r'#', line)
if(lineNumber>2 and WhereItsAt):
LinesOfInterest.append(lineNumber-3)
print LinesOfInterest
for lineNumber in LinesOfInterest:
print(lines[lineNumber])
Lines of Interest is now a list of line numbers matching your criteria
I used
line1,0
line2,0
line3,0
#
line1,1
line2,1
line3,1
#
line1,2
line2,2
line3,2
#
line1,3
line2,3
line3,3
#
as input yielding
[0, 4, 8, 12]
line1,0
line1,1
line1,2
line1,3

Update iteration value in Python for loop

Pretty new to Python and have been writing up a script to pick out certain lines of a basic log file
Basically the function searches lines of the file and when it finds one I want to output to a separate file, adds it into a list, then also adds the next five lines following that. This then gets output to a separate file at the end in a different funcition.
What I've been trying to do following that is jump the loop to continue on from the last of those five lines, rather than going over them again. I thought the last line in the code would solved the problem, but unfortunately not.
Are there any recommended variations of a for loop I could use for this purpose?
def readSingleDayLogs(aDir):
print 'Processing files in ' + str(aDir) + '\n'
lineNumber = 0
try:
open_aDirFile = open(aDir) #open the log file
for aLine in open_aDirFile: #total the num. lines in file
lineNumber = lineNumber + 1
lowerBound = 0
for lineIDX in range(lowerBound, lineNumber):
currentLine = linecache.getline(aDir, lineIDX)
if (bunch of logic conditions):
issueList.append(currentLine)
for extraLineIDX in range(1, 6): #loop over the next five lines of the error and append to issue list
extraLine = linecache.getline(aDir, lineIDX+ extraLineIDX) #get the x extra line after problem line
issueList.append(extraLine)
issueList.append('\n\n')
lowerBound = lineIDX
You should use a while loop :
line = lowerBound
while line < lineNumber:
...
if conditions:
...
for lineIDX in range(line, line+6):
...
line = line + 6
else:
line = line + 1
A for-loop uses an iterator over the range, so you can have the ability to change the loop variable.
Consider using a while-loop instead. That way, you can update the line index directly.
I would look at something like:
from itertools import islice
with open('somefile') as fin:
line_count = 0
my_lines = []
for line in fin:
line_count += 1
if some_logic(line):
my_lines.append(line)
next_5 = list(islice(fin, 5))
line_count += len(next_5)
my_lines.extend(next_5)
This way, by using islice on the input, you're able to move the iterator ahead and resume after the 5 lines (perhaps fewer if near the end of the file) are exhausted.
This is based on if I'm understanding correctly that you can read forward through the file, identify a line, and only want a fixed number of lines after that point, then resume looping as per normal. (You may not even require the line counting if that's all you're after as it only appears to be for the getline and not any other purpose).
If you indeed you want to take the next 5, and still consider the following line, you can use itertools.tee to branch at the point of the faulty line, and islice that and let the fin iterator resume on the next line.

getting data out of a txt file

I'm only just beginning my journey into Python. I want to build a little program that will calculate shim sizes for when I do the valve clearances on my motorbike. I will have a file that will have the target clearances, and I will query the user to enter the current shim sizes, and the current clearances. The program will then spit out the target shim size. Looks simple enough, I have built a spread-sheet that does it, but I want to learn python, and this seems like a simple enough project...
Anyway, so far I have this:
def print_target_exhaust(f):
print f.read()
#current_file = open("clearances.txt")
print print_target_exhaust(open("clearances.txt"))
Now, I've got it reading the whole file, but how do I make it ONLY get the value on, for example, line 4. I've tried print f.readline(4) in the function, but that seems to just spit out the first four characters... What am I doing wrong?
I'm brand new, please be easy on me!
-d
To read all the lines:
lines = f.readlines()
Then, to print line 4:
print lines[4]
Note that indices in python start at 0 so that is actually the fifth line in the file.
with open('myfile') as myfile: # Use a with statement so you don't have to remember to close the file
for line_number, data in enumerate(myfile): # Use enumerate to get line numbers starting with 0
if line_number == 3:
print(data)
break # stop looping when you've found the line you want
More information:
with statement
enumerate
Not very efficient, but it should show you how it works. Basically it will keep a running counter on every line it reads. If the line is '4' then it will print it out.
## Open the file with read only permit
f = open("clearances.txt", "r")
counter = 0
## Read the first line
line = f.readline()
## If the file is not empty keep reading line one at a time
## till the file is empty
while line:
counter = counter + 1
if counter == 4
print line
line = f.readline()
f.close()

Categories

Resources