Testing each line in a file - python

I am trying to write a Python program that reads each line from an infile. This infile is a list of dates. I want to test each line with a function isValid(), which returns true if the date is valid, and false if it is not. If the date is valid, it is written into an output file. If it is not, invalid is written into the output file. I have the function, and all I want to know is the best way to test each line with the function. I know this should be done with a loop, I'm just uncertain how to set up the loop to test each line in the file one-by-one.
Edit: I now have a program that basically works. However, I am getting incorrect output to the output file. Perhaps someone will be able to explain why.
Ok, I now have a program that basically works, but I'm getting strange results in the output file. Hopefully those with Python 3 experience can help.
def main():
datefile = input("Enter filename: ")
t = open(datefile, "r")
c = t.readlines()
ofile = input("Enter filename: ")
o = open(ofile, "w")
for line in c:
b = line.split("/")
e = b[0]
f = b[1]
g = b[2]
text = str(e) + " " + str(f) + ", " + str(g)
text2 = "The date " + text + " is invalid"
if isValid(e,f,g) == True:
o.write(text)
else:
o.write(text2)
def isValid(m, d, y):
if m == 1 or m == 3 or m == 5 or m == 7 or m == 8 or m == 10 or m == 12:
if d is range(1, 31):
return True
elif m == 2:
if d is range(1,28):
return True
elif m == 4 or m == 6 or m == 9 or m == 11:
if d is range(1,30):
return True
else:
return False
This is the output I'm getting.
The date 5 19, 1998
is invalidThe date 7 21, 1984
is invalidThe date 12 7, 1862
is invalidThe date 13 4, 2000
is invalidThe date 11 40, 1460
is invalidThe date 5 7, 1970
is invalidThe date 8 31, 2001
is invalidThe date 6 26, 1800
is invalidThe date 3 32, 400
is invalidThe date 1 1, 1111
is invalid

In the most recent versions of Python you can use the context management features that are implicit for files:
results = list()
with open(some_file) as f:
for line in f:
if isValid(line, date):
results.append(line)
... or even more tersely with a list comprehension:
with open(some_file) as f:
results = [line for line in f if isValid(line, date)]
For progressively older versions of Python you might need to explicitly open and close the file (with simple implicit iteration over the file for line in file:) or add more explicit iteration over the file (f.readline() or f.readlines() (plural) depending on whether you want to "slurp" in the entire file (with the memory overhead implications of that) or iterate line-by-line).
Also note that you may wish to strip the trailing newlines off these file contents (perhaps by calling line.rstrip('\n') --- or possibly just line.strip() if you want to eliminate all leading and trailing whitespace from each line).
(Edit based on additional comment to previous answer):
The function signature isValid(m,d,y) suggests that you're passing a data to this function (month, day, year) but that doesn't make sense given that you must also, somehow, pass in the data to be validated (a line of text, a string, etc).
To help you further you'll have to provide more information (preferable the source or a relevant portion of the source to this "isValid()" function.
In my initial answer I was assuming that your "isValid()" function was merely scanning for any valid date in its single argument. I've modified my code examples to show how one might pass a specific date, as a single argument, to a function which used this calling signature: "isValid(somedata, some_date)."

with open(fname) as f:
for line in f.readlines():
test(line)

Related

Read only the float in a file

I am working with file handling exercise.
So my txt file have this content:
List of Sales
Day 1 : 1250.25
Day 2 : 2560.25
Day 3 : 3241.10
Day 4 : 1530.20
Day 5 : 1247.27
Day 6 : 1646.22
Day 7 : 850.25
I want to only get the amount per day and sum it.
OFile = open('sales.txt','r')
file_content = OFile.read()
print(file_content)
import re
get = re.findall(r'[.]', file_content)
amount = []
for n in range(7):
amount.append(get)
total = sum(amount)
print("Total sales Amount: ", "Php", total)
I keep getting Total sales Amount 0
keep it simple and use str.split and str.strip instead of using regex!
In your case (with the input file you have attached)
Exception may raised from the conversion to float (if you have
invalid line or some string that can not be converted to float!
Or line that have no ":" (e.g. the first line in the file) which causes
the split() call to return the same input string as a list of one string (the line)
without spaces.In both cases you want to
skip and continue to next line!
total_sum = 0
with open('sales.txt','r') as fp:
for line in fp:
try:
current_float_num = line.strip().split(":")[1]
current_float_num = float(current_float_num)
# do work on float_num
# for example add it to the accumulative total_sum
total_sum += current_float_num
except (IndexError,ValueError):
continue

Convert string (letter) from file text to integer

I'm the new one to machine learning. I got some problem when trying to use int for letters. I use Python 3.5 on Mac OS. This is my code:
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines())
returnMat = zeros((numberOfLines, 3))
classLabelVector = []
fr = open(filename)
index=0
for line in fr.readlines():
line = line.strip()
listFromLine1 = line.split('\t')
listFromLine = zeros(3)
i = 0
for value in listFromLine1:
if value.isdigit():
valueAsInt = int(value)
listFromLine[i] = valueAsInt
i += 1
returnMat[index, :] = listFromLine[0:3]
classLabelVector.append(int(listFromLine1[-1]))
index += 1
return returnMat, classLabelVector
This is my txt file:
23 8 1 f
7 8 5 j
5 9 1 j
6 6 6 f
This is the error:
classLabelVector.append(int(listFromLine1[-1])) ValueError: invalid literal for int() with base 10: 'f'
Can anybody help me with these problems?
If I understand your desired outcome correctly, you want to return a list with n lists in it. Each list will be along the line of [23. 8. 1.]. Then you want a second list that takes the last number of each list like this: [1, 5, 1, 6].
Assuming this is all correct, the reason you are getting classLabelVector.append(int(listFromLine1[-1])) ValueError: invalid literal for int() with base 10: 'f' is because you are not returning any numbers, but a string instead. I found 3 issues that should fix the error.
First, I found no '\t' in your text document. I instead used listFromLine1 = line.split(' ') and it split based on spaces. This could simply be from the way it copied over when you posted, though.
Second, when you assign a value for each position in listFromLine you then ignore it and append from listFromLine1 which you did nothing to, so it remains a string.
Third, try using if value.isnumeric(): instead of if value.isdigit():.
Fixing these few problems should get the program working. Also, you open the file and run fr.readlines() twice and never tell it to close. Your making the program work twice for the same information. You should try to rewrite it to only open once and use with open() as fr: because it will close when done.
EDIT: if you want the second list to be the letters instead [f, j, j, f] then keep it as listFromLine1 and use str() instead of int(): classLabelVector.append(str(listFromLine1[-1]))

Formatting output of CSV data?

I'm fairly new to python and made something that had this output:
(The text is in a csv file so so:
1,A
2,B
3,C etc)
Number Letter
1 A
2 B
3 C
26 Z
Unfortunately, I spent a good amount of time making it using a complicated method in which I manually made spaces like this:
Updated Code rn
fx = int(input('Number?\n'))
f=open('nums.txt','r')
lines=f.readlines()
line = lines[fx - 1]
with open('nums.txt','r') as f:
for i, line in enumerate(f):
if i >= 5:
break
NUM, LTR, SMB = line.rsplit(',', 1)
print(NUM.ljust(13) + LTR.ljust(13) + SMB)
How do I get it to make 3 columns? Right now it comes up with a
ValueError: not enough values to unpack (expected 3, got 2)
So is there a simpler method of achieving this that doesn't move the strings around like this:
Number Letter
1 A
2 B
3 C
26 Z #< string moves with spaces.
For simple alignment, you can use ljust or rjust. There is also no need to read the entire file for each line you want to process:
with open('numberletter','r') as f:
for i, line in enumerate(f):
if i >= 5:
break
number, letter = line.rsplit(',', 1)
print(number.ljust(13) + letter)
For more complex output formatting, look at str.format() and the formatting syntax
You can use sys module for that.
import sys
a=[1,"A"]
sys.stdout.write("%-6s %-50s " % (a[0],a[1]))

How do I print a range of lines after a specific pattern into separate files when this pattern appears several times in an input file

Sorry for my previous post, I had no idea what I was doing. I am trying to cut out certain ranges of lines in a given input file and print that range to a separate file. This input file looks like:
18
generated by VMD
C 1.514895 -3.887949 2.104134
C 2.371076 -2.780954 1.718424
C 3.561071 -3.004933 1.087316
C 4.080424 -4.331872 1.114878
C 3.289761 -5.434047 1.607808
C 2.018473 -5.142150 2.078551
C 3.997237 -6.725186 1.709355
C 5.235126 -6.905640 1.295296
C 5.923666 -5.844841 0.553037
O 6.955216 -5.826197 -0.042920
O 5.269004 -4.590026 0.590033
H 4.054002 -2.184680 0.654838
H 1.389704 -5.910354 2.488783
H 5.814723 -7.796634 1.451618
O 1.825325 -1.537706 1.986256
H 2.319215 -0.796042 1.550394
H 3.390707 -7.564847 2.136680
H 0.535358 -3.663175 2.483943
18
generated by VMD
C 1.519866 -3.892621 2.109595
I would like to print every 100th frame starting from the first frame into its own file named "snapshot0.xyz" (The first frame is frame 0).
For example, the above input shows two snapshots. I would like to print out lines 1:20 into its own file named snapshot0.xyz and then skip 100 (2000 lines) snapshots and print out snapshot1.xyz (with the 100th snapshot). My attempt was in python, but you can choose either grep, awk, sed, or Python.
My input file: frames.dat
1 #!/usr/bin/Python
2
3
4
5 mest = open('frames.dat', 'r')
6 test = mest.read().strip().split('\n')
7
8 for i in range(len(test)):
9 if test[i] == '18':
10 f = open("out"+`i`+".dat", "w")
11 for j in range(19):
12 print >> f, test[j]
13 f.close()
I suggest using the csv module for this input.
import csv
def strip_empty_columns(line):
return filter(lambda s: s.strip() != "", line)
def is_count(line):
return len(line) == 1 and line[0].strip().isdigit()
def is_float(s):
try:
float(s.strip())
return True
except ValueError:
return False
def is_data_line(line):
return len(line) == 4 and is_float(line[1]) and is_float(line[2]) and is_float(line[3])
with open('frames.dat', 'r') as mest:
r = csv.reader(mest, delimiter=' ')
current_count = 0
frame_nr = 0
outfile = None
for line in r:
line = strip_empty_columns(line)
if is_count(line):
if frame_nr % 100 == 0:
outfile = open("snapshot%d.xyz" % frame_nr, "w+")
elif outfile:
outfile.close()
outfile = None
frame_nr += 1 # increment the frame counter every time you see this header line like '18'
elif is_data_line(line):
if outfile:
outfile.write(" ".join(line) + "\n")
The opening post mentions to write every 100th frame to an output file named snapshot0.xyz. I assume the 0 should be a counter, ot you would continously overwrite the file. I updated the code with a frame_nr counter and a few lines which open/close an output file depending on the frame_nr and write data if an output file is open.
This might work for you (GNU sed and csplit):
sed -rn '/^18/{x;/x{100}/z;s/^/x/;x};G;/\nx$/P' file | csplit -f snapshot -b '%d.xyz' -z - '/^18/' '{*}'
Filter every 100th frame using sed and pass that file to csplit to create the individual files.

matching and dispalying specific lines through python

I have 15 lines in a log file and i want to read the 4th and 10 th line for example through python and display them on output saying this string is found :
abc
def
aaa
aaa
aasd
dsfsfs
dssfsd
sdfsds
sfdsf
ssddfs
sdsf
f
dsf
s
d
please suggest through code how to achieve this in python .
just to elaborate more on this example the first (string or line is unique) and can be found easily in logfile the next String B comes within 40 lines of the first one but this one occurs at lots of places in the log file so i need to read this string withing the first 40 lines after reading string A and print the same that these strings were found.
Also I cant use with command of python as this gives me errors like 'with' will become a reserved keyword in Python 2.6. I am using Python 2.5
You can use this:
fp = open("file")
for i, line in enumerate(fp):
if i == 3:
print line
elif i == 9:
print line
break
fp.close()
def bar(start,end,search_term):
with open("foo.txt") as fil:
if search_term in fil.readlines()[start,end]:
print search_term + " has found"
>>>bar(4, 10, "dsfsfs")
"dsfsfs has found"
#list of random characters
from random import randint
a = list(chr(randint(0,100)) for x in xrange(100))
#look for this
lookfor = 'b'
for element in xrange(100):
if lookfor==a[element]:
print a[element],'on',element
#b on 33
#b on 34
is one easy to read and simple way to do it. Can you give part of your log file as an example? There are other ways that may work better :).
after edits by author:
The easiest thing you can do then is:
looking_for = 'findthis' i = 1 for line in open('filename.txt','r'):
if looking_for == line:
print i, line
i+=1
it's efficient and easy :)

Categories

Resources