I am working with file handling exercise.
So my txt file have this content:
List of Sales
Day 1 : 1250.25
Day 2 : 2560.25
Day 3 : 3241.10
Day 4 : 1530.20
Day 5 : 1247.27
Day 6 : 1646.22
Day 7 : 850.25
I want to only get the amount per day and sum it.
OFile = open('sales.txt','r')
file_content = OFile.read()
print(file_content)
import re
get = re.findall(r'[.]', file_content)
amount = []
for n in range(7):
amount.append(get)
total = sum(amount)
print("Total sales Amount: ", "Php", total)
I keep getting Total sales Amount 0
keep it simple and use str.split and str.strip instead of using regex!
In your case (with the input file you have attached)
Exception may raised from the conversion to float (if you have
invalid line or some string that can not be converted to float!
Or line that have no ":" (e.g. the first line in the file) which causes
the split() call to return the same input string as a list of one string (the line)
without spaces.In both cases you want to
skip and continue to next line!
total_sum = 0
with open('sales.txt','r') as fp:
for line in fp:
try:
current_float_num = line.strip().split(":")[1]
current_float_num = float(current_float_num)
# do work on float_num
# for example add it to the accumulative total_sum
total_sum += current_float_num
except (IndexError,ValueError):
continue
Related
I'm new to Python & here is my question
Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ' line by finding the time and then splitting the string a second time using a colon.
From stephen.marquard#uct.ac.za Sat Jan 5 09:14:16 2008
Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.
Link of the file:
http://www.pythonlearn.com/code/mbox-short.txt
This is my code:
name = raw_input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
counts = dict()
for line in handle:
if not line.startswith ("From "):continue
#words = line.split()
col = line.find(':')
coll = col - 2
print coll
#zero = line.find('0')
#one = line.find('1')
#b = line[ zero or one : col ]
#print b
#hour = words[5:6]
#print hour
#for line in hour:
# hr = line.split(':')
# x = hr[1]
for x in coll:
counts[x] = counts.get(x,0) + 1
for key, value in sorted(counts.items()):
print key, value
My first try was with list splitting(Comments) and it didn't work as it considered the 0 & the 1 as the first & the second letter not the numbers
second one was with line find (:) which is partially worked with minutes not with hours as required!!
First question
Why when I write line.find(:), it takes automatically the 2 numbers after?
Second question
Why when I run the program now, it gives an error
TypeError: 'int' object is not iterable on line 26 ??
Third question
Why it considered 0 & 1 as first & second letters of the line not 0 & 1 numbers
Finally
If possible please solve me this problem with a little of explanation please (with the same codes to keep my learning sequence)
Thank you...
First question
Why when I write line.find(:), it takes automatically the 2 numbers
after?
str.find() return the first index of the character that you want to find. If your string is "From 00:00:00", it returns 7 as the first ':' is at index 7.
Second question
Why when I run the program now, it gives an error TypeError: 'int'
object is not iterable on line 26 ??
As have said above, it returns an int, which you cannot iterate
Third question
Why it considered 0 & 1 as first & second letters of the line not 0 &
1 numbers
I don't really understand what do you mean here. Anyway, as I understand, you try to find the first index which '0' or '1' occurs and assume that the first letter of hour? What about 8-11pm(start with 2)?
Finally If possible please solve me this problem with a little of
explanation please (with the same codes to keep my learning sequence)
Sure, it will be like this:
for line in f:
if not line.startswith("From "): continue
first_colon_index = line.find(":")
if first_colon_index == -1: # there is no ':'
continue
first_char_hour_index = first_colon_index - 2
# string slicing
# [a:b] get string from index a to b
hour = line[first_char_hour_index:first_char_hour_index+2]
hour_int = int(hour)
# if key exist, increase by 1. If not, set to 1
if hour_int in count:
count[hour_int] += 1
else:
count[hour_int] = 1
# print hour & count, in sorting order
for hour in sorted(count):
print hour, count[hour]
The part about string slicing can be confusing, you can read more about it at Python docs.
And you have to sure that: in the line, there is no other ":" or this method will fail as the first ":" will not be the one between hour and minute.
To make sure it works, it's better to use Regex. Something like:
for line in f:
if not line.startswith("From"): continue
match = re.search(r'^From.*?([0-9]{2,2}:[0-9]{2,2}:[0-9]{2,2})', line)
if match:
time = match.group(1) # hh:mm:ss
hh = int(time.split(":")[0])
# if key exist, increase by 1. If not, set to 1
if hh in count:
count[hh] += 1
else:
count[hh] = 1
# print hour & count, in sorting order
for hour in sorted(count):
print hour, count[hour]
That's because str.find() returns an index of the found substring, not the string itself. Consequently, when you subtract 2 from it and then try to loop through it it will complain that you're trying to loop through an integer and raise a TypeError.
You can grab the whole time string as:
time_start = line.find(":")
if time_start == -1: # not found
continue
time_string = line[time_start-2:time_start+6] # slice out the whole time string
You can then further split the time_string by : to get hours, minutes and seconds (e.g. hours, minutes, seconds = time_string.split(":", 2) just keep in mind that those will be strings, not integers), or if you just want the hour:
hour = int(line[time_start-2:time_start])
You can take it from there - just increase your dict value and when you're done with parsing the file sort everything out.
I am writing a code where I want to search term "X-DSPAM-Confidence: 0.8475" from mbox.text file. well so far, I can search the string and count the number of times it appears in the file. Now the problem is, I have to add the end digits of that string ( here- 0.8475 ) every time it appears in the text file. I need help because I stuck there and couldn't count the total of the float number appears at the end of that string.
The content of my file looks like this:
X-Content-Type-Message-Body: text/plain; charset=UTF-8
Content-Type: text/plain; charset=UTF-8
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sat Jan 5 09:14:16 2008
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000
My code:
text_file = raw_input ("please enter the path of the file that you want to open:")
open_file = open ( text_file )
print "Text file has been open "
count = 0
total = 0.00000
for line in open_file:
if 'X-DSPAM-Confidence:' in line:
total =+ float(line[20:])
count = count + 1
print total/count
print "The number of line with X-DSPAM-Confidence: is:", count
How can I do that?
slicing returns a list not a value and the in-place operator for addition is += not =+. That being said you should use split.
total = 0.00000
for line in open_file:
if 'X-DSPAM-Confidence:' in line:
total += float(line.split()[-1]) # change here.
count = count + 1
print total/count
Or even better using sum and len.
with open('test.txt') as f:
data = [float(line.split()[-1]) for line in f if line.strip().startswith('X-DSPAM-Confidence:')]
print(sum(data)/len(data))
Python 3.4 or newer solution using mean from the statistics module.
from statistics import mean
with open('test.txt') as f:
data = [float(line.split()[-1]) for line in f if line.strip().startswith('X-DSPAM-Confidence:')]
print(mean(data))
The print statement, much like a magic 8-Ball, tells all
>>> print repr(line[20:])
' 0.0000\n'
You simply bit off more than float could choose. Narrow it down a bit
total += float(line[21:-1])
For my code, i need to input a text file where each line will read a Last Name, Wage, and Hours worked. Each separated by a single whitespace character.
Example:
johnson 14 52
doe 12.12 35.5
smith 14.56 42
Further i have to print a report using these values until there are no more lines to take data from.
I want to be able to assign the name to a variable(last), the wage to a variable(rate), and the hours worked to a variable(totalHours) and then use these variables to compute other things.
But i cant figure out how to target each specific name, rate, and hours worked, for each line.
Here's the code i have so far.
f = open('test.txt', 'r')
for line in f:
data = line.split()
for word in data:
last =
rate =
totalHours =
#these are my computations
otHours = 0
if totalHours > 40:
otHours = totalHours - 40
regPay = (totalHours - otHours) * rate
otPay = 1.5 * rate * otHours
gross = regPay + otPay
print("%-21s%-15.2f%-17.1f%-15.1f%-15.2f%-16.2f%3.2f" % \
(last, rate, totalHours, otHours, regPay, otPay, gross))
f.close
Any help is appreciated.
You can unpack the data list, like below:
last,rate,totalHours = data
Or,
last = data[0]
rate=data[1]
totalHours=data[2]
You can simply use
for line in f:
last, rate, totalHours = line.split()
But there's another problem with your code. All these items will be read as strings. You'll have to convert rate and totalHours to floats before doing any calculations with them.
I am trying to write a Python program that reads each line from an infile. This infile is a list of dates. I want to test each line with a function isValid(), which returns true if the date is valid, and false if it is not. If the date is valid, it is written into an output file. If it is not, invalid is written into the output file. I have the function, and all I want to know is the best way to test each line with the function. I know this should be done with a loop, I'm just uncertain how to set up the loop to test each line in the file one-by-one.
Edit: I now have a program that basically works. However, I am getting incorrect output to the output file. Perhaps someone will be able to explain why.
Ok, I now have a program that basically works, but I'm getting strange results in the output file. Hopefully those with Python 3 experience can help.
def main():
datefile = input("Enter filename: ")
t = open(datefile, "r")
c = t.readlines()
ofile = input("Enter filename: ")
o = open(ofile, "w")
for line in c:
b = line.split("/")
e = b[0]
f = b[1]
g = b[2]
text = str(e) + " " + str(f) + ", " + str(g)
text2 = "The date " + text + " is invalid"
if isValid(e,f,g) == True:
o.write(text)
else:
o.write(text2)
def isValid(m, d, y):
if m == 1 or m == 3 or m == 5 or m == 7 or m == 8 or m == 10 or m == 12:
if d is range(1, 31):
return True
elif m == 2:
if d is range(1,28):
return True
elif m == 4 or m == 6 or m == 9 or m == 11:
if d is range(1,30):
return True
else:
return False
This is the output I'm getting.
The date 5 19, 1998
is invalidThe date 7 21, 1984
is invalidThe date 12 7, 1862
is invalidThe date 13 4, 2000
is invalidThe date 11 40, 1460
is invalidThe date 5 7, 1970
is invalidThe date 8 31, 2001
is invalidThe date 6 26, 1800
is invalidThe date 3 32, 400
is invalidThe date 1 1, 1111
is invalid
In the most recent versions of Python you can use the context management features that are implicit for files:
results = list()
with open(some_file) as f:
for line in f:
if isValid(line, date):
results.append(line)
... or even more tersely with a list comprehension:
with open(some_file) as f:
results = [line for line in f if isValid(line, date)]
For progressively older versions of Python you might need to explicitly open and close the file (with simple implicit iteration over the file for line in file:) or add more explicit iteration over the file (f.readline() or f.readlines() (plural) depending on whether you want to "slurp" in the entire file (with the memory overhead implications of that) or iterate line-by-line).
Also note that you may wish to strip the trailing newlines off these file contents (perhaps by calling line.rstrip('\n') --- or possibly just line.strip() if you want to eliminate all leading and trailing whitespace from each line).
(Edit based on additional comment to previous answer):
The function signature isValid(m,d,y) suggests that you're passing a data to this function (month, day, year) but that doesn't make sense given that you must also, somehow, pass in the data to be validated (a line of text, a string, etc).
To help you further you'll have to provide more information (preferable the source or a relevant portion of the source to this "isValid()" function.
In my initial answer I was assuming that your "isValid()" function was merely scanning for any valid date in its single argument. I've modified my code examples to show how one might pass a specific date, as a single argument, to a function which used this calling signature: "isValid(somedata, some_date)."
with open(fname) as f:
for line in f.readlines():
test(line)
I'm trying to read the last line from a text file. Each line starts with a number, so the next time something is inserted, the new number will be incremented by 1.
For example, this would be a typical file
1. Something here date
2. Something else here date
#next entry would be "3. something date"
If the file is blank I can enter an entry with no problem. However, when there are already entries I get the following error
LastItemNum = lineList[-1][0:1] +1 #finds the last item's number
TypeError: cannon concatenate 'str' and 'int objects
Here's my code for the function
def AddToDo(self):
FILE = open(ToDo.filename,"a+") #open file for appending and reading
FileLines = FILE.readlines() #read the lines in the file
if os.path.getsize("EnteredInfo.dat") == 0: #if there is nothing, set the number to 1
LastItemNum = "1"
else:
LastItemNum = FileLines[-1][0:1] + 1 #finds the last items number
FILE.writelines(LastItemNum + ". " + self.Info + " " + str(datetime.datetime.now()) + '\n')
FILE.close()
I tried to convert LastItemNum to a string but I get the same "cannot concatenate" error.
LastItemNum = int(lineList[-1][0:1]) +1
then you've to convert LastItemNum back to string before writing to file, using :
LastItemNum=str(LastItemNum) or instead of this you can use string formatting.