Sum of all numbers in a file - python

I have been fiddling round with this code for ages and cannot figure out how to make it pass the doctests. the output is always 1000 less than the corrects answer. is there a simple way to change this code so that it gives the desired output ??
my code is:
def sum_numbers_in_file(filename):
"""
Return the sum of the numbers in the given file (which only contains
integers separated by whitespace).
>>> sum_numbers_in_file("numb.txt")
19138
"""
f = open(filename)
m = f.readline()
n = sum([sum([int(x) for x in line.split()]) for line in f])
f.close()
return n
the values in the file are:
1000
15000
2000
1138

The culprit is:
m = f.readline()
when you are doing f.readline(), it is losing the 1000, which is not being considered in the list comprehension. Hence the error.
This should work:
def sum_numbers_in_file(filename):
"""
Return the sum of the numbers in the given file (which only contains
integers separated by whitespace).
>>> sum_numbers_in_file("numb.txt")
19138
"""
f = open(filename, 'r+')
m = f.readlines()
n = sum([sum([int(x) for x in line.split()]) for line in m])
f.close()
return n

You pull out the first line and store it in m. Then never use it.

You could use two for-loops in one generator expression:
def sum_numbers_in_file(filename):
"""
Return the sum of the numbers in the given file (which only contains
integers separated by whitespace).
>>> sum_numbers_in_file("numb.txt")
19138
"""
with open(filename) as f:
return sum(int(x)
for line in f
for x in line.split())
The generator expression above is equivalent to
result = []
for line in f:
for x in line.split():
result.append(int(x))
return sum(result)

Related

How to return a tuple containing the smallest and largest numbers in the file

In this exercise, I need to write a function that take input a string representing a filename. The file
contains a list of integers, one integer per line. Function should return a tuple containing the smallest and largest numbers in the file.
My code attempt below did pass the auto-grader, but it is ugly. Would like to ask if there is a more efficient way of solving this.
def find_range(filename):
tu = ()
with open(filename, 'r') as file:
m = max(file.readlines(), key=lambda x: int(x))
with open(filename, 'r') as file:
s = min(file.readlines(), key=lambda y: int(y))
tu = int(s), int(m)
return tu
You open the file twice, and both times read the entire content of the file into a list with file.readlines before finding the min or max respectively. If you read the file all at once, you can just as well map the lines to int directly, collect those in a list, and bind that to a variable and use it for both min and max.
def find_range(filename):
with open(filename) as f:
nums = list(map(int, f))
return min(nums), max(nums)
If the file is so large that you can't load it all in memory at once, you might actually want to open and iterate it twice, e.g. with seek(0) or just another with:
def find_range(filename):
with open(filename) as f:
m = max(f, key=int)
with open(filename) as f:
s = min(f, key=int)
return (int(s), int(m))
Or open and iterate the file just once, and test each value independently for min/max qualities.
def find_range(filename):
with open(filename) as f:
s = m = int(next(f))
for x in map(int ,f):
if x < s: s = x
if x > m: m = x
return s, m
Each approach has its strong sides and weak sides, e.g. more memory consumption, more file system access, or more evaluation in Python (as opposed to using builtins that might be implemented more efficiently).
You can use itertools.tee to not open and iterate over the file two times:
from itertools import tee
def find_range(filename):
with open(filename, "r") as f_in:
i1, i2 = tee(map(int, f_in))
return min(i1), max(i2)
Solution that iterates only once over the file:
def find_range(filename):
mn, mx = float("inf"), float("-inf")
with open(filename, "r") as f_in:
for i in map(int, f_in):
if i < mn:
mn = i
if i > mx:
mx = i
return mn, mx
Eventually, min, max, also support a default value if the iterable is empty.
def find_range(filename):
with open(filename, 'r') as file:
data = file.read().split('\n')
return int(min(data, key=int)), int(max(data, key=int))
Your code can be improved by getting rid of the calls to readlines(), which store the whole text file as a list of its lines. (Note that you're actually calling this function twice, even though you should only need the file once!)
A cheaper approach is to read line by line and store min and max "on the fly":
def find_range(filename, mn=1, mx=1):
with open(filename, 'r') as f:
line = f.readline()
while line:
mn, mx = min(mn, int(line)), max(mx, int(line))
line = f.readline()
return mn, mx
readline() returns False when the end of the file has been found.
The function above also accepts mn and mx as optional parameters that allow you to set an initial guess for min and max values. You should tune these accordingly if you know that the file will always includes integers (positive and negative! you didn't specify that in your question!)
Finally, if the file is not expected to be too large, take a look at other answers.

How to separate different input formats from the same text file with Python

I'm new to programming and python and I'm looking for a way to distinguish between two input formats in the same input file text file. For example, let's say I have an input file like so where values are comma-separated:
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
Where the format is N followed by N lines of Data1, and M followed by M lines of Data2. I tried opening the file, reading it line by line and storing it into one single list, but I'm not sure how to go about to produce 2 lists for Data1 and Data2, such that I would get:
Data1 = ["Washington,A,10", "New York,B,20", "Seattle,C,30", "Boston,B,20", "Atlanta,D,50"]
Data2 = ["New York,5", "Boston,10"]
My initial idea was to iterate through the list until I found an integer i, remove the integer from the list and continue for the next i iterations all while storing the subsequent values in a separate list, until I found the next integer and then repeat. However, this would destroy my initial list. Is there a better way to separate the two data formats in different lists?
You could use itertools.islice and a list comprehension:
from itertools import islice
string = """
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
"""
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [string.split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
This yields
[['Washington,A,10', 'New York,B,20', 'Seattle,C,30', 'Boston,B,20', 'Atlanta,D,50'], ['New York,5', 'Boston,10']]
For a file, you need to change it to:
with open("testfile.txt", "r") as f:
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [f.read().split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
You're definitely on the right track.
If you want to preserve the original list here, you don't actually have to remove integer i; you can just go on to the next item.
Code:
originalData = []
formattedData = []
with open("data.txt", "r") as f :
f = list(f)
originalData = f
i = 0
while i < len(f): # Iterate through every line
try:
n = int(f[i]) # See if line can be cast to an integer
originalData[i] = n # Change string to int in original
formattedData.append([])
for j in range(n):
i += 1
item = f[i].replace('\n', '')
originalData[i] = item # Remove newline char in original
formattedData[-1].append(item)
except ValueError:
print("File has incorrect format")
i += 1
print(originalData)
print(formattedData)
The following code will produce a list results which is equal to [Data1, Data2].
The code assumes that the number of entries specified is exactly the amount that there is. That means that for a file like this, it will not work.
2
New York,5
Boston,10
Seattle,30
The code:
# get the data from the text file
with open('filename.txt', 'r') as file:
lines = file.read().splitlines()
results = []
index = 0
while index < len(lines):
# Find the start and end values.
start = index + 1
end = start + int(lines[index])
# Everything from the start up to and excluding the end index gets added
results.append(lines[start:end])
# Update the index
index = end

Python: filling a List of objects from a .txt file

For starters I've programmed in C++ for the past year and a half, and this is the first time I'm using Python.
The objects have two int attributes, say i_ and j_.
The text file is as follows:
1,0
2,0
3,1
4,0
...
What I want to do is have the list filled with objects with correct attributes. For example,
print(myList[2].i_, myList[2].j_, end = ' ')
would return
3 1
Here's my attempt after reading a little online.
class myClass:
def __init__(self, i, j):
self.i_ = i
self.j_ = j
with open("myFile.txt") as f:
myList = [list(map(int, line.strip().split(','))) for line in f]
for line in f:
i = 0
while (i < 28):
myList.append(myClass(line.split(","), line.split(",")))
i +=1
But it doesn't work obviously.
Thanks in advance!
Since you're working with a CSV file you might want to use the csv module. First you would pass the file object to the csv.reader function and it will return an iterable of rows from the file. From there you can cast it to a list and slice it to the 29 rows you are required to have. Finally, you can iterate over the rows (e.g. [1,0]) and simply unpack them in the class constructor.
class MyClass:
def __init__(self, i, j):
self.i = int(i)
self.j = int(j)
def __repr__(self):
return f"MyClass(i={self.i}, j={self.j})"
with open('test.txt') as f:
rows = [r.strip().split(',') for r in f.readlines()[:29]]
my_list = [MyClass(*row) for row in rows]
for obj in my_list:
print(obj.i, obj.j)
print(len(my_list))
I not sure you really what to stick with this format
print(myList[2].i_, myList[2].j_, end = ' ')
My solution is quite manual coded and i am using dictionary to store i and j
result = {'i':[],
'j':[]}
and below is my code
result = {'i':[],
'j':[]}
with open('a.txt', 'r') as myfile:
data=myfile.read().replace('\n', ',')
print(data)
a = data.split(",")
print (a)
b = [x for x in a if x]
print(b)
for i in range( 0, len(b)):
if i % 2 == 0:
result['i'].append(b[i])
else:
result['j'].append(b[i])
print(result['i'])
print(result['j'])
print(str(result['i'][2])+","+ str(result['j'][2]))
The result: 3,1
I'm not sure what you're trying to do with myList = [list(map(int, line.strip().split(','))) for line in f]. This will give you a list of lists with those pairs converted to ints. But you really want objects from those numbers. So let's do that directly as we iterate through the lines in the file and do away with the next while loop:
my_list = []
with open("myFile.txt") as f:
for line in f:
nums = [int(i) for i in line.strip().split(',') if i]
if len(nums) >= 2:
my_list.append(myClass(nums[0], nums[1]))

Iterate through lines changing only one character python

I have a file that looks like this
N1 1.023 2.11 3.789
Cl1 3.124 2.4534 1.678
Cl2 # # #
Cl3 # # #
Cl4
Cl5
N2
Cl6
Cl7
Cl8
Cl9
Cl10
N3
Cl11
Cl12
Cl13
Cl14
Cl15
The three numbers continue down throughout.
What I would like to do is pretty much a permutation. These are 3 data sets, set 1 is N1-Cl5, 2 is N2-Cl10, and set three is N3 - end.
I want every combination of N's and Cl's. For example the first output would be
Cl1
N1
Cl2
then everything else the same. the next set would be Cl1, Cl2, N1, Cl3...and so on.
I have some code but it won't do what I want, becuase it would know that there are three individual data sets. Should I have the three data sets in three different files and then combine, using a code like:
list1 = ['Cl1','Cl2','Cl3','Cl4', 'Cl5']
for line in file1:
line.replace('N1', list1(0))
list1.pop(0)
print >> file.txt, line,
or is there a better way?? Thanks in advance
This should do the trick:
from itertools import permutations
def print_permutations(in_file):
separators = ['N1', 'N2', 'N3']
cur_separator = None
related_elements = []
with open(in_file, 'rb') as f:
for line in f:
line = line.strip()
# Split Nx and CIx from numbers.
value = line.split()[0]
# Found new Nx. Print previous permutations.
if value in separators and related_elements:
for perm in permutations([cur_separator] + related_elements)
print perm
cur_separator = line
related_elements = []
else:
# Found new CIx. Append to the list.
related_elements.append(value)
You could use regex to find the line numbers of the "N" patterns and then slice the file using those line numbers:
import re
n_pat = re.compile(r'N\d')
N_matches = []
with open(sample, 'r') as f:
for num, line in enumerate(f):
if re.match(n_pat, line):
N_matches.append((num, re.match(n_pat, line).group()))
>>> N_matches
[(0, 'N1'), (12, 'N2'), (24, 'N3')]
After you figure out the line numbers where these patterns appear, you can then use itertools.islice to break the file up into a list of lists:
import itertools
first = N_matches[0][0]
final = N_matches[-1][0]
step = N_matches[1][0]
data_set = []
locallist = []
while first < final + step:
with open(file, 'r') as f:
for item in itertools.islice(f, first, first+step):
if item.strip():
locallist.append(item.strip())
dataset.append(locallist)
locallist = []
first += step
itertools.islice is a really nice way to take a slice of an iterable. Here's the result of the above on a sample:
>>> dataset
[['N1 1.023 2.11 3.789', 'Cl1 3.126 2.6534 1.878', 'Cl2 3.124 2.4534 1.678', 'Cl3 3.924 2.1134 1.1278', 'Cl4', 'Cl5'], ['N2', 'Cl6 3.126 2.6534 1.878', 'Cl7 3.124 2.4534 1.678', 'Cl8 3.924 2.1134 1.1278', 'Cl9', 'Cl10'], ['N3', 'Cl11', 'Cl12', 'Cl13', 'Cl14', 'Cl15']]
After that, I'm a bit hazy on what you're seeking to do, but I think you want permutations of each sublist of the dataset? If so, you can use itertools.permutations to find permutations on various sublists of dataset:
for item in itertools.permutations(dataset[0]):
print(item)
etc.
Final Note:
Assuming I understand correctly what you're doing, the number of permutations is going to be huge. You can calculate how many permutations there are them by taking the factorial of the number of items. Anything over 10 (10!) is going to produce over 3,000,000 million permutations.

How to delete line from the file in python

I have a file F, content huge numbers e.g F = [1,2,3,4,5,6,7,8,9,...]. So i want to loop over the file F and delete all lines contain any numbers in file say f = [1,2,4,7,...].
F = open(file)
f = [1,2,4,7,...]
for line in F:
if line.contains(any number in f):
delete the line in F
You can not immediately delete lines in a file, so have to create a new file where you write the remaining lines to. That is what chonws example does.
It's not clear to me what the form of the file you are trying to modify is. I'm going to assume it looks like this:
1,2,3
4,5,7,19
6,2,57
7,8,9
128
Something like this might work for you:
filter = set([2, 9])
lines = open("data.txt").readlines()
outlines = []
for line in lines:
line_numbers = set(int(num) for num in line.split(","))
if not filter & line_numbers:
outlines.append(line)
if len(outlines) < len(lines):
open("data.txt", "w").writelines(outlines)
I've never been clear on what the implications of doing an open() as a one-off are, but I use it pretty regularly, and it doesn't seem to cause any problems.
exclude = set((2, 4, 8)) # is faster to find items in a set
out = open('filtered.txt', 'w')
with open('numbers.txt') as i: # iterates over the lines of a file
for l in i:
if not any((int(x) in exclude for x in l.split(','))):
out.write(l)
out.close()
I'm assuming the file contains only integer numbers separated by ,
Something like this?:
nums = [1, 2]
f = open("file", "r")
source = f.read()
f.close()
out = open("file", "w")
for line in source.splitlines():
found = False
for n in nums:
if line.find(str(n)) > -1:
found = True
break
if found:
continue
out.write(line+"\n")
out.close()

Categories

Resources