Sum of strings extracted from text file using regex - python

I am just learning python and need some help for my class assignment.
I have a file with text and numbers in it. Some lines have from one to three numbers and others have no numbers at all.
I need to:
Extract numbers only from the file using regex
Find the sum of all the numbers
I used regex to extract out all the numbers. I am trying to get the total sum of all the numbers but I am just getting the sum of each line that had numbers. I have been battling with different ways to do this assignment and this is the closest I have gotten to getting it right.
I know I am missing some key parts but I am not sure what I am doing wrong.
Here is my code:
import re
text = open('text_numbers.txt')
for line in text:
line = line.strip()
y = re.findall('([0-9]+)',line)
if len(y) > 0:
print sum(map(int, y))
The result I get is something like this
(each is a sum of a line):
14151
8107
16997
18305
3866
And it needs to be one sum like this (sum of all numbers):
134058

import re
import np
text = open('text_numbers.txt')
final = []
for line in text:
line = line.strip()
y = re.findall('([0-9]+)',line)
if len(y) > 0:
lineVal = sum(map(int, y))
final.append(lineVal)
print "line sum = {0}".format(lineVal)
print "Final sum = {0}".format(np.sum(final))
Is that what you're looking for?

I dont know much python but I can give a simple solution.
Try this
import re
hand = open('text_numbers.txt')
x=list()
for line in hand:
y=re.findall('[0-9]+',line)
x=x+y
sum=0
for i in x:
sum=sum + int(i)
print sum

My first attempt to answer with the use of regular expressions, I find it a great skill to practise, that reading other's code.
import re # import regular expressions
chuck_text = open("regex_sum_286723.txt")
numbers = []
Total = 0
for line in chuck_text:
nmbrs = re.findall('[0-9]+', line)
numbers = numbers + nmbrs
for n in numbers:
Total = Total + float(n)
print "Total = ", Total
and thanx to Beer for the 'comprehension list' one liner, though his 'r' seems not needed, not sure what it does. But it reads beautifully, I get more confused reading two lots of loops like my answer
import re
print sum([int(i) for i in re.findall('[0-9]+',open("regex_sum_286723.txt").read())])

import re
text = open('text_numbers.txt')
data=text.read()
print sum(map(int,re.findall(r"\b\d+\b",data)))
Use .read to get content in string format

import re
sample = open ('text_numbers.txt')
total =0
dignum = 0
for line in sample:
line = line.rstrip()
dig= re.findall('[0-9]+', line)
if len(dig) >0:
dignum += len(dig)
linetotal= sum(map(int, dig))
total += linetotal
print 'The number of digits are: '
print dignum
print 'The sum is: '
print total
print 'The sum ends with: '
print total % 1000

import re
print sum([int(i) for i in re.findall('[0-9]+',open(raw_input('What is the file you want to analyze?\n'),'r').read())])
You can compact it into one line, but this is only for fun!

Here is my solution to this problem.
import re
file = open('text_numbers.txt')
sum = 0
for line in file:
line = line.rstrip()
line = re.findall('([0-9]+)', line)
for i in line:
i = int(i)
sum += i
print(sum)
The line elements in first for loop are the lists also and I used second for loop to convert its elements to integer from string so I can sum them.

import re
fl=open('regex_sum_7469.txt')
ls=[]
for x in fl: #create a list in the list
x=x.rstrip()
print x
t= re.findall('[0-9]+',x) #all numbers
for d in t: #for loop as there a empthy values in the list a
ls.append(int(d))
print (sum(ls))

Here is my code:
f = open('regex_sum_text.txt', 'r').read().strip()
y = re.findall('[0-9]+', f)
l = [int(s) for s in y]
s = sum(l)
print(s)
another shorter way is:
with open('regex_sum_text.txt', 'r') as f:
total = sum(map(int, re.findall(r'[0-9]+', f.read())))
print(total)

import re
print(sum(int(value) for value in re.findall('[0-9]+', open('regex_sum_1128122.txt').read())))

Related

Unable to extract numbers and sum them using regex and re.findall()

I try to extract numbers from a text file with regex. Afterward, I create the sum.
Here is the code:
import re
def main():
sum = 0
numbers = []
name = input("Enter file:")
if len(name) < 1 : name = "sample.txt"
handle = open(name)
for line in handle:
storage = line.split(" ")
for number in storage:
check = re.findall('([0-9]+)',number)
if check:
numbers.append(check)
print(numbers)
print(len(numbers))
for number in numbers:
x = ''.join(number)
num = int(x)
sum = sum + num
print(sum)
if __name__ == "__main__":
main()
The problem is, if this string "http://www.py4e.com/code3/"
I gets add as [4,3] into the list and later summed up as 43.
Any idea how I can fix that?
I think you just change numbers.append(check) into numbers.extend(check)because you want to add elements to an array. You have to use extend() function.
More, you do not need to use ( ) in your regex.
I also tried to check code on python.
import re
sum = 0;
strings = [
'http://www.py4e.com/code3/',
'http://www.py1e.com/code2/'
];
numbers = [];
for string in strings:
check = re.findall('[0-9]+', string);
if check:
numbers.extend(check)
for number in numbers:
x = ''.join(number)
num = int(x)
sum = sum + num
print(sum)
I am assuming instead of 43 you want to get 7
The number variable is an array of characters. So when you use join it becomes a string.
So instead of doing this you can either use a loop in to iterate through this array and covert elements of this array into int and then add to the sum.
Or
you can do this
import np
number np.array(number).astype('int').tolist()
This makes array of character into array on integers if conversion if possible for all the elements is possible.
When I add the string http://www.py4e.com/code3/" instead of calling a file which is not handled correctly in your code above fyi. The logic regex is running through two FOR loops and placing each value and it's own list[[4],[3]]. The output works when it is stepped through I think you issue is with methods of importing a file in the first statement. I replaced the file with the a string you asked about"http://www.py4e.com/code3/" you can find a running code here.
pyregx linkhttps://repl.it/join/cxercdju-shaunpritchard
I ran this method below calling a string with the number list and it worked fine?
#### Final conditional loop
``` for number in numbers:
x = ''.join(number)
num = int(x)
sum = sum + num
print(str(sum)) ```
You could also try using range or map:
for i in range(0, len(numbers)):
sum = sum + numbers
print(str(sum))

How do I read certain 4 digits in a file and move to another line and do the same and so on

So I have this problem I need to solve. I have a file that goes something like:
11-08-2012;1485;10184;7,53;31;706;227;29;6;1102
12-08-2012;2531;10272;7,59;25;695;222;26;22;1234
13-08-2012;1800;13418;8,66;46;714;203;50;6;0757
14-08-2012;2009;11237;9,43;81;655;246;49;7;1783
And I should be able to read the "1485" and then the "2531" part and then the "1800" part and go all the way to the end of the file and finally sum them up. How do I do that? I wrote under this text how I tried to approach this problem with while. But I seem to be lost with this one. Anyone can help?
while True:
f.seek(12)
text=f.read(4)
text=f.readline()
if(text==""):
break
return text
There are number of ways to do this, with numpy, pandas, simple coroutines and so on. I am adding the one closest to your approach.
total = 0
with open('exmplefile.txt','r') as f:
for line in f:
elements = line.split(';')
num_of_interest = int(elements[1])
# you can add a print if you want
total += num_of_interest
print(total)
This solution is by getting the first and second index of a common term, in this case ;.
with open(filename,'r') as file:
file_list = file.readlines()
sum = 0
for line in file_list:
loc = line.find(';')
first_loc = loc + 1
last_loc = loc +line[loc+1:].find(';')+1
sum = sum + int(line[first_loc:last_loc])
print(sum)
Try this
mylist = []
for string in file:
mynum = string.split(';')[1]
mylist.append(mynum)
sum([int(i) for i in mylist])
This solution caters for when the 4 digit is not the second item in the array
with open("path/to/file") as f:
f1 = f.readlines()
sum = 0
for line in f1:
lineInArray = line.split(';')
for digit in lineInArray:
if len(digit.strip()) == 4 and digit.strip().isnumeric():
sum += int(digit)

Increment Number In a String + 1

I put together python script which will read the string "BatchSequence="NUMBER INCREMENT HERE" and just return the integers. How can i find a certain integer and increment the rest by one but leaving the integers before the same? It skips 3 and goes to 5. I want it to go 3,4,5.
Also,
Once i have figured this script out. How can i replace the numbers of the original text file with the new script numbers? Would i have to write into a new file?
I have tried increment the numbers by one but it starts from the beginning.
code that i tried:
import re
file = '\\\MyDataNEE\\user$\\bxt058y\\Desktop\\75736.oxi.error'
counter = 0
for line in open(file):
match = re.search('BatchSequence="(\d+)"', line)
if match:
print(int(match.group(1)) + 1)
Original Code:
import re
file = 'FILENAME HERE'
counter = 0
for line in open(file):
match = re.search('BatchSequence="(\d+)"', line)
if match:
print(match.group(1))
Currently:
BatchSequence="1"
BatchSequence="2"
BatchSequence="3"
BatchSequence="5"
BatchSequence="6"
BatchSequence="7"
BatchSequence="8"
New output should be:
BatchSequence="1"
BatchSequence="2"
BatchSequence="3"
BatchSequence="4"
BatchSequence="5"
BatchSequence="6"
BatchSequence="7"
My take on the problem:
txt = '''BatchSequence="1"
BatchSequence="2"
BatchSequence="3"
BatchSequence="5"
BatchSequence="6"
BatchSequence="7"
BatchSequence="8"'''
import re
def fn(my_number):
val = yield
while True:
val = yield str(val) if val < my_number else str(val-1)
f = fn(4)
next(f)
s = re.sub(r'BatchSequence="(\d+)"', lambda g: 'BatchSequence="' + f.send(int(g.group(1))) + '"', txt)
print(s)
Prints:
BatchSequence="1"
BatchSequence="2"
BatchSequence="3"
BatchSequence="4"
BatchSequence="5"
BatchSequence="6"
BatchSequence="7"
The function fn(my_number) will return same values until it reaches my_number, then the values are decremented by one.

I want to write a function which prints a sum

I just started learning Python a few weeks ago and I want to write a function which opens a file, counts and adds up the characters in each line and prints that those equal the total number of characters in the file.
For example, given a file test1.txt:
lineLengths('test1.txt')
The output should be:
15+20+23+24+0=82 (+0 optional)
This is what I have so far:
def lineLengths(filename):
f=open(filename)
lines=f.readlines()
f.close()
answer=[]
for aline in lines:
count=len(aline)
It does what I want it to do, but I don't know how to include all the of numbers added together when I have the function print.
If you only want to print the sum of the length of each line, you can do it like so:
def lineLengths(filename):
with open(filename) as f:
answer = []
for aline in f:
answer.append(len(aline))
print("%s = %s" %("+".join(str(c) for c in answer), sum(answer))
If you however also need to track lengths of all the individual lines, you can append the length for each line in your answer list by using the append method and then print the sum by using sum(answer)
Try this :
f=open(filename)
mylist = f.read().splitlines()
sum([len(i) for i in mylist])
Simple as this:
sum(map(len, open(filename)))
open(filename) returns an iterator that passes through each line, each of which is run through the len function, and the results are summed.
Once you read lines from file you can count sum using:
sum([len(aline) for aline in lines])
Separate you problem in function : a responsible by return total sum of lines and other to format sum of each line.
def read_file(file):
with open(file) as file:
lines = file.readlines()
return lines
def format_line_sum(lines):
lines_in_str = []
for line in lines:
lines_in_str.append(str(line)
return "+".join(str_lines))
def lines_length(file):
lines = read_file(file)
total_sum = 0
for line in lines:
total_sum += len(line)
return format_lines_sum(lines) + "=" + total_sum
And to use:
print(lines_length('file1.txt'))
Assuming your output is literal, something like this should work.
You can use python sum() function when you figure out how to add numbers to the list
def lineLengths(filename):
with open(filename) as f:
line_lengths = [len(l.rstrip()) for l in f]
summ = '+'.join(map(str, line_lengths)) # can only join strings
return sum(line_lengths), summ
total_chars, summ = lineLengths(filename)
print("{} = {}".format(summ, total_chars))
This should have the output you want : x+y+z=a
def lineLengths(filename):
count=[]
with open(filename) as f: #this is an easier way to open/close a file
for line in f:
count.append(len(line))
print('+'.join(str(x) for x in count) + "=" + str(sum(count))

Fastest way to print this in python?

Suppose I have a text file containing this, where the number on the left says how many of the characters of the right should be there:
2 a
1 *
3 $
How would I get this output in the fastest time?
aa*$$$
This is my code, but has N^2 complexity:
f = open('a.txt')
for item in f:
item2=item.split()
num = int(item2[0])
for i in range(num):
line+=item2[1]
print(line)
f.close()
KISS
with open('file.txt') as f:
for line in f:
count, char = line.strip().split(' ')
print char * int(count),
Just print immediately:
for item in open('a.txt'):
num, char = item.strip().split()
print(int(num) * char, end='')
print() # Newline
You can multiply strings to repeat them in Python:
"foo" * 3 gives you foofoofoo.
line = []
with open("a.txt") as f:
for line in f:
n, c = line.rstrip().split(" ")
line.append(c * int(n))
print("".join(line))
You can print directly but the code above lets you get the output you want in a string if you care about that.
Using a list then joining is more efficient than using += on a string because strings are immutable in Python. This means that a new string must be created for each +=. Of course printing immediately avoids this issue.
You can try like this,
f = open('a.txt')
print ''.join(int(item.split()[0]) * item.split()[1] for item in f.readlines())
Your code is actually O(sum(n_i)) where n_i is the number in the row i. You can't do any better and none of the solutions in the other answers do, even if they might be faster than yours.

Categories

Resources