Sum all numbers in a list of strings - python

sorry if this is very noob question, but I have tried to solve this on my own for some time, gave it a few searches (used the "map" function, etc.) and I did not find a solution to this. Maybe it's a small mistake somewhere, but I am new to python and seem to have some sort of tunnel vision.
I have some text (see sample) that has numbers inbetween. I want to extract all numbers with regular expressions into a list and then sum them. I seem to be able to do the extraction, but struggle to convert them to integers and then sum them.
import re
df = ["test 4497 test 6702 test 8454 test",
"7449 test"]
numlist = list()
for line in df:
line = line.rstrip()
numbers = re.findall("[0-9]+", line) # find numbers
if len(numbers) < 1: continue # ignore lines with no numbers, none in this sample
numlist.append(numbers) # create list of numbers
The sum(numlist) returns an error.

You don't need a regex for this. Split the strings in the list, and sum those that are numeric in a comprehension:
sum(sum(int(i) for i in s.split() if i.isnumeric()) for s in df)
# 27102
Or similarly, flatten the resulting lists, and sum once:
from itertools imprt chain
sum(chain.from_iterable((int(i) for i in s.split() if i.isnumeric()) for s in df))
# 27102

This is the source of your problem:
finadall returns a list which you are appending to numlist, a list. So you end up with a list of lists. You should instead do:
numlist.extend(numbers)
So that you end up with a single list of numbers (well, actually string representations of numbers). Then you can convert the strings to integers and sum:
the_sum = sum(int(n) for n in numlist)

Iterate twice over df and append each digit to numlist:
numlist = list()
for item in df:
for word in item.split():
if word.isnumeric():
numlist.append(int(word))
print(numlist)
print(sum(numlist))
Out:
[4497, 6702, 8454, 7449]
27102
You could make a one-liner using list comprehension:
print(sum([int(word) for item in df for word in item.split() if word.isnumeric()]))
>>> 27102

It's as easy as
my_sum = sum(map(int, numbers_list))

Here is an option using map, filter and sum:
First splits the strings at the spaces, filters out the non-numbers, casts the number-strings to int and finally sums them.
# if you want the sum per string in the list
sums = [sum(map(int, filter(str.isnumeric, s.split()))) for s in df]
# [19653, 7449]
# if you simply want the sum of all numbers of all strings
sum(sum(map(int, filter(str.isnumeric, s.split()))) for s in df)
# 27102

Related

Incorrect number of values in output from zip?

I've been working on a problem that involves taking multiple number pairs, and creating some form of sum loop that adds each pair together.
I am not getting the correct number of outputs, e.g. 15 pairs of numbers are inputted and only 8 are coming out.
Here's my code so far...
data = "917128 607663\
907859 281478\
880236 180499\
138147 764933\
120281 410091\
27737 932325\
540724 934920\
428397 637913\
879249 469640\
104749 325216\
113555 304966\
941166 925887\
46286 299745\
319716 662161\
853092 455361"
data_list = data.split(" ") # creating a list of strings
data_list_numbers = [] # converting list of strings to list of integers
for d in data_list:
data_list_numbers.append(int(d))
#splitting the lists into two with every other integer (basically to get the pairs again.
list_one = data_list_numbers[::2]
list_two = data_list_numbers[1::2]
zipped_list = zip(list_one, list_two) #zipping lists
sum = [x+y for x,y in zip(list_one, list_two)] # finding the sum of each pair
print(sum)
What am I missing?
Quote the input string like so: """...""", remove the backslashes, and use re.split to split on whitespace. Note that using backslashes without spaces, as you did, causes the numbers in data to smash into each other. That is, this:
"607663\
907859"
is the same as: "607663907859".
import re
data = """917128 607663
907859 281478
880236 180499
138147 764933
120281 410091
27737 932325
540724 934920
428397 637913
879249 469640
104749 325216
113555 304966
941166 925887
46286 299745
319716 662161
853092 455361"""
data_list = re.split(r'\s+', data) # creating a list of strings
data_list_numbers = [] # converting list of strings to list of integers
for d in data_list:
data_list_numbers.append(int(d))
#splitting the lists into two with every other integer (basically to get the pairs again.
list_one = data_list_numbers[::2]
list_two = data_list_numbers[1::2]
zipped_list = zip(list_one, list_two) #zipping lists
sum = [x+y for x,y in zip(list_one, list_two)] # finding the sum of each pair
print(sum)
# [1524791, 1189337, 1060735, 903080, 530372, 960062, 1475644, 1066310, 1348889, 429965, 418521, 1867053, 346031, 981877, 1308453]

Get sum of integers from list of strings

alist = [["Chanel-1000, Dior-2000, Prada-500"],
["Chloe-200,Givenchy-400,LV-600"], ["Bag-1,Bagg-2,Baggg-3"]]
alist_min = [
min(map(str.strip, x[0].split(',')),
key=lambda i: int(str.strip(i).split('-')[-1])) for x in alist
]
print(alist_min)
Given this script how to get the sum of alist_min it will only print the integer so given the result of [Prada-500, Chloe-200, Bagg-1] by doing the summation of the list the output would be
#total: 701
You can use sum() and list comprehension with split() function:
sum([int(x.split('-')[1]) for x in alist_min])
Full code:
alist = [["Chanel-1000, Dior-2000, Prada-500"],
["Chloe-200,Givenchy-400,LV-600"], ["Bag-1,Bagg-2,Baggg-3"]]
alist_min = [
min(map(str.strip, x[0].split(',')),
key=lambda i: int(str.strip(i).split('-')[-1])) for x in alist
]
print(alist_min)
print(sum([int(x.split('-')[1]) for x in alist_min]))
Output:
['Prada-500', 'Chloe-200', 'Bag-1']
701
Explanation:
use split() to split each string in alist_min at character -, into two, the second one has the number.
Convert this to an int.
Use above logic in list comprehension to generate list of numbers
Use sum() to take sum of this list
You can use regular expression, along with map and sum
import re
sum(map(int,(map(lambda x:re.findall('\d+',x)[0], alist_min))))
#output: 701

searching a list of strings for integers

Given the following list of strings:
my_list = ['element0 123 321\n', 'element1 223 32221\n', 'element2 19823 328771\n', ... ]
how can I split each entry into a list of tuples:
[ (123, 321), (223, 32221), (19823, 328771), ... ]
In my other poor attempt, I managed to extract the numbers, but I encountered a problem, the element placeholder also contains a number which this method includes! It also doesn't write to a tuple, rather a list.
numbers = list()
for s in my_list:
for x in s:
if x.isdigit():
numbers.append((x))
numbers
We can first build a regex that identifies positive integers:
from re import compile
INTEGER_REGEX = compile(r'\b\d+\b')
Here \d stands for digit (so 0, 1, etc.), + for one or more, and \b are word boundaries.
We can then use INTEGER_REGEX.findall(some_string) to identify all positive integers from the input. Now the only thing left to do is iterate through the elements of the list, and convert the output of INTEGER_REGEX.findall(..) to a tuple. We can do this with:
output = [tuple(INTEGER_REGEX.findall(l)) for l in my_list]
For your given sample data, this will produce:
>>> [tuple(INTEGER_REGEX.findall(l)) for l in my_list]
[('123', '321'), ('223', '32221'), ('19823', '328771')]
Note that digits that are not separate words will not be matched. For instance the 8 in 'see you l8er' will not be matched, since it is not a word.
your attempts iterates on each char of the string. You have to split the string according to blank. A task that str.split does flawlessly.
Also numbers.append((x)) is numbers.append(x). For a tuple of 1 element, add a comma before the closing parenthese. Even if that doesn't solve it either.
Now, the list seems to contain an id (skipped), then 2 integers as string, so why not splitting, zap the first token, and convert as tuple of integers?
my_list = ['element0 123 321\n', 'element1 223 32221\n', 'element2 19823 328771\n']
result = [tuple(map(int,x.split()[1:])) for x in my_list]
print(result)
gives:
[(123, 321), (223, 32221), (19823, 328771)]

Spliting string into two by comma using python

I have following data in a list and it is a hex number,
['aaaaa955554e']
I would like to split this into ['aaaaa9,55554e'] with a comma.
I know how to split this when there are some delimiters between but how should i do for this case?
Thanks
This will do what I think you are looking for:
yourlist = ['aaaaa955554e']
new_list = [','.join([x[i:i+6] for i in range(0, len(x), 6)]) for x in yourlist]
It will put a comma at every sixth character in each item in your list. (I am assuming you will have more than just one item in the list, and that the items are of unknown length. Not that it matters.)
i assume you wanna split into every 6th character
using regex
import re
lst = ['aaaaa955554e']
newlst = re.findall('\w{6}', lst[0])
# ['aaaaa9', '55554e']
Using list comprehension, this works for multiple items in lst
lst = ['aaaaa955554e']
newlst = [item[i:i+6] for i in range(0,len(a[0]),6) for item in lst]
# ['aaaaa9', '55554e']
This could be done using a regular expression substitution as follows:
import re
print re.sub(r'([a-zA-Z]+\d)(.*?)', r'\1,\2', 'aaaaa955554e', count=1)
Giving you:
aaaaa9,55554e
This splits after seeing the first digit.

How to sort out first row in a list

I would like to sort out the first row of a given list.
I've been already tried to use python "replace" to remove the second row.
But the problem is that the replace function seems not work at all.
Here is the regular expression I used: replace(r'^ //.*$','')
Here is the list:
//SA/... //short_message/Saint/...
//SS-SA/... //long_message/wonder-girl/...
here is the output I am expecting:
//SA/...
//SS-SA/...
l = ["1 12","3 12","2 12"] # space separated
n = [x.split()[0] for x in l]
print sorted(n)

Categories

Resources