Incorrect number of values in output from zip? - python

I've been working on a problem that involves taking multiple number pairs, and creating some form of sum loop that adds each pair together.
I am not getting the correct number of outputs, e.g. 15 pairs of numbers are inputted and only 8 are coming out.
Here's my code so far...
data = "917128 607663\
907859 281478\
880236 180499\
138147 764933\
120281 410091\
27737 932325\
540724 934920\
428397 637913\
879249 469640\
104749 325216\
113555 304966\
941166 925887\
46286 299745\
319716 662161\
853092 455361"
data_list = data.split(" ") # creating a list of strings
data_list_numbers = [] # converting list of strings to list of integers
for d in data_list:
data_list_numbers.append(int(d))
#splitting the lists into two with every other integer (basically to get the pairs again.
list_one = data_list_numbers[::2]
list_two = data_list_numbers[1::2]
zipped_list = zip(list_one, list_two) #zipping lists
sum = [x+y for x,y in zip(list_one, list_two)] # finding the sum of each pair
print(sum)
What am I missing?

Quote the input string like so: """...""", remove the backslashes, and use re.split to split on whitespace. Note that using backslashes without spaces, as you did, causes the numbers in data to smash into each other. That is, this:
"607663\
907859"
is the same as: "607663907859".
import re
data = """917128 607663
907859 281478
880236 180499
138147 764933
120281 410091
27737 932325
540724 934920
428397 637913
879249 469640
104749 325216
113555 304966
941166 925887
46286 299745
319716 662161
853092 455361"""
data_list = re.split(r'\s+', data) # creating a list of strings
data_list_numbers = [] # converting list of strings to list of integers
for d in data_list:
data_list_numbers.append(int(d))
#splitting the lists into two with every other integer (basically to get the pairs again.
list_one = data_list_numbers[::2]
list_two = data_list_numbers[1::2]
zipped_list = zip(list_one, list_two) #zipping lists
sum = [x+y for x,y in zip(list_one, list_two)] # finding the sum of each pair
print(sum)
# [1524791, 1189337, 1060735, 903080, 530372, 960062, 1475644, 1066310, 1348889, 429965, 418521, 1867053, 346031, 981877, 1308453]

Related

Flatten lists of list for each cell in a pandas column

I have a DF that looks like this
DF =
index goal features
0 1 [[5.20281045, 5.3353545, 7.343434, ...],[2.33435, 4.2133, ...], ...]]
1 0 [[7.23123213, 1.2323123, 2.232133, ...],[1,45456, 0.2313, 2.23213], ...]]
...
The features column has a very large amount of numbers in a list of lists. The actual amount of its elements is not the same across multiple rows and I therefore wanted to fill in 0 to create a singular input and also flattening the list of lists to a single list.
DF_Desired
index goal features
0 1 [5.20281045, 5.3353545, 7.343434, ..., 2.33435, 4.2133, ... , ...]
0 0 [7.23123213, 1.2323123, 2.232133, ..., 1,45456, 0.2313, 2.23213, ...]
Here is my code:
# Flatten each Lists
flat_list = []
for sublist in data["features"]:
for item in sublist:
flat_list.append(item)
or
flat_list = list(itertools.chain.from_iterable(data["features"]))
I (of course) cannot enter flat_list straight into the DF as its length does not match
"ValueError: Length of values (478) does not match length of index (2)"
# Make the Lists equal in length:
length = max(map(len, df["features"]))
X = np.array([xi+[0]*(length-len(xi)) for xi in df["features"])
print(X)
What this should do is flatten each cell of df["features"] into a single list and then adding 0 to fit each list where needed. But it just returns:
[[5.20281045, 5.3353545, 7.343434, ...]
[2.33435, 4.2133, ...]
[...]
...
[7.23123213, 1.2323123, 2.232133, ...]
[1,45456, 0.2313, 2.23213 ...]]
So what exactly did I do wrong?
You can sum each list with a empty one to get a flat list:
DF['features'] = DF.features.apply(lambda x: sum(x, []))
If I understood correctly you want to flatten the list of lists into one list and also want each entry in features column to be of equal length.
This can be achieved in the following manner:
# flattening
df.features = df.features.apply(lambda x:[leaf for tree in x for leaf in tree])
# make equal in length
max_len = df.features.apply(len).max()
def append_zeros(l):
if len(l) < max_len:
return l.append([0]*(max_len - len(l))).copy()
else:
return l
df.features = df.features.apply(append_zeros)
If I have not understood something clearly, please comment.

how to filter a value from list in python?

I have list of values , need to filter out values , that doesn't follow a naming convention.
like below list : list = ['a1-23','b1-24','c1-25','c1-x-25']
need to filter : all values that starts with 'c1-' , except 'c1-x-' )
output expected: ['a1-23','b1-24','c1-x-25']
list = ['a1-23','b1-24','c1-25','c1-x-25']
[x for x in list if not x.startswith('c1-')]
['a1-23', 'b1-24']
You have the right idea, but you're missing the handing of values that start with c1-x-:
[x for x in list if not x.startswith('c1-') or x.startswith('c1-x-')]
import re
list1 = ['a1-23','b1-24','c1-25','c1-x-25',"c1-22"]
r = re.compile(r"\bc1-\b\d{2}$") # this regex matches anything with `c1-{2 digits}` exactly
[x for x in list1 if x not in list(filter(r.match,list1))]
# output
['a1-23', 'b1-24', 'c1-x-25']
So what my pattern does is match EXACTLY a word that starts with c1- and ends with two digits only.
Therefore, list(filter(r.match,list1)) will give us all the c1-## and then we do a list comprehension to filter out from list1 all the x's that aren't in the new provided list containing the matches.
x for x in [1,2,3] if x not in [1,2]
#output
[3]

Sum all numbers in a list of strings

sorry if this is very noob question, but I have tried to solve this on my own for some time, gave it a few searches (used the "map" function, etc.) and I did not find a solution to this. Maybe it's a small mistake somewhere, but I am new to python and seem to have some sort of tunnel vision.
I have some text (see sample) that has numbers inbetween. I want to extract all numbers with regular expressions into a list and then sum them. I seem to be able to do the extraction, but struggle to convert them to integers and then sum them.
import re
df = ["test 4497 test 6702 test 8454 test",
"7449 test"]
numlist = list()
for line in df:
line = line.rstrip()
numbers = re.findall("[0-9]+", line) # find numbers
if len(numbers) < 1: continue # ignore lines with no numbers, none in this sample
numlist.append(numbers) # create list of numbers
The sum(numlist) returns an error.
You don't need a regex for this. Split the strings in the list, and sum those that are numeric in a comprehension:
sum(sum(int(i) for i in s.split() if i.isnumeric()) for s in df)
# 27102
Or similarly, flatten the resulting lists, and sum once:
from itertools imprt chain
sum(chain.from_iterable((int(i) for i in s.split() if i.isnumeric()) for s in df))
# 27102
This is the source of your problem:
finadall returns a list which you are appending to numlist, a list. So you end up with a list of lists. You should instead do:
numlist.extend(numbers)
So that you end up with a single list of numbers (well, actually string representations of numbers). Then you can convert the strings to integers and sum:
the_sum = sum(int(n) for n in numlist)
Iterate twice over df and append each digit to numlist:
numlist = list()
for item in df:
for word in item.split():
if word.isnumeric():
numlist.append(int(word))
print(numlist)
print(sum(numlist))
Out:
[4497, 6702, 8454, 7449]
27102
You could make a one-liner using list comprehension:
print(sum([int(word) for item in df for word in item.split() if word.isnumeric()]))
>>> 27102
It's as easy as
my_sum = sum(map(int, numbers_list))
Here is an option using map, filter and sum:
First splits the strings at the spaces, filters out the non-numbers, casts the number-strings to int and finally sums them.
# if you want the sum per string in the list
sums = [sum(map(int, filter(str.isnumeric, s.split()))) for s in df]
# [19653, 7449]
# if you simply want the sum of all numbers of all strings
sum(sum(map(int, filter(str.isnumeric, s.split()))) for s in df)
# 27102

How to convert list pairs into tuple pairs

How do you turn a list that contain pairs into a list that contains tuple pairs by using easy programming e.g for loop? x,y = ...?
My code:
def read_numbers():
numbers = ['68,125', '113,69', '65,86', '108,149', '152,53', '78,90']
numbers.split(',')
x,y = tuple numbers
return numbers
desire output:
[(68,125), (113,69), (65,86), (108,149), (152,53), (78,90)]
def read_numbers():
numbers = ['68,125', '113,69', '65,86', '108,149', '152,53', '78,90']
return [tuple(map(int,pair.split(','))) for pair in numbers]
Try this by using nested list comprehension:
o = [tuple(int(y) for y in x.split(',')) for x in numbers]
Just use list comprehension. Read more about it here!
# Pass in numbers as an argument so that it will work
# for more than 1 list.
def read_numbers(numbers):
return [tuple(int(y) for y in x.split(",")) for x in numbers]
Here is a breakdown and explanation (in comments) of the list comprehension:
[
tuple( # Convert whatever is between these parentheses into a tuple
int(y) # Make y an integer
for y in # Where y is each element in
x.split(",") # x.split(","). Where x is a string and x.split(",") is a list
# where the string is split into a list delimited by a comma.
) for x in numbers # x is each element in numbers
]
However, if you are just doing it for one list, there is no need to create a function.
Try this :
def read_numbers():
numbers = ['68,125', '113,69', '65,86', '108,149', '152,53', '78,90']
final_list = []
[final_list.append(tuple(int(test_str) for test_str in number.split(','))) for number in numbers]
return final_list

Python: How to evaluate a part of each item in a list, and append matching results?

Problem:
Trying to evaluate first 4 characters of each item in list.
If the first 4 chars match another first 4 chars in the list, then append the last three digits to the first four. See example below.
Notes:
The list values are not hard coded.
The list always has this structure "####.###".
Only need to match first 4 chars in each item of list.
Order is not essential.
Code:
Grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102", "094G.032"]
Desired Output:
Grid = ["094G.016\019\032", "194P.005\015", "093T.021\102"]
Research:
I know that sets can find duplicates, could I use a set to evaluate only the 1st 4 chars, would I run into a problem since indexing of sets cannot be done?
Would it be better to split the list items into the 2 parts. The four digits before the period ("094G"), and a separate list of the three digits after the period ("093"), compare them, then join them in a new list?
Is there a better way of doing this all together that I'm not realizing?
Here is one straightforward way to do it.
from collections import defaultdict
grid = ['094G.016', '094G.019', '194P.005', '194P.015', '093T.021', '093T.102', '094G.032']
d = defaultdict(list)
for item in grid:
k,v = item.split('.')
d[k].append(v)
result = ['%s.%s' % (k, '/'.join(v)) for k, v in d.items()]
Gives unordered result:
['093T.021/102', '194P.005/015', '094G.016/019/032']
What you'll most likely want is a dictionary mapping the first part of each code to a list of second parts. You can build the dictionary like so:
mappings = {} #Empty dictionary
for code in Grid: #Loop over each code
first, second = code.split('.') #Separate the code into first.second
if first in mappings: #if the first was already found
mappings[first].append(second) #add the second to those already computed
else:
mappings[first] = [second] #otherwise, put it in a new list
Once you have the dictionary, it will be quite simple to loop over it and combine the second parts together (ideally, using '\\'.join)
Sounds like a job for defaultdict.
from containers import defaultdict
grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102"]
d = defaultdict(set)
for item in grid:
prefix, suffix = item.split(".")
d[prefix].add(suffix)
output = [ "%s.%s" % (prefix, "/".join(d[prefix]), ) for prefix in d ]
>>> from itertools import groupby
>>> Grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102", "094G.032"]
>>> Grid = sorted(Grid, key=lambda x:x.split(".")[0])
>>> gen = ((k, g) for k, g in groupby(Grid, key=lambda x:x.split(".")[0]))
>>> gen = ((k,[x.split(".") for x in g]) for k, g in gen)
>>> gen = list((k + '.' + '/'.join(x[1] for x in g) for k, g in gen))
>>> for x in gen:
... print(x)
...
093T.021/102
094G.016/019/032
194P.005/015

Categories

Resources