How to apply the same function to multiple lists - python

I have two lists and want to apply the same function to both, I know how to apply one at a time, but not both? Then I want to add each element to gather a total?
a = ['a','b','c','d','e']
b = ['b', np.nan,'c','e','a']
c = ['a','b','c','d','e']
I know you could do below to get the output, but I wanted to do it with serparation
a = [1 if 'a' in a else 99 for x in a]
b = [1 if 'a' in b else 99 for x in b]
c = [1 if 'a' in c else 99 for x in c]
I first want to outputs below:
a = [1, 99, 99, 99, 99]
b = [99, 99, 99, 99, 1]
c = [99, 99, 99, 99, 1]
Then Add each elements into one final list
sum = [199, 297, 297, 297, 101]

I'm not sure whether I understood your question correctly. As fgblomqvist commented, I replaced 1 if 'a' in a by 1 if x == 'a' in the list comprehension. Then I basically reproduced your second step with a for-loop and after that I used zip to iterate over the list values of all lists synchronously in order to calculate the sum.
a = ['a','b','c','d','e']
b = ['b','a','c','e','a']
c = ['a','b','c','d','e']
# add the lists to a list.
lists = [a,b,c]
outcomes = []
for l in lists:
outcome = [1 if x == 'a' else 99 for x in l]
outcomes.append(outcome)
print(f'one of the outcomes: {outcome}')
results = []
# iterate over all list values synchronously and calculate the sum
for outs in zip(*outcomes):
results.append(sum(outs))
print(f'sum of {outs} is {sum(outs)}')
print(f'final result:{results}')
This is the output:
one of the outcomes: [1, 99, 99, 99, 99]
one of the outcomes: [99, 1, 99, 99, 1]
one of the outcomes: [1, 99, 99, 99, 99]
sum of (1, 99, 1) is 101
sum of (99, 1, 99) is 199
sum of (99, 99, 99) is 297
sum of (99, 99, 99) is 297
sum of (99, 1, 99) is 199
final result:[101, 199, 297, 297, 199]
edit: To avoid looping twice you could join the loops together like so:
lists = [a,b,c]
sums = []
for values in zip(*lists):
the_sum = 0
for val in values:
the_sum += 1 if val == 'a' else 99
sums.append(the_sum)
print(f'sums: {sums}')
Keep in mind you can replace the 1 if val == 'a' else 99 by some_func(val)

pandas makes this quite easy
(although im sure its amost as easy just with numpy)
import pandas
df = pandas.DataFrame({'a':a,'b':b,'c':c})
mask = df == 'a'
df[mask] = 1
df[~mask] = 99
df.sum(axis=1)

import numpy as np
a = ['a','b','c','d','e']
b = ['b', np.nan,'c','e','a']
c = ['a','b','c','d','e']
dct = {"a":1}
sum_var = [np.nansum([dct.get(aa,99), dct.get(bb,99), dct.get(cc,99)]) for (aa, bb, cc) in zip(a,b,c)]
Explaination:
You can use list comprehension (as you did in your example) with a bit of modification. Instead of iterating through a single list, iterate through a collection of all the lists. You can achive this by using the builtin function "zip()" which essentially zips the lists together.
Since all 3 lists have the same length you can iterate through it and apply additional transformations on each element like shown in the example.
The additional function in this example is the dictionary.get() method which gets you the value for each key, here the 1 for an "a". Everything which is not in the dictionary will return a 99. But in the same way you can use custom made functions.

Related

Having trouble replacing elements in a 2d list with a 1d list using for-loops

I am asked to replace some elements in a 2d list with elements from a 1D list. The rule is to replace the first row of list1 with list2. The elements in the next row will be replaced by 3 times each element from the previous row. So the second row will contain 33, 39, 45,and 51, and the third row 99, 117, 135, and 153... all the way to the 10th row.
Here is my code:
list1 = [[0]*4]*10
list2 = [11,13,15,17]
list1[0] = list2
i = 1
for i in range(9):
j = 0
for j in range(4):
list1[i][j] = list2[j]*(3**i)
j+=1
i+=1
The result I got from this code basically only contains the correct first row, but the rest of the rows after that are all 72171, 85293, 98415, and 111537 (which is 3 to the 8th). I am not sure which part is giving me the error.
Here are some comments on your code and examples of how to make it do what the question describes:
(1.) References vs copies: list1 = [[0]*4]*10 creates a single 4-element list and populates list1 with 10 references to it (NOT 10 copies of it).
For an example of what this implies, watch this:
list1 = [[0]*4]*10
list1[1][0] = 33
list1[1][1] = 39
list1[1][2] = 45
list1[1][3] = 51
print(list1)
... gives this:
[[33, 39, 45, 51], [33, 39, 45, 51], [33, 39, 45, 51], [33, 39, 45, 51], [33, 39, 45, 51], [33, 39, 45, 51], [33, 39, 45, 51], [33, 39, 45, 51], [33, 39, 45, 51], [33, 39, 45, 51]]
In other words, updating one list element within list1 updates them all, since each element of list1 is just a reference to the same list.
If this is not what you want, you can use list1 = [[0]*4 for _ in range(10)] instead to give you a distinct list (10 in total) for each index in list1:
list1 = [[0]*4 for _ in range(10)]
list1[1][0] = 33
list1[1][1] = 39
list1[1][2] = 45
list1[1][3] = 51
print(list1)
... gives:
[[0, 0, 0, 0], [33, 39, 45, 51], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
Your code as written would tend to imply that the second approach above is what is needed.
(2.) Code fix assuming nested lists cannot be replaced: It's unclear from the question whether whether you are allowed to replace each list in list1 to get the desired values, or if you are expected to leave the nested lists in place and simply modify their numerical contents.
If list replacement is not allowed, then your code can be rewritten like this:
list1 = [[0]*4 for _ in range(10)]
list2 = [11,13,15,17]
list1[0][:] = list2
for i in range(9):
for j in range(4):
list1[i + 1][j] = list2[j]*(3**(i + 1))
print(list1)
... giving:
[[11, 13, 15, 17], [33, 39, 45, 51], [99, 117, 135, 153], [297, 351, 405, 459], [891, 1053, 1215, 1377], [2673, 3159, 3645, 4131], [8019, 9477, 10935, 12393], [24057, 28431, 32805, 37179], [72171, 85293, 98415, 111537], [216513, 255879, 295245, 334611]]
Note that these changes were made to your code:
Changed the initialization of list1 to allocate 10 distinct nested lists.
Eliminated i = 1 (not needed because the for loop takes care of initializing i), j = 0 (similar reason), j+=1 (not needed because the for loop takes care of updating j) and i+=1 (similar reason).
Changed list1[i][j] to list1[i + 1][j] to fix the indexing.
Changed 3**i to 3**(i + 1) to calculate the correct multiplier.
(3.) Code fix assuming nested lists can be replaced: In this case, your looping logic can be simplified and you don't need to use nested lists when initializing list1.
Here is a long-hand way to do what you ask which will overwrite the contents of list1 with new nested lists that have the desired values:
list1 = [None] * 10
list2 = [11,13,15,17]
list1[0] = list2
mul = 3
for i in range(1, len(list1)):
temp = [0] * len(list2)
for j in range(len(list2)):
temp[j] = mul * list2[j]
list1[i] = temp
mul *= 3
print(list1)
Here is a way that uses a list comprehension inside a loop:
list1 = [None] * 10
list2 = [11,13,15,17]
list1[0] = list2
mul = 3
for i in range(1, len(list1)):
list1[i] = [mul * v for v in list2]
mul *= 3
print(list1)
And finally, here is a very compact nested list comprehension approach:
list1 = [None] * 10
list2 = [11,13,15,17]
list1 = [[(3 ** i) * v for v in list2] for i in range(len(list1))]
the reason that happened is that you are working on a copy of 0 elements of list1 , so whenever you modify further everything will be replaced with the new value, it's not a copy its the same object in reference.
for i in range(9):
tmp = []
for j in range(4):
tmp.append(list2[j]*(3**i))
list1[i] = tmp
this is all you need
You can make this much easier using numpy arrays:
import numpy as np
list1 = np.zeros((10,4), dtype=int)
list1[0] = [11,13,15,17]
for i in range(1,10):
list1[i] = 3*list1[i-1]

How to compare the each elements in the delimited string in pandas data frame column with a python list object elements

I have a data frame that has a delimited string column that has to be compared with a list. If the result of the elements in the delimited string and elements of the list intersect, consider that row.
For example
test_lst = [20, 45, 35]
data = pd.DataFrame({'colA': [1, 2, 3],
'colB': ['20,45,50,60', '22,70,35', '10,90,100']})
should have the output as because the elements 20,45 are common in both the list variable and delimited text in DF in the first row.
Likewise, 35 intersects in row 2
colA
colB
1
20,45,50,60
2
22,70,35
What I have tried is
test_lst = [20, 45, 35]
data["colC"]= data['colB'].str.split(',')
data
# data["colC"].apply(lambda x: set(x).intersection(test_lst))
print(data[data['colC'].apply(lambda x: set(x).intersection(test_lst)).astype(bool)])
data
Does not give the required result.
Any help is appreciated
This might not be the best approach, but it works.
import pandas as pd
df = pd.DataFrame({'colA': [1, 2, 3],
'colB': ['20,45,50,60', '22,70,35', '10,90,100']})
def match_element(row):
row_elements = [int(n) for n in row.split(',')]
test_lst = [20, 45, 35]
if [value for value in row_elements if value in test_lst]:
return True
else:
return False
mask = df['colB'].apply(lambda row: match_element(row))
df = df[mask]
output:
colA
colB
0
1
20,45,50,60
1
2
22,70,35

Count the number of times values appear within a range of values

How do I output a list which counts and displays the number of times different values fit into a range?
Based on the below example, the output would be x = [0, 3, 2, 1, 0] as there are 3 Pro scores (11, 24, 44), 2 Champion scores (101, 888), and 1 King score (1234).
- P1 = 11
- P2 = 24
- P3 = 44
- P4 = 101
- P5 = 1234
- P6 = 888
totalsales = [11, 24, 44, 101, 1234, 888]
Here is ranking corresponding to the sales :
Sales___________________Ranking
0-10____________________Noob
11-100__________________Pro
101-1000________________Champion
1001-10000______________King
100001 - 200000__________Lord
This is one way, assuming your values are integers and ranges do not overlap.
from collections import Counter
# Ranges go to end + 1
score_ranges = [
range(0, 11), # Noob
range(11, 101), # Pro
range(101, 1001), # Champion
range(1001, 10001), # King
range(10001, 200001) # Lord
]
total_sales = [11, 24, 44, 101, 1234, 888]
# This counter counts how many values fall into each score range (by index).
# It works by taking the index of the first range containing each value (or -1 if none found).
c = Counter(next((i for i, r in enumerate(score_ranges) if s in r), -1) for s in total_sales)
# This converts the above counter into a list, taking the count for each index.
result = [c[i] for i in range(len(score_ranges))]
print(result)
# [0, 3, 2, 1, 0]
As a general rule homework should not be posted on stackoverflow. As such, just a pointer on how to solve this, implementation is up to you.
Iterate over the totalsales list and check if each number is in range(start,stop). Then for each matching check increment one per category in your result list (however using a dict to store the result might be more apt).
Here a possible solution with no use of modules such as numpy or collections:
totalsales = [11, 24, 44, 101, 1234, 888]
bins = [10, 100, 1000, 10000, 20000]
output = [0]*len(bins)
for s in totalsales:
slot = next(i for i, x in enumerate(bins) if s <= x)
output[slot] += 1
output
>>> [0, 3, 2, 1, 0]
If your sales-to-ranking mapping always follows a logarithmic curve, the desired output can be calculated in linear time using math.log10 with collections.Counter. Use an offset of 0.5 and the abs function to handle sales of 0 and 1:
from collections import Counter
from math import log10
counts = Counter(int(abs(log10(abs(s - .5)))) for s in totalsales)
[counts.get(i, 0) for i in range(5)]
This returns:
[0, 3, 2, 1, 0]
Here, I have used the power of dataframe to store the values, then using bin and cut to group the values into the right categories. The extracting the value count into list.
Let me know if it is okay.
import pandas as pd
import numpy
df = pd.DataFrame([11, 24, 44, 101, 1234, 888], columns=['P'])# Create dataframe
bins = [0, 10, 100, 1000, 10000, 200000]
labels = ['Noob','Pro', 'Champion', 'King', 'Lord']
df['range'] = pd.cut(df.P, bins, labels = labels)
df
outputs:
P range
0 11 Pro
1 24 Pro
2 44 Pro
3 101 Champion
4 1234 King
5 888 Champion
Finally, to get the value count. Use:
my = df['range'].value_counts().sort_index()#this counts to the number of occurences
output=map(int,my.tolist())#We want the output to be integers
output
The result below:
[0, 3, 2, 1, 0]
You can use collections.Counter and a dict:
from collections import Counter
totalsales = [11, 24, 44, 101, 1234, 888]
ranking = {
0: 'noob',
10: 'pro',
100: 'champion',
1000: 'king',
10000: 'lord'
}
c = Counter()
for sale in totalsales:
for k in sorted(ranking.keys(), reverse=True):
if sale > k:
c[ranking[k]] += 1
break
Or as a two-liner (credits to #jdehesa for the idea):
thresholds = sorted(ranking.keys(), reverse=True)
c = Counter(next((ranking[t] for t in thresholds if s > t)) for s in totalsales)

Random shuffle multiple lists python

I have a set of lists in Python and I want to shuffle both of them but switching elements in same positions in both lists like
a=[11 22 33 44] b = [66 77 88 99]
*do some shuffeling like [1 3 0 2]*
a=[22 44 11 33] b = [77 99 66 88]
Is this possible?
Here's a solution that uses list comprehensions:
>>> a = [11, 22, 33, 44]
>>> b = [66, 77, 88, 99]
>>> p = [1, 3, 0, 2]
>>>
>>> [a[i] for i in p]
[22, 44, 11, 33]
>>>
>>> [b[i] for i in p]
[77, 99, 66, 88]
>>>
You can use zip in concert with the random.shuffle operator:
a = [1,2,3,4] # list 1
b = ['a','b','c','d'] # list 2
c = zip(a,b) # zip them together
random.shuffle(c) # shuffle in place
c = zip(*c) # 'unzip' them
a = c[0]
b = c[1]
print a # (3, 4, 2, 1)
print b # ('c', 'd', 'b', 'a')
If you want to retain a,b as lists, then just use a=list(c[0]). If you don't want them to overwrite the original a/b then rename like a1=c[0].
Expanding upon Tom's answer, you can make the p list easily and randomize it like this:
import random
p = [x for x in range(len(a))]
random.shuffle(p)
This works for any size lists, but I'm assuming from your example that they're all equal in size.
Tom's answer:
Here's a solution that uses list comprehensions:
a = [11, 22, 33, 44]
b = [66, 77, 88, 99]
p = [1, 3, 0, 2]
[a[i] for i in p]
[22, 44, 11, 33]
[b[i] for i in p]
[77, 99, 66, 88]
a=[11,22,33,44]
order = [1,0,3,2] #give the order
new_a = [a[k] for k in order] #list comprehension that's it
You just give the order and then do list comprehension to get new list

Reshaping vector with indices in python

I am having some problems resizing a list in python. I have a vector (A) with -9999999 as a few of the elements. I want to find those elements remove them and remove the corresponding elements in B.
I have tried to index the non -9999999 values like this:
i = [i for i in range(len(press)) if press[i] !=-9999999]
But I get an error when I try to use the index to reshape press and my other vector.
Type Error: list indices must be integers, not list
The vectors have a length of about 26000
Basically if I have vector A I want to remove -9999999 elements from A and 65 and 32 in B.
A = [33,55,-9999999,44,78,22,-9999999,10,34]
B = [22,33,65,87,43,87,32,77,99]
Since you mentioned vector, so I think you're looking for a NumPy based solution:
>>> import numpy as np
>>> a = np.array(A)
>>> b = np.array(B)
>>> b[a!=-9999999]
array([22, 33, 87, 43, 87, 77, 99])
Pure Python solution using itertools.compress:
>>> from itertools import compress
>>> list(compress(B, (x != -9999999 for x in A)))
[22, 33, 87, 43, 87, 77, 99]
Timing comparisons:
>>> A = [33,55,-9999999,44,78,22,-9999999,10,34]*10000
>>> B = [22,33,65,87,43,87,32,77,99]*10000
>>> a = np.array(A)
>>> b = np.array(B)
>>> %timeit b[a!=-9999999]
100 loops, best of 3: 2.78 ms per loop
>>> %timeit list(compress(B, (x != -9999999 for x in A)))
10 loops, best of 3: 22.3 ms per loop
A = [33,55,-9999999,44,78,22,-9999999,10,34]
B = [22,33,65,87,43,87,32,77,99]
A1, B1 = (list(x) for x in zip(*((a, b) for a, b in zip(A, B) if a != -9999999)))
print(A1)
print(B1)
This yields:
[33, 55, 44, 78, 22, 10, 34]
[22, 33, 87, 43, 87, 77, 99]
c = [j for i, j in zip(A, B) if i != -9999999]
zip merges two lists, creating a list of the pairs (x, y). Using list comprehension you can filter the elements that are -999999 in A.

Categories

Resources