Related
The task I have is as shown. Given two lists of ints, such as:
lst1 = [4,1,3,5,2]
lst2 = [1,3,5,2,4]
I want to get the mean of the ints indexes. For example, the int 4 will have indexes of 0 and 3, thus will have an index mean of 1.5. Similarly, int 1 will have an index mean of 1. I want to then have all ints in a list, sorted by their 'index mean'. The result should be:
result = [1, 3, 4, 5, 2]
as their means are 0.5, 1.5, 2, 2.5, and 3.5, respectively.
What is a fast pythonic way of doing this without going through some complicated iterations? All help is appreciated!
>>> lst1 = [4,1,3,5,2]
>>> lst2 = [1,3,5,2,4]
>>> sorted(lst1, key=lambda n: lst1.index(n) + lst2.index(n))
[1, 3, 4, 5, 2]
Note that finding the actual average (i.e. dividing the sum by 2) isn't necessary since you're using the values as sort keys; as long as they have the same comparison results as the averages (which they will since each sum is the average times a constant factor of 2) you get the correct result.
For a more efficient solution, you'd build a dictionary so you only need to iterate through each list once to sum up all the indices:
>>> avg_indices = {n: 0 for n in lst1}
>>> for a in (lst1, lst2):
... for i, n in enumerate(a):
... avg_indices[n] += i
...
>>> sorted(lst1, key=avg_indices.get)
[1, 3, 4, 5, 2]
Building lists of indices (and taking the actual average) would become necessary if you didn't have the same number of occurrences of each item across the lists.
So there are probably multiple ways, and it all depends on the lists you are working on. Here are two examples:
lst1 = [4,1,3,5,2]
lst2 = [1,3,5,2,4]
means = []
for i in range(len(lst1)):
i2 = lst2.index(lst1[i])
means.append(math.round((i+i2)/2))
print(means)
Another way is to use a generator function:
def indexmeans(a,b):
for i in range(len(a)):
i2 = b.index(a[i])
yield math.round((i+i2)/2)
lst1 = [4,1,3,5,2]
lst2 = [1,3,5,2,4]
print(list(indexmeans(lst1, lst2)))
If you want to support lists with mismatched number of elements, handle if one value is not part of the other list and so on, you have to add the logic to handle that to suit your application.
To illustrate my problem, imagine I have a list and I want to compare each element with the next one to check if they are the same value. The problem is that when I try to access the last element of the list and compare it with "the next one", that one is out of range, so I would get an error. So, to avoid this, I put a condition when accessing that last element, so I avoid the comparison.
list = [1, 2, 1, 1, 5, 6, 1,1]
for i in range(len(list)):
if i == len(list)-1:
print('Last element. Avoid comparison')
else:
if list[i] == list[i+1]:
print('Repeated')
I guess that there should be a more efficient way to do this. For instance, I was trying to set the condition in the definition of the for loop, something like this:
for i in range(len(list)) and i < len(list)-1
But that is invalid. Any suggestion about how to do this in a more efficient/elegant way?
If you need to start from 0, you should use:
for i in range(len(list) - 1):
if list[i] == list[i + 1]:
print('Repeated')
The parameter stop of range function is just integer, so you can use value len(list) - 1 instead of len(list) to stop iterating on last but one element.
Other answers have solved this, but I think it's worth mentioning an approach that may be closer to idiomatic Python. Python provides iterable unpacking and other tools like the zip function to avoid accessing elements of sequences by index.
# Better to avoid shadowing the build-in name `list`
a_list = [1, 2, 1, 1, 5, 6, 1, 1]
for value, following_value in zip(a_list, a_list[1:]):
if value == following_value:
print("Repeated!")
You can utilize the functionality of range as follows:
for i in range(1, len(list)):
if list[i-1] == list[i]:
print('Repeated')
In this way, you won't overrun the list.
start from one and look backwards
for i in range(1, len(list)):
if list[i-1] == list[i]:
print('Repeated')
This works!
list = [1, 2, 1, 1, 5, 6, 1, 1]
for i in range(len(list)):
if i+1 < len(list) and list[i] == list[i+1]:
print('Repeated')
len(list) is 8
range(len(list)) is 0, 1, ..., 7
but you want the for loop to skip when the index is 6 right?
so given that case ... if i == len(list)-1: this condition will be True when the index is 7 (not the index that you want)
Just change that to if i == len(list)-2:
There are many ways to do this. The most common one is to use zip to pair each item with its successor:
if any(item == successor for item,successor in zip(lst,lst[1:])):
print('repeated')
groupby from itertools is also a popular choice (but not optimal for this):
if any(duplicate for _,(_,*duplicate) in itertools.groupby(lst)):
print('repeated')
A for-loop would only need to track the previous value (no need for indexing):
prev = object() # non-matching initial value
for x in lst:
if prev==x: # compare to previous
print('repeated')
break
prev = x # track previous for next iteration
Iterators can be interesting when traversing data in parallel (here the elements and their predecessors):
predecessor = iter(lst) # iterate over items from first
for x in lst[1:]: # iterate from 2nd item
if x == next(predecessor): # compare to corresponding predecessor
print('repeated')
break
list = [1, 2, 1, 1, 5, 6, 1,1]
for i in range(len(list)):
if list[i] in list[i+1:i+2]:
print('repeated')
If you use only numbers in your list, you might want to work with numpy
for instance:
import numpy as np
np_arr = np.array(lst) # don't use 'list' for your object name.
diffs = np.diff(np_arr)
diffs_indices = np.where(diffs != 0)[0]
It is unclear what your exact uses, but for example in my code, you will get:
>>> diffs_indexes
array([0, 1, 3, 4, 5])
Which are the indices where elelment[i] != element[i+1]
Problem to solve: Define a Python function remdup(l) that takes a non-empty list of integers l
and removes all duplicates in l, keeping only the last occurrence of each number. For instance:
if we pass this argument then remdup([3,1,3,5]) it should give us a result [1,3,5]
def remdup(l):
for last in reversed(l):
pos=l.index(last)
for search in reversed(l[pos]):
if search==last:
l.remove(search)
print(l)
remdup([3,5,7,5,3,7,10])
# intended output [5, 3, 7, 10]
On line 4 for loop I want the reverse function to check for each number excluding index[last] but if I use the way I did in the above code it takes the value at pos, not the index number. How can I solve this
You need to reverse the entire slice, not merely one element:
for search in reversed(l[:pos]):
Note that you will likely run into a problem for modifying a list while iterating. See here
It took me a few minutes to figure out the clunky logic. Instead, you need the rest of the list:
for search in reversed(l[pos+1:]):
Output:
[5, 3, 7, 10]
Your original algorithm could be improved. The nested loop leads to some unnecessary complexity.
Alternatively, you can do this:
def remdup(l):
seen = set()
for i in reversed(l):
if i in seen:
l.remove(i)
else:
seen.add(i)
print(l)
I use the 'seen' set to keep track of the numbers that have already appeared.
However, this would be more efficient:
def remdup(l):
seen = set()
for i in range(len(l)-1, -1, -1):
if l[i] in seen:
del l[i]
else:
seen.add(l[i])
print(l)
In the second algorithm, we are iterating over the list in reverse order using a range, and then we delete any item that already exists in 'seen'. I'm not sure what the implementation of reversed() and remove() is, so I can't say what the exact impact on time/space complexity is. However, it is clear to see exactly what is happening in the second algorithm, so I would say that it is a safer option.
This is a fairly inefficient way of accomplishing this:
def remdup(l):
i = 0
while i < len(l):
v = l[i]
scan = i + 1
while scan < len(l):
if l[scan] == v:
l.remove(v)
scan -= 1
i -= 1
scan += 1
i += 1
l = [3,5,7,5,3,7,10]
remdup(l)
print(l)
It essentially walks through the list (indexed by i). For each element, it scans forward in the list for a match, and for each match it finds, it removes the original element. Since removing an element shifts the indices, it adjusts both its indices accordingly before continuing.
It takes advantage of the built-in the list.remove: "Remove the first item from the list whose value is equal to x."
Here is another solution, iterating backward and popping the index of a previously encountered item:
def remdup(l):
visited= []
for i in range(len(l)-1, -1, -1):
if l[i] in visited:
l.pop(i)
else:
visited.append(l[i])
print(l)
remdup([3,5,7,5,3,7,10])
#[5, 3, 7, 10]
Using dictionary:
def remdup(ar):
d = {}
for i, v in enumerate(ar):
d[v] = i
return [pair[0] for pair in sorted(d.items(), key=lambda x: x[1])]
if __name__ == "__main__":
test_case = [3, 1, 3, 5]
output = remdup(test_case)
expected_output = [1, 3, 5]
assert output == expected_output, f"Error in {test_case}"
test_case = [3, 5, 7, 5, 3, 7, 10]
output = remdup(test_case)
expected_output = [5, 3, 7, 10]
assert output == expected_output, f"Error in {test_case}"
Explanation
Keep the last index of each occurrence of the numbers in a dictionary. So, we store like: dict[number] = last_occurrence
Sort the dictionary by values and use list comprehension to make a new list from the keys of the dictionary.
Along with other right answers, here's one more.
from iteration_utilities import unique_everseen,duplicates
import numpy as np
list1=[3,5,7,5,3,7,10]
dup=np.sort(list((duplicates(list1))))
list2=list1.copy()
for j,i in enumerate(list2):
try:
if dup[j]==i:
list1.remove(dup[j])
except:
break
print(list1)
How about this one-liner: (convert to a function is easy enough for an exercise)
# - one-liner Version
lst = [3,5,7,5,3,7,10]
>>>list(dict.fromkeys(reversed(lst)))[::-1]
# [5, 3, 7, 10]
if you don't want a new list, you can do this instead:
lst[:] = list(dict.fromkeys(reversed(lst)))[::-1]
So I want to create a list which is a sublist of some existing list.
For example,
L = [1, 2, 3, 4, 5, 6, 7], I want to create a sublist li such that li contains all the elements in L at odd positions.
While I can do it by
L = [1, 2, 3, 4, 5, 6, 7]
li = []
count = 0
for i in L:
if count % 2 == 1:
li.append(i)
count += 1
But I want to know if there is another way to do the same efficiently and in fewer number of steps.
Solution
Yes, you can:
l = L[1::2]
And this is all. The result will contain the elements placed on the following positions (0-based, so first element is at position 0, second at 1 etc.):
1, 3, 5
so the result (actual numbers) will be:
2, 4, 6
Explanation
The [1::2] at the end is just a notation for list slicing. Usually it is in the following form:
some_list[start:stop:step]
If we omitted start, the default (0) would be used. So the first element (at position 0, because the indexes are 0-based) would be selected. In this case the second element will be selected.
Because the second element is omitted, the default is being used (the end of the list). So the list is being iterated from the second element to the end.
We also provided third argument (step) which is 2. Which means that one element will be selected, the next will be skipped, and so on...
So, to sum up, in this case [1::2] means:
take the second element (which, by the way, is an odd element, if you judge from the index),
skip one element (because we have step=2, so we are skipping one, as a contrary to step=1 which is default),
take the next element,
Repeat steps 2.-3. until the end of the list is reached,
EDIT: #PreetKukreti gave a link for another explanation on Python's list slicing notation. See here: Explain Python's slice notation
Extras - replacing counter with enumerate()
In your code, you explicitly create and increase the counter. In Python this is not necessary, as you can enumerate through some iterable using enumerate():
for count, i in enumerate(L):
if count % 2 == 1:
l.append(i)
The above serves exactly the same purpose as the code you were using:
count = 0
for i in L:
if count % 2 == 1:
l.append(i)
count += 1
More on emulating for loops with counter in Python: Accessing the index in Python 'for' loops
For the odd positions, you probably want:
>>>> list_ = list(range(10))
>>>> print list_[1::2]
[1, 3, 5, 7, 9]
>>>>
I like List comprehensions because of their Math (Set) syntax. So how about this:
L = [1, 2, 3, 4, 5, 6, 7]
odd_numbers = [y for x,y in enumerate(L) if x%2 != 0]
even_numbers = [y for x,y in enumerate(L) if x%2 == 0]
Basically, if you enumerate over a list, you'll get the index x and the value y. What I'm doing here is putting the value y into the output list (even or odd) and using the index x to find out if that point is odd (x%2 != 0).
You can also use itertools.islice if you don't need to create a list but just want to iterate over the odd/even elements
import itertools
L = [1, 2, 3, 4, 5, 6, 7]
li = itertools.islice(l, 1, len(L), 2)
You can make use of bitwise AND operator &:
>>> x = [1, 2, 3, 4, 5, 6, 7]
>>> y = [i for i in x if i&1]
[1, 3, 5, 7]
This will give you the odd elements in the list. Now to extract the elements at odd indices you just need to change the above a bit:
>>> x = [10, 20, 30, 40, 50, 60, 70]
>>> y = [j for i, j in enumerate(x) if i&1]
[20, 40, 60]
Explanation
Bitwise AND operator is used with 1, and the reason it works is because, odd number when written in binary must have its first digit as 1. Let's check:
23 = 1 * (2**4) + 0 * (2**3) + 1 * (2**2) + 1 * (2**1) + 1 * (2**0) = 10111
14 = 1 * (2**3) + 1 * (2**2) + 1 * (2**1) + 0 * (2**0) = 1110
AND operation with 1 will only return 1 (1 in binary will also have last digit 1), iff the value is odd.
Check the Python Bitwise Operator page for more.
P.S: You can tactically use this method if you want to select odd and even columns in a dataframe. Let's say x and y coordinates of facial key-points are given as columns x1, y1, x2, etc... To normalize the x and y coordinates with width and height values of each image you can simply perform:
for i in range(df.shape[1]):
if i&1:
df.iloc[:, i] /= heights
else:
df.iloc[:, i] /= widths
This is not exactly related to the question but for data scientists and computer vision engineers this method could be useful.
I am looking for a way to easily split a python list in half.
So that if I have an array:
A = [0,1,2,3,4,5]
I would be able to get:
B = [0,1,2]
C = [3,4,5]
A = [1,2,3,4,5,6]
B = A[:len(A)//2]
C = A[len(A)//2:]
If you want a function:
def split_list(a_list):
half = len(a_list)//2
return a_list[:half], a_list[half:]
A = [1,2,3,4,5,6]
B, C = split_list(A)
A little more generic solution (you can specify the number of parts you want, not just split 'in half'):
def split_list(alist, wanted_parts=1):
length = len(alist)
return [ alist[i*length // wanted_parts: (i+1)*length // wanted_parts]
for i in range(wanted_parts) ]
A = [0,1,2,3,4,5,6,7,8,9]
print split_list(A, wanted_parts=1)
print split_list(A, wanted_parts=2)
print split_list(A, wanted_parts=8)
f = lambda A, n=3: [A[i:i+n] for i in range(0, len(A), n)]
f(A)
n - the predefined length of result arrays
def split(arr, size):
arrs = []
while len(arr) > size:
pice = arr[:size]
arrs.append(pice)
arr = arr[size:]
arrs.append(arr)
return arrs
Test:
x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
print(split(x, 5))
result:
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13]]
If you don't care about the order...
def split(list):
return list[::2], list[1::2]
list[::2] gets every second element in the list starting from the 0th element.
list[1::2] gets every second element in the list starting from the 1st element.
Using list slicing. The syntax is basically my_list[start_index:end_index]
>>> i = [0,1,2,3,4,5]
>>> i[:3] # same as i[0:3] - grabs from first to third index (0->2)
[0, 1, 2]
>>> i[3:] # same as i[3:len(i)] - grabs from fourth index to end
[3, 4, 5]
To get the first half of the list, you slice from the first index to len(i)//2 (where // is the integer division - so 3//2 will give the floored result of1, instead of the invalid list index of1.5`):
>>> i[:len(i)//2]
[0, 1, 2]
..and the swap the values around to get the second half:
>>> i[len(i)//2:]
[3, 4, 5]
B,C=A[:len(A)/2],A[len(A)/2:]
Here is a common solution, split arr into count part
def split(arr, count):
return [arr[i::count] for i in range(count)]
def splitter(A):
B = A[0:len(A)//2]
C = A[len(A)//2:]
return (B,C)
I tested, and the double slash is required to force int division in python 3. My original post was correct, although wysiwyg broke in Opera, for some reason.
If you have a big list, It's better to use itertools and write a function to yield each part as needed:
from itertools import islice
def make_chunks(data, SIZE):
it = iter(data)
# use `xragne` if you are in python 2.7:
for i in range(0, len(data), SIZE):
yield [k for k in islice(it, SIZE)]
You can use this like:
A = [0, 1, 2, 3, 4, 5, 6]
size = len(A) // 2
for sample in make_chunks(A, size):
print(sample)
The output is:
[0, 1, 2]
[3, 4, 5]
[6]
Thanks to #thefourtheye and #Bede Constantinides
This is similar to other solutions, but a little faster.
# Usage: split_half([1,2,3,4,5]) Result: ([1, 2], [3, 4, 5])
def split_half(a):
half = len(a) >> 1
return a[:half], a[half:]
There is an official Python receipe for the more generalized case of splitting an array into smaller arrays of size n.
from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
This code snippet is from the python itertools doc page.
10 years later.. I thought - why not add another:
arr = 'Some random string' * 10; n = 4
print([arr[e:e+n] for e in range(0,len(arr),n)])
While the answers above are more or less correct, you may run into trouble if the size of your array isn't divisible by 2, as the result of a / 2, a being odd, is a float in python 3.0, and in earlier version if you specify from __future__ import division at the beginning of your script. You are in any case better off going for integer division, i.e. a // 2, in order to get "forward" compatibility of your code.
#for python 3
A = [0,1,2,3,4,5]
l = len(A)/2
B = A[:int(l)]
C = A[int(l):]
General solution split list into n parts with parameter verification:
def sp(l,n):
# split list l into n parts
if l:
p = len(l) if n < 1 else len(l) // n # no split
p = p if p > 0 else 1 # split down to elements
for i in range(0, len(l), p):
yield l[i:i+p]
else:
yield [] # empty list split returns empty list
Since there was no restriction put on which package we can use.. Numpy has a function called split with which you can easily split an array any way you like.
Example
import numpy as np
A = np.array(list('abcdefg'))
np.split(A, 2)
With hints from #ChristopheD
def line_split(N, K=1):
length = len(N)
return [N[i*length/K:(i+1)*length/K] for i in range(K)]
A = [0,1,2,3,4,5,6,7,8,9]
print line_split(A,1)
print line_split(A,2)
Another take on this problem in 2020 ... Here's a generalization of the problem. I interpret the 'divide a list in half' to be .. (i.e. two lists only and there shall be no spillover to a third array in case of an odd one out etc). For instance, if the array length is 19 and a division by two using // operator gives 9, and we will end up having two arrays of length 9 and one array (third) of length 1 (so in total three arrays). If we'd want a general solution to give two arrays all the time, I will assume that we are happy with resulting duo arrays that are not equal in length (one will be longer than the other). And that its assumed to be ok to have the order mixed (alternating in this case).
"""
arrayinput --> is an array of length N that you wish to split 2 times
"""
ctr = 1 # lets initialize a counter
holder_1 = []
holder_2 = []
for i in range(len(arrayinput)):
if ctr == 1 :
holder_1.append(arrayinput[i])
elif ctr == 2:
holder_2.append(arrayinput[i])
ctr += 1
if ctr > 2 : # if it exceeds 2 then we reset
ctr = 1
This concept works for any amount of list partition as you'd like (you'd have to tweak the code depending on how many list parts you want). And is rather straightforward to interpret. To speed things up , you can even write this loop in cython / C / C++ to speed things up. Then again, I've tried this code on relatively small lists ~ 10,000 rows and it finishes in a fraction of second.
Just my two cents.
Thanks!
from itertools import islice
Input = [2, 5, 3, 4, 8, 9, 1]
small_list_length = [1, 2, 3, 1]
Input1 = iter(Input)
Result = [list(islice(Input1, elem)) for elem in small_list_length]
print("Input list :", Input)
print("Split length list: ", small_list_length)
print("List after splitting", Result)
You can try something like this with numpy
import numpy as np
np.array_split([1,2,3,4,6,7,8], 2)
result:
[array([1, 2, 3, 4]), array([6, 7, 8])]