Remove Python list element - python

I have two list,
l1 = [1,2,3,4,5,6]
l2 = [3,2]
what i want is to remove the element of list l1 which is in l2, for that i have done something like this,
for x in l1:
if x in l2:
l1.remove(x)
it gives output like
[1, 3, 4, 5, 6]
but the output should be like
[1, 4, 5, 6]
can any one shed light on this.

This is easily explained like this.
consider the first array you have:
| 1 | 2 | 3 | 4 | 5 | 6 |
Now you start iterating
| 1 | 2 | 3 | 4 | 5 | 6 |
^
Nothing happens, iterator increments
| 1 | 2 | 3 | 4 | 5 | 6 |
^
2 gets removed
| 1 | 3 | 4 | 5 | 6 |
^
iterator increments
| 1 | 3 | 4 | 5 | 6 |
^
And voila, 3 is still there.
The solution is to iterate ove a copy of the vector e.g.
for x in l1[:]: <- slice on entire array
if x in l2:
l1.remove(x)
or to iterate in reverse:
for x in reversed(l1):
if x in l2:
l1.remove(x)
Which acts like this:
| 1 | 2 | 3 | 4 | 5 | 6 |
^
| 1 | 2 | 3 | 4 | 5 | 6 |
^
| 1 | 2 | 4 | 5 | 6 |
^
| 1 | 2 | 4 | 5 | 6 |
^
| 1 | 4 | 5 | 6 |
^
| 1 | 4 | 5 | 6 |
^

Why not making it a bit simpler? No need to actually iterate over l1 if we only want to remove elements present in l2:
for item in l2:
while item in l1:
l1.remove(item)
This gives you exactly the output desired...
Also, as commenters point out, if there is a possibility that we can have duplicates:
l1 = filter(lambda x: x not in l2, l1)
.. or many other variations using list comprehensions.

You want the outer loop to read:
for x in l1[:]:
...
You can't change a list while iterating over it and expect reasonable results. The above trick makes a copy of l1 and iterates over the copy instead.
Note that if order doesn't matter in the output list, and your elements are unique and hashable, you could use a set:
set(l1).difference(l2)
which will give you a set as output, but you can construct a list from it easily:
l1 = list(set(l1).difference(l2))

As others have said, you can't edit a list while you loop over it. A good option here is to use a list comprehension to create a new list.
removals = set(l2)
l1 = [item for item in l1 if item not in removals]
We make a set as a membership check on a set is significantly faster than on a list.

If the order and loss of duplicates in l1 do not matter:
list(set(l1) - set(l2))
The last list() is only required if you need the result as a list. You could also just use the resulting set, it's also iterable.
If you need it ordered you can of course call l.sort() on the resulting list.

Edit: Removed my original answer because even though it did give correct results, it did so for non-intuitive reasons, and is was not very fast either... so I've just left the timings:
import timeit
setup = """l1 = list(range(20)) + list(range(20))
l2 = [2, 3]"""
stmts = {
"mgilson": """for x in l1[:]:
if x in l2:
l1.remove(x)""",
"petr": """for item in l2:
while item in l1:
l1.remove(item)""",
"Lattyware": """removals = set(l2)
l1 = [item for item in l1 if item not in removals]""",
"millimoose": """for x in l2:
try:
while True: l1.remove(x)
except ValueError: pass""",
"Latty_mgilson": """removals = set(l2)
l1[:] = (item for item in l1 if item not in removals)""",
"mgilson_set": """l1 = list(set(l1).difference(l2))"""
}
for idea in stmts:
print("{0}: {1}".format(idea, timeit.timeit(setup=setup, stmt=stmts[idea])))
Results (Python 3.3.0 64bit, Win7):
mgilson_set: 2.5841989922197333
mgilson: 3.7747968857414813
petr: 1.9669433777815701
Latty_mgilson: 7.262900152285258
millimoose: 3.1890831105541793
Lattyware: 4.573971325181478

You're modifying the list l1 while you're iterating over it, this will cause weird behaviour. (The 3 will get skipped during iteration.)
Either iterate over a copy, or change your algorithm to iterate over l2 instead:
for x in l2:
try:
while True: l1.remove(x)
except ValueError: pass
(This should perform better than testing if x in l1 explicitly.) Nope, this performs terribly as l1 grows in size.

FWIW I get significantly different results than #Tim Pietzcker did using what I believe is more realistic input data set and by using a little more rigorous (but otherwise the same) approach to timing different people's answers.
The names and code snippets are the same as Tim's except I added a variation of the one named Lattyware called Lattyware_rev which determines what elements to keep rather than reject -- it turned out to be a slower than the former. Note that the two fastest don't preserve the order of l1.
Here's the latest timing code:
import timeit
setup = """
import random
random.seed(42) # initialize to constant to get same test values
l1 = [random.randrange(100) for _ in xrange(100)]
l2 = [random.randrange(100) for _ in xrange(10)]
"""
stmts = {
"Minion91": """
for x in reversed(l1):
if x in l2:
l1.remove(x)
""",
"mgilson": """
for x in l1[:]: # correction
if x in l2:
l1.remove(x)
""",
"mgilson_set": """
l1 = list(set(l1).difference(l2))
""",
"Lattyware": """
removals = set(l2)
l1 = [item for item in l1 if item not in removals]
""",
"Lattyware_rev": """
keep = set(l1).difference(l2)
l1 = [item for item in l1 if item in keep]
""",
"Latty_mgilson": """
removals = set(l2)
l1[:] = (item for item in l1 if item not in removals)""",
"petr": """
for item in l2:
while item in l1:
l1.remove(item)
""",
"petr (handles dups)": """
l1 = filter(lambda x: x not in l2, l1)
""",
"millimoose": """
for x in l2:
try:
while True: l1.remove(x)
except ValueError: pass
""",
"K.-Michael Aye": """
l1 = list(set(l1) - set(l2))
""",
}
N = 10000
R = 3
timings = [(idea,
min(timeit.repeat(stmts[idea], setup=setup, repeat=R, number=N)),
) for idea in stmts]
longest = max(len(t[0]) for t in timings) # length of longest name
exec(setup) # get an l1 & l2 just for heading length measurements
print('fastest to slowest timings of ideas:\n' +\
' ({:,d} timeit calls, best of {:d} executions)\n'.format(N, R)+\
' len(l1): {:,d}, len(l2): {:,d})\n'.format(len(l1), len(l2)))
for i in sorted(timings, key=lambda x: x[1]): # sort by speed (fastest first)
print "{:>{width}}: {}".format(*i, width=longest)
Output:
fastest to slowest timings of ideas:
(10,000 timeit calls, best of 3 executions)
len(l1): 100, len(l2): 10)
mgilson_set: 0.143126456832
K.-Michael Aye: 0.213544010551
Lattyware: 0.23666971551
Lattyware_rev: 0.466918513924
Latty_mgilson: 0.547516608553
petr: 0.552547776807
mgilson: 0.614238139366
Minion91: 0.728920176815
millimoose: 0.883061820848
petr (handles dups): 0.984093136969
Of course, please let me know if there's something radically wrong that would explain the radically different results.

l1 = [1, 2, 3, 4, 5, 6]
l2 = [3, 2]
[l1.remove(x) for x in l2]
print l1
[1, 4, 5, 6]

Related

Default stepsize is already 1, so why does adding [::1] change my output?

I encountered something I don't understand. I created a simple example in order to explain it. I have the following list:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
I iterate through the list, printing the value at the current index, and remove the value if it is even. This results in the next value getting skipped because they all move up one index (filling the gap created by the removal).
for i in numbers:
print(i)
if i % 2 == 0:
numbers.remove(i)
This results in the following output, which is as expected:
1
2
4
6
8
10
When iterating through the list backwards, this problem will be avoided, since all the lower indexes won't be affected by the removal of a higher index.
for i in numbers[::-1]:
print(i)
if i % 2 == 0:
numbers.remove(i)
Which results in (as expected):
10
9
8
7
6
5
4
3
2
1
Now we've arrived at the part I don't understand. As far as I know the default stepsize is 1, so adding [::1] shouldn't make any difference right? Well, it does...
for i in numbers[::1]:
print(i)
if i % 2 == 0:
numbers.remove(i)
Which results in:
1
2
3
4
5
6
7
8
9
10
As yous see all the numbers are getting printed, while I expected some to be skipped due to the shifting explained earlier.
Can someone explain why this is?
So, the answer is already there in the comments, just a proper explanation here:
Basically, when you say:
for i in numbers:
print(i)
if i % 2 == 0:
numbers.remove(i)
You are iterating through the list numbers
But when you write:
for i in numbers[::1]:
print(i)
if i % 2 == 0:
numbers.remove(i)
It can be seen as an equivalent to:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]# <-------------
num2 = numbers.copy() #or numbers[::1] <-------------- |
# i.e. python automatically creates a copy of the | |
# list numbers to iterate through | |
# since it seems like you are iterating through a | |
# slice of the original list | |
# | |
for i in numbers[::1]: # iterates through------------- |
print(i) # |
if i % 2 == 0: # |
numbers.remove(i) # this removes elements from --
The difference? The copy created is only stored until iteration is done. Hence even when the elements are removed, since the current list is the same, no number is skipped.

Adding None Value Always gives 0

I want to create a function that adds the numbers in the columns in a matrix and output a vector made of the sum. However if there is a "None" value in the matrix, the output vector gets a "None" value for that column automatically. I cannot figure out how to do the part for the "None" value.
I tried the following code.
def sum_matrix (matrix):
#
# | 1 2 3 |
# | 1 2 3 |
# | 1 2 3 | -> |4 8 12|
# | 1 2 3 |
# _________
# 4 8 12
vektor = [[0] for i in range(0,len(matrix[0]))]
for j in range(0, len(matrix[0])): #rows 0-3 4
buffer = 0
for i in range(0, len(matrix)): #columns 3
if matrix[i][j] !=None:
buffer = buffer + matrix[i][j]
#vektor[j][0] = buffer
elif matrix[i][j] ==None:
vektor[j][0] = None
vektor[j][0] = buffer
return vektor
print (sum_matrix ([[0,0,0],[0,0,1],[0,1,0],[0,0,0]]))
print (sum_matrix ([[0,0,0],[0,None,1],[0,1,None],[0,0,0]]))
For sum_matrix ([[0,0,0],[0,0,1],[0,1,0],[0,0,0]]), I get [[0],[1],[1]] which is good.
For sum_matrix ([[0,0,0],[0,None,1],[0,1,None],[0,0,0]]), I still get [[0],[1],[1]] even though I am supposed to get [[0],[None],[None]]
As I always say, you should differentiate a matrix (a mathematical abstraction) from its implementation (a list of lists).
Now, what we have here is basically a list of lists where each inner list represents a row, but we want to take the sum of each column, with the additional constraint that it should be None whenever it contains at least one None value.
The simplest way to do this, I would say, is using a list comprehension in conjunction with zip, which effectively transposes your matrix:
def sum_matrix(m):
transposed = zip(*m)
summed = [[sum(col) if None not in col else None]
for col in transposed]
return summed
print(sum_matrix([[0,0,0],[0,0,1],[0,1,0],[0,0,0]]))
print(sum_matrix([[0,0,0],[0,None,1],[0,1,None],[0,0,0]]))
Output:
[[0], [1], [1]]
[[0], [None], [None]]
Note: you can also couch the inner list comprehension as [None if None in col else sum(col)], but I prefer to put the "normal" case first.
You could also convert col to a set, which allows constant time lookups, but actual conversion to a set is linear time, and since we're only iterating over each column once, I don't think it'll be faster.
when matrix[i][j]==None you need to store in the buffer None, since you're changing the value of vektor when exiting the inner loop, so the vektor will always take the value of the buffer
def sum_matrix (matrix):
#
# | 1 2 3 |
# | 1 2 3 |
# | 1 2 3 | -> |4 8 12|
# | 1 2 3 |
# _________
# 4 8 12
vektor = [[0] for i in range(0,len(matrix[0]))]
for j in range(0, len(matrix[0])): #rows 0-3 4
buffer = 0
for i in range(0, len(matrix)): #columns 3
if matrix[i][j] !=None:
buffer = buffer + matrix[i][j]
#vektor[j][0] = buffer
elif matrix[i][j] ==None:
buffer = None
break
vektor[j][0] = buffer
return vektor

Python - Place a character in between a formatted string

Lets say I want to print out
Item 1 | Item 2
A Third item | #4
Which can be done without the | by doing
print('%s20 %20s' % (value1,value2))
How would I go about placing the | character so that it is evenly justified between the two formatted values?
I suppose I could manually could the length of the formatted string without the | character and then insert the | in the middle but I am hoping for a more elegant solution!
Thank you very much for your time.
Edit: Here is a solution that I suggested would be possible
def PrintDoubleColumn(value1, value2):
initial = '%s20 %20s' % (value1,value2)
lenOfInitial = len(initial)
print(initial[:lenOfInitial/2] +'|' + initial[lenOfInitial/2+1:])
There is a good source for string format operations: https://pyformat.info/#string_pad_align
x = [1, 2, 3, 4, 5]
y = ['a', 'b', 'c', 'd', 'e']
for i in range(0, 5):
print "{0:<20}|{1:>20}".format(x[i], y[i])
Result:
1 | a
2 | b
3 | c
4 | d
5 | e

Fastest way to get union of lists - Python

There's a C++ comparison to get union of lists from lists of lists: The fastest way to find union of sets
And there's several other python related questions but none suggest the fastest way to unionize the lists:
Finding a union of lists of lists in Python
Flattening a shallow list in Python
From the answers, I've gathered that there are at least 2 ways to do it:
>>> from itertools import chain
>>> x = [[1,2,3], [3,4,5], [1,7,8]]
>>> list(set().union(*x))
[1, 2, 3, 4, 5, 7, 8]
>>> list(set(chain(*x)))
[1, 2, 3, 4, 5, 7, 8]
Note that I'm casting the set to list afterwards because I need the order of the list to be fixed for further processing.
After some comparison, it seems like list(set(chain(*x))) is more stable and takes less time:
from itertools import chain
import time
import random
# Dry run.
x = [[random.choice(range(10000))
for i in range(10)] for j in range(10)]
list(set().union(*x))
list(set(chain(*x)))
y_time = 0
z_time = 0
for _ in range(1000):
x = [[random.choice(range(10000))
for i in range(10)] for j in range(10)]
start = time.time()
y = list(set().union(*x))
y_time += time.time() - start
#print 'list(set().union(*x)):\t', y_time
start = time.time()
z = list(set(chain(*x)))
z_time += time.time() - start
#print 'list(set(chain(*x))):\t', z_time
assert sorted(y) == sorted(z)
#print
print y_time / 1000.
print z_time / 1000.
[out]:
1.39586925507e-05
1.09834671021e-05
Taking out the variable of casting sets to list:
y_time = 0
z_time = 0
for _ in range(1000):
x = [[random.choice(range(10000))
for i in range(10)] for j in range(10)]
start = time.time()
y = set().union(*x)
y_time += time.time() - start
start = time.time()
z = set(chain(*x))
z_time += time.time() - start
assert sorted(y) == sorted(z)
print y_time / 1000.
print z_time / 1000.
[out]:
1.22241973877e-05
1.02684497833e-05
Here's the full output when I try to print the intermediate timings (without list casting): http://pastebin.com/raw/y3i6dXZ8
Why is it that list(set(chain(*x))) takes less time than list(set().union(*x))?
Is there another way of achieving the same union of lists? Using numpy or pandas or sframe or something? Is the alternative faster?
What's fastest depends on the nature of x -- whether it is a long list or a short list, with many sublists or few sublists, whether the sublists are long or short, and whether there are many duplicates or few duplicates.
Here are some timeit results comparing some alternatives. There are so many possibilities that this is by no means a complete analysis, but perhaps this will give you a framework for studying your use case.
func | x | time
unique_concatenate | many_uniques | 0.863
empty_set_union | many_uniques | 1.191
short_set_union_rest | many_uniques | 1.192
long_set_union_rest | many_uniques | 1.194
set_chain | many_uniques | 1.224
func | x | time
long_set_union_rest | many_duplicates | 0.958
short_set_union_rest | many_duplicates | 0.969
empty_set_union | many_duplicates | 0.971
set_chain | many_duplicates | 1.128
unique_concatenate | many_duplicates | 2.411
func | x | time
empty_set_union | many_small_lists | 1.023
long_set_union_rest | many_small_lists | 1.028
set_chain | many_small_lists | 1.032
short_set_union_rest | many_small_lists | 1.036
unique_concatenate | many_small_lists | 1.351
func | x | time
long_set_union_rest | few_large_lists | 0.791
empty_set_union | few_large_lists | 0.813
unique_concatenate | few_large_lists | 0.814
set_chain | few_large_lists | 0.829
short_set_union_rest | few_large_lists | 0.849
Be sure to run the timeit benchmarks on your own machine since results may vary.
from __future__ import print_function
import random
import timeit
from itertools import chain
import numpy as np
def unique_concatenate(x):
return np.unique(np.concatenate(x))
def short_set_union_rest(x):
# This assumes x[0] is the shortest list in x
return list(set(x[0]).union(*x[1:]))
def long_set_union_rest(x):
# This assumes x[-1] is the longest list in x
return list(set(x[-1]).union(*x[1:]))
def empty_set_union(x):
return list(set().union(*x))
def set_chain(x):
return list(set(chain(*x)))
big_range = list(range(10**7))
small_range = list(range(10**5))
many_uniques = [[random.choice(big_range) for i in range(j)]
for j in range(10, 10000, 10)]
many_duplicates = [[random.choice(small_range) for i in range(j)]
for j in range(10, 10000, 10)]
many_small_lists = [[random.choice(big_range) for i in range(10)]
for j in range(10, 10000, 10)]
few_large_lists = [[random.choice(big_range) for i in range(1000)]
for j in range(10, 100, 10)]
if __name__=='__main__':
for x, n in [('many_uniques', 1), ('many_duplicates', 4),
('many_small_lists', 800), ('few_large_lists', 800)]:
timing = dict()
for func in [
'unique_concatenate', 'short_set_union_rest', 'long_set_union_rest',
'empty_set_union', 'set_chain']:
timing[func, x] = timeit.timeit(
'{}({})'.format(func, x), number=n,
setup='from __main__ import {}, {}'.format(func, x))
print('{:20} | {:20} | {}'.format('func', 'x', 'time'))
for key, t in sorted(timing.items(), key=lambda item: item[1]):
func, x = key
print('{:20} | {:20} | {:.3f}'.format(func, x, t))
print(end='\n')

Slicing a list from the end

Say, I have a list of values:
>>> a = [1, 2, 3, 4]
How can I make it include the end value through slicing? I expected:
>>> a[4:]
[4]
instead of:
>>> a[4:]
[]
Slicing indices start from zero
So if you have:
>>> xs = [1, 2, 3, 4]
| | | |
V V V V
0 1 2 3 <-- index in xs
And you slice from 4 onwards you get:
>>> xs[4:]
[]
Four is is the length of ``xs`, not the last index!
However if you slice from 3 onwards (the last index of the list):
>>> xs[3:]
[4]
See: Data Structures
Many many common computer programmming langauges and software systems are in fact zero-based so please have a read of Zero-based Numbering
Python indexes are zero based. The last element is at index 3, not 4:
>>> a = [1,2,3,4]
>>> a[3:]
[4]
a = [1,2,3,4]
a[-1:]
In python you can iterate values from beginning to end ending to beginning
1, 2, 3, 4
| | | |
0 1 2 3 (or)
-4 -3 -2 -1
So If you want last element of the list you can use either a[len(a)-1:] or a[-1:]

Categories

Resources