I wish to compare to nested lists of unequal length. I am interested only in a match between the first element of each sub list. Should a match exist, I wish to add the match to another list for subsequent transformation into a tab delimited file. Here is an example of what I am working with:
x = [['1', 'a', 'b'], ['2', 'c', 'd']]
y = [['1', 'z', 'x'], ['4', 'z', 'x']]
match = []
def find_match():
for i in x:
for j in y:
if i[0] == j[0]:
match.append(j)
return match
This returns:
[['1', 'x'], ['1', 'y'], ['1', 'x'], ['1', 'y'], ['1', 'z', 'x']]
Would it be good practise to reprocess the list to remove duplicates or can this be done in a simpler fashion?
Also, is it better to use tuples and/or tuples of tuples for the purposes of comparison?
Any help is greatly appreciated.
Regards,
Seafoid.
Use sets to obtain collections with no duplicates.
You'll have to use tuples instead of lists as the items because set items must be hashable.
The code you posted doesn't seem to generate the output you posted. I do not have any idea how you are supposed to generate that output from that input. For example, the output has 'y' and the input does not.
I think the design of your function could be much improved. Currently you define x, y, and match as the module level and read and mutate them explicitly. This is not how you want to design functions—as a general rule, a function shouldn't mutate something at the global level. It should be explicitly passed everything it needs and return a result, not implicitly receive information and change something outside itself.
I would change
x = some list
y = some list
match = []
def find_match():
for i in x:
for j in y:
if i[0] == j[0]:
match.append(j)
return match # This is the only line I changed. I think you meant
# your return to be over here?
find_match()
to
x = some list
y = some list
def find_match(x, y):
match = []
for i in x:
for j in y:
if i[0] == j[0]:
match.append(j)
return match
match = find_match(x, y)
To take that last change to the next level, I usually replace the pattern
def f(...):
return_value = []
for...
return_value.append(foo)
return return_value
with the similar generator
def f(...):
for...
yield foo
which would make the above function
def find_match(x, y):
for i in x:
for j in y:
if i[0] == j[0]:
yield j
another way to express this generator's effect is with the generator expression (j for i in x for j in y if i[0] == j[0]).
I don't know if I interpret your question correctly, but given your example it seems that you might be using a wrong index:
change
if i[1] == j[1]:
into
if i[0] == j[0]:
You can do this a lot more simply by using sets.
set_x = set([i[0] for i in x])
set_y = set([i[0] for i in y])
matches = list(set_x & set_y)
if i[1] == j[1]
checks whether the second elements of the arrays are identical. You want if i[0] == j[0].
Otherwise, I find your code quite readable and wouldn't necessarily change it.
A simplier expression should work here too:
list_of_lists = filter(lambda l: l[0][0] == l[1][0], zip(x, y))
map(lambda l: l[1], list_of_lists)
Related
I am attempting to perform a list comprehension. I want to check the values in a smaller list with the values of the larger list. I think my code works. That is until one of my inner lists is empty.
The logic makes sense... there is no element at position 0 of the smaller list therefore index error:
['w', 'c']
if x[0] != y[0]:
['w', 'c']
IndexError: list index out of range
However, what I want to know is what is the proper way to write this s/t it won't error here and instead just assume there was no match and move on to the next list within list_one?
Here is my code:
a = [['a', 'b', 'c'], ['w', 'c'], []]
b = [['a', 'b', 'c'], ['a', 'f', 'g'], ['x', 'y', 'z']]
def check_contents(list_one, list_two):
if len(list_one)<=len(list_two):
for x in list_one:
for y in list_two:
if x[0] != y[0]:
print(x)
else:
for x in list_two:
for y in list_one:
if x[0] != y[0]:
print(x)
check_contents(a, b)
First, your two loops do the same thing. DRY (Don't Repeat Yourself). Second, to see if a list is empty, check its truth value. Empty lists evaluate to False.
def check_contents(list_one, list_two):
shorter, longer = sorted([list_one, list_two], key = len)
for x in longer:
if not x:
continue
for y in shorter:
if not y:
continue
if x[0] != y[0]:
print(x)
Try this:
for x, y in zip(list_one, list_two):
if x and y and x[0] != y[0]:
print(x)
else:
# Rest of the code here
Using the zip() function creates a zip object so that you can iterate through both list-one and list-two at the same time, comparing their elements. This takes care of your empty list problem as well.
You can change your conditional to this:
if x and x[0] != y[0]:
Empty lists are falsy, and non-empty lists are truthy, thus this only evaluates x[0] != y[0] if x is non empty (i.e. x[0] exists).
I have a list of the following structure which is simplified for the purpose of this question:
x = ["f","f","f",0,"f",0,0,0"f","f"]
Where "f" represents a file path as a string. What I wish to do is remove all elements from the list equal to zero. I have tried iterating over like so:
for h in range(len(x)):
if x[g] == "0":
del x[g]
else:
pass
This has not worked as deleting from a list being iterated over does not work and it seems to be like list comprehension is the answer I am looking for but I can't seem to get the formatting down:
x = [h for h in range(len(x)) if h != 0]
So my final desired output would be:
x = ["f","f","f",f","f",f"]
How do I go about achieveing this?
EDIT: Patrick's answer in the comments below is exactly what I was looking and solves this.
I want also note that in a case when x is a large size list you may want iterator to be returned after list filtering because using iterator on a large data set may increase your performance (See Performance Advantages to Iterators? for more information).
In this case you can use built-in filter(function, iterable) function that will return iterator in a Python 3.
x = ["f","f","f",0,"f",0,0,0,"f","f"]
for ch in filter(lambda sym: sym != 0, x):
print(ch)
As a result only elements not equal to 0 will be printed.
See https://docs.python.org/3/library/functions.html#filter to get more information about the filter built-in.
Another approach is to use generator comprehension. The note here is that you will be able to iterate over the generator just once. But you will also have some benefits, each result will be evaluating and yielding on the fly so your memory will be conserved by using a generator expression instead.
Just use this example to use generator comprehension:
x = ["f","f","f",0,"f",0,0,0,"f","f"]
for ch in (sym for sym in x if sym !=0):
print(ch)
See more generator comprehension examples and advantages here https://www.python.org/dev/peps/pep-0289/
Also be accurate filtering results with True value testing.
Any object can be tested for truth value, for use in an if or while condition or as operand of the Boolean operations below. The following values are considered false:
None
False
zero of any numeric type, for example, 0, 0.0, 0j.
any empty sequence, for example, '', (), [].
any empty mapping, for example, {}.
instances of user-defined classes, if the class defines a bool() or len() method, when that method returns the integer zero or bool value False.
So expression [sym for sym in x if sym] will remove all False symbols (i.e. False, '', etc.) from your results. So be accurate using True value testing (see https://docs.python.org/3.6/library/stdtypes.html#truth-value-testing for more information).
You can replace all 0 and "0" with:
li = ["f","f","f",0,"f",0,"0",0,"f","f"]
[y for y in l if y not in ("0",0)]
# results:
['f', 'f', 'f', 'f', 'f', 'f']
Try this:
x = [h for h in x if h != 0]
Some small bits of advice:
You don't have to specify else: pass.
You can iterate directly over a list, i.e. instead of:
for i in range(len(x)):
x[i]
you can simply use:
for i in x:
i
if x[g] == "0" would only check if x[g] is equal to the string "0", not the actual number. Judging by your code, you want if x[g] == 0 instead.
I hope this piece of code will serve your purpose.
x = [i for i in x if i]
It's mean get all value from x which is not 0
The reason you can't do this looping forwards is because you're shortening the list as you go along, but if you loop backwards you won't have this issue:
>>> x = ["f","f","f","0","f","0","0","0","f","f"]
>>> for h in range(len(x)-1,-1,-1):
... if x[h] == "0":
... del x[h]
...
>>> x
['f', 'f', 'f', 'f', 'f', 'f']
I want to sort a list of strings based on the string length. I tried to use sort as follows, but it doesn't seem to give me correct result.
xs = ['dddd','a','bb','ccc']
print xs
xs.sort(lambda x,y: len(x) < len(y))
print xs
['dddd', 'a', 'bb', 'ccc']
['dddd', 'a', 'bb', 'ccc']
What might be wrong?
When you pass a lambda to sort, you need to return an integer, not a boolean. So your code should instead read as follows:
xs.sort(lambda x,y: cmp(len(x), len(y)))
Note that cmp is a builtin function such that cmp(x, y) returns -1 if x is less than y, 0 if x is equal to y, and 1 if x is greater than y.
Of course, you can instead use the key parameter:
xs.sort(key=lambda s: len(s))
This tells the sort method to order based on whatever the key function returns.
EDIT: Thanks to balpha and Ruslan below for pointing out that you can just pass len directly as the key parameter to the function, thus eliminating the need for a lambda:
xs.sort(key=len)
And as Ruslan points out below, you can also use the built-in sorted function rather than the list.sort method, which creates a new list rather than sorting the existing one in-place:
print(sorted(xs, key=len))
The same as in Eli's answer - just using a shorter form, because you can skip a lambda part here.
Creating new list:
>>> xs = ['dddd','a','bb','ccc']
>>> sorted(xs, key=len)
['a', 'bb', 'ccc', 'dddd']
In-place sorting:
>>> xs.sort(key=len)
>>> xs
['a', 'bb', 'ccc', 'dddd']
The easiest way to do this is:
list.sort(key = lambda x:len(x))
I Would like to add how the pythonic key function works while sorting :
Decorate-Sort-Undecorate Design Pattern :
Python’s support for a key function when sorting is implemented using what is known as the
decorate-sort-undecorate design pattern.
It proceeds in 3 steps:
Each element of the list is temporarily replaced with a “decorated” version that includes the result of the key function applied to the element.
The list is sorted based upon the natural order of the keys.
The decorated elements are replaced by the original elements.
Key parameter to specify a function to be called on each list element prior to making comparisons. docs
Write a function lensort to sort a list of strings based on length.
def lensort(a):
n = len(a)
for i in range(n):
for j in range(i+1,n):
if len(a[i]) > len(a[j]):
temp = a[i]
a[i] = a[j]
a[j] = temp
return a
print lensort(["hello","bye","good"])
I can do it using below two methods, using function
def lensort(x):
list1 = []
for i in x:
list1.append([len(i),i])
return sorted(list1)
lista = ['a', 'bb', 'ccc', 'dddd']
a=lensort(lista)
print([l[1] for l in a])
In one Liner using Lambda, as below, a already answered above.
lista = ['a', 'bb', 'ccc', 'dddd']
lista.sort(key = lambda x:len(x))
print(lista)
def lensort(list_1):
list_2=[];list_3=[]
for i in list_1:
list_2.append([i,len(i)])
list_2.sort(key = lambda x : x[1])
for i in list_2:
list_3.append(i[0])
return list_3
This works for me!
I want to sort a list of strings based on the string length. I tried to use sort as follows, but it doesn't seem to give me correct result.
xs = ['dddd','a','bb','ccc']
print xs
xs.sort(lambda x,y: len(x) < len(y))
print xs
['dddd', 'a', 'bb', 'ccc']
['dddd', 'a', 'bb', 'ccc']
What might be wrong?
When you pass a lambda to sort, you need to return an integer, not a boolean. So your code should instead read as follows:
xs.sort(lambda x,y: cmp(len(x), len(y)))
Note that cmp is a builtin function such that cmp(x, y) returns -1 if x is less than y, 0 if x is equal to y, and 1 if x is greater than y.
Of course, you can instead use the key parameter:
xs.sort(key=lambda s: len(s))
This tells the sort method to order based on whatever the key function returns.
EDIT: Thanks to balpha and Ruslan below for pointing out that you can just pass len directly as the key parameter to the function, thus eliminating the need for a lambda:
xs.sort(key=len)
And as Ruslan points out below, you can also use the built-in sorted function rather than the list.sort method, which creates a new list rather than sorting the existing one in-place:
print(sorted(xs, key=len))
The same as in Eli's answer - just using a shorter form, because you can skip a lambda part here.
Creating new list:
>>> xs = ['dddd','a','bb','ccc']
>>> sorted(xs, key=len)
['a', 'bb', 'ccc', 'dddd']
In-place sorting:
>>> xs.sort(key=len)
>>> xs
['a', 'bb', 'ccc', 'dddd']
The easiest way to do this is:
list.sort(key = lambda x:len(x))
I Would like to add how the pythonic key function works while sorting :
Decorate-Sort-Undecorate Design Pattern :
Python’s support for a key function when sorting is implemented using what is known as the
decorate-sort-undecorate design pattern.
It proceeds in 3 steps:
Each element of the list is temporarily replaced with a “decorated” version that includes the result of the key function applied to the element.
The list is sorted based upon the natural order of the keys.
The decorated elements are replaced by the original elements.
Key parameter to specify a function to be called on each list element prior to making comparisons. docs
Write a function lensort to sort a list of strings based on length.
def lensort(a):
n = len(a)
for i in range(n):
for j in range(i+1,n):
if len(a[i]) > len(a[j]):
temp = a[i]
a[i] = a[j]
a[j] = temp
return a
print lensort(["hello","bye","good"])
I can do it using below two methods, using function
def lensort(x):
list1 = []
for i in x:
list1.append([len(i),i])
return sorted(list1)
lista = ['a', 'bb', 'ccc', 'dddd']
a=lensort(lista)
print([l[1] for l in a])
In one Liner using Lambda, as below, a already answered above.
lista = ['a', 'bb', 'ccc', 'dddd']
lista.sort(key = lambda x:len(x))
print(lista)
def lensort(list_1):
list_2=[];list_3=[]
for i in list_1:
list_2.append([i,len(i)])
list_2.sort(key = lambda x : x[1])
for i in list_2:
list_3.append(i[0])
return list_3
This works for me!
Let's say i have a multidimensional list l:
l = [['a', 1],['b', 2],['c', 3],['a', 4]]
and I want to return another list consisting only of the rows that has 'a' in their first list element:
m = [['a', 1],['a', 4]]
What's a good and efficient way of doing this?
Definitely a case for a list comprehension:
m = [row for row in l if 'a' in row[0]]
Here I'm taking your "having 'a' in the first element" literally, whence the use of the in operator. If you want to restrict this to "having 'a' as the first element" (a very different thing from what you actually wrote!-), then
m = [row for row in l if 'a' == row[0]]
is more like it;-).
m = [i for i in l if i[0] == 'a']
With the filter function:
m = filter(lambda x: x[0] == 'a', l)
or as a list comprehension:
m = [x for x in l where x[0] == 'a']
What's wrong with just:
m = [i for i in l if i[0] == 'a']
Or:
m = filter(lambda x: x[0] == 'a', l)
I doubt the difference between these will be significant performance-wise. Use whichever is most convenient. I don't like lambdas, but the filter can be replaced with itertools.ifilter for larger lists if that's a problem, but you can also change the list comprehension to a generator (change the [] to ()) to achieve the same general result. Other than that, they're probably identical.
[i for i in l if i[0]=='a']
btw, take a look at Python's list comprehension with conditions.