Remove empty string from list - python

I just started Python classes and I'm really in need of some help. Please keep in mind that I'm new if you're answering this.
I have to make a program that takes the average of all the elements in a certain list "l". That is a pretty easy function by itself; the problem is that the teacher wants us to remove any empty string present in the list before doing the average.
So when I receive the list [1,2,3,'',4] I want the function to ignore the '' for the average, and just take the average of the other 4/len(l). Can anyone help me with this?
Maybe a cycle that keeps comparing a certain position from the list with the '' and removes those from the list? I've tried that but it's not working.

You can use a list comprehension to remove all elements that are '':
mylist = [1, 2, 3, '', 4]
mylist = [i for i in mylist if i != '']
Then you can calculate the average by taking the sum and dividing it by the number of elements in the list:
avg = sum(mylist)/len(mylist)
Floating Point Average (Assuming python 2)
Depending on your application you may want your average to be a float and not an int. If that is the case, cast one of these values to a float first:
avg = float(sum(mylist))/len(mylist)
Alternatively you can use python 3's division:
from __future__ import division
avg = sum(mylist)/len(mylist)

You can use filter():
filter() returns a list in Python 2 if we pass it a list and an iterator in Python 3. As suggested by #PhilH you can use itertools.ifilter() in Python 2 to get an iterator.
To get a list as output in Python 3 use list(filter(lambda x:x != '', lis))
In [29]: lis = [1, 2, 3, '', 4, 0]
In [30]: filter(lambda x:x != '', lis)
Out[30]: [1, 2, 3, 4, 0]
Note to filter any falsy value you can simply use filter(None, ...):
>>> lis = [1, 2, 3, '', 4, 0]
>>> filter(None, lis)
[1, 2, 3, 4]

The other answers show you how to create a new list with the desired element removed (which is the usual way to do this in python). However, there are occasions where you want to operate on a list in place -- Here's a way to do it operating on the list in place:
while True:
try:
mylist.remove('')
except ValueError:
break
Although I suppose it could be argued that you could do this with slice assignment and a list comprehension:
mylist[:] = [i for i in mylist if i != '']
And, as some have raised issues about memory usage and the wonders of generators:
mylist[:] = (i for i in mylist if i != '')
works too.

itertools.ifilterfalse(lambda x: x=='', myList)
This uses iterators, so it doesn't create copies of the list and should be more efficient both in time and memory, making it robust for long lists.
JonClements points out that this means keeping track of the length separately, so to show that process:
def ave(anyOldIterator):
elementCount = 0
runningTotal = 0
for element in anyOldIterator:
runningTotal += element
elementCount += 1
return runningTotal/elementCount
Or even better
def ave(anyOldIterator):
idx = None
runningTotal = 0
for idx,element in enumerate(anyOldIterator):
runningTotal += element
return runningTotal/(idx+1)
Reduce:
def ave(anyOldIterator):
pieces = reduce(lambda x,y: (y[0],x[1]+y[1]), enumerate(anyOldIterator))
return pieces[1]/(pieces[0]+1)
Timeit on the average of range(0,1000) run 10000 times gives the list comprehension a time of 0.9s and the reduce version 0.16s. So it's already 5x faster before we add in filtering.

You can use:
alist = ['',1,2]
new_alist = filter(None, alist)
new_alist_2 = filter(bool, alist)
Result:
new_alist = [1,2]
new_alist_2 = [1,2]

mylist = [1, 2, 3, '', 4]
newlist = []
for i in mylist:
try:
newlist.append(int(i))
except ValueError:
pass
avg = sum(newlist)/len(newlist)

'' is equivalent to False. If we filter the 0 case out (because 0 is equivalent to False), we can use list comprehension :
[x for x in a if x or x == 0]
Or if we strictly want to filter out empty strings :
[x for x in a if x != '']
This may not be the fastest way.
Edit, added some bench results comparing with the other solutions (not for the sake of comparing myself to others, but I was curious too of what method was the fastest)
ragsagar>
6.81217217445
pistache>
1.0873541832
cerealy>
1.07090902328
Matt>
1.40736508369
Ashwini Chaudhary>
2.04662489891
Phil H (just the generator) >
0.935978889465
Phil H with list() >
3.58926296234
I made the script quickly, using timeit(), I used [0,1,2,0,3,4,'',5,8,0,'',4] as the list. I ran multiple tests, results did not vary.
NOTE: I'm not trying to put my solution on top using speed as a criteria. I know OP didn't specifically ask for speed, but I was curious and maybe some other are.

Related

Recursively multiply all values in an array Python

I've had a look through the forums and can't find anything to do with multiplying all elements in an array recursively.
I've created the following code that almost does what I want. The goal is to use no loops and only recursion.
Here's the code:
def multAll(k,A):
multAllAux(k,A)
return A[:]
def multAllAux(k,A):
B = [0]
if A == []:
return 0
else:
B[0] = (A[0] * k)
B.append(multAllAux(k,A[1:]))
return B
print(multAllAux(10, [5,12,31,7,25] ))
The current output is:
[50, [120, [310, [70, [250, 0]]]]]
However, it should be:
[50,120,310,70,250]
I know I am close, but I am at a complete loss at this point. The restrictions of no loops and solely recursion has left me boggled!
Your multAllAux function returns a list. If you append a list to another list, you get this nested list kind of structure that you are getting right now.
If you instead use the "extend" function; it will work as expected.
>>> a = [1, 2, 3]
>>> a.extend([4, 5])
>>> a
[1, 2, 3, 4, 5]
extend takes the elements from a second list and adds them to the first list, instead of adding the second list itself which is what append does!
Your function also returns a zero at the end of the list, which you don't need. You can try this:
def mult(k, A: list) -> list:
return [k * A[0]] + mult(k, A[1:]) if A else []
The problem is here:
B.append(multAllAux(k,A[1:])))
What .append(..) does is it takes the argument, considers it as a single element and adds that element to the end of the list. What you want is to concatenate to the list (ie the item being added should be seen as a list of elements rather than one single element).
You can say something like: B = B + multAllAux(..) or just use +=
B += multAllAux(...)
BTW, if you wanted to multiply a single number to a list, there is a very similar construct: map(..). Note that this behaves slightly differently depending on whether you're using Py2 or Py3.
print(map(lambda x: x * 10, [5,12,31,7,25]))

Extract index of Non duplicate elements in python list

I have a list:
input = ['a','b','c','a','b','d','e','d','g','g']
I want index of all elements except duplicate in a list.
output = [0,1,2,5,6,8]
You should iterate over the enumerated list and add each element to a set of "seen" elements and add the index to the output list if the element hasn't already been seen (is not in the "seen" set).
Oh, the name input overrides the built-in input() function, so I renamed it input_list.
output = []
seen = set()
for i,e in enumerate(input_list):
if e not in seen:
output.append(i)
seen.add(e)
which gives output as [0, 1, 2, 5, 6, 8].
why use a set?
You could be thinking, why use a set when you could do something like:
[i for i,e in enumerate(input_list) if input_list.index(e) == i]
which would work because .index returns you the index of the first element in a list with that value, so if you check the index of an element against this, you can assert that it is the first occurrence of that element and filter out those elements which aren't the first occurrences.
However, this is not as efficient as using a set, because list.index requires Python to iterate over the list until it finds the element (or doesn't). This operation is O(n) complexity and since we are calling it for every element in input_list, the whole solution would be O(n^2).
On the other hand, using a set, as in the first solution, yields an O(n) solution, because checking if an element is in a set is complexity O(1) (average case). This is due to how sets are implemented (they are like lists, but each element is stored at the index of its hash so you can just compute the hash of an element and see if there is an element there to check membership rather than iterating over it - note that this is a vague oversimplification but is the idea of them).
Thus, since each check for membership is O(1), and we do this for each element, we get an O(n) solution which is much better than an O(n^2) solution.
You could do a something like this, checking for counts (although this is computation-heavy):
indexes = []
for i, x in enumerate(inputlist):
if (inputlist.count(x) == 1
and x not in inputlist[:i]):
indexes.append(i)
This checks for the following:
if the item appears only once. If so, continue...
if the item hasn't appeared before in the list up till now. If so, add to the results list
In case you don't mind indexes of the last occurrences of duplicates instead and are using Python 3.6+, here's an alternative solution:
list(dict(map(reversed, enumerate(input))).values())
This returns:
[3, 4, 2, 7, 6, 9]
Here is a one-liner using zip and reversed
>>> input = ['a','b','c','a','b','d','e','d','g','g']
>>> sorted(dict(zip(reversed(input), range(len(input)-1, -1, -1))).values())
[0, 1, 2, 5, 6, 8]
This question is missing a pandas solution. πŸ˜‰
>>> import pandas as pd
>>> inp = ['a','b','c','a','b','d','e','d','g','g']
>>>
>>> pd.DataFrame(list(enumerate(inp))).groupby(1).first()[0].tolist()
[0, 1, 2, 5, 6, 8]
Yet another version, using a side effect in a list comprehension.
>>> xs=['a','b','c','a','b','d','e','d','g','g']
>>> seen = set()
>>> [i for i, v in enumerate(xs) if v not in seen and not seen.add(v)]
[0, 1, 2, 5, 6, 8]
The list comprehension filters indices of values that have not been seen already.
The trick is that not seen.add(v) is always true because seen.add(v) returns None.
Because of short circuit evaluation, seen.add(v) is performed if and only if v is not in seen, adding new values to seen on the fly.
At the end, seen contains all the values of the input list.
>>> seen
{'a', 'c', 'g', 'b', 'd', 'e'}
Note: it is usually a bad idea to use side effects in list comprehension,
but you might see this trick sometimes.

How do I make a loop to look through the elements of a list?

Am fairly new to python and coding in general. So I have a defined list and I'm trying to make a loop that looks through the elements in a list until it matches what I'm looking for and records the position it is in the list.
list = [1, 2, 3, 4]
x = 3
for x in list:
if x == list
print(x, list.index(x))
This is my attempt but it doesn't work at all.
Your search variable is called x, and then you call the loop iterator x as well, overwriting the search query. Call it something else. Also call list something else to avoid masking the built-in. You can also use enumerate to properly handle cases where the query appears multiple times. Also, don't forget the colon at the end of the if statement.
l = [1, 2, 3, 4]
x = 3
for idx,item in enumerate(l):
if item == x:
print(x, idx)
You shouldn't loop for x in list. That will make your x value [1,2,3,4] not 3 anymore and beware with the indentation in python. Python is sensitive with Indentation. Maybe this is the code you're looking for:
list = [1, 2, 3, 4]
x = 3
for i in list:
if (i == x):
print(i, list.index(i))
I forgot that list is a built-in syntax. We shouldn't use list as variable. Just change it with 'list1' or anything except the built-in syntax name. Here some list of built-in syntax in python https://docs.python.org/2/library/functions.html
The problem, aside from the indentation, is that you use same variable x for lookup and for the iteration.
Also, it's recommended that you do not use list as the variable name, since it is used as a built-in function
Consider the following:
_list = [1, 2, 3, 4]
lookup = 3
for element in _list:
if lookup == element:
print(element, _list.index(lookup))
Also, if you are certain to use loop to find index, it's better that you use enumerate function:
_list = [1, 2, 3, 4]
lookup = 3
for index, element in enumerate(_list):
if lookup == element:
print(element, index)
If you are just looking to find the index of the element inside the list, you can just use index function by itself.
_list = [1, 2, 3, 4]
lookup = 3
print(lookup, _list.index(lookup))
Note that if the lookup is not in the list, ValueError would be raised.
Try this.
if x in list:
print(list.index(x))
As idjaw mentioned, though the above piece of code works for your solution, do not use the python keywords such as list for other purposes as it overshadows the functionality in the keyword in the local scope. You could always alter the spelling or use a synonym instead. But if it is absolutely necessary to use the same name as of the keyword, the Pythonic way is to use one trailing underscore when renaming your variable. That gives you list_ which is perfectly alright.

how to iterate from a specific point in a sequence (Python)

[Edit]
From the feedback/answers I have received, I gather there is some confusion regarding the original question. Consequently, I have reduced the problem to its most rudimentary form
Here are the relevant facts of the problem:
I have a sorted sequence: S
I have an item (denoted by i) that is GUARANTEED to be contained in S
I want a find() algorithm that returns an iterator (iter) that points to i
After obtaining the iterator, I want to be able to iterate FORWARD (BACKWARD?) over the elements in S, starting FROM (and including) i
For my fellow C++ programmers who can also program in Python, what I am asking for, is the equivalent of:
const_iterator std::find (const key_type& x ) const;
The iterator returned can then be used to iterate the sequence. I am just trying to find (pun unintended), if there is a similar inbuilt algorithm in Python, to save me having to reinvent the wheel.
Given your relevant facts:
>>> import bisect
>>> def find_fwd_iter(S, i):
... j = bisect.bisect_left(S, i)
... for k in xrange(j, len(S)):
... yield S[k]
...
>>> def find_bkwd_iter(S, i):
... j = bisect.bisect_left(S, i)
... for k in xrange(j, -1, -1):
... yield S[k]
...
>>> L = [100, 150, 200, 300, 400]
>>> list(find_fwd_iter(L, 200))
[200, 300, 400]
>>> list(find_bkwd_iter(L, 200))
[200, 150, 100]
>>>
yes , you can do like this:
import itertools
from datetime import datetime
data = {
"2008-11-10 17:53:59":"data",
"2005-11-10 17:53:59":"data",
}
list_ = data.keys()
new_list = [datetime.strptime(x, "%Y-%m-%d %H:%M:%S") for x in list_]
begin_date = datetime.strptime("2007-11-10 17:53:59", "%Y-%m-%d %H:%M:%S")
for i in itertools.ifilter(lambda x: x > begin_date, new_list):
print i
If you know for a fact that the items in your sequence are sorted, you can just use a generator expression:
(item for item in seq if item >= 5)
This returns a generator; it doesn't actually traverse the list until you iterate over it, i.e.:
for item in (item for item in seq if item > 5)
print item
will only traverse seq once.
Using a generator expression like this is pretty much identical to using itertools.ifilter, which produces a generator that iterates over the list returning only values that meet the filter criterion:
>>> import itertools
>>> seq = [1, 2, 3, 4, 5, 6, 7]
>>> list(itertools.ifilter(lambda x: x>=3, seq))
[3, 4, 5, 6, 7]
I'm not sure why (except for backwards compatibility) we need itertools.ifilter anymore now that we have generator expressions, but other methods in itertools are invaluable.
If, for instance, you don't know that your sequence is sorted, and you still want to return everything in the sequence from a known item and beyond, you can't use a generator expression. Instead, use itertools.dropwhile. This produces a generator that iterates over the list skipping values until it finds one that meets the filter criterion:
>>> seq = [1, 2, 4, 3, 5, 6, 7]
>>> list(itertools.dropwhile(lambda x: x != 3, seq))
[3, 5, 6, 7]
As far as searching backwards goes, this will only work if the sequence you're using is actually a sequence (like a list, i.e. something that has an end and can be navigated backwards) and not just any iterable (e.g. a generator that returns the next prime number). To do this, use the reversed function, e.g.:
(item for item in reversed(seq) if item >= 5)
One simpler way (albeit slower) would be to use filter and filter for keys before/after that date. Filter has to process each element in the list as opposed to slicing not needing to.
You can do
def on_or_after(date):
from itertools import dropwhile
sorted_items = sorted(date_dictionary.iteritems())
def before_date(pair):
return pair[0] < date
on_or_after_date = dropwhile(before_date, sorted_items)
which I think is about as efficient as it's going to get if you're just doing one such lookup on each sorted collection. on_or_after_date will iterate (date, value) pairs.
Another option would be to build a dictionary as a separate index into the sorted list:
sorted_items = sorted(date_dictionary.iteritems())
date_index = dict((key, i) for i, key in enumerate(sorted_items.keys()))
and then get the items on or after a date with
def on_or_after(date):
return sorted_items[date_index[date]:]
This second approach will be faster if you're going to be doing a lot of lookups on the same series of sorted dates (which it sounds like you are).
If you want really speedy slicing of the sorted dates, you might see some improvement by storing it in a tuple instead of a list. I could be wrong about that though.
note the above code is untested, let me know if it doesn't work and you can't sort out why.
First off, this question isn't related to dicts. You're operating on a sorted list. You're using the results on a dict, but that's not relevant to the question.
You want the bisect module, which implements binary searching. Starting from your code:
import bisect
mydict = {
"2001-01-01":"data1",
"2005-01-02":"data2",
"2002-01-01":"data3",
"2004-01-02":"data4",
}
# ['2001-01-01', '2002-01-01', '2004-01-02', '2005-01-02']:
sorted_dates = sorted(mydict)
# Iterates over 2002-01-01, 2004-01-02 and 2005-01-02:
offset = bisect.bisect_left(sorted_dates, "2002-01-01")
for item in sorted_dates[offset:]:
print item

Picking out items from a python list which have specific indexes

I'm sure there's a nice way to do this in Python, but I'm pretty new to the language, so forgive me if this is an easy one!
I have a list, and I'd like to pick out certain values from that list. The values I want to pick out are the ones whose indexes in the list are specified in another list.
For example:
indexes = [2, 4, 5]
main_list = [0, 1, 9, 3, 2, 6, 1, 9, 8]
the output would be:
[9, 2, 6]
(i.e., the elements with indexes 2, 4 and 5 from main_list).
I have a feeling this should be doable using something like list comprehensions, but I can't figure it out (in particular, I can't figure out how to access the index of an item when using a list comprehension).
[main_list[x] for x in indexes]
This will return a list of the objects, using a list comprehension.
t = []
for i in indexes:
t.append(main_list[i])
return t
map(lambda x:main_list[x],indexes)
If you're good with numpy:
import numpy as np
main_array = np.array(main_list) # converting to numpy array
out_array = main_array.take([2, 4, 5])
out_list = out_array.tolist() # if you want a list specifically
I think Yuval A's solution is a pretty clear and simple. But if you actually want a one line list comprehension:
[e for i, e in enumerate(main_list) if i in indexes]
As an alternative to a list comprehension, you can use map with list.__getitem__. For large lists you should see better performance:
import random
n = 10**7
L = list(range(n))
idx = random.sample(range(n), int(n/10))
x = [L[x] for x in idx]
y = list(map(L.__getitem__, idx))
assert all(i==j for i, j in zip(x, y))
%timeit [L[x] for x in idx] # 474 ms per loop
%timeit list(map(L.__getitem__, idx)) # 417 ms per loop
For a lazy iterator, you can just use map(L.__getitem__, idx). Note in Python 2.7, map returns a list, so there is no need to pass to list.
I have noticed that there are two optional ways to do this job, either by loop or by turning to np.array. Then I test the time needed by these two methods, the result shows that when dataset is large
【[main_list[x] for x in indexes]】is about 3~5 times faster than
【np.array.take()】
if your code is sensitive to the computation time, the highest voted answer is a good choice.

Categories

Resources