Python: Get first element of list potentially containing sublists - python

I'm looking for the first element of a Python list potentially containing either numbers (integer or float), or many levels of nested sublists containing the same. In these examples, let's suppose I am always looking for the number '1'. If the list contains no sublists, we have:
>>> foo = [1,2,3]
>>> foo[0]
1
If the list contains one sublist, and I know this information, I can again obtain 1 with
>>> foo = [[1,2],[3,4]]
>>> foo[0][0]
1
Similarly if the first element of my list is a list containing a list:
>>> foo = [[[1,2],[3,4]],[[5,6],[7,8]]]
>>> foo[0][0][0]
1
Is there a general way to get the first integer or float in foo, without resorting to calling a function recursively until drilling down to a value of foo[0] that is no longer a list?

There shouldn't be any need for recursion. Assuming that you are always working with lists and ints, this should work perfectly well for you.
foo = [[[1,2],[3,4]],[[5,6],[7,8]]]
result = None
while True:
try:
result = foo[0]
except TypeError:
break
Unlike the other answers, this asks for forgiveness rather than for permission, which is a bit more Pythonic.
If you really want to be Pythonic, you could define a function like as follows. However, this would admittedly be overkill given your specification.
def first_scalar(foo):
result = None
while True:
try:
result = next(iter(foo))
except TypeError:
return result
Note that it returns None if the argument is not an iterable. The same applies for the first segment of code.
Note that this doesn't work if the if the deepest "left-most" child list is empty. To account for this, you'll need to totally flatten the list.
def _flatten(foo):
try:
for item in foo:
yield from flatten(foo)
except TypeError:
yield foo
def flatten(foo):
for item in foo:
yield from _flatten(foo)
def first_scalar(foo):
return next(flatten(foo))
Note that the above must be written in at least Python 3.3.
The following code is for earlier versions of Python.
def _flatten(foo):
try:
for item in foo:
for subitem in _flatten(foo):
yield subitem
except TypeError:
yield foo
def flatten(foo):
for item in foo:
for subitem in _flatten(foo):
yield subitem

The general-case answer for this is "Fix your data structure." Lists are supposed to be homogeneous, e.g. every element of the list should have the same type (be that int or list of ints or list of lists of ints or etc).
The special case here would be to recurse until you find a number and return it.
def foo(lst):
first_el = lst[0]
if isinstance(first_el, (float, int)):
return first_el
else:
return foo(first_el)

create a simple, recursive function:
>>> def getFirst(l):
return l[0] if not isinstance(l[0],list) else getFirst(l[0])
>>> getFirst([1,2,3,4])
1
>>> getFirst([[1,2,3],[4,5]])
1
>>> getFirst([[[4,2],12,[1,3]],1])
4
this will return l[0] if l[0] is anything but a list. else, it will return the first item of l[0] recursively

You can just "dive in", without any recursion:
lst = [[1, 2], [3, 4]]
first = lst
while isinstance(first, list):
first = first[0]

If you really want to avoid any loops or recursion, there is an ugly workaround. Transform the list to a string and then remove the list-specific chars:
','.join(map(str,foo)).replace('[','').replace(']','').replace(' ','').split(',')
Of course it only works if the list is composed by strings or integers. If the objects in the list are custom, you would have to transform them to string. But, since there is an unknown number of sublists, you would have to use recursion, so using this workaround wouldn't make sense.
Another thing, maybe the elements of the list and sublists have the same chars as the list-specific ones, such as '[' or ',', so that would also be a problem.
In short, this is a bad workaround that only works for sure if the list and sublists are composed of numbers. Otherwise, using some kind of recursion is most probably necessary.

Related

Python: How can 2 dictionaries with a list be compared neglecting the list items order? [duplicate]

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]
a & b should be considered equal, because they have exactly the same elements, only in different order.
The thing is, my actual lists will consist of objects (my class instances), not integers.
O(n): The Counter() method is best (if your objects are hashable):
def compare(s, t):
return Counter(s) == Counter(t)
O(n log n): The sorted() method is next best (if your objects are orderable):
def compare(s, t):
return sorted(s) == sorted(t)
O(n * n): If the objects are neither hashable, nor orderable, you can use equality:
def compare(s, t):
t = list(t) # make a mutable copy
try:
for elem in s:
t.remove(elem)
except ValueError:
return False
return not t
You can sort both:
sorted(a) == sorted(b)
A counting sort could also be more efficient (but it requires the object to be hashable).
>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True
If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)
In the general case you can't rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)
len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)
If you have to do this in tests:
https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual
assertCountEqual(first, second, msg=None)
Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.
Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.
New in version 3.2.
or in 2.7:
https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual
Outside of tests I would recommend the Counter method.
The best way to do this is by sorting the lists and comparing them. (Using Counter won't work with objects that aren't hashable.) This is straightforward for integers:
sorted(a) == sorted(b)
It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.
sorted(a, key=id) == sorted(b, key==id)
(In Python 2.x you don't actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn't matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances -- for example, you can't compare strings to integers -- so if you will have objects of various types, best to explicitly use the object's ID.)
If you want to compare the objects in the list by value, on the other hand, first you need to define what "value" means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.
sorted(a, key=repr) == sorted(b, key==repr)
If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.
If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).
Works on sequences of unhashable objects too.
If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:
from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
print("Lists a and b contain the same objects")
Let a,b lists
def ass_equal(a,b):
try:
map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
if len(a) == 0: # if a is empty, means that b has removed them all
return True
except:
return False # b failed to remove some items from a
No need to make them hashable or sort them.
I hope the below piece of code might work in your case :-
if ((len(a) == len(b)) and
(all(i in a for i in b))):
print 'True'
else:
print 'False'
This will ensure that all the elements in both the lists a & b are same, regardless of whether they are in same order or not.
For better understanding, refer to my answer in this question
You can write your own function to compare the lists.
Let's get two lists.
list_1=['John', 'Doe']
list_2=['Doe','Joe']
Firstly, we define an empty dictionary, count the list items and write in the dictionary.
def count_list(list_items):
empty_dict={}
for list_item in list_items:
list_item=list_item.strip()
if list_item not in empty_dict:
empty_dict[list_item]=1
else:
empty_dict[list_item]+=1
return empty_dict
After that, we'll compare both lists by using the following function.
def compare_list(list_1, list_2):
if count_list(list_1)==count_list(list_2):
return True
return False
compare_list(list_1,list_2)
from collections import defaultdict
def _list_eq(a: list, b: list) -> bool:
if len(a) != len(b):
return False
b_set = set(b)
a_map = defaultdict(lambda: 0)
b_map = defaultdict(lambda: 0)
for item1, item2 in zip(a, b):
if item1 not in b_set:
return False
a_map[item1] += 1
b_map[item2] += 1
return a_map == b_map
Sorting can be quite slow if the data is highly unordered (timsort is extra good when the items have some degree of ordering). Sorting both also requires fully iterating through both lists.
Rather than mutating a list, just allocate a set and do a left-->right membership check, keeping a count of how many of each item exist along the way:
If the two lists are not the same length you can short circuit and return False immediately.
If you hit any item in list a that isn't in list b you can return False
If you get through all items then you can compare the values of a_map and b_map to find out if they match.
This allows you to short-circuit in many cases long before you've iterated both lists.
plug in this:
def lists_equal(l1: list, l2: list) -> bool:
"""
import collections
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
ref:
- https://stackoverflow.com/questions/9623114/check-if-two-unordered-lists-are-equal
- https://stackoverflow.com/questions/7828867/how-to-efficiently-compare-two-unordered-lists-not-sets
"""
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
set_comp = set(l1) == set(l2) # removes duplicates, so returns true when not sometimes :(
multiset_comp = compare(l1, l2) # approximates multiset
return set_comp and multiset_comp #set_comp is gere in case the compare function doesn't work

What is the most efficient way to find the index of an element in a list, given only an element of a sublist (Python)

I.e. Does something like the following exist?
lst = [["a", "b", "c"], [4,5,6],"test"]
print getIndex(lst, "a")
>>> 0
print getIndex(lst, 5)
>>> 1
print getIndex(lst, "test")
>>> 2
I know of the regular index() method but that only looks for the immediate elements. I have a rough solution of making a new list, parsing through the superlist and adding "y" or "n" then looking for the index of the "y" in that one, but I feel there is much better way. Thanks
There is a problem with hellpanderrr's solution. It assumes that the main list elements will only be lists or strings. It fails if one searches on a list where another type is in the main list (the in operation raises a TypeError). E.g.:
lst2 = [["a", "b", "c"], [4, 5, 6], "test", 19]
>>> getIndex(lst2, 19)
# Ugly TypeError stack trace ensues
To fix this:
def getIndex2(lst, item):
for n, i in enumerate(lst):
try:
if item == i or item in i:
return n
except TypeError:
pass
return None
Now:
>>> getIndex2(lst2, "test")
2
>>> getIndex2(lst2, 19)
3
There are several ways to accomplish the "equals or in" test. This solution bowls right through, using a "get forgiveness not permission" idiom to catch the times when the in on i is not type-appropriate. It would also be possible to test the type of i before the in operation, or directly ask if i supports the in operation. But direct type inspection is often frowned upon, and strings and containers in Python have some complex overlapping capabilities. The "get forgiveness" approach gracefully handles those more simply.
Note that this also explicitly handles the case where no value is found.
>>> print getIndex2(lst2, 333)
None
While functions not returning a value implicitly return None, it is better to be explicit about such default cases.
By the by, this approach handles two levels. If the lists can be arbitrarily nested, a different approach, likely involving recursion, would be needed.
use a generator
e.g. in >= Python 2.6, if you know the item exists in a sublist:
idx = next(i for i,v in enumerate(lst) if item in v)
Try using the default function to the list: list.index
l = [[1,2,3], ['a', 'b', 'c']]
l[0].index(2) # index 1
l[1].index('b') # index 1
This generates a "ValueError" if the item does not exist.
def getIndex(lst,item):
for n,i in enumerate(lst):
if (type(i) == list and item in i) or i == item
return n
getIndex(lst,'test')
>>> 2

Summing nested lists without recursion in python

Given a Python list whose elements are either integers or lists of integers (only we don't know how deep the nesting goes), how can we find the sum of each individual integer within the list?
It's fairly straightforward to find the sum of a list whose nesting only goes one level deep,
but what if the nesting goes two, three, or more levels deep?
I know the best approach is recursion, but this is a challenge wherein I have to do it without recursion.
Please help!!
L = [...]
while any(isinstance(i, list) for i in L):
L = [j for i in L for j in (i if isinstance(i, list) else [i])]
result = sum(L)
Basically you iterate over the outer list and unpack the first level of any inner lists until there are no inner lists left
One mostly-readable (and presumably performant, though I haven't tested it) way to iteratively flatten a list:
from collections import deque
def iterative_flatten(li):
nested = deque(li)
res = []
dq = deque()
while nested or dq:
x = dq.pop() if dq else nested.popleft()
dq.extend(reversed(x)) if isinstance(x, list) else res.append(x)
return res
Uses deques to avoid nasty O(n**2) behavior from list.pop(0). You can get equivalent results by making a reversed copy and popping from the end, but I find the code a little easier to follow if you just use deques and popleft. On a similar note, it's a line or two less code if you want to mutate the list in-place but way slower (for the same reason; popping from the head of the list is O(n) since every element in the underlying array has to be shifted).
nested = [1,[[2,3],[[4,5],[6]]],[[[[7]]]]]
iterative_flatten(nested)
Out[116]: [1, 2, 3, 4, 5, 6, 7]
sum(iterative_flatten(nested))
Out[117]: 28
After it's flat, summing is (hopefully) trivial :-)
Here is one solution:
from copy import deepcopy
def recursive_sum(int_list):
#int_list = deepcopy(int_list) use this line if don't want to modify original list
ret = 0
while len(int_list) > 0:
elem = int_list.pop(0)
if type(elem) == int:
ret += elem
elif type(elem) == list:
int_list.extend(elem)
else:
raise ValueError
return ret
testcase = [1,2,3,[4,5,[6,7,8,[9,10]]]]
print recursive_sum(testcase) # print 55
Basically, it pops first element of input list. If it's Int, add into sum; if it's List, extend to the end of input list

Converting strings to floats in a nested list

I have a list of lists which contain strings of numbers and words
I want to convert only those strings which are numbers to floats
aList= [ ["hi", "1.33"], ["bye", " 1.555"] ]
First, you need a function that does the "convert a string to float if possible, otherwise leave it as a string":
def floatify(s):
try:
return float(s)
except ValueError:
return s
Now, you can just call that on each value, either generating a new list, or modifying the old one in place.
Since you have a nested list, this means a nested iteration. You might want to start by doing it explicitly in two steps:
def floatify_list(lst):
return [floatify(s) for s in lst]
def floatify_list_of_lists(nested_list):
return [floatify_list(lst) for lst in nested_list]
You can of course combine it into one function just by making floatify_list a local function:
def floatify_list_of_lists(nested_list):
def floatify_list(lst):
return [floatify(s) for s in lst]
return [floatify_list(lst) for lst in nested_list]
You could also do it by substituting the inner expression in place of the function call. If you can't figure out how to do that yourself, I would recommend not doing it, because you're unlikely to understand it (complex nested list comprehensions are hard enough for experts to understand), but if you must:
def floatify_list_of_lists(nested_list):
return [[floatify(s) for s in lst] for lst in nested_list]
Or, if you prefer your Python to look like badly-disguised Haskell:
def floatify_list_of_lists(nested_list):
return map(partial(map, floatify), nested_list)

Python: check if value is in a list no matter the CaSE

I want to check if a value is in a list, no matter what the case of the letters are, and I need to do it efficiently.
This is what I have:
if val in list:
But I want it to ignore case
check = "asdf"
checkLower = check.lower()
print any(checkLower == val.lower() for val in ["qwert", "AsDf"])
# prints true
Using the any() function. This method is nice because you aren't recreating the list to have lowercase, it is iterating over the list, so once it finds a true value, it stops iterating and returns.
Demo : http://codepad.org/dH5DSGLP
If you know that your values are all of type str or unicode, you can try this:
if val in map(str.lower, list):
...Or:
if val in map(unicode.lower, list):
If you really have just a list of the values, the best you can do is something like
if val.lower() in [x.lower() for x in list]: ...
but it would probably be better to maintain, say, a set or dict whose keys are lowercase versions of the values in the list; that way you won't need to keep iterating over (potentially) the whole list.
Incidentally, using list as a variable name is poor style, because list is also the name of one of Python's built-in types. You're liable to find yourself trying to call the list builtin function (which turns things into lists) and getting confused because your list variable isn't callable. Or, conversely, trying to use your list variable somewhere where it happens to be out of scope and getting confused because you can't index into the list builtin.
You can lower the values and check them:
>>> val
'CaSe'
>>> l
['caSe', 'bar']
>>> val in l
False
>>> val.lower() in (i.lower() for i in l)
True
items = ['asdf', 'Asdf', 'asdF', 'asjdflk', 'asjdklflf']
itemset = set(i.lower() for i in items)
val = 'ASDF'
if val.lower() in itemset: # O(1)
print('wherever you go, there you are')

Categories

Resources