Is there a way to compare all 2-item combinations of more than 2 lists?
Let's say there is an object:
class obj():
def __init__():
self.name = # some name
self.number = random(10)
def equals(obj):
if self.number == obj.number:
return True
else: return False
list1,list2,list3....listX - all these lists contain instances of class obj
I want to compare all 2-item combinations from these lists and return equal objects.
So if there is an obj in list2 which obj.number attribute is 5 and obj in list8 which has obj.number 5, it will be returned.
For two lists the comparison would be simple:
for obj1 in list1:
for obj2 in list2:
if obj1.equals(obj2):
print obj1,obj2
But I don't know how to make this comparison for more lists of objects.
Do you have any advice?
As you might know, with X lists, the time complexity will go up to O(n^X), which is far from optimal (in the case that all lists have the same length =n)
Now it all depends on what you actually want as output. It seems to me that you want to find objects that are present in multiple lists.
One way to do this in a more performant way is to use a dictionary (hashmap) and iterate trough every list. Hash objects based on their self.number.
This will result in something like: {1: [obj1], 2: [obj2, obj3], 3: [obj4], ...}, where the keys are the numbers of the objects and the values are the objects that have these values as number.
By running over this dictionary and only considering entries that have a list with a size larger or equal than 2, you will end up with the objects that are equal.
here the time complexity is equal to the O(n*X), which is ~ O(n)
To illustrate this, I've created a short simple example that uses 2 lists:
from collections import defaultdict
class Obj():
def __init__(self, value):
self.number = value
def find_equals(list1,list2):
d = defaultdict(list)
for obj1 in list1:
d[obj1.number].append(obj1)
for obj2 in list2:
d[obj2.number].append(obj2)
return [d[i] for i in d if len(d[i]) >= 2]
def test():
l1 = [Obj(1),Obj(2),Obj(3),Obj(4)]
l2 = [Obj(5),Obj(2),Obj(3),Obj(6)]
print find_equals(l1,l2)
test()
It can probably be optimised with nifty python constructs, but it shows the idea behind it.
The output is:
[[<__main__.Obj instance at 0x103278440>, <__main__.Obj instance at 0x103278560>], [<__main__.Obj instance at 0x103278488>, <__main__.Obj instance at 0x1032785a8>]]
Which are the objects with the numbers 2 and 3, that were used in the test sample.
A (very) simple approach would be to get the intersection of the lists of objects.
To do that, you have to make your object hashable, to build a set for each list of objects.
def __hash__(self):
return self.number
Then, to check multiple lists, you simply take the set intersection:
x = [Obj(1) Obj(3) Obj(8) Obj(10) Obj(3)]
y = [Obj(2) Obj(9) Obj(10) Obj(3)]
intersection = x & y # -> returns {Obj(3), Obj(10)}
This implementation has worst case complexity (n - 1) * O(L), where L is the maximum of the set lengths and n is the number of sets.
So, in terms of complexity, I think DJanssens's answer is faster.
But if performance is not the problem (e.g. you have small lists etc.), I think it's way more elegant to be able to write:
def intersect(*lists):
return set.intersection(*map(set, lists))
or the same thing in lambda notation:
intersect = lambda *lists: set.intersection(*map(set, lists))
Related
I have 2D list (matrix) of objects with one attribute as an integer (or float).
I need to perform some operations on the matrix of these integer attributes in the way, that the objects in first list will change as well.
But when I make an 2D list of those attributes, they are not copied by the reference. Therefore changing this list of integers will make no change to list of objects.
Is there some simple way to pass these integers to my list by reference?
I don't want to always compute the list and rewrite all the object attributes, as it is too time-consuming.
Code for example:
class A:
def __init__(self, value):
self.value = value
list_of_objects = []
list_of_values = []
for i in range (2):
list_of_objects.append([])
list_of_values.append([])
for j in range(2):
item = A(2)
list_of_objects[i].append(item)
list_of_values[i].append(item.value)
print (list_of_values) # getting [[2,2],[2,2]]
list_of_values[1][1] *= 3
print (list_of_objects[1][1].value) # getting 2, want 6
This is just simle example. In fact the thing that I want to do is matrix convolution
I have a program which reads in python object one by one (this is fixed) and will need to remove duplicate objects. Program will output a list of unique objects.
Psuedo-code is similar to this:
1. Create an empty list to store unique object and return at the end
2. Read in a single object
3. If the identical object is not in the list, add to the list
4. Repeat 2 and 3 until no more objects to read, then terminate and return the list (and the number of duplicate objects that were removed).
Actual code uses set operation to check for duplicates:
#!/usr/bin/python
import MyObject
import pickle
numDupRemoved = 0
uniqueObjects = set()
with open(inputFile, 'rb') as fileIn:
while 1:
try:
thisObject = pickle.load(fileIn)
if thisObject in uniqueObjects:
numDupRemoved += 1
continue
else:
uniqueObjects.add(thisObject)
except EOFError:
break
print("Number of duplicate objects removed: %d" %numDupRemoved)
return list(uniqueObjects)
(simplified) object looks like this (note that all values are integers, so we don't need to worry about floating point precision errors) :
#!/usr/bin/python
class MyObject:
def __init__(self, attr1, attr2, attr3):
self.attribute1 = attr1 # List of ints
self.attribute2 = attr2 # List of lists (each list is a list of ints)
self.attribute3 = attr3 # List of ints
def __eq__(self, other):
if isinstance(other, self__class__):
return (self.attribute1, self.attribute2, self.attribute3) == (other.attribute1, other.attribute2, other.attribute3)
def __hash__(self):
return self.generateHash()
def generateHash(self):
# Convert lists to tuples
attribute1_tuple = tuple(self.attribute1)
# Since attribute2 is list of list, convert to tuple of tuple
attribute2_tuple = []
for sublist in self.attribute2:
attribute2_tuple.append(tuple(sublist))
attribute2_tuple = tuple(attribute2_tuple)
attribute3_tuple = tuple(self.attribute3)
return hash((attribute1_tuple, attribute2_tuple, attribute3_tuple))
However, I now need to keep track of duplicates by individual attribute or subset of attributes of MyObject. That is, if the previous code was only removing duplicates in the darker blue region of the diagram below (where two objects are considered duplicate is all 3 attributes are identical), we would like to now:
1. Remove duplicate by subset of attribute (attribute 1 and 2) AND/OR individual attribute (attribute 3)
2. Be able to track 3 disjoint regions of the diagram
I have created two more objects to do this:
#!/usr/bin/python
class MyObject_sub1:
def __init__(self, attr1, attr2):
self.attribute1 = attr1 # List of ints
self.attribute2 = attr2 # List of lists (each list is a list of ints)
def __eq__(self, other):
if isinstance(other, self__class__):
return (self.attribute1, self.attribute2) == (other.attribute1, other.attribute2)
def __hash__(self):
return self.generateHash()
def generateHash(self):
# Convert lists to tuples
attribute1_tuple = tuple(self.attribute1)
# Since attribute2 is list of list, convert to tuple of tuple
attribute2_tuple = []
for sublist in self.attribute2:
attribute2_tuple.append(tuple(sublist))
attribute2_tuple = tuple(attribute2_tuple)
return hash((attribute1_tuple, attribute2_tuple))
and
#!/usr/bin/python
class MyObject_sub2:
def __init__(self, attr3):
self.attribute3 = attr3 # List of ints
def __eq__(self, other):
if isinstance(other, self__class__):
return (self.attribute3) == (other.attribute3)
def __hash__(self):
return hash(tuple(self.attribute3))
Duplicate removal code is updated as below:
#!/usr/bin/python
import MyObject
import MyObject_sub1
import MyObject_sub2
import pickle
# counters
totalNumDupRemoved = 0
numDupRemoved_att1Att2Only = 0
numDupRemoved_allAtts = 0
numDupRemoved_att3Only = 0
# sets for duplicate removal purposes
uniqueObjects_att1Att2Only = set()
uniqueObjects_allAtts = set() # Intersection part in the diagram
uniqueObjects_att3Only = set()
with open(inputFile, 'rb') as fileIn:
while 1:
try:
thisObject = pickle.load(fileIn)
# I will omit how thisObject_sub1 (MyObject_sub1) and thisObject_sub2 (MyObject_sub2) are created for brevity
if thisObject_sub1 in uniqueObjects_att1Att2Only or thisObject_sub2 in uniqueObjects_att3Only:
totalNumDupRemoved += 1
if thisObject in uniqueObjects_allAtts:
numDupRemoved_allAtts += 1
elif thisObject_sub1 in uniqueObjects_att1Att2Only:
numDupRemoved_att1Att2Only += 1
else:
numDupRemoved_att3Only += 1
continue
else:
uniqueObjects_att1Att2Only.add(thisObject_sub1)
uniqueObjects_allAtts.add(thisObject) # Intersection part in the diagram
uniqueObjects_att3Only.add(thisObject_sub2)
except EOFError:
break
print("Total number of duplicates removed: %d" %totalNumDupRemoved)
print("Number of duplicates where all attributes are identical: %d" %numDupRemoved_allAtts)
print("Number of duplicates where attributes 1 and 2 are identical: %d" %numDupRemoved_att1Att2Only)
print("Number of duplicates where only attribute 3 are identical: %d" %numDupRemoved_att3Only)
return list(uniqueObjects_allAtts)
What's been driving me insane is that "numDupRemoved_allAtts" from the second program do not match with "numDupRemoved" from the first program.
For example, both programs read over the same file containing about 80,000 total objects and outputs were vastly different:
First program output
Number of duplicate objects removed: 47,742 (which should be the intersecting part of the diagram)
Second program output
Total number of duplicates removed: 66,648
Number of duplicates where all attributes are identical: 18,137 (intersection of diagram)
Number of duplicates where attributes 1 and 2 are identical: 46,121 (left disjoint set of diagram)
Number of duplicates where only attribute 3 are identical: 2,390 (right disjoint set of diagram)
Note that before I tried using multiple python objects (MyObject_sub1 and MyObject_sub2) and set operations, I have tried using tuple equality (checking equality of tuple of individual or subset of attributes) for duplicate checking as well, but the numbers still didn't match up.
Am I missing some fundamental python concepts here? What would be causing this error?
Any help would be greatly appreicated
Example: If first processed object has attributes (1, 2, 3) and next has (1, 2, 4) then in the first variant, both are added as unique (and recognized later).
In the second variant, the first object would be recorded in uniqueObjects_att1Att2Only (and the other sets). When the second object now arrives the
if thisObject_sub1 in uniqueObjects_att1Att2Only or thisObject_sub2 in uniqueObjects_att3Only:
is true and the else part with recording to uniqueObjects_allAtts isn't executed. This means that (1, 2, 4) will never be added to uniqueObjects_allAtts and will never increment numDupRemoved_allAtts regardless how often it appears.
Solution: Let the duplicate detection for each set happen independently one after another.
For recording of totalNumDupRemoved create a flag which is set to True when one of the duplicate detections triggers and increment totalNumDupRemoved if the flag is true.
I came across this question in a very specific context but I soon realized that it has a quite general relevance.
FYI: I'm getting data from a framework and at a point I have transformed it into a list of unordered pairs (could be list of lists or tupels of any size as well but atm. I have 100% pairs). In my case these pairs are representing relationships between data objects and I want to refine my data.
I have a list of unordered tupels and want a list of objects or in this case a dict of dicts. If the same letter indicates the same class and differing numbers indicate different instances I want to accomplish this transformation:
[(a1, x1), (x2, a2), (y1, a2), (y1, a1)] -> {a1:{"y":y1,"x":x1},a2:{"y":y1,"x":x2}}
Note that there can be many "a"s that are connected to the same "x" or "y" but every "a" has at most one "x" or "y" each and that I can't rely on neither the order of the tupels nor the order of the tupel's elements (because the framework does not make a difference between "a" and "x") and I obviously don't care about the order of elements in my dicts - I just need the proper relations. There are many other pairs I don't care about and they can contain "a" elements, "y" elements or "x" elements as well
So the main question is "How to iterate over nested data when there is no reliable order but a need of accessing and checking all elements of the lowest level?"
I tried it in several ways but they don't seem right. For simplicity I just check for A-X pairs here:
def first_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
if pair[0].__cls__ is A and pair[1].__class__ is X:
result[pair[0]] = {"X": pair[1]}
if pair[0].__cls__ is X and pair[1].__class__ is A:
result[pair[1]] = {"X": pair[0]}
return result
def second_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
for index, item in enumerate(pair):
if item.__cls__ is A:
other_index = (index + 1) % 2
if pair[other_index].__class__ is X:
result[item] = {"X":pair[other_index]}
return result
def third_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
for item in pair:
if item.__class__ is A:
for any_item in pair:
if any_item.__class__ is X:
result[item] = {"X":any_item}
return result
The third draft actually works for every size of sub lists and got rid of any non pythonic integer usage but iterating over the same list while iterating over itself? And quintuple nesting for just one line of code? That does not seem right to me and I learned "When there is a problem according to iteration in python and you don't know a good solution - there is a great solution in itertools!" - I just didn't find one.
Does someone now a buildin that can help me or simply a better way to implement my methods?
You can do something like this with strings:
l = [('a1', 'x1','z3'), ('x2', 'a2'), ('y1', 'a2'), ('y1', 'a1')]
res = {}
for tup in l:
main_class = ""
sub_classes = ""
for item in tup:
if item.startswith('a'):
main_class = item
sub_classes = list(tup)
sub_classes.remove(main_class)
if not main_class in res:
res[main_class] = {}
for item in sub_classes:
res[main_class][item[0]] = item[-1]
If your objects aren't strings, you just need to change if a.startswith('a'): to something that determines whether the first item in your pair should be the key or not.
This also handles tuples greater than length two. It iterates each tuple, finding the "main class", and then removes it from a list version of the tuple (so that the new list is all the sub classes).
Looks like Ned Batchelder (who said that every time one have a problem with iterables and don't think there is a nice solution in Python there is a solution in itertools) was right. I finally found a solution I overlooked last time: the permutations method
def final_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
for permutation in permutations(pair):
if permutation[0].__class__ is A:
my_a = permutation[0]
if permutation[1].__class__ is X:
my_x = permutation[1]
if my_a not in result:
result[my_a] = {}
result[my_a]["key for X"] = my_x
return result
I still have quintuple nesting because I added a check if the key exists (so my original drafts would have sextuple nesting and two productive lines of code) but I got rid of the double iteration over the same iterable and have both minimal index usage and the possibility of working with triplets in the future.
One could avoid the assignments but I prefere "my_a" before permutation[0]
I want to make a condition where all selected variables are not equal.
My solution so far is to compare every pair which doesn't scale well:
if A!=B and A!=C and B!=C:
I want to do the same check for multiple variables, say five or more, and it gets quite confusing with that many. What can I do to make it simpler?
Create a set and check whether the number of elements in the set is the same as the number of variables in the list that you passed into it:
>>> variables = [a, b, c, d, e]
>>> if len(set(variables)) == len(variables):
... print("All variables are different")
A set doesn't have duplicate elements so if you create a set and it has the same number of elements as the number of elements in the original list then you know all elements are different from each other.
If you can hash your variables (and, uh, your variables have a meaningful __hash__), use a set.
def check_all_unique(li):
unique = set()
for i in li:
if i in unique: return False #hey I've seen you before...
unique.add(i)
return True #nope, saw no one twice.
O(n) worst case. (And yes, I'm aware that you can also len(li) == len(set(li)), but this variant returns early if a match is found)
If you can't hash your values (for whatever reason) but can meaningfully compare them:
def check_all_unique(li):
li.sort()
for i in range(1,len(li)):
if li[i-1] == li[i]: return False
return True
O(nlogn), because sorting. Basically, sort everything, and compare pairwise. If two things are equal, they should have sorted next to each other. (If, for some reason, your __cmp__ doesn't sort things that are the same next to each other, 1. wut and 2. please continue to the next method.)
And if ne is the only operator you have....
import operator
import itertools
li = #a list containing all the variables I must check
if all(operator.ne(*i) for i in itertools.combinations(li,2)):
#do something
I'm basically using itertools.combinations to pair off all the variables, and then using operator.ne to check for not-equalness. This has a worst-case time complexity of O(n^2), although it should still short-circuit (because generators, and all is lazy). If you are absolutely sure that ne and eq are opposites, you can use operator.eq and any instead.
Addendum: Vincent wrote a much more readable version of the itertools variant that looks like
import itertools
lst = #a list containing all the variables I must check
if all(a!=b for a,b in itertools.combinations(lst,2)):
#do something
Addendum 2: Uh, for sufficiently large datasets, the sorting variant should possibly use heapq. Still would be O(nlogn) worst case, but O(n) best case. It'd be something like
import heapq
def check_all_unique(li):
heapq.heapify(li) #O(n), compared to sorting's O(nlogn)
prev = heapq.heappop(li)
for _ in range(len(li)): #O(n)
current = heapq.heappop(li) #O(logn)
if current == prev: return False
prev = current
return True
Put the values into a container type. Then just loop trough the container, comparing each value. It would take about O(n^2).
pseudo code:
a[0] = A; a[1] = B ... a[n];
for i = 0 to n do
for j = i + 1 to n do
if a[i] == a[j]
condition failed
You can enumerate a list and check that all values are the first occurrence of that value in the list:
a = [5, 15, 20, 65, 48]
if all(a.index(v) == i for i, v in enumerate(a)):
print "all elements are unique"
This allows for short-circuiting once the first duplicate is detected due to the behaviour of Python's all() function.
Or equivalently, enumerate a list and check if there are any values which are not the first occurrence of that value in the list:
a = [5, 15, 20, 65, 48]
if not any(a.index(v) != i for i, v in enumerate(a)):
print "all elements are unique"
How do I get the number of elements in the list items?
items = ["apple", "orange", "banana"]
# There are 3 items.
The len() function can be used with several different types in Python - both built-in types and library types. For example:
>>> len([1, 2, 3])
3
How do I get the length of a list?
To find the number of elements in a list, use the builtin function len:
items = []
items.append("apple")
items.append("orange")
items.append("banana")
And now:
len(items)
returns 3.
Explanation
Everything in Python is an object, including lists. All objects have a header of some sort in the C implementation.
Lists and other similar builtin objects with a "size" in Python, in particular, have an attribute called ob_size, where the number of elements in the object is cached. So checking the number of objects in a list is very fast.
But if you're checking if list size is zero or not, don't use len - instead, put the list in a boolean context - it is treated as False if empty, and True if non-empty.
From the docs
len(s)
Return the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or
a collection (such as a dictionary, set, or frozen set).
len is implemented with __len__, from the data model docs:
object.__len__(self)
Called to implement the built-in function len(). Should return the length of the object, an integer >= 0. Also, an object that doesn’t
define a __nonzero__() [in Python 2 or __bool__() in Python 3] method and whose __len__() method returns zero
is considered to be false in a Boolean context.
And we can also see that __len__ is a method of lists:
items.__len__()
returns 3.
Builtin types you can get the len (length) of
And in fact we see we can get this information for all of the described types:
>>> all(hasattr(cls, '__len__') for cls in (str, bytes, tuple, list,
range, dict, set, frozenset))
True
Do not use len to test for an empty or nonempty list
To test for a specific length, of course, simply test for equality:
if len(items) == required_length:
...
But there's a special case for testing for a zero length list or the inverse. In that case, do not test for equality.
Also, do not do:
if len(items):
...
Instead, simply do:
if items: # Then we have some items, not empty!
...
or
if not items: # Then we have an empty list!
...
I explain why here but in short, if items or if not items is both more readable and more performant.
While this may not be useful due to the fact that it'd make a lot more sense as being "out of the box" functionality, a fairly simple hack would be to build a class with a length property:
class slist(list):
#property
def length(self):
return len(self)
You can use it like so:
>>> l = slist(range(10))
>>> l.length
10
>>> print l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Essentially, it's exactly identical to a list object, with the added benefit of having an OOP-friendly length property.
As always, your mileage may vary.
Besides len you can also use operator.length_hint (requires Python 3.4+). For a normal list both are equivalent, but length_hint makes it possible to get the length of a list-iterator, which could be useful in certain circumstances:
>>> from operator import length_hint
>>> l = ["apple", "orange", "banana"]
>>> len(l)
3
>>> length_hint(l)
3
>>> list_iterator = iter(l)
>>> len(list_iterator)
TypeError: object of type 'list_iterator' has no len()
>>> length_hint(list_iterator)
3
But length_hint is by definition only a "hint", so most of the time len is better.
I've seen several answers suggesting accessing __len__. This is all right when dealing with built-in classes like list, but it could lead to problems with custom classes, because len (and length_hint) implement some safety checks. For example, both do not allow negative lengths or lengths that exceed a certain value (the sys.maxsize value). So it's always safer to use the len function instead of the __len__ method!
And for completeness (primarily educational), it is possible without using the len() function. I would not condone this as a good option DO NOT PROGRAM LIKE THIS IN PYTHON, but it serves a purpose for learning algorithms.
def count(list): # list is an iterable object but no type checking here!
item_count = 0
for item in list:
item_count += 1
return item_count
count([1,2,3,4,5])
(The list object must be iterable, implied by the for..in stanza.)
The lesson here for new programmers is: You can’t get the number of items in a list without counting them at some point. The question becomes: when is a good time to count them? For example, high-performance code like the connect system call for sockets (written in C) connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);, does not calculate the length of elements (giving that responsibility to the calling code). Notice that the length of the address is passed along to save the step of counting the length first? Another option: computationally, it might make sense to keep track of the number of items as you add them within the object that you pass. Mind that this takes up more space in memory. See Naftuli Kay‘s answer.
Example of keeping track of the length to improve performance while taking up more space in memory. Note that I never use the len() function because the length is tracked:
class MyList(object):
def __init__(self):
self._data = []
self.length = 0 # length tracker that takes up memory but makes length op O(1) time
# the implicit iterator in a list class
def __iter__(self):
for elem in self._data:
yield elem
def add(self, elem):
self._data.append(elem)
self.length += 1
def remove(self, elem):
self._data.remove(elem)
self.length -= 1
mylist = MyList()
mylist.add(1)
mylist.add(2)
mylist.add(3)
print(mylist.length) # 3
mylist.remove(3)
print(mylist.length) # 2
Answering your question as the examples also given previously:
items = []
items.append("apple")
items.append("orange")
items.append("banana")
print items.__len__()
You can use the len() function to find the length of an iterable in python.
my_list = [1, 2, 3, 4, 5]
print(len(my_list)) # OUTPUT: 5
The len() function also works with strings:
my_string = "hello"
print(len(my_string)) # OUTPUT: 5
So to conclude, len() works with any sequence or collection (or any sized object that defines __len__).
There is an inbuilt function called len() in python which will help in these conditions.
>>> a = [1,2,3,4,5,6]
>>> len(a) # Here the len() function counts the number of items in the list.
6
This will work slightly different in the case of string: it counts the characters.
>>> a = "Hello"
>>> len(a)
5
There are three ways that you can find the length of the elements in the list. I will compare the 3 methods with performance analysis here.
Method 1: Using len()
items = []
items.append("apple")
items.append("orange")
items.append("banana")
print(len(items))
output:
3
Method 2: Using Naive Counter Method
items = []
items.append("apple")
items.append("orange")
items.append("banana")
counter = 0
for i in items:
counter = counter + 1
print(counter)
output:
3
Method 3: Using length_hint()
items = []
items.append("apple")
items.append("orange")
items.append("banana")
from operator import length_hint
list_len_hint = length_hint(items)
print(list_len_hint)
output:
3
Performance Analysis – Naive vs len() vs length_hint()
Note: In order to compare, I am changing the input list into a large set that can give a good amount of time difference to compare the methods.
items = list(range(100000000))
# Performance Analysis
from operator import length_hint
import time
# Finding length of list
# using loop
# Initializing counter
start_time_naive = time.time()
counter = 0
for i in items:
# incrementing counter
counter = counter + 1
end_time_naive = str(time.time() - start_time_naive)
# Finding length of list
# using len()
start_time_len = time.time()
list_len = len(items)
end_time_len = str(time.time() - start_time_len)
# Finding length of list
# using length_hint()
start_time_hint = time.time()
list_len_hint = length_hint(items)
end_time_hint = str(time.time() - start_time_hint)
# Printing Times of each
print("Time taken using naive method is : " + end_time_naive)
print("Time taken using len() is : " + end_time_len)
print("Time taken using length_hint() is : " + end_time_hint)
Output:
Time taken using naive method is : 7.536813735961914
Time taken using len() is : 0.0
Time taken using length_hint() is : 0.0
Conclusion
It can be clearly seen that time taken for naive is very large compared to the other two methods, hence len() & length_hint() is the best choice to use.
To get the number of elements in any sequential objects, your goto method in Python is len() eg.
a = range(1000) # range
b = 'abcdefghijklmnopqrstuvwxyz' # string
c = [10, 20, 30] # List
d = (30, 40, 50, 60, 70) # tuple
e = {11, 21, 31, 41} # set
len() method can work on all the above data types because they are iterable i.e You can iterate over them.
all_var = [a, b, c, d, e] # All variables are stored to a list
for var in all_var:
print(len(var))
A rough estimate of the len() method
def len(iterable, /):
total = 0
for i in iterable:
total += 1
return total
Simple: use len(list) or list.__len__()
In terms of how len() actually works, this is its C implementation:
static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
Py_ssize_t res;
res = PyObject_Size(obj);
if (res < 0) {
assert(PyErr_Occurred());
return NULL;
}
return PyLong_FromSsize_t(res);
}
Py_ssize_t is the maximum length that the object can have. PyObject_Size() is a function that returns the size of an object. If it cannot determine the size of an object, it returns -1. In that case, this code block will be executed:
if (res < 0) {
assert(PyErr_Occurred());
return NULL;
}
And an exception is raised as a result. Otherwise, this code block will be executed:
return PyLong_FromSsize_t(res);
res which is a C integer, is converted into a Python int (which is still called a "Long" in the C code because Python 2 had two types for storing integers) and returned.