I've noticed it can make quite a substantial difference in terms of speed,
if you specify the protocol used in pickle.dumps via argument or if you
monkey patch pickle.DEFAULT_PROTOCOL for the desired protocol version.
On Python 3.6, pickle.DEFAULT_PROTOCOL is 3 and
pickle.HIGHEST_PROTOCOL is 4.
For objects up to a certain length it seems to be faster setting
DEFAULT_PROTOCOL to 4 instead of passing protocol=4 as argument.
In my tests for example, with setting pickle.DEFAULT_PROTOCOL to 4 and pickling
a list with length 1 by calling pickle.dumps(packet_list_1) takes 481 ns, while calling with pickle.dumps(packet_list_1, protocol=4) takes 733 ns, a staggering ~52% speed-penalty for passing protocol explicitly instead of falling back to default (which was set to 4 before).
"""
(stackoverflow insists this to be formatted as code:)
pickle.DEFAULT_PROTOCOL = 4
pickle.dumps(packet) vs pickle.dumps(packet, protocol=4):
(stackoverflow insists this to be formatted as code:)
For a list with length 1 it's 481ns vs 733ns (~52% penalty).
For a list with length 10 it's 763ns vs 999ns (~30% penalty).
For a list with length 100 it's 2.99 µs vs 3.21 µs (~7% penalty).
For a list with length 1000 it's 25.8 µs vs 26.2 µs (~1.5% penalty).
For a list with length 1_000_000 it's 32 ms vs 32.4 ms (~1.13% penalty).
"""
I've found this behaviour for instances, lists, dicts and arrays, which is
all I tested so far. The effect diminishes with object size.
For dicts I noticed the effect turning at some point into the opposite, so that
for a length 10**6 dict (with unique integer values) it's faster to explicitly
pass protocol=4 as argument (269ms) than relying on default set to 4 (286ms).
"""
pickle.DEFAULT_PROTOCOL = 4
pickle.dumps(packet) vs pickle.dumps(packet, protocol=4):
For a dict with length 1 it's 589 ns vs 811 ns (~38% penalty).
For a dict with length 10 it's 1.59 µs vs 1.81 µs (~14% penalty).
For a dict with length 100 it's 13.2 µs vs 12.9 µs (~2,3% penalty).
For a dict with length 1000 it's 128 µs vs 129 µs (~0.8% penalty).
For a dict with length 1_000_000 it's 306 ms vs 283 ms (~7.5% improvement).
"""
Glimpsing over the pickle source, nothing strikes my eye what might cause
such variations.
How is this unexpected behaviour explainable?
Are there any caveats for setting pickle.DEFAULT_PROTOCOL instead of passing
protocol as argument to take advantage of the improved speed?
(Timed with IPython's timeit magic on Python 3.6.3, IPython 6.2.1, Windows 7)
Some example code dump:
# instances -------------------------------------------------------------
class Dummy: pass
dummy = Dummy()
pickle.DEFAULT_PROTOCOL = 3
"""
>>> %timeit pickle.dumps(dummy)
5.8 µs ± 33.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit pickle.dumps(dummy, protocol=4)
6.18 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
%timeit pickle.dumps(dummy)
5.74 µs ± 18.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit pickle.dumps(dummy, protocol=4)
6.24 µs ± 26.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
# lists -------------------------------------------------------------
packet_list_1 = [*range(1)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_1)
476 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_1, protocol=4)
730 ns ± 2.22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1)
481 ns ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_1, protocol=4)
733 ns ± 2.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
# --------------------------
packet_list_10 = [*range(10)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_10)
714 ns ± 3.05 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_10, protocol=4)
978 ns ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_10)
763 ns ± 3.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_10, protocol=4)
999 ns ± 8.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
# --------------------------
packet_list_100 = [*range(100)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_100)
2.96 µs ± 5.16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>>%timeit pickle.dumps(packet_list_100, protocol=4)
3.22 µs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_100)
2.99 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>>%timeit pickle.dumps(packet_list_100, protocol=4)
3.21 µs ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
# --------------------------
packet_list_1000 = [*range(1000)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_1000)
26 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>>%timeit pickle.dumps(packet_list_1000, protocol=4)
26.4 µs ± 93.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1000)
25.8 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>>%timeit pickle.dumps(packet_list_1000, protocol=4)
26.2 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
"""
# --------------------------
packet_list_1m = [*range(10**6)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_1m)
32 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>%timeit pickle.dumps(packet_list_1m, protocol=4)
32.3 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1m)
32 ms ± 52.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>%timeit pickle.dumps(packet_list_1m, protocol=4)
32.4 ms ± 466 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
"""
Let's reorganize your %timeit results by return value:
| DEFAULT_PROTOCOL | call | %timeit | returns |
|------------------+-----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------------------------------|
| 3 | pickle.dumps(dummy) | 5.8 µs ± 33.5 ns | b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.' |
| 4 | pickle.dumps(dummy) | 5.74 µs ± 18.8 ns | b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.' |
| 3 | pickle.dumps(dummy, protocol=4) | 6.18 µs ± 10.4 ns | b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.' |
| 4 | pickle.dumps(dummy, protocol=4) | 6.24 µs ± 26.7 ns | b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.' |
| 3 | pickle.dumps(packet_list_1) | 476 ns ± 1.01 ns | b'\x80\x03]q\x00cbuiltins\nrange\nq\x01K\x00K\x01K\x01\x87q\x02Rq\x03a.' |
| 4 | pickle.dumps(packet_list_1) | 481 ns ± 2.12 ns | b'\x80\x03]q\x00cbuiltins\nrange\nq\x01K\x00K\x01K\x01\x87q\x02Rq\x03a.' |
| 3 | pickle.dumps(packet_list_1, protocol=4) | 730 ns ± 2.22 ns | b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00]\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94K\x00K\x01K\x01\x87\x94R\x94a.' |
| 4 | pickle.dumps(packet_list_1, protocol=4) | 733 ns ± 2.94 ns | b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00]\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94K\x00K\x01K\x01\x87\x94R\x94a.' |
Notice how the %timeit results correspond well when we pair calls that give the same return value.
As you can see, the value of pickle.DEFAULT_PROTOCOL has no effect on the value returned by pickle.dumps.
If the protocol parameter is not specified, the default protocol is 3 no matter what the value of pickle.DEFAULT_PROTOCOL.
The reason is here:
# Use the faster _pickle if possible
try:
from _pickle import (
PickleError,
PicklingError,
UnpicklingError,
Pickler,
Unpickler,
dump,
dumps,
load,
loads
)
except ImportError:
Pickler, Unpickler = _Pickler, _Unpickler
dump, dumps, load, loads = _dump, _dumps, _load, _loads
The pickle module sets pickle.dumps to _pickle.dumps if it succeeds in importing _pickle, the compiled version of the pickle module.
The _pickle module uses protocol=3 by default. Only if Python fails to import _pickle is dumps set to the Python version:
def _dumps(obj, protocol=None, *, fix_imports=True):
f = io.BytesIO()
_Pickler(f, protocol, fix_imports=fix_imports).dump(obj)
res = f.getvalue()
assert isinstance(res, bytes_types)
return res
Only the Python version, _dumps, is affected by the value of pickle.DEFAULT_PROTOCOL:
In [68]: pickle.DEFAULT_PROTOCOL = 3
In [70]: pickle._dumps(dummy)
Out[70]: b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.'
In [71]: pickle.DEFAULT_PROTOCOL = 4
In [72]: pickle._dumps(dummy)
Out[72]: b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.'
Related
I want to count how often a regex-expression (prior and ensuing characters are needed to identify the pattern) occurs in multiple dataframe columns. I found a solution which seems a litte slow. Is there a more sophisticated way?
column_A
column_B
column_C
Test • test abc
winter • sun
snow rain blank
blabla • summer abc
break • Data
test letter • stop.
So far I created a solution which is slow:
print(df["column_A"].str.count("(?<=[A-Za-z]) • (?=[A-Za-z])").sum() + df["column_B"].str.count("(?<=[A-Za-z]) • (?=[A-Za-z])").sum() + df["column_C"].str.count("(?<=[A-Za-z]) • (?=[A-Za-z])").sum())
The str.count should be able to apply to the whole dataframe without hard coding this way. Try
sum(df.apply(lambda x: x.str.count("(?<=[A-Za-z]) • (?=[A-Za-z])").sum()))
I have tried with 1000 * 1000 dataframes. Here is a benchmark for your reference.
%timeit sum(df.apply(lambda x: x.str.count("(?<=[A-Za-z]) • (?=[A-Za-z])").sum()))
1.97 s ± 54.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
You can use list comprehension and re.search. You can reduce 938 µs to 26.7 µs. (make sure don't create list and use generator)
res = sum(sum(True for item in df[col] if re.search("(?<=[A-Za-z]) • (?=[A-Za-z])", item))
for col in ['column_A', 'column_B','column_C'])
print(res)
# 5
Benchmark:
%%timeit
sum(sum(True for item in df[col] if re.search("(?<=[A-Za-z]) • (?=[A-Za-z])", item)) for col in ['column_A', 'column_B','column_C'])
# 26 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
df["column_A"].str.count("(?<=[A-Za-z]) • (?=[A-Za-z])").sum() + df["column_B"].str.count("(?<=[A-Za-z]) • (?=[A-Za-z])").sum() + df["column_C"].str.count("(?<=[A-Za-z]) • (?=[A-Za-z])").sum()
# 938 µs ± 149 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# --------------------------------------------------------------------#
The following code for getting the week number and year works:
import pandas as pd
df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='w', periods=10), columns=['date'])
df['weekNo'] = df['date'].dt.isocalendar().week
df['year'] = df['date'].dt.year
date weekNo year
0 2021-12-05 48 2021
1 2021-12-12 49 2021
2 2021-12-19 50 2021
3 2021-12-26 51 2021
4 2022-01-02 52 2022
5 2022-01-09 1 2022
6 2022-01-16 2 2022
7 2022-01-23 3 2022
8 2022-01-30 4 2022
9 2022-02-06 5 2022
but,
df['weekYear'] = "%d/%d" % (df['date'].dt.isocalendar().week, df['date'].dt.year)
Gives the error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_26440/999845293.py in <module>
----> 1 df['weekYear'] = "%d/%d" % (df['date'].dt.isocalendar().week, df['date'].dt.year)
TypeError: %d format: a number is required, not Series
I am accessing the week and year in a way that accesses the series of values, as shown by the first code snippet. Why doesn't that work when I want a formatted string? How do I re-write the code in snippet 2, to make it work? I don't want to make intermediate columns.
Why doesn't that work when I want a formatted string? The error is clear, because '%d' expects a single decimal value, not a pandas.Series
Providing there is a format code for the value to be extracted, dt.strftime can be used.
This requires the 'date' column to be a datetime dtype, which can be done with pd.to_datetime. The column in the following example is already the correct dtype.
'%V': ISO 8601 week as a decimal number with Monday as the first day of the week. Week 01 is the week containing Jan 4.
'%Y': Year with century as a decimal number.
import pandas as pd
# sample data
df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='w', periods=10), columns=['date'])
# add week number and year
df['weekYear'] = df.date.dt.strftime('%V/%Y')
# display(df)
date weekYear
0 2021-12-05 48/2021
1 2021-12-12 49/2021
2 2021-12-19 50/2021
3 2021-12-26 51/2021
4 2022-01-02 52/2022
5 2022-01-09 01/2022
6 2022-01-16 02/2022
7 2022-01-23 03/2022
8 2022-01-30 04/2022
9 2022-02-06 05/2022
Timing for 1M rows
df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='h', periods=1000000), columns=['date'])
%%timeit
df.date.dt.strftime('%V/%Y')
[out]: 3.74 s ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
You can just use:
df['weekYear'] = df['date'].dt.isocalendar().week.astype(str) + '/' + df['date'].dt.year.astype(str)
Or using pandas.Series.str.cat
df['weekYear'] = df['date'].dt.isocalendar().week.astype(str).str.cat(df['date'].dt.year.astype(str), sep='/')
Or using list comprehension
df['weekYear'] = [f"{week}/{year}" for week, year in zip(df['date'].dt.isocalendar().week, df['date'].dt.year)]
Timing for 1M rows
df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='h', periods=1000000), columns=['date'])
%%timeit
df['date'].dt.isocalendar().week.astype(str) + '/' + df['date'].dt.year.astype(str)
[out]: 886 ms ± 9.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
df['date'].dt.isocalendar().week.astype(str).str.cat(df['date'].dt.year.astype(str), sep='/')
[out]: 965 ms ± 8.56 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
[f"{week}/{year}" for week, year in zip(df['date'].dt.isocalendar().week, df['date'].dt.year)]
[out]: 587 ms ± 7.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you want to use the formatting, can use map to get that map or apply the formatting to every road, the .dt is not needed since you will be working with date itself, not Series of dates. Also isocalendar() returns a tuple where second element is the week number:
df["date"] = pd.to_datetime(df["date"])
df['weekYear'] = df['date'].map(lambda x: "%d/%d" % (x.isocalendar()[1], x.year))
Timing for 1M rows
df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='h', periods=1000000), columns=['date'])
%%timeit
df['date'].map(lambda x: "%d/%d" % (x.isocalendar()[1], x.year))
[out]: 2.03 s ± 4.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
There are clearly a number of ways this can be solved, so a timing comparison is the best way to determine which is the "best" answer.
Here's a single implementation for anyone to run a timing analysis in Jupyter of all the current answers.
See this answer to modify the code to create a timing analysis plot with a varying number of rows.
See IPython: %timeit for the option descriptions.
import pandas as pd
# sample data with 60M rows
df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='s', periods=60000000), columns=['date'])
# functions
def test1(d):
return d.date.dt.strftime('%V/%Y')
def test2(d):
return d['date'].dt.isocalendar().week.astype(str) + '/' + d['date'].dt.year.astype(str)
def test3(d):
return d['date'].dt.isocalendar().week.astype(str).str.cat(d['date'].dt.year.astype(str), sep='/')
def test4(d):
return [f"{week}/{year}" for week, year in zip(d['date'].dt.isocalendar().week, d['date'].dt.year)]
def test5(d):
return d['date'].map(lambda x: "%d/%d" % (x.isocalendar()[1], x.year))
t1 = %timeit -r2 -n1 -q -o test1(df)
t2 = %timeit -r2 -n1 -q -o test2(df)
t3 = %timeit -r2 -n1 -q -o test3(df)
t4 = %timeit -r2 -n1 -q -o test4(df)
t5 = %timeit -r2 -n1 -q -o test5(df)
print(f'test1 result: {t1}')
print(f'test2 result: {t2}')
print(f'test3 result: {t3}')
print(f'test4 result: {t4}')
print(f'test5 result: {t5}')
test1 result: 3min 45s ± 653 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
test2 result: 53.4 s ± 459 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
test3 result: 59.7 s ± 164 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
test4 result: 35.5 s ± 409 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
test5 result: 2min 2s ± 29.1 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
The question is: "Have I really to do handmade optimization or exists better explaining of this uncomprehensive comprehensive?"
Thanks! And please - don't minus my question... Even FORTRAN can optimize nested loops since 1990... or earlier.
Look the example.
dict_groups = [{'name': 'Новые Альбомы', 'gid': 4100014},
{'name': 'Synthpop [Futurepop, Retrowave, Electropop]', 'gid': 8564},
{'name': 'E:\\music\\leftfield', 'gid': 101522128},
{'name': 'Бренд одежды | MEDICINE', 'gid': 134709480},
{'name': 'Другая Музыка', 'gid': 35486626},
{'name': 'E:\\music\\trip-hop', 'gid': 27683540},
{'name': 'Depeche Mode', 'gid': 125927592}]
x = [{'gid': 35486626},{'gid': 134709480},{'gid': 27683540}]
Have to receive
rez = [{'name': 'Другая Музыка', 'gid': 35486626},
{'name': 'E:\\music\\trip-hop', 'gid': 27683540},
{'name': 'Бренд одежды | MEDICINE', 'gid': 134709480}]
One of the solutions is:
x_val = tuple(d["gid"] for d in x)
rez = [dict_el for dict_el in dict_groups if dict_el["gid"] in x_val]
with timing
%timeit x_val = tuple(d["gid"] for d in x)
1.55 µs ± 81.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [dict_el for dict_el in dict_groups if dict_el["gid"] in x_val]
2.19 µs ± 93.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
one-row nested comprehensive solution gives:
%timeit [dict_el for dict_el in dict_groups if dict_el["gid"] in tuple(d["gid"] for d in x)]
11.9 µs ± 756 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
This is much slower! Its looks like expression tuple(d["gid"] for d in x) calculates each time!
7*1,55 + 2,19 = 13,04µs It's near the 11.9µs....
I want to use Tensorflow serving to load multiple Models. If I mount a directory containing the model, loading everything is done in an instant, while loading them from a gs:// path takes around 10 seconds per model.
While researching the issue I discovered this is probably a Tensorflow issue and not a Tensorflow Serving issue as loading them in Tensorflow is a huge difference as well:
[ins] In [22]: %timeit tf.saved_model.load('test/1')
3.88 s ± 719 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[ins] In [23]: %timeit tf.saved_model.load('gs://path/to/test/1')
30.6 s ± 2.66 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Then it could be that downloading the model (which is very small) is slow, but I tested this as well:
def test_load():
bucket_name = 'path'
folder='test'
delimiter='/'
file = 'to/test/1'
bucket=storage.Client().get_bucket(bucket_name)
blobs=bucket.list_blobs(prefix=file, delimiter=delimiter) # Excluding folder inside bucket
for blob in blobs:
print(blob.name)
destination_uri = '{}/{}'.format(folder, blob.name)
blob.download_to_filename(destination_uri)
[ins] In [31]: %timeit test_load()
541 ms ± 54.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Any idea what is happening here?
The bson.son.SON is used mainly in pymongo, to get a ordered mapping(dict).
But python already have the ordered dict in collections.OrderedDict()
I have read the docs of bson.son.SON. It did say SON is similar to OrderedDict but did not mention the difference.
So What is the difference? When should we use SON and when should we use OrderedDict?
Currently, the slight difference in both is that bson.son.SON remains backward compatible with Python 2.7 and older versions.
Also the argument that SON serializes faster than OrderedDict is no longer correct in 2018.
The son module was added in Jan 8, 2009.
collections.OrderedDict(PEP-372) was added in python in Mar 2, 2009.
While the differences in dates doesn't tell which was released first, it is interesting to see that the Mongodb already implemented an ordered map for their use case. I guess that they may have decided to keep maintaining it for backward compatibility instead of switching all SON references in their codebase to collections.OrderedDict
In small experiments with both, you easily see that collections.OrderedDict performs better than bson.son.SON.
In [1]: from bson.son import SON
from collections import OrderedDict
import copy
print(set(dir(SON)) - set(dir(OrderedDict)))
{'weakref', 'iteritems', 'iterkeys', 'itervalues', 'module', 'has_key', 'deepcopy', 'to_dict'}
In [2]: %timeit s = SON([('a',2)]); z = copy.deepcopy(s)
14.3 µs ± 758 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [3]: %timeit s = OrderedDict([('a',2)]); z = copy.deepcopy(s)
7.54 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [4]: %timeit s = SON(data=[('a',2)]); z = json.dumps(s)
11.5 µs ± 350 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [5]: %timeit s = OrderedDict([('a',2)]); z = json.dumps(s)
5.35 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In answer to your question about when to use SON,
use SON if running your software in versions of Python older than 2.7.
If you can help it, use OrderedDict from the collections module.
You can also use dict, they are ordered now in Python 3.7