Sorting in Python is performed using the sorted()
function for iterable objects and the list.sort()
method for lists. Let’s take a detailed look at how this worked in older versions and how it works now.
Basics of Sorting
How to sort a Python list? To sort in ascending order, simply call the Python sorting function sorted()
, which returns a new sorted list:
sorted([5, 2, 3, 1, 4])
# Output: [1, 2, 3, 4, 5]
You can also use the Python list method list.sort()
to sort the list in place (and returns None
to avoid confusion). Generally, sorting a list in Python is less convenient than using sorted()
, but if you don’t need the original list, it will be slightly more efficient:
a = [5, 2, 3, 1, 4]
a.sort()
a
# Output: [1, 2, 3, 4, 5]
Note: In Python, returning None
and not returning anything is the same.
Another difference is that the method list.sort()
is only defined for lists, while the Python function sorted
works with all iterable objects. Roughly speaking, the Python function sort
sorts the list and stores it in sorted form, while the Python function sorted
creates a new sorted list without modifying the original.
sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
# Output: [1, 2, 3, 4, 5]
Note: When iterating over a dictionary in Python, it returns its keys. If you need their values or “key-value” pairs, use the methods dict.values()
and dict.items()
respectively.
Let’s take a look at the basic functions for sorting in Python.
Key Functions
Starting with Python 2.4, list.sort()
and sorted()
have a parameter key
that specifies a function to be called on each element before making comparisons. Here’s a case-insensitive string comparison:
sorted("This is a test string from Andrew".split(), key=str.lower)
# Output: ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
The key should be a function that takes a single argument and returns a key to use for sorting. It is faster because the key function is called once for each element.
You will often encounter code where a complex object is sorted by one of its indices. For example:
student_tuples = [
('john', 'A', 15),
('jane', 'B', 12),
('dave', 'B', 10),
]
sorted(student_tuples, key=lambda student: student[2]) # Sort by age
# Output: [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
The same method works for objects with named attributes:
class Student:
def __init__(self, name, grade, age):
self.name = name
self.grade = grade
self.age = age
def __repr__(self):
return repr((self.name, self.grade, self.age))
def weighted_grade(self):
return 'CBA'.index(self.grade) / self.age
student_objects = [
Student('john', 'A', 15),
Student('jane', 'B', 12),
Student('dave', 'B', 10),
]
sorted(student_objects, key=lambda student: student.age) # Sort by age
# Output: [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
Functions from the operator
Module
The key function examples shown above are so common that Python provides convenient functions to make things easier and faster. The operator
module includes itemgetter()
, attrgetter()
, and, starting with Python 2.6, methodcaller()
functions. With these, it becomes even easier:
from operator import itemgetter, attrgetter, methodcaller
sorted(student_tuples, key=itemgetter(2))
# Output: [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
sorted(student_objects, key=attrgetter('age'))
# Output: [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
Operator functions allow for the use of multiple levels to sort Python arrays. Let’s sort students first by grade and then by age:
sorted(student_tuples, key=itemgetter(1, 2))
# Output: [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
sorted(student_objects, key=attrgetter('grade', 'age'))
# Output: [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
Using the methodcaller()
function to sort students by weighted grade:
[(student.name, student.weighted_grade()) for student in student_objects]
# Output: [('john', 0.13333333333333333), ('jane', 0.08333333333333333), ('dave', 0.1)]
sorted(student_objects, key=methodcaller('weighted_grade'))
# Output: [('jane', 'B', 12), ('dave', 'B', 10), ('john', 'A', 15)]
Sorting in Ascending and Descending Order in Python
Both list.sort()
and sorted()
have a parameter reverse
that accepts a Boolean value. It is used to sort in descending order. Let’s sort students by age in descending order:
sorted(student_tuples, key=itemgetter(2), reverse=True)
# Output: [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
sorted(student_objects, key=attrgetter('age'), reverse=True)
# Output: [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
Stability of Sorting and Complex Sorting in Python
Starting with Python 2.2, it is guaranteed that sorting will be stable: if several entries have the same key, their order will remain the same. For example:
data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
sorted(data, key=itemgetter(0))
# Output: [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
Notice that both blue
entries maintained their original order. This property allows for creating complex sorting by performing sequential sorts. We can then sort student data first by age in ascending order and then by grade in descending order, so that the data is sorted first by grade and then by age:
s = sorted(student_objects, key=attrgetter('age')) # Sort by secondary key
sorted(s, key=attrgetter('grade'), reverse=True) # Sort by primary key in reverse
# Output: [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
Sorting algorithms like Timsort can perform multiple sorts so efficiently because they can take advantage of any order already present in the dataset.
Decorate-Sort-Undecorate
First, the original list is filled with new values that control the order. Then the new list is sorted. Finally, the attached values are removed, leaving the sorted list with the original elements.
This is how student data can be sorted by their grade:
decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
decorated.sort()
[student for grade, i, student in decorated] # Undecorate
# Output: [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
This works because tuples are compared lexicographically, comparing the first elements first, and if they are the same, then the second elements, and so on.
It is not always necessary to include the index in the decorated list, but it has its advantages:
- Sorting is stable – if two elements have the same key, their order will not change.
- There is no need to compare the source elements, as the order of the decorated tuples will be determined maximally by the first two elements. For example, the original list may contain complex numbers that cannot be directly compared.
This idiom is also called the Schwartzian Transform, named after Randal Schwartz, who popularized it among Perl programmers.
For large lists and versions of Python lower than 2.4, “decorate-sort-undecorate” sorting would be the optimal way. For versions 2.4+, key functions provide the same functionality.
Using the cmp
Parameter
Python 2.x versions supported the cmp
parameter in sorting functions to handle custom comparison functions. However, this parameter was completely removed in Python 3.0. In Python 2.x, you could pass a function to sort()
that would be used to compare elements. This function should take two arguments and return a negative value for “less than,” a positive value for “greater than,” and zero if they are equal:
def numeric_compare(x, y):
return x - y
sorted([5, 2, 4, 1, 3], cmp=numeric_compare)
# Output: [1, 2, 3, 4, 5]
Sorting can be reversed:
def reverse_numeric(x, y):
return y - x
sorted([5, 2, 4, 1, 3], cmp=reverse_numeric)
# Output: [5, 4, 3, 2, 1]
When porting code from version 2.x to 3.x, you might need to convert custom comparison functions to key functions. The following wrapper makes this task easier in Python:
def cmp_to_key(mycmp):
'Convert a cmp= function into a key= function'
class K(object):
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) < 0
def __gt__(self, other):
return mycmp(self.obj, other.obj) > 0
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) <= 0
def __ge__(self, other):
return mycmp(self.obj, other.obj) >= 0
def __ne__(self, other):
return mycmp(self.obj, other.obj) != 0
return K
sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
# Output: [5, 4, 3, 2, 1]
In Python 2.7, the cmp_to_key()
function was added to the functools
module.
Maintaining Sorted Order
Python’s standard library does not include a module equivalent to C++ data types like set and map. Instead, Python delegates these functions to third-party libraries available in the Python Package Index. These libraries use various methods to maintain sorted order for list, dictionary, and set types. Using a special data structure to maintain order can avoid the very slow process of naive sorting (quadratic time complexity) when frequently editing and re-sorting the data. Here are some modules that implement sorted data types:
- SortedContainers: Pure-Python implementation of sorted list, dictionary, and set types that claims to match the speed of C implementations. It includes 100% code coverage tests and hours of stress tests, with a complete API reference, performance comparison, and contribution guidelines.
- rbtree: Fast implementation of dictionary and set types in C, using the red-black tree data structure.
- treap: Sorted dictionary implementation using a Cartesian tree, with performance enhanced by Cython.
- bintrees: Several tree-based dictionary and set types implemented in C, with the fastest implementations based on AVL and red-black trees. Extends the standard API to provide set operations for dictionaries.
- banyan: Fast implementation of dictionary and set types in C.
- skiplistcollections: Pure-Python implementation based on the skip list, offering a limited API for dictionary and set types.
- blist: Provides sorted list, dictionary, and set types written in Python and C, based on a B-tree implementation called “blist.”
Other Considerations
To sort while considering language, use locale.strxfrm()
as the key function or locale.strcoll()
as the comparison function. The reverse
parameter maintains the stability of the sort. This effect can be emulated without any parameters by using the built-in reversed()
function twice:
data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
assert sorted(data, reverse=True) == list(reversed(sorted(reversed(data))))
To create a default sorting order for a class, implement the relevant comparison methods:
class Student:
def __init__(self, name, grade, age):
self.name = name
self.grade = grade
self.age = age
Student.__eq__ = lambda self, other: self.age == other.age
Student.__ne__ = lambda self, other: self.age != other.age
Student.__lt__ = lambda self, other: self.age < other.age
Student.__le__ = lambda self, other: self.age <= other.age
Student.__gt__ = lambda self, other: self.age > other.age
Student.__ge__ = lambda self, other: self.age >= other.age
sorted(student_objects)
# Output: [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
For types that can be compared naturally, it’s recommended to define all six comparison operators. The functools.total_ordering
class decorator simplifies their implementation. The key function does not need access to the internal data of the objects being sorted; it can also access external resources. For example, if students’ grades are stored in a dictionary, they can be used to sort a separate list of student names:
students = ['dave', 'john', 'jane']
newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
sorted(students, key=newgrades.__getitem__)
# Output: ['jane', 'dave', 'john']