DISTINCT in SQL

Nov 12, 2011

How does a RDBMS execute a query that has the DISTINCT keyword? The most effective way to ensure the uniqueness of the returned rows of a query is to sort them first; that is, to sort the result based on all fields. Depending on the number of fields SELECTed, this sorting could take a lot of time and need a lot of RAM. If the available RAM is not enough, the RDBMS will resort to using the disk, which is too slow.

So, it is very important to SELECT as few fields as possible. This makes the sorting phase much faster and, additionally, requires less RAM. We had such an issue at Transifex.