Puzzler 24

Double Trouble

One of the differences between floating-point and integer arithmetic is the existence of the special floating-point value NaN. By and large, this value behaves predictably, but you still need to bear in mind how it can affect affect your code.

This code example sorts two collections of floating-point values. What is the result of executing the following code?

  def printSorted(a: Array[Double]) {
    util.Sorting.stableSort(a)
    println(a.mkString(" "))
  }
  
printSorted(Array(7.89Double.NaN1.234.56)) printSorted(Array(7.891.23Double.NaN4.56))

Possibilities

  1. Prints:
      1.23 4.56 7.89 NaN
      1.23 4.56 7.89 NaN
    
  2. Prints:
      1.23 4.56 7.89 NaN
      1.23 7.89 NaN 4.56
    
  3. Prints:
      NaN 1.23 4.56 7.89
      NaN 1.23 4.56 7.89
    
  4. Prints:
      1.23 4.56 7.89 NaN
      1.23 4.56 NaN 7.89
    

Explanation

You may ask yourself whether Double.NaN is regarded as greater, or smaller, than all other Doubles? Or does sorting leave NaNs in place? Surely, though, the presence of NaN values cannot cause arrays to be incorrectly sorted?

Oh, yes, it can—sometimes. The correct answer is number 2:

  scala> printSorted(Array(7.89Double.NaN1.234.56))
  1.23 4.56 7.89 NaN
  
scala> printSorted(Array(7.891.23Double.NaN4.56)) 1.23 7.89 NaN 4.56

To understand what is going on here, you first need to consider a few of the properties of NaN. To adhere to IEEE 754,[1] comparisons involving NaN using ==, <, >, <=, and >= always result in false in Java and Scala. Comparisons involving NaN using != always result in true.

  scala> 1.0 < Double.NaN
  res0: Boolean = false
  
scala> 1.0 > Double.NaN res1: Boolean = false
scala> Double.NaN != Double.NaN res2: Boolean = true

As of 2.10.0,[2] Scala's default implicit Ordering for Double, which is used by util.Sorting, adheres to IEEE 754. As a result, Ordering on Double is inconsistent. For example, compare(1.0, Double.NaN) is negative, implying that 1.0 is less than Double.NaN. On the other hand, the direct comparison lt(1.0, Double.NaN) returns false, in accordance with the IEEE criteria. This implies that 1.0 is not less than Double.NaN:

  val doubleOps = implicitly[Ordering[Double]]
  
scala> doubleOps.compare(1.0Double.NaN) res12: Int = -1
scala> doubleOps.lt(1.0Double.NaN) res13: Boolean = false

This inconsistency is the fundamental reason for the observed behavior of the code example. To see how this causes the second array in the code example, Array(7.89, 1.23, Double.NaN, 4.56), to be incorrectly sorted, you need to look more closely at the implementation of Sorting.stableSort, Scala's default stable sorting algorithm that is intended for almost-sorted arrays and sequences.[3]

To sort an array, stableSort first divides it down the middle and recursively sorts the resulting subarrays, by default using Ordering.lt to compare elements. To produce the final result, it then merges the two sorted subarrays by taking elements from the first subarray as long as they are less than or equal to—again according to Ordering.lt—the current "reference element" of the second subarray. The reference element (initially, the first element) is added to the result array only if is greater, whereupon the next element of the second subarray becomes the new reference.

The soundness of this process relies critically on the fact that the second subarray is assumed to be sorted: in that case, if the current element of the first half is less than or equal to the reference element of the second half, it is necessarily also less than or equal to all the subsequent elements in that half.

What happens in the case of the second array in the code example? First, stableSort recursively sorts 7.89, 1.23, and Double.NaN, 4.56, resulting in the sorted subarrays 1.23, 7.89, and Double.NaN, 4.56. The second subarray is indeed correctly sorted according to the IEEE specification, since Ordering.lt(4.56, Double.NaN) is false, i.e., Double.NaN <= 4.56.

Now, the algorithm starts the merge process, comparing the current element of the first half, 1.23, with the reference element of the second half, Double.NaN. Since 1.23 is less than or equal to Double.NaN (according to Ordering.lt), it is added to the result array. Then, stableSort compares 7.89, the next element of the first half, with Double.NaN and again concludes that, because it is less than or equal to Double.NaN, it should be added to the result array.

Herein lies the error: while Ordering.lt(Double.NaN, 7.89), i.e., 7.89 <= Double.NaN, is true, the conclusion that 7.89 is thus less than all subsequent elements in the second subarray—4.56, in this case—is false. By this point, however, both elements from the first subarray have already been processed. The algorithm concludes that any remaining elements in the second subarray must be larger than all the values already added to the result array, so it simply adds them to the result.

Discussion

The presence of the "unusual" Double.NaN value can at least give you a hint of what might be causing the unexpected behavior. Figuring out what is going on becomes a lot more difficult if there is any kind of "data cleanup" happening that strips out such values. Imagine trying to debug the following:

  import util.Sorting.stableSort
  
def filterNaN(arr: Array[Double]) = arr filter { !_.isNaN() }
val filterBeforeSort =    filterNaN(Array(1.237.89Double.NaN4.56))
stableSort(filterBeforeSort)
scala> println(filterBeforeSort mkString " ") 1.23 4.56 7.89
val sortBeforeFilter = Array(1.237.89Double.NaN4.56) stableSort(sortBeforeFilter)
scala> println(filterNaN(sortBeforeFilter) mkString " ") 1.23 7.89 4.56

Here, there really is no indication of what might be happening—you are simply getting an incorrectly sorted array.

To avoid this problem, define an ordering for Double—or pass an explicit comparison function to the sorting algorithms—that consistently handles NaN. Double.compare, which, like java.lang.Double.compareTo, treats Double.NaN as equal to itself and greater than all other Double values, is a useful candidate:

  def printSortedByDoubleCompare(a: Array[Double]) {
    def ltWithNaNGtAll(x: Double, y: Double) = 
      (x compare y) < 0
    util.Sorting.stableSort(a, ltWithNaNGtAll _)
    println(a.mkString(" "))
  }
  
scala> printSortedByDoubleCompare(     Array(7.89Double.NaN1.234.56)) 1.23 4.56 7.89 NaN
scala> printSortedByDoubleCompare(     Array(7.891.23Double.NaN4.56)) 1.23 4.56 7.89 NaN

Note that the behavior of Double.compare is not IEEE 754-compliant. Surprising as it may seem, there is no way of ordering Doubles that handles NaN consistently and is compliant with the IEEE specification.

image images/moralgraphic117px.png As of Scala 2.10.0, sorting Doubles using the default orderings does not correctly handle NaN. Define your own orderings, or provide explicit comparison functions to sorting algorithms when sorting collections of Doubles that may contain NaN.

Footnotes for Chapter 24:

[1] IEEE 754 is the Standard for Floating-Point Arithmetic.

[2] See SI-5104, "(Double.NaN min 0.0) yields 0.0, should be NaN." [Dou]

[3] A sorting algorithm is stable if it leaves the order of equal elements unchanged.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.96.105