Puzzler 9

Init You, Init Me

As your programs get larger, you may end up with modules that have cyclic dependencies. Reliably initializing such modules can be challenging.

What is the result of executing the following code in the REPL?

  object XY {
    object X {
      val value: Int = Y.value + 1
    }
    object Y {
      val value: Int = X.value + 1
    }
  }
  
println(if (math.random > 0.5) XY.X.value else XY.Y.value)

Possibilities

  1. Prints:
      1
    
  2. Prints:
      2
    
  3. Prints either:
      1
    

    or:

      2
    
  4. Throws a runtime exception.

Explanation

You may wonder whether the Scala compiler can even handle cyclic definitions of this kind, or whether you will run into an endless loop at runtime.

If you are confident that Scala can indeed handle such definitions without blowing up, you may suspect that the values will be initialized in declaration order. Since you randomly print either the value that is declared first (XY.X.value) or second (XY.Y.value), you would expect to see a non-deterministic result in this case.

Alternatively, you may guess that, while initializing the object accessed first, you will see default values for the other as-yet-uninitialized object, resulting in 1 being printed every time.

In fact, the correct answer is number 2. The value 2 is printed every time:

  scala> println(if (math.random > 0.5) XY.X.value else 
           XY.Y.value)
  2
  
scala> println(s"X: ${XY.X.value} Y: ${XY.Y.value}") X: 2 Y: 1

And after a few :reset commands, you should eventually see:[1]

  scala> println(if (math.random > 0.5) XY.X.value else 
           XY.Y.value)
  2
  
scala> println(s"X: ${XY.X.value} Y: ${XY.Y.value}") X: 1 Y: 2

To understand what is going on, we'll first demonstrate that the Scala compiler has no problem with cycles in val definitions. It does, however, require at least one explicit type specification:

  scala> lazy val x = y; lazy val y = x
  <console>:12: error: recursive value y needs type
         lazy val x = y; lazy val y = x
                      ^
  
scala> lazy val x: Int = y; lazy val y = x x: Int = <lazy> y: Int = <lazy>

The language specification says that "the value defined by an object definition is instantiated lazily,"[2] and goes even further by remarking that an object can indeed be seen as "roughly equivalent to [...] a lazy value." This explains why the random choice of printing XY.X.value or XY.Y.value does not influence the outcome: the declaration order is irrelevant, since the values are not initialized when the objects are declared, but when they are accessed. The chosen object is always the first to be initialized, and the object initialized first ends up with the value 2.

But how does the object initialized first end up with the value 2, and why do you not run into an endless loop once you access it? Here, it helps to examine the output produced by the compiler when you compile just the XY object with scalac -print:

  [[syntax trees at end of        cleanup]] // XY.scala
  package <empty> {
    object XY extends Object {
      def <init>(): XY.type = {
        XY.super.<init>();
        ()
      }
    };
    object XY$X extends Object {
      private[thisval value: Int = _;
      <stable> <accessor> def value(): Int = XY$X.this.value;
      def <init>(): XY$X.type = {
        XY$X.super.<init>();
        XY$X.this.value = XY$Y.value().+(1);
        ()
      }
    };
    object XY$Y extends Object {
      private[thisval value: Int = _;
      <stable> <accessor> def value(): Int = XY$Y.this.value;
      def <init>(): XY$Y.type = {
        XY$Y.super.<init>();
        XY$Y.this.value = XY$X.value().+(1);
        ()
      }
    }
  }

What happens when you access the randomly chosen object? Assume you are trying to get XY.Y.value:

  1. XY is initialized by calling its constructor. Uneventful.
  2. XY$Y is, in turn, initialized by calling its constructor, which attempts to get X's value through the accessor, XY$X.value().
  3. The call to XY$X.value() triggers the initialization of XY$X, again, through its constructor. Therefore, it now tries to retrieve the value for Y by calling XY$Y.value().
  4. At this point, Y has still not been initialized, so you seem to be on the brink of an endless loop. But now "magic" happens: the JVM specification stipulates that instances cannot be initialized multiple times.[3] As a result, XY$X directly invokes XY$Y's accessor method value(), which, since the value has not yet been defined, returns 0, the default value for the Int type.
  5. Given this value 0, the constructor of XY$X can now complete the initialization of XY$X.this.value, setting it to 1 and returning.
  6. At last, the call to XY$X.value() in XY$Y's constructor can proceed, returning the value 1.
  7. Given this value 1, the constructor of XY$Y completes the assignment of XY$Y.this.value, setting it to 2.

If you happen to choose XY.X.value as the value to print, the initialization takes place with roles reversed. This explains why the first-accessed object will always receive a value of 2, with 1 being assigned to the value of the other object.

Discussion

The observed behavior becomes more surprising when you compare it to what happens with similar kinds of cyclic definitions. For example, given that The Scala Language Specification says that objects are "roughly equivalent to [lazy values]," you might try:[4]

  object XY2 {
    lazy val xvalue: Int = yvalue + 1
    lazy val yvalue: Int = xvalue + 1
  }
  
scala> println(if (math.random > 0.5) XY2.xvalue else           XY2.yvalue) java.lang.StackOverflowError   ...   at XY2$.xvalue(<console>:8)   at XY2$.yvalue$lzycompute(<console>:9)   at XY2$.yvalue(<console>:9)   at XY2$.xvalue$lzycompute(<console>:8)   at XY2$.xvalue(<console>:8)

Or you could stick with objects, but put them inside an enclosing class instead of an object:

  class XY3 {
    object X {
      val value: Int = Y.value + 1
    }
    object Y {
      val value: Int = X.value + 1
    }
  }
  
scala> val xy3 = new XY3() xy3: XY3 = XY3@770b07b9
scala> println(if (math.random > 0.5) xy3.X.value else           xy3.Y.value) java.lang.StackOverflowError   ...   at XY3.Y$lzycompute(<console>:11)   at XY3.Y(<console>:11)   at XY3$X$.<init>(<console>:9)   at XY3.X$lzycompute(<console>:8)   at XY3.X(<console>:8)   at XY3$Y$.<init>(<console>:12)   at XY3.Y$lzycompute(<console>:11)

In both cases, you are missing the "endless loop protection" provided by the JVM's inability to initialize the same instance more than once. The compiler happily allows two functions to each each other, so you throw an exception at runtime.

In the second example, Y$lzycompute starts creating a new instance of Y to assign to the XY.Y singleton. This tries to access XY.X, which triggers X$lzycompute and, because XY.Y has not been initialized yet, invokes Y$lzycompute again. Y$lzycompute tries to create another instance of Y, and so on.

Alternatively, you can be slightly "less lazy":

  object XY4 {
    lazy val xvalue: Int = yvalue + 1
    val yvalue: Int = xvalue + 1
  }
  
scala> println(if (math.random > 0.5) XY4.xvalue else           XY4.yvalue) 2
scala> println(s"X: ${XY4.xvalue} Y: ${XY4.yvalue}") X: 1 Y: 2

Now, it is no longer the order in which the values are accessed that determines their values: for XY4, yvalue will be evaluated as soon as XY4 is initialized. This triggers the evaluation of xvalue, which sees the default value 0 for yvalue and becomes 1, with yvalue always becoming 2. The order in which xvalue and yvalue are declared still does not matter, though:

  object XY4a {
    val yvalue: Int = xvalue + 1
    lazy val xvalue: Int = yvalue + 1
  }
  
scala> println(if (math.random > 0.5) XY4a.xvalue else           XY4a.yvalue) 1
scala> println(s"X: ${XY4a.xvalue} Y: ${XY4a.yvalue}") X: 1 Y: 2

You can also avoid lazy values entirely:

  object XY5 {
    val xvalue: Int = yvalue + 1
    val yvalue: Int = xvalue + 1
  }
  
scala> println(if (math.random > 0.5) XY5.xvalue else           XY5.yvalue) 1
scala> println(s"X: ${XY5.xvalue} Y: ${XY5.yvalue}") X: 1 Y: 2

Here, both xvalue and yvalue are immediately evaluated on initialization of XY5. xvalue tries to retrieve the value of the as-yet-unassigned yvalue, again sees the default value 0, and is set to 1. yvalue is then always set to 2. Here, though, the problem is so predictable that the compiler emits a warning as soon as XY5 is defined:

  scala> object XY5 {
           val xvalue: Int = yvalue + 1
           val yvalue: Int = xvalue + 1
         }
  <console>:8: warning: Reference to uninitialized value yvalue
           val xvalue: Int = yvalue + 1
                             ^
  defined object XY5

Furthermore, unlike the other examples, here the declaration order determines the values of xvalue and yvalue. Inverting the order flips the values:

  object XY5a {
    val yvalue: Int = xvalue + 1
    val xvalue: Int = yvalue + 1
  }
  
scala> println(s"X: ${XY5a.xvalue} Y: ${XY5a.yvalue}") X: 2 Y: 1

In summary, cyclic dependencies and definitions are tricky and hard to reason about. Some forms are dependent on the declaration order, others on the order of initialization, yet others result in endless loops. Avoid them where possible.

image images/moralgraphic117px.png Avoid cyclic dependencies and definitions where possible. If you really can find no way to remove the cycle, ensure you understand the initialization behavior of all its components and values. Test thoroughly to ensure you get the intended result, especially if the order in which elements will be initialized is not deterministic.

Footnotes for Chapter 9:

[1] The :reset command tells the REPL to "forget" all definitions, allowing you to initialize XY.X.value and XY.Y.value again.

[2] Odersky, The Scala Language Specification, Section 5.4. [Ode14]

[3] Lindholm, et. al., The Java Virtual Machine Specification, Section 5.5. [Lin13]

[4] See Puzzler 4 for a more detailed discussion of initialization options for variables.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.137.58