ci: Run all flake-regressions tests

libexpr: Make attribute set comparison use deterministic lexicographic order of attributes
See the added test case for the reasoning. Current order of comparison depends on the global state and sneaks in an impurity. Introducing a canonical order for comparisons is the only reasonable approach for plugging this hole. The order was already unspecified and could change for various reasons like changes to internal files/symbols used, so this isn't exactly a breaking change. While we have the option of choosing an arbitrary comparison order we also choose a more efficient approach by first comparing all keys and only then forcing any values. Notice that this is still compatible with the non-deterministic iteration order that was used prior to this commit. We just behave "as if" we choose the most efficient iteration order that also happens to be strictly not less lazy. I consider plugging this purity hole to be a requirement for adding any sort of parallel/async eval, which would make the problem 100 times worse by introducing true randomness to evaluation (depending on the OS scheduling/order in which thunks finish evaluating). Note that the choice of iteration order is consistent with snix [1], which is good for reproducibility (modulo short circuit on non-matching keys). [1]: a35d6558dd/snix/eval/src/value/mod.rs (L625-L628)
2026-02-26 15:53:52 +03:00 · 2026-02-26 15:53:47 +03:00
9 changed files with 165 additions and 19 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -245,7 +245,7 @@ jobs:
          install_url: ${{ format('{0}/install', steps.installer-tarball-url.outputs.installer-url) }}
          install_options: ${{ format('--tarball-url-prefix {0}', steps.installer-tarball-url.outputs.installer-url) }}
      - name: Run flake regressions tests
-        run: MAX_FLAKES=25 flake-regressions/eval-all.sh
+        run: MAX_FLAKES=100000 flake-regressions/eval-all.sh

  profile_build:
    needs: tests
--- a/doc/manual/rl-next/deterministic-comparison-order-attrsets.md
+++ b/doc/manual/rl-next/deterministic-comparison-order-attrsets.md
@@ -0,0 +1,40 @@
+---
+synopsis: "Attribute set equality now compares in lexicographic order of attribute names"
+prs: [15327]
+---
+
+Attribute set equality (`==`) now compares attributes in lexicographic order of attribute names, rather than the internal symbol table order. Comparing attribute sets with mismatching `attrNames` now never evaluates any values, whereas previously that was undefined and depended on the unspecified symbol table order.
+
+Previously, the evaluation order of attribute values during comparison was non-deterministic, meaning expressions like the following could silently succeed or fail depending on evaluation history or unspecified order of evaluation.
+
+For example, depending on the order in which attribute names `a` and `b` have been seen by the evaluator, the following code would either yield `false` or fail to evaluate.
+
+```nix
+{ a = throw "oops"; b = 2; } == { a = 2; b = 1; }
+```
+
+
+```
+nix-repl> a = 1
+
+nix-repl> b = 2
+
+nix-repl> { a = throw "oops"; b = 2; } == { a = 2; b = 1; }
+error:
+       … while calling the 'throw' builtin
+         at «string»:1:7:
+            1| { a = throw "oops"; b = 2; } == { a = 2; b = 1; }
+             |       ^
+
+       error: oops
+```
+
+```
+nix-repl> b = 2
+
+nix-repl> a = 1
+
+nix-repl> { a = throw "oops"; b = 2; } == { a = 2; b = 1; }
+false
+```
+
--- a/doc/manual/source/language/operators.md
+++ b/doc/manual/source/language/operators.md
@@ -188,13 +188,14 @@ All comparison operators are implemented in terms of `<`, and the following equi

 ## Equality

- [Attribute sets][attribute set] are compared first by attribute names and then by items until a difference is found.
+- [Attribute sets][attribute set] are compared by size first and then by attribute names and their values until a difference is found.
 - [Lists][list] are compared first by length and then by items until a difference is found.
 - Comparison of distinct [functions][function] returns `false`, but identical functions may be subject to [value identity optimization](#value-identity-optimization).
 - Numbers are type-compatible, see [arithmetic] operators.
 - Floating point numbers only differ up to a limited precision.

-The `==` operator is [strict](@docroot@/language/evaluation.md#strictness) in both arguments; when comparing composite types ([attribute sets][attribute set] and [lists][list]), it is partially strict in their contained values: they are evaluated until a difference is found. <!-- this is woefully underspecified, affecting which expressions evaluate correctly; not just "ordering" or error messages. -->
+The `==` operator is [strict](@docroot@/language/evaluation.md#strictness) in both arguments; when comparing composite types ([attribute sets][attribute set] and [lists][list]), it is partially strict in their contained values: they are evaluated until a difference is found. On `attrNames` mismatches, no contained values are evaluated. Attribute set comparison `a == b` behaves equivalently to `(builtins.attrNames a) == (builtins.attrNames b) && (builtins.attrValues a == builtins.attrValues b)` if [value identity optimization](#value-identity-optimization) is not taken into account.
+Equality comparison of contained values happens in canonical order: attributes are compared in lexicographic order of attribute names and list elements are compared in the order in which they appear in the list.

 ### Value identity optimization

--- a/src/libexpr/eval.cc
+++ b/src/libexpr/eval.cc
@@ -2794,9 +2794,12 @@ void EvalState::assertEqValues(Value & v1, Value & v2, const PosIdx pos, std::st
        return;

    case nAttrs: {
+        const Bindings & attrs1 = *v1.attrs();
+        const Bindings & attrs2 = *v2.attrs();
+
        if (isDerivation(v1) && isDerivation(v2)) {
-            auto i = v1.attrs()->get(s.outPath);
-            auto j = v2.attrs()->get(s.outPath);
+            auto i = attrs1.get(s.outPath);
+            auto j = attrs2.get(s.outPath);
            if (i && j) {
                try {
                    assertEqValues(*i->value, *j->value, pos, errorCtx);
@@ -2809,7 +2812,7 @@ void EvalState::assertEqValues(Value & v1, Value & v2, const PosIdx pos, std::st
            }
        }

-        if (v1.attrs()->size() != v2.attrs()->size()) {
+        if (attrs1.size() != attrs2.size()) {
            error<AssertionError>(
                "attribute names of attribute set '%s' differs from attribute set '%s'",
                ValuePrinter(*this, v1, errorPrintOptions),
@@ -2817,12 +2820,8 @@ void EvalState::assertEqValues(Value & v1, Value & v2, const PosIdx pos, std::st
                .debugThrow();
        }

-        // Like normal comparison, we compare the attributes in non-deterministic Symbol index order.
-        // This function is called when eqValues has found a difference, so to reliably
-        // report about its result, we should follow in its literal footsteps and not
-        // try anything fancy that could lead to an error.
-        Bindings::const_iterator i, j;
-        for (i = v1.attrs()->begin(), j = v2.attrs()->begin(); i != v1.attrs()->end(); ++i, ++j) {
+        /* This mirrors what EvalState::eqValues does. See the corresponding comment there. */
+        for (auto i = attrs1.begin(), j = attrs2.begin(); i != attrs1.end(); ++i, ++j) {
            if (i->name != j->name) {
                // A difference in a sorted list means that one attribute is not contained in the other, but we don't
                // know which. Let's find out. Could use <, but this is more clear.
@@ -2844,6 +2843,22 @@ void EvalState::assertEqValues(Value & v1, Value & v2, const PosIdx pos, std::st
                }
                assert(false);
            }
+        }
+
+        /* Comparing in lexicographical order is crucial for determinism. Short-circuiting
+           the comparison without a canonical ordering of iteration would mean that depending
+           on the global symbol table order this could either fail or successfully evaluate if
+           forcing an attribute leads to an error.
+           This function is called when eqValues has found a difference, so to reliably
+           report about its result, we should follow in its literal footsteps and not
+           try anything fancy that could lead to an error. */
+        boost::container::small_vector<Attr, conservativeStackReservation> sorted1;
+        v1.attrs()->lexicographicOrder(sorted1, symbols);
+        boost::container::small_vector<Attr, conservativeStackReservation> sorted2;
+        v2.attrs()->lexicographicOrder(sorted2, symbols);
+
+        for (auto i = sorted1.begin(), j = sorted2.begin(); i != sorted1.end(); ++i, ++j) {
+            assert(i->name == j->name);
            try {
                assertEqValues(*i->value, *j->value, pos, errorCtx);
            } catch (Error & e) {
@@ -2951,23 +2966,46 @@ bool EvalState::eqValues(Value & v1, Value & v2, const PosIdx pos, std::string_v
        return true;

    case nAttrs: {
+        const Bindings & attrs1 = *v1.attrs();
+        const Bindings & attrs2 = *v2.attrs();
+
        /* If both sets denote a derivation (type = "derivation"),
           then compare their outPaths. */
        if (isDerivation(v1) && isDerivation(v2)) {
-            auto i = v1.attrs()->get(s.outPath);
-            auto j = v2.attrs()->get(s.outPath);
+            auto i = attrs1.get(s.outPath);
+            auto j = attrs2.get(s.outPath);
            if (i && j)
                return eqValues(*i->value, *j->value, pos, errorCtx);
        }

-        if (v1.attrs()->size() != v2.attrs()->size())
+        if (attrs1.size() != attrs2.size())
            return false;

-        /* Otherwise, compare the attributes one by one. */
-        Bindings::const_iterator i, j;
-        for (i = v1.attrs()->begin(), j = v2.attrs()->begin(); i != v1.attrs()->end(); ++i, ++j)
-            if (i->name != j->name || !eqValues(*i->value, *j->value, pos, errorCtx))
+        /* Compare in 2 passes: first make sure the attribute keys are equal
+           without evaluating or recursing into values. Historically, attribute
+           set comparison has been done in undefined symbol table order. This
+           approach is compatible with nondeterministic iteration order. */
+        for (auto i = attrs1.begin(), j = attrs2.begin(); i != attrs1.end(); ++i, ++j) {
+            /* If we do SoA transformation for Bindings then this could just be a memcmp. */
+            if (i->name != j->name)
                return false;
+        }
+
+        /* Comparing in lexicographical order is crucial for determinism. Short-circuiting
+           the comparison without a canonical ordering of iteration would mean that depending
+           on the global symbol table order this could either fail or successfully evaluate if
+           forcing an attribute leads to an error. */
+        boost::container::small_vector<Attr, conservativeStackReservation> sorted1;
+        v1.attrs()->lexicographicOrder(sorted1, symbols);
+        boost::container::small_vector<Attr, conservativeStackReservation> sorted2;
+        v2.attrs()->lexicographicOrder(sorted2, symbols);
+
+        /* Otherwise, compare the attributes one by one. */
+        for (auto i = sorted1.begin(), j = sorted2.begin(); i != sorted1.end(); ++i, ++j) {
+            assert(i->name == j->name);
+            if (!eqValues(*i->value, *j->value, pos, errorCtx))
+                return false;
+        }

        return true;
    }
--- a/src/libexpr/include/nix/expr/attr-set.hh
+++ b/src/libexpr/include/nix/expr/attr-set.hh
@@ -414,6 +414,29 @@ public:

    void sort();

+    /**
+     * Get the attributes in lexicographically sorted order. Needed for deterministic
+     * iteration order.
+     *
+     * @note Unlike the overload returning a vector of const Attr *, this
+     * copies the attribute values into a @ref out container by value. This is
+     * to reduce the amount of indirection. Note that the resulting attribute
+     * values are copies and one must not use a pointer to the Attr object for
+     * any purpose other than reading its members.
+     */
+    template<typename C>
+    void lexicographicOrder(C & out, const SymbolTable & symbols) const
+    {
+        out.reserve(size());
+        for (auto attr : *this)
+            out.push_back(attr);
+        assert(out.size() == size());
+        std::sort(out.begin(), out.end(), [&symbols](const Attr & a, const Attr & b) {
+            std::string_view sa = symbols[a.name], sb = symbols[b.name];
+            return sa < sb;
+        });
+    }
+
    /**
     * Returns the attributes in lexicographically sorted order.
     */
--- a/tests/functional/lang/eval-okay-attrset-comparison-mismatching-doesnt-force.exp
+++ b/tests/functional/lang/eval-okay-attrset-comparison-mismatching-doesnt-force.exp
@@ -0,0 +1 @@
+"ok"
--- a/tests/functional/lang/eval-okay-attrset-comparison-mismatching-doesnt-force.nix
+++ b/tests/functional/lang/eval-okay-attrset-comparison-mismatching-doesnt-force.nix
@@ -0,0 +1,20 @@
+# Test that comparing attribute sets with mismatched attrNames never forces values.
+
+let
+  attrs1 = {
+    # This symbol will appear first in the symbol table, but later in the canonical
+    # lexicographical order.
+    symbol2 = throw "a";
+    symbol3 = throw "b";
+  };
+
+  attrs2 = {
+    symbol2 = throw "a";
+    symbol4 = throw "b";
+  };
+
+  equal = attrs1 == attrs2;
+in
+assert attrs1 != attrs2;
+assert !equal;
+"ok"
--- a/tests/functional/lang/eval-okay-attrset-comparison-order.exp
+++ b/tests/functional/lang/eval-okay-attrset-comparison-order.exp
@@ -0,0 +1 @@
+"ok"
--- a/tests/functional/lang/eval-okay-attrset-comparison-order.nix
+++ b/tests/functional/lang/eval-okay-attrset-comparison-order.nix
@@ -0,0 +1,22 @@
+# Test that comparison happens in lexicographical order of attribute names and
+# not in the unspecified symbol table order. Symbol table ordering is an impurity
+# that changes depending on the evaluation order.
+
+let
+  attrs1 = {
+    # This symbol will appear first in the symbol table, but later in the canonical
+    # lexicographical order.
+    symbol2 = throw "bad";
+    symbol1 = "str1";
+  };
+
+  attrs2 = {
+    symbol1 = "str2";
+    symbol2 = "ok";
+  };
+
+  equal = attrs1 == attrs2;
+in
+assert attrs1 != attrs2;
+assert !equal;
+"ok"