Compare commits

...

2 Commits

Author SHA1 Message Date
Sergei Zimmerman
dee3b49920 ci: Run all flake-regressions tests 2026-02-26 15:53:52 +03:00
Sergei Zimmerman
2d9bb33ba2 libexpr: Make attribute set comparison use deterministic lexicographic order of attributes
See the added test case for the reasoning. Current order of comparison
depends on the global state and sneaks in an impurity. Introducing a
canonical order for comparisons is the only reasonable approach for
plugging this hole. The order was already unspecified and could change
for various reasons like changes to internal files/symbols used, so this
isn't exactly a breaking change.

While we have the option of choosing an arbitrary comparison order we
also choose a more efficient approach by first comparing all keys and
only then forcing any values. Notice that this is still compatible with
the non-deterministic iteration order that was used prior to this commit.
We just behave "as if" we choose the most efficient iteration order that
also happens to be strictly not less lazy.

I consider plugging this purity hole to be a requirement for adding
any sort of parallel/async eval, which would make the problem 100 times
worse by introducing true randomness to evaluation (depending on the
OS scheduling/order in which thunks finish evaluating).

Note that the choice of iteration order is consistent with snix [1],
which is good for reproducibility (modulo short circuit on non-matching keys).

[1]: a35d6558dd/snix/eval/src/value/mod.rs (L625-L628)
2026-02-26 15:53:47 +03:00
9 changed files with 165 additions and 19 deletions

View File

@@ -245,7 +245,7 @@ jobs:
install_url: ${{ format('{0}/install', steps.installer-tarball-url.outputs.installer-url) }}
install_options: ${{ format('--tarball-url-prefix {0}', steps.installer-tarball-url.outputs.installer-url) }}
- name: Run flake regressions tests
run: MAX_FLAKES=25 flake-regressions/eval-all.sh
run: MAX_FLAKES=100000 flake-regressions/eval-all.sh
profile_build:
needs: tests

View File

@@ -0,0 +1,40 @@
---
synopsis: "Attribute set equality now compares in lexicographic order of attribute names"
prs: [15327]
---
Attribute set equality (`==`) now compares attributes in lexicographic order of attribute names, rather than the internal symbol table order. Comparing attribute sets with mismatching `attrNames` now never evaluates any values, whereas previously that was undefined and depended on the unspecified symbol table order.
Previously, the evaluation order of attribute values during comparison was non-deterministic, meaning expressions like the following could silently succeed or fail depending on evaluation history or unspecified order of evaluation.
For example, depending on the order in which attribute names `a` and `b` have been seen by the evaluator, the following code would either yield `false` or fail to evaluate.
```nix
{ a = throw "oops"; b = 2; } == { a = 2; b = 1; }
```
```
nix-repl> a = 1
nix-repl> b = 2
nix-repl> { a = throw "oops"; b = 2; } == { a = 2; b = 1; }
error:
… while calling the 'throw' builtin
at «string»:1:7:
1| { a = throw "oops"; b = 2; } == { a = 2; b = 1; }
| ^
error: oops
```
```
nix-repl> b = 2
nix-repl> a = 1
nix-repl> { a = throw "oops"; b = 2; } == { a = 2; b = 1; }
false
```

View File

@@ -188,13 +188,14 @@ All comparison operators are implemented in terms of `<`, and the following equi
## Equality
- [Attribute sets][attribute set] are compared first by attribute names and then by items until a difference is found.
- [Attribute sets][attribute set] are compared by size first and then by attribute names and their values until a difference is found.
- [Lists][list] are compared first by length and then by items until a difference is found.
- Comparison of distinct [functions][function] returns `false`, but identical functions may be subject to [value identity optimization](#value-identity-optimization).
- Numbers are type-compatible, see [arithmetic] operators.
- Floating point numbers only differ up to a limited precision.
The `==` operator is [strict](@docroot@/language/evaluation.md#strictness) in both arguments; when comparing composite types ([attribute sets][attribute set] and [lists][list]), it is partially strict in their contained values: they are evaluated until a difference is found. <!-- this is woefully underspecified, affecting which expressions evaluate correctly; not just "ordering" or error messages. -->
The `==` operator is [strict](@docroot@/language/evaluation.md#strictness) in both arguments; when comparing composite types ([attribute sets][attribute set] and [lists][list]), it is partially strict in their contained values: they are evaluated until a difference is found. On `attrNames` mismatches, no contained values are evaluated. Attribute set comparison `a == b` behaves equivalently to `(builtins.attrNames a) == (builtins.attrNames b) && (builtins.attrValues a == builtins.attrValues b)` if [value identity optimization](#value-identity-optimization) is not taken into account.
Equality comparison of contained values happens in canonical order: attributes are compared in lexicographic order of attribute names and list elements are compared in the order in which they appear in the list.
### Value identity optimization

View File

@@ -2794,9 +2794,12 @@ void EvalState::assertEqValues(Value & v1, Value & v2, const PosIdx pos, std::st
return;
case nAttrs: {
const Bindings & attrs1 = *v1.attrs();
const Bindings & attrs2 = *v2.attrs();
if (isDerivation(v1) && isDerivation(v2)) {
auto i = v1.attrs()->get(s.outPath);
auto j = v2.attrs()->get(s.outPath);
auto i = attrs1.get(s.outPath);
auto j = attrs2.get(s.outPath);
if (i && j) {
try {
assertEqValues(*i->value, *j->value, pos, errorCtx);
@@ -2809,7 +2812,7 @@ void EvalState::assertEqValues(Value & v1, Value & v2, const PosIdx pos, std::st
}
}
if (v1.attrs()->size() != v2.attrs()->size()) {
if (attrs1.size() != attrs2.size()) {
error<AssertionError>(
"attribute names of attribute set '%s' differs from attribute set '%s'",
ValuePrinter(*this, v1, errorPrintOptions),
@@ -2817,12 +2820,8 @@ void EvalState::assertEqValues(Value & v1, Value & v2, const PosIdx pos, std::st
.debugThrow();
}
// Like normal comparison, we compare the attributes in non-deterministic Symbol index order.
// This function is called when eqValues has found a difference, so to reliably
// report about its result, we should follow in its literal footsteps and not
// try anything fancy that could lead to an error.
Bindings::const_iterator i, j;
for (i = v1.attrs()->begin(), j = v2.attrs()->begin(); i != v1.attrs()->end(); ++i, ++j) {
/* This mirrors what EvalState::eqValues does. See the corresponding comment there. */
for (auto i = attrs1.begin(), j = attrs2.begin(); i != attrs1.end(); ++i, ++j) {
if (i->name != j->name) {
// A difference in a sorted list means that one attribute is not contained in the other, but we don't
// know which. Let's find out. Could use <, but this is more clear.
@@ -2844,6 +2843,22 @@ void EvalState::assertEqValues(Value & v1, Value & v2, const PosIdx pos, std::st
}
assert(false);
}
}
/* Comparing in lexicographical order is crucial for determinism. Short-circuiting
the comparison without a canonical ordering of iteration would mean that depending
on the global symbol table order this could either fail or successfully evaluate if
forcing an attribute leads to an error.
This function is called when eqValues has found a difference, so to reliably
report about its result, we should follow in its literal footsteps and not
try anything fancy that could lead to an error. */
boost::container::small_vector<Attr, conservativeStackReservation> sorted1;
v1.attrs()->lexicographicOrder(sorted1, symbols);
boost::container::small_vector<Attr, conservativeStackReservation> sorted2;
v2.attrs()->lexicographicOrder(sorted2, symbols);
for (auto i = sorted1.begin(), j = sorted2.begin(); i != sorted1.end(); ++i, ++j) {
assert(i->name == j->name);
try {
assertEqValues(*i->value, *j->value, pos, errorCtx);
} catch (Error & e) {
@@ -2951,23 +2966,46 @@ bool EvalState::eqValues(Value & v1, Value & v2, const PosIdx pos, std::string_v
return true;
case nAttrs: {
const Bindings & attrs1 = *v1.attrs();
const Bindings & attrs2 = *v2.attrs();
/* If both sets denote a derivation (type = "derivation"),
then compare their outPaths. */
if (isDerivation(v1) && isDerivation(v2)) {
auto i = v1.attrs()->get(s.outPath);
auto j = v2.attrs()->get(s.outPath);
auto i = attrs1.get(s.outPath);
auto j = attrs2.get(s.outPath);
if (i && j)
return eqValues(*i->value, *j->value, pos, errorCtx);
}
if (v1.attrs()->size() != v2.attrs()->size())
if (attrs1.size() != attrs2.size())
return false;
/* Otherwise, compare the attributes one by one. */
Bindings::const_iterator i, j;
for (i = v1.attrs()->begin(), j = v2.attrs()->begin(); i != v1.attrs()->end(); ++i, ++j)
if (i->name != j->name || !eqValues(*i->value, *j->value, pos, errorCtx))
/* Compare in 2 passes: first make sure the attribute keys are equal
without evaluating or recursing into values. Historically, attribute
set comparison has been done in undefined symbol table order. This
approach is compatible with nondeterministic iteration order. */
for (auto i = attrs1.begin(), j = attrs2.begin(); i != attrs1.end(); ++i, ++j) {
/* If we do SoA transformation for Bindings then this could just be a memcmp. */
if (i->name != j->name)
return false;
}
/* Comparing in lexicographical order is crucial for determinism. Short-circuiting
the comparison without a canonical ordering of iteration would mean that depending
on the global symbol table order this could either fail or successfully evaluate if
forcing an attribute leads to an error. */
boost::container::small_vector<Attr, conservativeStackReservation> sorted1;
v1.attrs()->lexicographicOrder(sorted1, symbols);
boost::container::small_vector<Attr, conservativeStackReservation> sorted2;
v2.attrs()->lexicographicOrder(sorted2, symbols);
/* Otherwise, compare the attributes one by one. */
for (auto i = sorted1.begin(), j = sorted2.begin(); i != sorted1.end(); ++i, ++j) {
assert(i->name == j->name);
if (!eqValues(*i->value, *j->value, pos, errorCtx))
return false;
}
return true;
}

View File

@@ -414,6 +414,29 @@ public:
void sort();
/**
* Get the attributes in lexicographically sorted order. Needed for deterministic
* iteration order.
*
* @note Unlike the overload returning a vector of const Attr *, this
* copies the attribute values into a @ref out container by value. This is
* to reduce the amount of indirection. Note that the resulting attribute
* values are copies and one must not use a pointer to the Attr object for
* any purpose other than reading its members.
*/
template<typename C>
void lexicographicOrder(C & out, const SymbolTable & symbols) const
{
out.reserve(size());
for (auto attr : *this)
out.push_back(attr);
assert(out.size() == size());
std::sort(out.begin(), out.end(), [&symbols](const Attr & a, const Attr & b) {
std::string_view sa = symbols[a.name], sb = symbols[b.name];
return sa < sb;
});
}
/**
* Returns the attributes in lexicographically sorted order.
*/

View File

@@ -0,0 +1,20 @@
# Test that comparing attribute sets with mismatched attrNames never forces values.
let
attrs1 = {
# This symbol will appear first in the symbol table, but later in the canonical
# lexicographical order.
symbol2 = throw "a";
symbol3 = throw "b";
};
attrs2 = {
symbol2 = throw "a";
symbol4 = throw "b";
};
equal = attrs1 == attrs2;
in
assert attrs1 != attrs2;
assert !equal;
"ok"

View File

@@ -0,0 +1 @@
"ok"

View File

@@ -0,0 +1,22 @@
# Test that comparison happens in lexicographical order of attribute names and
# not in the unspecified symbol table order. Symbol table ordering is an impurity
# that changes depending on the evaluation order.
let
attrs1 = {
# This symbol will appear first in the symbol table, but later in the canonical
# lexicographical order.
symbol2 = throw "bad";
symbol1 = "str1";
};
attrs2 = {
symbol1 = "str2";
symbol2 = "ok";
};
equal = attrs1 == attrs2;
in
assert attrs1 != attrs2;
assert !equal;
"ok"