It's not being excersised by anything and already contains
bugs (like the fact that copyRecursive only handles the first entry
in a directory). Evidently, it can just be removed and was never used
by anything.
We don't use the various set<string_view>s that we construct,
and all we really care about is ensuring that all outputs are
of a single, consistent type.
This is necessary to ban symlink following. It can be considered
a defense in depth against issues similar to CVE-2024-45593. By
slightly changing the API in a follow-up commit we will be able
to mitigate the symlink following issue for good.
This is an example of "Parse, don't validate" principle [1].
Before, we had a number of `StringSet`s in `DerivationOptions` that
were not *actually* allowed to be arbitrary sets of strings. Instead,
each set member had to be one of:
- a store path
- a CA "downstream placeholder"
- an output name
Only later, in the code that checks outputs, would these strings be
further parsed to match these cases. (Actually, only 2 by that point,
because the placeholders must be rewritten away by then.)
Now, we fully parse everything up front, and have an "honest" data type
that reflects these invariants:
- store paths are parsed, stored as (opaque) deriving paths
- CA "downstream placeholders" are rewritten to the output deriving
paths they denote
- output names are the only arbitrary strings left
Since the first two cases both become deriving paths, that leaves us
with a `std::variant<SingleDerivedPath, String>` data type, which we use
in our sets instead.
Getting rid of placeholders is especially nice because we are replacing
them with something much more internally-structured / transparent.
[1]: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
Now the error message looks something like:
error:
… during upload of 'file:///tmp/storeabc/4yxrw9flcvca7f3fs7c5igl2ica39zaw.narinfo'
error: blah blah
Also makes fail and failEx themselves noexcept, since all the operations they
do are noexcept and we don't want exceptions escaping from them.
The indentation level of the code is already high enough. We can just
wrap the whole function in a try/catch and mark it noexcept.
Partially cherry-picked from https://gerrit.lix.systems/c/lix/+/2133
Co-authored-by: eldritch horrors <pennae@lix.systems>
Makes the cross-x86_64-w64-mingw32 devshell slightly less
broken. It still needs a bit of massaging to function, but
that's much less cumbersome now that the generic machinery
with genericClosure that evaluates drvPath doesn't barf on
unavailable packages.
The test was failing because nix path-info --json now returns narHash as
a structured dictionary {"algorithm": "sha256", "format": "base64",
"hash": "..."} instead of an SRI string "sha256-...".
This change was introduced in commit 5e7ee808d. The functional test
path-info.sh was updated at that time, but this NixOS test was missed.
The fix converts the dictionary format to SRI format inline:
tarball_hash_sri = f"{narHash_obj['algorithm']}-{narHash_obj['hash']}"
Several bugs to squash:
- Apparently DELETE is an already used macro with Win32. We can avoid it
by using Camel case instead (slightly hacky but also fits the naming
convention better)
- Gets rid of the raw usage of isatty. Added an isTTY impl to abstract over
the raw API.
Replaces the usage of createAtRoot, which goes as far up the
directory tree as possible with rooted variant makeFSSourceAccessor.
The changes in this patch should be safe wrt to not asserting on relative
paths. Arguments passed to makeFSSourceAccessor here should already be using
absolute paths.
without this, there is no way to swap them out for structures using a
different allocator. This should be reverted as part of redesiging
ExprAttrs to use an ExprAttrsBuilder
And also render the docs nicely.
I would like to use a markdown AST for this, but to avoid new deps
(lowdown's AST doesn't suffice) I am just doing crude string
manipulations for now.
Since 918c1a9e58 configurePhase variable points to cmakeConfigurePhase
and runPhase configurePhase does the wrong thing.
configurePhase function on the other hand still worked correctly.
Move StorePath and Derivation declarations to their own headers in a
backwards compatible way:
- Created nix_api_store/store_path.h for StorePath operations
- Created nix_api_store/derivation.h for Derivation operations
- Main nix_api_store.h includes both headers for backwards compatibility
This reorganization improves modularity and hopefully makes the API
easier to navigate.
I found this because of a test failure on cygwin in
nix_api_expr_test.nix_eval_state_lookup_path:
'std::filesystem::__cxx11::filesystem_error'
what(): filesystem error: cannot remove all: Device or resource busy
[...]
[.../my_state/db/db.sqlite]
LocalState was never getting destroyed due to a reference leak. These
_free functions use an 'operator delete' which doesn't call the
destructor for the type.
Fixes: 309d55807c
Currently, --gtest_filter=nix_api_store_test.nix_eval_state_lookup_path
will result in:
terminating due to unexpected unrecoverable internal error: Assertion
'gcInitialised' failed in void nix::assertGCInitialized() at
../src/libexpr/eval-gc.cc:138
Changing the test fixture to _exr_test causes GC to be initialised.
Typically PosixSourceAccessor can be used from multiple threads,
but mtime is not updated atomically (i.e. with compare_exchange_weak),
so mtime gets raced. It's only needed in dumpPathAndGetMtime and mtime
tracking can be gated behind that.
Also start using getLastModified interface instead of dynamic casts.
This makes the output easier to compare with the new machine-generated
lists in #9732.
The hand-curated order did have the advantage of putting more important
attributes at the top, but I don't think it is worth preserving that
when `std::map` is so much easier to work with. The right solution to
leading the reader to the more important attributes is to call them out
in the intro texts.
This makes the proto serializer characterisation test data be
accompanied by JSON data.
This is arguably useful for a reasons:
- The JSON data is human-readable while the binary data is not, so it
provides some indication of what the test data means beyond the C++
literals.
- The JSON data is language-agnostic, and so can be used to quickly rig
up tests for implementation in other languages, without having source
code literals at all (just go back and forth between the JSON and the
binary).
- Even though we have no concrete plans to place the binary protocol 1-1
or with JSON, it is still nice to ensure that the JSON serializers and
binary protocols have (near) equal coverage over data types, to help
ensure we didn't forget a JSON (de)serializer.
Make instances for them that share code with `nix path-info`, but do a
slightly different format without store paths containing store dirs
(matching the other latest JSON formats).
Progress on #13570.
If we depend on the store dir, our JSON serializers/deserializers take
extra arguements, and that interfaces with the likes of various
frameworks for associating these with types (e.g. nlohmann in C++, Serde
in Rust, and Aeson in Haskell).
For now, `nix path-info` still uses the previous format, with store
dirs. We may yet decide to "rip of the band-aid", and just switch it
over, but that is left as a future PR.
The variant has on the left-hand side the topologically sorted vector
and the right-hand side is a pair showing the path and its parent that
represent a cycle in the graph making the sort impossible.
This change prepares for enhanced cycle error messages that can provide
more context about the cycle. The variant approach allows callers to
handle cycles more flexibly, enabling better error reporting that shows
the full cycle path and which files are involved.
Adapted from Lix commit f7871fcb5.
Change-Id: I70a987f470437df8beb3b1cc203ff88701d0aa1b
Co-Authored-By: Maximilian Bosch <maximilian@mbosch.me>
Add support for configuring S3 storage class via the storage-class
parameter for S3BinaryCacheStore. This allows users to optimize costs
by selecting appropriate storage tiers (STANDARD, GLACIER,
INTELLIGENT_TIERING, etc.) based on access patterns.
The storage class is applied via the x-amz-storage-class header for
both regular PUT uploads and multipart upload initiation.
Replace the null-terminated C-style strings in Value with hybrid C /
Pascal strings, where the length is stored in the allocation before the
data, and there is still a null byte at the end for the sake of C
interopt.
Co-Authored-By: Taeer Bar-Yam <taeer@bar-yam.me>
Co-Authored-By: Sergei Zimmerman <sergei@zimmerman.foo>
These steps are done (originally in order, but I squashed it as the end
result is still pretty small, and the churn in the code comments was a
bit annoying to keep straight).
1. Create proper struct type for string contexts on the heap
This will make it easier to change this type in the future.
2. Make `Value::StringWithContext` iterable
This make some for loops a lot more terse.
3. Encapsulate `Value::StringWithContext::Context::elems`
It turns out the iterators we just exposed are sufficient.
4. Make `StringWithContext::Context` length-prefixed instead
Rather than having a null pointer at the end, have a `size_t` at the
beginning. This is the exact same size (note that null pointer is
longer than null byte) and thus takes no more space!
Also, see the new TODO on naming. The thing we already so-named is a
builder type for string contexts, not the on-heap type. The
`fromBuilder` static method reflects what the names ought to be too.
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
Otherwise PosTable grows indefinitely for each reload. Since
the total input size is limited to 4GB (uint32_t for byte offset PosIdx)
it can get exhausted pretty. This ensures that we don't waste memory
on reloads as well.
Update all channel URLs from https://nixos.org/channels/ to
https://channels.nixos.org/ to use the more reliable subdomain.
The nixos.org domain apex lacks IPv6 support due to DNS hoster
limitations. Using the subdomain allows better CDN distribution
and improved reliability.
Updated files:
- Installation scripts (multi-user and tarball installers)
- Channel URL resolution in eval-settings.cc
- Documentation and examples
- Docker image default channel URL
- Release notes (added note about URL change)
Fixes#14517
I've run into this quite a few times when working with characterization test
infra. It would print an invalid command:
_NIX_TEST_ACCEPT=1 meson test main/lang
Which you'd then proceed to run and it would fail. This commit makes it
be honest about the command you need to run:
_NIX_TEST_ACCEPT=1 meson test --suite main lang
I would run `nix flake show` on a flake than hit:
===
├───ihaskell: package 'ihaskell-wrapper'
├───ihaskell-96: package 'ihaskell-wrapper'
├───ihaskell-96-dev: package 'ghc-shell-for-ihaskell-0.10.4.0'
error: expected a derivation
===
and it is not obvious what package is the culprit here since nix stops
rightaway.
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
boost::concurrent_flat_map (used in libutil and libstore) includes the
C++17 <execution> header. GCC's libstdc++ implements parallel algorithms
using Intel TBB as the backend, which creates a link-time dependency on
libtbb even though we don't actually use any parallel algorithms.
Disable the TBB backend for libstdc++ by setting
_GLIBCXX_USE_TBB_PAR_BACKEND=0. This makes parallel algorithms fall back
to serial execution, which is acceptable since we don't use them anyway.
This only affects libstdc++ (GCC's standard library); other standard
libraries like libc++ (LLVM) are unaffected.
`allowedReferences` and friends can, in addition to supporting store
paths (and placeholders, but because those will be rewritten to store
paths), they also support to refering to other outputs in the derivation
by name.
We update the tests in order to cover for that.
(While we are at it, also introduce some scratch variables for paths and
placeholders to make the C++ literalsf for this test more concise.)
Since we haven't released v2 yet (2.32 has v1) we can just update this
in-place and avoid version churn.
Note that as a nice side effect of using the standard `Hash` JSON impl,
we don't neeed this `hashFormat` parameter anymore.
It turns out this code path is only used for unit tests (to ensure our
JSON formats are possible to parse by other code, elsewhere). No
user-facing functionality consumes this format.
Therefore, let's drop the old version parsing support.
We now have functional tests for these. The unit tests added negligible
value while imposing a much higher maintenance cost.
The maintenance cost is high:
- No automatic accept option
- They broke 5+ times during this session due to implementation changes (trace count, ordering)
- They require understanding ANSI escape codes, Uncolored() wrappers, trace reversal
- They test empty traces HintFmt("") from withTrace(pos, "") - pure implementation detail
- They're fragile: adding any trace anywhere breaks the exact count assertions
The additional value over functional tests is minimal:
- Functional tests already verify the error message
- Functional tests already show trace order and content (as users see it, helps review)
- Unit tests verify "exactly 3 traces, not 2 or 4" - but users don't count traces
- Unit tests verify empty traces exist - but users never see them
The white-box testing catches the wrong things:
- It catches "you added helpful context" as a failure
- It doesn't catch "the context is confusing" (which functional tests would show)
- It enforces implementation details that should be allowed to evolve
Show which element(s) are involved at each error point:
- When an element is missing the "key" attribute, show the element
- When an element is not an attribute set, show the element
- When comparing keys fails, show both elements being compared
- When calling operator fails, show which element was being processed
This provides concrete context using ValuePrinter with errorPrintOptions.
Note: errorPrintOptions uses maxDepth=10 by default, which may print
quite deeply nested structures in error messages. This could potentially
be overwhelming, but follows the existing default for error contexts.
The old string format is a holdover from the pre JSON days. It is not
friendly to users who need to get the information out of it.
Also introduce the sort of versioning we have for derivation for this
format too.
- Use canonical content address JSON format for floating content
addressed derivation outputs
This keeps it more consistent.
- Reorganize inputs into nested structure (`inputs.srcs` and
`inputs.drvs`)
This will allow for an easier to use, but less compact, alternative
where `srcs` is just a list of derived paths.
It also allows for other experiments for derivations with a different
input structure, as I suspect will be needed for secure build traces.
This was already dropped in `inputFromURL()`, but not in
`inputFromAttrs()`. Now it's done in `fixGitURL()`, which is used by
both.
In principle, `git+` shouldn't be used in the `url` attribute, since
we already know that it's a Git URL. But since it currently works, we
don't want to break it.
Fixes#14429.
Have one to that instead of one to `Derivation`. `DerivationBuilder`
doesn't need `inputDrvs`, so `BasicDerivation` suffices.
(In fact, it doesn't need `inputSrcs` either, but we don't yet hve a
type to exclude that.)
We were calling git with `--quiet` in order not to mess up Nix's
progress bar. However, `runProgram()` already suspends the progress
bar (since git may be interactive) so that's no longer an issue. So we
can just run with `--progress` instead.
Fix#14480
This method is not well-defined for arbitrary stores, which do not have
a notion of a "real path" -- it is only well-defined for local file
systems stores, which do have exactly that notion, and so it is moved to
that sub-interface instead.
Some call-sites had to be fixed up for this, but in all cases the
changes are positive. Using `getFSSourceAccessor` allows for more other
stores to work properly. `nix-channel` was straight-up wrong in the case
of redirected local stores. And the building logic with remote building
and a non-local store is also fixed, properly gating some deletions on
store type.
Co-authored-by: Robert Hensing <robert@roberthensing.nl>
The assumption that no unknown paths can be returned is incorrect. It
can happen if a derivation has outputs that are substitutable, but
that have references that cannot be substituted (i.e. an incomplete
closure in the binary cache). This can easily happen with
magic-nix-cache.
Previously, only shared memory segments were cleaned up.
This could lead to leaked message queues and semaphore sets when builds use System V IPC, exhausting kernel IPC limits over time.
This commit extends the cleanup to all three System V IPC types:
1. Shared memory segments
2. Message queues
3. Semaphores
Additionally, we stop removing IPC objects during iteration, as it could corrupt the kernel's iterator state and cause some objects to be skipped. The new implementation uses a two-pass approach where we list first and then remove them in a separate pass.
The IPC IDs are now extracted during iteration using actual system calls (shmget, msgget, semget) rather than being looked up later, ensuring the objects exist when we capture their IDs.
In Linux, IPC objects are automatically cleaned up when the IPC namespace is destroyed.
On Darwin, since there are no IPC namespaces, the IPC objects may sometimes persist after the build user's processes are killed.
This patch modifies the cleanup logic to use sysctl calls to identify and remove left over shm segments associated with the build user.
Fixes: #12548
For repos with a lot of non-linearity in the commit graph (like
Nixpkgs), this speeds up getting the revcount a lot, e.g. `nix flake
metadata /path/to/nixpkgs?rev=9dc7035bbee85ffc740d893e02cb64460f11989f` went
from 9.1s to 3.7s.
Warning:
```
[39/483] Generating src/kaitai-struct-checks/kaitai-generated-sources with a custom command
../src/kaitai-struct-checks/nar.ksy: /types/padded_str/seq/1/encoding:
warning: use canonical encoding name `ASCII` instead of `ascii` (see https://doc.kaitai.io/ksy_style_guide.html#encoding-name)
```
This will allow us to more accurately test dropping support for
dependent realisations, by separating the tests that should not change
from the tests that should.
I do that change in PR #14247, but even if for some reasons we don't end
up doing this soon, I think it is still good to separate the test data
this way so we have the option of doing that at some point.
Progress on #13405, which asks for an explicit characterisation of the
equivalence relation like the one given here.
Also progress on #11895, because we're using the term "build trace
entry" instead of "realisation".
Mention #9259, a future work item.
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
It is better to avoid null termination for performance and memory
safety, wherever possible.
These are good cleanups extracted from the Pascal String work that we
can land by themselves first, shrinking the diff in that PR.
Co-Authored-By: Aspen Smith <root@gws.fyi>
Co-Authored-By: Sergei Zimmerman <sergei@zimmerman.foo>
Add three configuration settings to `S3BinaryCacheStoreConfig` to control
multipart upload behavior:
- `bool multipart-upload` (default `false`): Enable/disable multipart uploads
- `uint64_t multipart-chunk-size` (default 5 MiB): Size of each upload part
- `uint64_t multipart-threshold` (default 100 MiB): Minimum file size for multipart
The feature is disabled by default.
The rule of silence can be a little surprising. As a compromise to
changing the default behavior, this adds printing a success message in
verbose mode, where we don't really have a reason to be silent about
our success.
Refactor `ExprConcatStrings::eval` by inlining two only-called-once
closures into the call-site, so that the code is easier to reason about
locally (especially since the variables that were closed over were
mutated all over the place within this function).
Also use curly braces with each branch for consistency in the the
resulting code.
This is a pure refactor, but also arguably causes us to depend less on
the optimizer; now, we don't have to make sure that this closure is
inlined.
3a3c062982 introduced a buffer overflow for the
case when there are more than 65535 formal arguments. It is a perfectly reasonable
limitation, but we *must* not crash, corrupt memory or otherwise crash the process.
Add a test for the graceful behavior and switch to using an explicit uninitialized_copy_n
to further guard against buffer overflows.
Stop delegating to `HttpBinaryCacheStore::upsertFile` and instead
handle compression in the S3 store's `upsertFile` override, then call
our own `upload()` method. This separation is necessary for future
multipart upload support.
Introduce protected `upload` method overloads in `HttpBinaryCacheStore`
that handle the actual upload after compression has been applied. This
separates compression concerns (in `upsertFile`) from upload mechanics
(in `upload`).
Two overloads are provided:
1. `upload(path, RestartableSource &, sizeHint, mimeType, contentEncoding)`
2. `upload(path, CompressedSource &, mimeType)`
Introduce a `CompressedSource` class in libutil's `serialise.hh` that
compresses a `RestartableSource` and owns the compressed data. This is a
general-purpose utility that can be used anywhere compressed data needs
to be treated as a source.
We were getting this flex lexer warning during build:
```
../src/libexpr/lexer.l:333: warning, -s option given but default rule can be matched
```
The lexer uses `%option nodefault` but the `PATH_START` state only had
rules for specific patterns (`PATH_SEG` and `HPATH_START`) without a
catch-all rule to handle unexpected input.
Added a catch-all rule with `unreachable()`. This code path should never
be reached in normal operation since `PATH_START` is only entered after
matching `PATH_SEG` or `HPATH_START`, and we immediately rewind to
re-parse those same patterns. The catch-all exists solely to satisfy
flex's `%option nodefault` requirement.
Make uploads run in constant memory. Also change the callbacks to be
noexcept, since we really don't want to be unwinding the stack in the
curl thread. That will definitely corrupt that stack and make nix/curl
crash in very bad ways.
Fix a race condition where interrupting a download (via Ctrl-C) during a
retry attempt could cause a crash. When `enqueueItem()` throws because the
download thread is shutting down, the exception would propagate without
setting `done=true`, causing the `TransferItem` destructor to invoke the
callback a second time.
This triggered an assertion failure in `Callback::rethrow()` with:
`Assertion '!prev' failed` and the error message `cannot enqueue download
request because the download thread is shutting down`.
The fix catches the exception from `enqueueItem()` and calls `fail()` to
properly complete the transfer, ensuring the callback is invoked exactly
once.
Some zsh setups (including mine) do not load the
completion if `#compdef` is not on the first line.
So we move the `# shellcheck` comment to the
second line to avoid this issue.
This continues the work for formalizing our current JSON docs. Note that
in the process, a few bugs were caught:
- `closureSize` was repeated twice, forgot `closureDownloadSize`
- `file*` fields should be `download*`. They are in fact called that in
the line-oriented `.narinfo` file, but were renamed in the JSON
format.
We immediately use this in the JSON schemas for Derivation and Deriving
Path, but we cannot yet use it in Store Object Info because those paths
*do* include the store dir currently.
- Uses the more explicit `@ingroup` most of the time, to avoid problems
with nested groups, and to make group membership more explicit.
The division into headers is not great for documentation purposes,
so this helps.
- More attention for memory management details
- Various other improvements to doc comments
Per #7591, the `nix-store --gc --print-dead` command does not provide
any feedback about the amount of disk space that is used by dead store
paths. It looks like this has been the case since 7ab68961e (* Garbage
collector: added an option `--use-atime' to delete paths in...,
2008-09-17).
Update the nix-store documentation to remove the claim that this is
function that `nix-store --gc --print-dead` performs.
Implement `uploadPart()` for uploading individual parts in S3 multipart
uploads:
- Constructs URL with `?partNumber=N&uploadId=ID` query parameters
- Uploads chunk data with `application/octet-stream` mime type
- Extracts and returns `ETag` from response
Add concurrency group configuration to the CI workflow to automatically
cancel outdated runs when a PR receives new commits or is force-pushed.
This prevents wasting CI resources on superseded code.
This is a good default (the methods that allow for an arbitrary choice
of source accessor are generally preferable both to implement and to
use). And it also pays its way by allowing us to delete *both* the
`DummyStore` and `LocalStore` implementations.
Introduces `scanForReferencesDeep` to provide per-file granularity when
scanning for store path references, enabling better diagnostics for
cycle detection and `nix why-depends --precise`.
Implement `abortMultipartUpload()` for cleaning up incomplete multipart
uploads on error:
- Constructs URL with `?uploadId=ID` query parameter
- Issues `DELETE` request to abort the multipart upload
With #14314, in some places in the parser we started using C++ objects
directly rather than pointers. In those places lines like `$$ = $1` now
imply a copy when we don't need one. This commit changes those to `$$ =
std::move($1)` to avoid those copies.
Previously it used the `ThreadPool` default,
i.e. `std::thread::hardware_concurrency()`. But copying signatures is
not primarily CPU-bound so it makes more sense to use the
`http-connections` setting (since we're typically copying from/to a
binary cache).
The `showBytes()` function was redundant with `renderSize()` as the
latter automatically selects the appropriate unit (KiB, MiB, GiB, etc.)
based on the value, whereas `showBytes()` always formatted as MiB
regardless of size.
Co-authored-by: Bernardo Meurer Costa <beme@anthropic.com>
Instead of iterating over the newly built bindings we can
do a cheaper set_intersection to count duplicates or fall back
to a per-element binary search over the "base" bindings.
This speeds up `hello` evaluation by around 10ms (0.196s -> 0.187s) and
`nixos.closures.ec2.x86_64-linux` by 140ms (2.744s -> 2.609s).
This addresses a somewhat steep performance regression from 82315c3807
that reduced memory requirements of attribute set merges. With this patch
we get back around to 2.31 level of eval performance while keeping the memory
usage optimization.
Also document the optimization a bit more.
In particular
- Remove `get`, it is redundant with `valueAt` and the `get` in
`util.hh`.
- Remove `nullableValueAt`. It is morally just the function composition
`getNullable . valueAt`, not an orthogonal combinator like the others.
- `optionalValueAt` return a pointer, not `std::optional`. This also
expresses optionality, but without creating a needless copy. This
brings it in line with the other combinators which also return
references.
- Delete `valueAt` and `optionalValueAt` taking the map by value, as we
did for `get` in 408c09a120, which
prevents bugs / unnecessary copies.
`adl_serializer<DerivationOptions::OutputChecks>::from_json` was the one
use of `getNullable`. I give it a little static function for the
ultimate creation of a `std::optional` it does need to do (after
switching it to using `getNullable . valueAt`. That could go in
`json-utils.hh` eventually, but I didn't bother for now since only one
things needs it.
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
S3 buckets support object versioning to prevent unexpected changes,
but Nix previously lacked the ability to fetch specific versions of
S3 objects. This adds support for a `versionId` query parameter in S3
URLs, enabling users to pin to specific object versions:
```
s3://bucket/key?region=us-east-1&versionId=abc123
```
This has already been implemented in 1e709554d5
as a side-effect of mounting the accessors in storeFS. Let's test this so it
doesn't regress.
(cherry-picked from https://github.com/NixOS/nix/pull/12915)
Move HttpBinaryCacheStore class from .cc file to header to enable
inheritance by S3BinaryCacheStore. Create S3BinaryCacheStore class that
overrides upsertFile() to implement multipart upload logic.
Add a sizeHint parameter to BinaryCacheStore::upsertFile() to enable
size-based upload decisions in implementations. This lays the groundwork
for reintroducing S3 multipart upload support.
Add support for HTTP DELETE requests to FileTransfer infrastructure:
This enables S3 multipart upload abort functionality via DELETE requests
to S3 endpoints.
This reverts commit 90d1ff4805.
The initial issue with EPIPE was solved in 9f680874c5.
Now this patch does move bad than good by eating up boost::io::format_error that are
bugs.
addToStore(): Don't parse the NAR
* StringSource: Implement skip()
This is slightly faster than doing a read() into a buffer just to
discard the data.
* LocalStore::addToStore(): Skip unnecessary NARs rather than parsing them
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
A few changes had cropped up with `_NIX_TEST_ACCEPT=1`:
1. Blake hashing test JSON had a different indentation
2. Store URI had improper non-quoted spaces
(1) was is just fixed, as we trust nlohmann JSON to parse JSON
correctly, regardless of whitespace.
For (2), the existing URL was made a read-only test, since we very much
wish to continue parsing such invalid URLs directly. And then the
original read/write test was updated to properly percent-encode the
space, as the normal form should be.
Since 2.32, nix now needs boost 1.87 or later to build,
due to using unordered::concurrent_flat_map try_emplace_and_cvisit
../src/libexpr/eval.cc: In member function ‘void nix::EvalState::evalFile(const nix::SourcePath&, nix::Value&, bool)’:
../src/libexpr/eval.cc:1096:20: error: ‘class boost::unordered::concurrent_flat_map<nix::SourcePath, nix::Value*, std::hash<nix::SourcePath>, std::equal_to<nix::SourcePath>, traceable_allocator<std::pair<const nix::SourcePath, nix::Value*> > >’ has no member named ‘try_emplace_and_cvisit’; did you mean ‘try_emplace_or_cvisit’?
1096 | fileEvalCache->try_emplace_and_cvisit(
| ^~~~~~~~~~~~~~~~~~~~~~
| try_emplace_or_cvisit
See 834580b539
The s3:ListBucket permission is required for read operations on S3
binary caches, not just for writes. Without this permission, users get
"Access Denied" errors when running nix-build.
Extract the path-based compression method determination logic into a
protected method that returns std::optional<std::string>. This allows
subclasses to reuse the logic and makes the semantics clearer (nullopt
means no compression, not empty string).
This prepares for S3BinaryCacheStore to apply the same compression
rules when implementing multipart uploads.
Fix POST requests with data to use the correct curl option for specifying
body size. Previously used CURLOPT_INFILESIZE_LARGE for both POST and PUT,
but POST requires CURLOPT_POSTFIELDSIZE_LARGE.
This caused POST request bodies to not be sent correctly, manifesting as
S3 multipart CompleteMultipartUpload requests failing with "You must
specify at least one part" even though the XML body contained valid parts.
When Nix's SQLite narinfo cache indicates a NAR exists, but the NAR
has been garbage collected from the binary cache, Nix displays error
messages even though the operation succeeds via fallback. This is
misleading because the cached narinfo is simply outdated.
This changes SubstituteGone exceptions to produce warnings instead of
errors, accurately reflecting that this is an expected cache coherency
issue, not an actual failure.
Fixes#11411🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
At least one user has probably used `file+git://` when they mean `git+file://`, maybe thinking of it as "a file-based git repository". This adds a specific error message to hint at the correct URL scheme format and may save some users from resorting to `path:///` and copying an entire repo.
Adds a comprehensive test to verify that `nix-prefetch-url` correctly
handles S3 URLs with query parameters (e.g., custom endpoints and regions).
Previously, nix-prefetch-url would fail with "invalid store
path" errors when given S3 URLs with query parameters like
`?endpoint=http://server:9000®ion=eu-west-1`, because it incorrectly
extracted the filename from the query parameters instead of the path.
Previously, `prefetchFile()` used `baseNameOf()` directly on the URL string
to extract the filename. This caused issues with URLs containing query
parameters that include slashes, such as S3 URLs with custom endpoints:
```
s3://bucket/file.txt?endpoint=http://server:9000
```
The `baseNameOf()` function naively searches for the rightmost `/` in the
entire string, which would find the `/` in `http://server:9000` and extract
`server:9000®ion=...` as the filename. This resulted in invalid store
path names containing illegal characters like `:`.
This commit fixes the issue by:
1. Adding a `VerbatimURL::lastPathSegment()` method that extracts the last
non-empty path segment from a URL, using `pathSegments(true)` to filter
empty segments
2. Changing `prefetchFile()` to accept `const VerbatimURL &` and use the new
`lastPathSegment()` method instead of manual path parsing
3. Adding early validation with `checkName()` to fail quickly on invalid
filenames
4. Maintains backward compatibility by falling back to `baseNameOf()` for
unparsable `VerbatimURL`s
Old code would do very much incorrect reentrancy crimes (trying to do an
erase inside the emplace callback). This would fail miserably with an assertion
in Boost:
terminating due to unexpected unrecoverable internal error: Assertion '(!find(px))&&("reentrancy not allowed")' failed in boost::unordered::detail::foa::entry_trace::entry_trace(const void *) at include/boost/unordered/detail/foa/reentrancy_check.hpp:33
This is trivially reproduced by using any S3 URL with a non-empty profile:
nix-prefetch-url "s3://happy/crash?profile=default"
The previous message was vague about what "deprecated" meant and why
unlocked inputs with NAR hashes "may not be reproducible". It also
used "verifiable" which was confusing.
The new message makes it clear that the NAR hash provides verification
(is checked by NAR hash) and explicitly states the failure modes:
garbage collection and sharing.
Add `test_public_bucket_operations` to validate that store operations
work correctly on public S3 buckets without requiring credentials.
Tests nix store info and nix copy operations.
Add cleanup of client store in the finally block of setup_s3 decorator.
Uses `nix store delete --ignore-liveness` to properly handle GC roots
and only attempts deletion if the path exists.
This slightly improves the logs situation by including the region/profile/endpoint
in the logs when S3 store references get printed. Instead of:
copying path '/nix/store/lxnp9cs4cfh2g9r2bs4z7gwwz9kdj2r9-test-package-c' to 's3://bucketname'...
This now includes:
copying path '/nix/store/lxnp9cs4cfh2g9r2bs4z7gwwz9kdj2r9-test-package-c' to 's3://bucketname?endpoint=http://server:9000®ion=eu-west-1'...
Nix attempts to set the stack size to 64 MB during initialization, which is
required for the repl tests to run successfully. Skip the tests on systems
where the hard stack limit is less than this value rather than failing.
We now unconditionally compile support for s3:// URLs and stores
without authentication. The whole curl version check can be greatly
simplified by the previous commit, which bumps the minimum required curl
version.
This version has been released a long time ago in 2021 and it's doubtful
that anybody actually uses it still, since it's full of vulnerabilities [^]
[^]: https://curl.se/docs/vuln-7.75.0.html
I realized that we can actually do this thing, even though it is not
what nlohmann expects at all, because the extra parameter has a default
argument so nlohmann doesn't need to care. Sneaky!
Since 3c610df550 this resulted in `getting status of`
errors on paths inside the chroot if a path was already valid. Careful inspection
of the logic shows that if buildMode != bmCheck actualPath gets reassigned to
store.toRealPath(finalDestPath). The only branch that cares about actualPath is
the buildMode == bmCheck case, which doesn't lead to optimisePath anyway.
Instead of the cryptic:
> error: Failed to resolve AWS credentials: error code 6153`
We now get more legible:
> error: AWS authentication error: 'Valid credentials could not be sourced by the IMDS provider' (6153)
This makes it so we don't need to rely on global variables and hacky destructors to
clean up another global variable. Just putting it in the correct order in the class
is more than enough.
This partially reverts commit 5e46df973f,
partially reversing changes made to
8c789db05b.
We do this because Hydra, while using the newer version of the protocol,
still uses this command, even though Nix (as a client) doesn't use it.
On that basis, we don't want to remove it (or consider it only part of
the older versions of the protocol) until Hydra no longer uses the
Legacy SSH Protocol.
This is necessary to fix nix-everything-llvm.
The problem here is that nix-cli is taken from the previous
stage that is built with libstdc++, but this derivation builds
plugins with libc++ and the plugin load fails miserably.
Realisations are conceptually key-value pairs, mapping `DrvOutputs` (the
key) to information about that derivation output.
This separate the value type, which will be useful in maps, etc., where
we don't want to denormalize by including the key twice.
This matches similar changes for existing types:
| keyed | unkeyed |
|--------------------|------------------------|
| `ValidPathInfo` | `UnkeyedValidPathInfo` |
| `KeyedBuildResult` | `BuildResult` |
| `Realisation` | `UnkeyedRealisation` |
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
Turns out there's a much better API for this that doesn't have the
footguns of the previous method.
isLegalRefName is somewhat of a misnomer, since it's mainly used to
validate user inputs that can be either references, branch names,
psedorefs or tags.
The macro now accurately reflects its purpose: gating only AWS
authentication code, not all S3 functionality. S3 URL parsing, store
configuration, and public bucket access work regardless of this flag.
This rename clarifies that:
- S3 support is always available (URL parsing, store registration)
- Only AWS credential resolution requires the flag
- The flag controls AWS CRT SDK dependency, not S3 protocol support
Move S3 URL parsing, store configuration, and public bucket support
outside of NIX_WITH_S3_SUPPORT guards. Only AWS credential resolution
remains gated, allowing builds with withAWS = false to:
- Parse s3:// URLs
- Register S3 store types
- Access public S3 buckets (via HTTPS conversion)
- Use S3-compatible services without authentication
The setupForS3() function now always performs URL conversion, with
authentication code conditionally compiled based on NIX_WITH_S3_SUPPORT.
The aws-creds.cc file (only code using AWS CRT SDK) is now conditionally
compiled by meson.
This commit replaces the AWS C++ SDK with a lighter curl-based approach
for S3 binary cache operations.
- Removed dependency on the heavy aws-cpp-sdk-s3 and aws-cpp-sdk-transfer
- Added lightweight aws-crt-cpp for credential resolution only
- Leverages curl's native AWS SigV4 authentication (requires curl >= 7.75.0)
- S3BinaryCacheStore now delegates to HttpBinaryCacheStore
- Function s3ToHttpsUrl converts ParsedS3URL to ParsedURL
- Multipart uploads are no longer supported (may be reimplemented later)
- Build now requires curl >= 7.75.0 for AWS SigV4 support
Fixes: #13084, #12671, #11748, #12403, #5947
This forces the code to go through proper abstractions instead of the raw filesystem
API.
This issue is evident from this reproducer:
nix eval --expr 'builtins.fetchurl { url = "https://example.com"; sha256 = ""; }' --json --eval-store "dummy://?read-only=false"
error:
… while calling the 'fetchurl' builtin
at «string»:1:1:
1| builtins.fetchurl { url = "https://example.com"; sha256 = ""; }
| ^
error: opening file '/nix/store/r4f87yrl98f2m6v9z8ai2rbg4qwlcakq-example.com': No such file or directory
We only care about the accessor for a single store object anyway, but
the validity gets ignored. Also `pathExists(store.printStorePath(path))`
is definitely incorrect since it confuses the logical location vs physical
location in case of a chroot store.
This is a simple wrapper around getFSAccessor that throws an InvalidPath
error. This simplifies usage in callsites that only care about getting
a non-null accessor.
Wrap fmt() calls in lambdas to defer string formatting until the
feature check fails. This avoids unnecessary string formatting in
the common case where the feature is enabled.
Addresses performance concern raised by xokdvium in PR review.
This, alongside the other invariants of the CanonPath is important
to uphold. std::filesystem happily crashes on NUL bytes in the constructor,
as we've seen with `path:%00` prior to c436b7a32a.
Best to stay clear of NUL bytes when we're talking about syscalls, especially
on Unix where strings are null terminated.
Very nice to have if we decide to switch over to pascal-style strings.
The refactor in the last commit fixed the bug it was supposed to fix,
but introduced a new bug in that sometimes we tried to write a resolved
derivation to a store before all its `inputSrcs` were in that store.
The solution is to defer writing the derivation until inside
`DerivationBuildingGoal`, just before we do an actual build. At this
point, we are sure that all inputs in are the store.
This does have the side effect of meaning we don't write down the
resolved derivation in the substituting case, only the building case,
but I think that is actually fine. The store that actually does the
building should make a record of what it built by storing the resolved
derivation. Other stores that just substitute from that store don't
necessary want that derivation however. They can trust the substituter
to keep the record around, or baring that, they can attempt to re
resolve everything, if they need to be audited.
(cherry picked from commit c97b050a6c)
Resolve the derivation before creating a building goal, in a context
where we know what output(s) we want. That way we have a chance just to
download the outputs we want.
Fix#13247
(cherry picked from commit 39f6fd9b46)
Store the reason string as a field in the exception class rather than
only embedding it in the error message. This supports better structured
error handling and future JSON error reporting.
Suggested by Ericson2314 in PR review.
std::regex is a really bad tool for parsing things, since
it tends to overflow the stack pretty badly. See the build failure
under ASan in [^].
[^]: https://hydra.nixos.org/build/310077167/nixlog/5
CURL is not very strict about validation of URLs passed to it. We
should reflect this in our handling of URLs that we get from the user
in <nix/fetchurl.nix> or builtins.fetchurl. ValidURL was an attempt to
rectify this, but it turned out to be too strict. The only good way to
resolve this is to pass (in some cases) the user-provided string verbatim
to CURL. Other usages in libfetchers still benefit from using structured
ParsedURL and validation though.
nix store prefetch-file --name foo 'https://cdn.skypack.dev/big.js@^5.2.2'
error: 'https://cdn.skypack.dev/big.js@^5.2.2' is not a valid URL: leftover
Add support for pre-resolving AWS credentials in the parent process
before forking for builtin:fetchurl. This avoids recreating credential
providers in the forked child process.
The previous implementation had a check-then-create race condition where
multiple threads could simultaneously:
1. Check the cache and find no provider (line 122)
2. Create their own providers (lines 126-145)
3. Insert into cache (line 161)
This resulted in multiple credential providers being created when
downloading multiple packages in parallel, as each .narinfo download
would trigger provider creation on its own thread.
Fix by using boost::concurrent_flat_map's try_emplace_and_cvisit, which
provides atomic get-or-create semantics:
- f1 callback: Called atomically during insertion, creates the provider
- f2 callback: Called if key exists, returns cached provider
- Other threads are blocked during f1, so no nullptr is ever visible
This will reduce the load on hydra. It doesn't make sense to
build 2 slightly different variations where the difference
is only in the nix-perl-bindings and additional sanitizers.
There's some unfortunate ODR violations that get dianosed with GCC but not Clang
for static inline constexpr variables defined inside the class body:
template<typename T>
struct static_const
{
static JSON_INLINE_VARIABLE constexpr T value{};
};
This can be ignored pretty much. There is the same problem for std::piecewise_construct:
http://lists.boost.org/Archives/boost/2007/06/123353.php
==2455704==ERROR: AddressSanitizer: odr-violation (0x7efddc460e20):
[1] size=1 'value' /nix/store/235hvgzcbl06fxy53515q8sr6lljvf68-nlohmann_json-3.11.3/include/nlohmann/detail/meta/cpp_future.hpp:156:45 in /nix/store/pkmljfq97a83dbanr0n64zbm8cyhna33-nix-store-2.33.0pre/lib/libnixstore.so.2.33.0
[2] size=1 'value' /nix/store/235hvgzcbl06fxy53515q8sr6lljvf68-nlohmann_json-3.11.3/include/nlohmann/detail/meta/cpp_future.hpp:156:45 in /nix/store/gbjpkjj0g8vk20fzlyrwj491gwp6g1qw-nix-util-2.33.0pre/lib/libnixutil.so.2.33.0
Instead of specifying env variables all the time
we can instead embed the __asan_default_options symbol
in all executables / shared objects. This reduces code
duplication.
This change overrides __assert_fail on glibc/musl
to instead call std::terminate that we have a custom
handler for. This ensures that we have more context
to diagnose issues encountered by users in the wild.
This commit adds two key fixes to http-binary-cache-store.cc to
properly support the new curl-based S3 implementation:
1. **Consistent cache key handling**: Use `getReference().render(withParams=false)`
for disk cache keys instead of `cacheUri.to_string()`. This ensures cache
keys are consistent with the S3 implementation and don't include query
parameters, which matches the behavior expected by Store::queryPathInfo()
lookups.
2. **S3 query parameter preservation**: When generating file transfer requests
for S3 URLs, preserve query parameters from the base URL (region, endpoint,
etc.) when the relative path doesn't have its own query parameters. This
ensures S3-specific configuration is propagated to all requests.
I want to separate "policy" from "mechanism".
Now the logic to decide how to build (a policy choice, though with some
hard constraints) is all in derivation building goal, and all in the
same spot. build hook, external builder, or local builder --- the choice
between all three is made in the same spot --- pure policy.
Now, if you want to use the external deriation builder, you simply
provide the `ExternalBuilder` you wish to use, and there is no
additional checking --- pure mechanism. It is the responsibility of the
caller to choose an external builder that works for the derivation in
question.
Also, `checkSystem()` was the only thing throwing `BuildError` from
`startBuilder`. Now that that is gone, we can now remove the
`try...catch` around that.
Add a new S3BinaryCacheStore implementation that inherits from
HttpBinaryCacheStore.
The implementation is activated with NIX_WITH_CURL_S3, keeping the
existing NIX_WITH_S3_SUPPORT (AWS SDK) implementation unchanged.
This code had several issues:
1. Not going through the SourceAccessor means that we can only work
with physical paths.
2. It did not actually check that the file exists. (std::ifstream does not check
it by default).
Most of the eval cache logic is flake-independent and libexpr,
but the loading part is not.
`nix-flake` is the right component for this, as the eval cache
isn't exactly specific to the command line.
we have now merge queues for maintainance branches. We still build it
for master to have our installer beeing updated. In future this part
could go in new workflow instead.
This barfed with
error: [json.exception.type_error.302] type must be string, but is array
on `nix build github:malt3/bazel-env#bazel-env` because it has a `exportReferencesGraph` with a value like `["string",...["string"]]`.
Add a `UsernameAuth` struct and optional `usernameAuth` field to
`FileTransferRequest` to support programmatic username/password
authentication.
This uses curl's `CURLOPT_USERNAME`/`CURLOPT_PASSWORD` options, which
works with multiple protocols (HTTP, FTP, etc.) and is not specific to
any particular authentication scheme.
The primary motivation is to enable S3 authentication refactoring where
AWS credentials (access key ID and secret access key) can be passed
through this general-purpose mechanism, reducing the amount of
S3-specific code behind `#if NIX_WITH_CURL_S3` guards.
This breaks gdb pretty-printers inserted into .debug_gdb_scripts section,
because it implies --compress-debug-sections=zlib, -Wa,--compress-debug-sections.
This is very unfortunate, because then gdb can't use pretty printers for
Boost.Unordered (which are very useful, since boost::unoredred_flat_map is
impossible to debug). This seems perfectly fine to disable in the dev-shell for
the time being.
See [1-3] for further references.
With this change I'm able to use boost's pretty-printers out-of-the box:
```
p *importResolutionCache
$2 = boost::concurrent_flat_map with 1 elements = {[{accessor = {p = std::shared_ptr<nix::SourceAccessor> (use count 5, weak count 1) = {
get() = 0x555555d830a8}}, path = {static root = {static root = <same as static member of an already seen type>, path = "/"},
path = "/derivation-internal.nix"}}] = {accessor = {p = std::shared_ptr<nix::SourceAccessor> (use count 5, weak count 1) = {
get() = 0x555555d830a8}}, path = {static root = {static root = <same as static member of an already seen type>, path = "/"},
path = "/derivation-internal.nix"}}}
```
When combined with a simple `add-auto-load-safe-path ~/code` in .gdbinit
[1]: https://gerrit.lix.systems/c/lix/+/3880
[2]: https://git.lix.systems/lix-project/lix/issues/1003
[3]: https://sourceware.org/pipermail/gdb-patches/2025-October/221398.html
Until these repos are potentially merged, this is good for dogfooding
alongside the experimental installer. It also uses the more official
`artifacts.nixos.org` endpoint to install stable releases now
More immediately though, we need a patch for the experimental installer
to really work in CI at all, and that hasn't landed in a tag yet. So,
this lets us use it right from `main`!
synopsis: Channel URLs migrated to channels.nixos.org subdomain
prs: [14518]
issues: [14517]
---
Channel URLs have been updated from `https://nixos.org/channels/` to `https://channels.nixos.org/` throughout Nix.
The subdomain provides better reliability with IPv6 support and improved CDN distribution. The old domain apex (`nixos.org/channels/`) currently redirects to the new location but may be deprecated in the future.
synopsis: "S3 binary cache stores now support storage class configuration"
prs: [14464]
issues: [7015]
---
S3 binary cache stores now support configuring the storage class for uploaded objects via the `storage-class` parameter. This allows users to optimize costs by selecting appropriate storage tiers based on access patterns.
The storage class applies to both regular uploads and multipart uploads. When not specified, objects use the bucket's default storage class.
See the [S3 storage classes documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html) for available storage classes and their characteristics.
Channels are a mechanism for referencing remote Nix expressions and conveniently retrieving their latest version.
The moving parts of channels are:
- The official channels listed at <https://nixos.org/channels>
- The official channels listed at <https://channels.nixos.org>
- The user-specific list of [subscribed channels](#subscribed-channels)
- The [downloaded channel contents](#channels)
- The [Nix expression search path](@docroot@/command-ref/conf-file.md#conf-nix-path), set with the [`-I` option](#opt-i) or the [`NIX_PATH` environment variable](#env-NIX_PATH)
- The [Nix expression search path](@docroot@/command-ref/conf-file.md#conf-nix-path), set with the [`-I` option](#opt-I) or the [`NIX_PATH` environment variable](#env-NIX_PATH)
> **Note**
>
@@ -88,9 +88,9 @@ This command has the following operations:
Subscribe to the Nixpkgs channel and run `hello` from the GNU Hello package:
To build all dependencies and start a shell in which all environment variables are set up so that those dependencies can be found:
@@ -256,7 +256,7 @@ You can use any of the other supported environments in place of `nix-cli-ccacheS
## Editor integration
The `clangd` LSP server is installed by default on the `clang`-based `devShell`s.
See [supported compilation environments](#compilation-environments) and instructions how to set up a shell [with flakes](#nix-with-flakes) or in [classic Nix](#classic-nix).
See [supported compilation environments](#compilation-environments) and instructions how to set up a shell [with flakes](#building-nix-with-flakes) or in [classic Nix](#building-nix).
To use the LSP with your editor, you will want a `compile_commands.json` file telling `clangd` how we are compiling the code.
Meson's configure always produces this inside the build directory.
[An experimental feature](#@docroot@/development/experimental-features.md#xp-feature-impure-derivations) that allows derivations to be explicitly marked as impure,
[An experimental feature](@docroot@/development/experimental-features.md#xp-feature-impure-derivations) that allows derivations to be explicitly marked as impure,
so that they are always rebuilt, and their outputs not reused by subsequent calls to realise them.
- [Nix database]{#gloss-nix-database}
@@ -279,7 +279,7 @@
See [References](@docroot@/store/store-object.md#references) for details.
- [referrer]{#gloss-reference}
- [referrer]{#gloss-referrer}
A reversed edge from one [store object] to another.
@@ -367,8 +367,8 @@
Nix represents files as [file system objects][file system object], and how they belong together is encoded as [references][reference] between [store objects][store object] that contain these file system objects.
The [Nix language] allows denoting packages in terms of [attribute sets](@docroot@/language/types.md#attribute-set) containing:
- attributes that refer to the files of a package, typically in the form of [derivation outputs](#output),
The [Nix language] allows denoting packages in terms of [attribute sets](@docroot@/language/types.md#type-attrs) containing:
- attributes that refer to the files of a package, typically in the form of [derivation outputs](#gloss-output),
- attributes with metadata, such as information about how the package is supposed to be used.
The exact shape of these attribute sets is up to convention.
@@ -333,7 +333,7 @@ Here is more information on the `output*` attributes, and what values they may b
`outputHashAlgo` can only be `null` when `outputHash` follows the SRI format, because in that case the choice of hash algorithm is determined by `outputHash`.
When defining an [attribute set](./types.md#attribute-set) or in a [let-expression](#let-expressions) it is often convenient to copy variables from the surrounding lexical scope (e.g., when you want to propagate attributes).
When defining an [attribute set](./types.md#type-attrs) or in a [let-expression](#let-expressions) it is often convenient to copy variables from the surrounding lexical scope (e.g., when you want to propagate attributes).
This can be shortened using the `inherit` keyword.
For historical reasons, [store derivations][store derivation] are stored on-disk in [ATerm](https://homepages.cwi.nl/~daybuild/daily-books/technology/aterm-guide/aterm-guide.html) format.
For historical reasons, [store derivations][store derivation] are stored on-disk in "Annotated Term" (ATerm) format
This is used when calculating the store paths of the derivation's outputs.
*`version`:
Must be `3`.
This is a guard that allows us to continue evolving this format.
The choice of `3` is fairly arbitrary, but corresponds to this informal version:
- Version 0: A-Term format
- Version 1: Original JSON format, with ugly `"r:sha256"` inherited from A-Term format.
- Version 2: Separate `method` and `hashAlgo` fields in output specs
- Verison 3: Drop store dir from store paths, just include base name.
Note that while this format is experimental, the maintenance of versions is best-effort, and not promised to identify every change.
*`outputs`:
Information about the output paths of the derivation.
This is a JSON object with one member per output, where the key is the output name and the value is a JSON object with these fields:
*`path`:
The output path, if it is known in advanced.
Otherwise, `null`.
*`method`:
For an output which will be [content addressed], a string representing the [method](@docroot@/store/store-object/content-address.md) of content addressing that is chosen.
This schema describes the JSON representation of Nix's `BuildResult` type, which represents the result of building a derivation or substituting store paths.
Build results can represent either successful builds (with built outputs) or various types of failures.
oneOf:
- "$ref": "#/$defs/success"
- "$ref": "#/$defs/failure"
type:object
required:
- success
- status
properties:
timesBuilt:
type:integer
minimum:0
title:Times built
description:|
How many times this build was performed.
startTime:
type:integer
minimum:0
title:Start time
description:|
The start time of the build (or one of the rounds, if it was repeated), as a Unix timestamp.
stopTime:
type:integer
minimum:0
title:Stop time
description:|
The stop time of the build (or one of the rounds, if it was repeated), as a Unix timestamp.
cpuUser:
type:integer
minimum:0
title:User CPU time
description:|
User CPU time the build took, in microseconds.
cpuSystem:
type:integer
minimum:0
title:System CPU time
description:|
System CPU time the build took, in microseconds.
"$defs":
success:
type:object
title:Successful Build Result
description:|
Represents a successful build with built outputs.
required:
- success
- status
- builtOutputs
properties:
success:
const:true
title:Success indicator
description:|
Always true for successful build results.
status:
type:string
title:Success status
description:|
Status string for successful builds.
enum:
- "Built"
- "Substituted"
- "AlreadyValid"
- "ResolvesToAlreadyValid"
builtOutputs:
type:object
title:Built outputs
description:|
A mapping from output names to their build trace entries.
additionalProperties:
"$ref": "build-trace-entry-v1.yaml"
failure:
type:object
title:Failed Build Result
description:|
Represents a failed build with error information.
required:
- success
- status
- errorMsg
properties:
success:
const:false
title:Success indicator
description:|
Always false for failed build results.
status:
type:string
title:Failure status
description:|
Status string for failed builds.
enum:
- "PermanentFailure"
- "InputRejected"
- "OutputRejected"
- "TransientFailure"
- "CachedFailure"
- "TimedOut"
- "MiscFailure"
- "DependencyFailed"
- "LogLimitExceeded"
- "NotDeterministic"
- "NoSubstituters"
- "HashMismatch"
errorMsg:
type:string
title:Error message
description:|
Information about the error if the build failed.
isNonDeterministic:
type:boolean
title:Non-deterministic flag
description:|
If timesBuilt > 1, whether some builds did not produce the same result.
Note that 'isNonDeterministic = false' does not mean the build is deterministic,
just that we don't have evidence of non-determinism.
The path to the store object that resulted from building this derivation for the given output name.
dependentRealisations:
type:object
title:Underlying Base Build Trace
description:|
This is for [*derived*](@docroot@/store/build-trace.md#derived) build trace entries to ensure coherence.
Keys are derivation output IDs (same format as the main `id` field).
Values are the store paths that those dependencies resolved to.
As described in the linked section on derived build trace traces, derived build trace entries must be kept in addition and not instead of the underlying base build entries.
This is the set of base build trace entries that this derived build trace is derived from.
(The set is also a map since this miniature base build trace must be coherent, mapping each key to a single value.)
patternProperties:
"^sha256:[0-9a-f]{64}![a-zA-Z_][a-zA-Z0-9_-]*$":
$ref:"store-path-v1.yaml"
title:Dependent Store Path
description:Store path that this dependency resolved to during the build
additionalProperties:false
signatures:
type:array
title:Build Signatures
description:|
A set of cryptographic signatures attesting to the authenticity of this build trace entry.
This schema describes the JSON representation of Nix's `ContentAddress` type, which conveys information about [content-addressing store objects](@docroot@/store/store-object/content-address.md).
> **Note**
>
> For current methods of content addressing, this data type is a bit suspicious, because it is neither simply a content address of a file system object (the `method` is richer), nor simply a content address of a store object (the `hash` doesn't account for the references).
> It should thus only be used in contexts where the references are also known / otherwise made tamper-resistant.
<!--
TODO currently `ContentAddress` is used in both of these, and so same rationale applies, but actually in both cases the JSON is currently ad-hoc.
That will be fixed, and as each is fixed, the example (along with a more precise link to the field in question) should be become part of the above note, so what is is saying is more clear.
> For example:
> - Fixed outputs of derivations are not allowed to have any references, so an empty reference set is statically known by assumption.
> - [Store object info](./store-object-info.md) includes the set of references along side the (optional) content address.
> This data type is thus safely used in both of these contexts.
-->
type:object
properties:
method:
"$ref": "#/$defs/method"
hash:
title:Content Address
description:|
This would be the content-address itself.
For all current methods, this is just a content address of the file system object of the store object, [as described in the store chapter](@docroot@/store/file-system-object/content-address.md), and not of the store object as a whole.
In particular, the references of the store object are *not* taken into account with this hash (and currently-supported methods).
"$ref": "./hash-v1.yaml"
required:
- method
- hash
additionalProperties:false
"$defs":
method:
type:string
enum:[flat, nar, text, git]
title:Content-Addressing Method
description:|
A string representing the [method](@docroot@/store/store-object/content-address.md) of content addressing that is chosen.
Valid method strings are:
- [`flat`](@docroot@/store/store-object/content-address.md#method-flat) (provided the contents are a single file)
[Structured Attributes](@docroot@/store/derivation/index.md#structured-attrs), only defined if the derivation contains them.
Structured attributes are JSON, and thus embedded as-is.
type:object
additionalProperties:true
"$defs":
output:
overall:
title:Derivation Output
description:|
A single output of a derivation, with different variants for different output types.
oneOf:
- "$ref": "#/$defs/output/inputAddressed"
- "$ref": "#/$defs/output/caFixed"
- "$ref": "#/$defs/output/caFloating"
- "$ref": "#/$defs/output/deferred"
- "$ref": "#/$defs/output/impure"
inputAddressed:
title:Input-Addressed Output
description:|
The traditional non-fixed-output derivation type.
The output path is determined from the derivation itself.
See [Input-addressing derivation outputs](@docroot@/store/derivation/outputs/input-address.md) for more details.
type:object
required:
- path
properties:
path:
$ref:"store-path-v1.yaml"
title:Output path
description:|
The output path determined from the derivation itself.
additionalProperties:false
caFixed:
title:Fixed Content-Addressed Output
description:|
The output is content-addressed, and the content-address is fixed in advance.
See [Fixed-output content-addressing](@docroot@/store/derivation/outputs/content-address.md#fixed) for more details.
"$ref": "./content-address-v1.yaml"
required:
- method
- hash
properties:
method:
description:|
Method of content addressing used for this output.
hash:
title:Expected hash value
description:|
The expected content hash.
additionalProperties:false
caFloating:
title:Floating Content-Addressed Output
description:|
Floating-output derivations, whose outputs are content
addressed, but not fixed, and so the output paths are dynamically calculated from
whatever the output ends up being.
See [Floating Content-Addressing](@docroot@/store/derivation/outputs/content-address.md#floating) for more details.
type:object
required:
- method
- hashAlgo
properties:
method:
"$ref": "./content-address-v1.yaml#/$defs/method"
description:|
Method of content addressing used for this output.
hashAlgo:
title:Hash algorithm
"$ref": "./hash-v1.yaml#/$defs/algorithm"
description:|
What hash algorithm to use for the given method of content-addressing.
additionalProperties:false
deferred:
title:Deferred Output
description:|
Input-addressed output which depends on a (CA) derivation whose outputs (and thus their content-address
are not yet known.
type:object
properties:{}
additionalProperties:false
impure:
title:Impure Output
description:|
Impure output which is just like a floating content-addressed output, but this derivation runs without sandboxing.
As such, we don't record it in the build trace, under the assumption that if we need it again, we should rebuild it, as it might produce something different.
required:
- impure
- method
- hashAlgo
properties:
impure:
const:true
method:
"$ref": "./content-address-v1.yaml#/$defs/method"
description:|
How the file system objects will be serialized for hashing.
hashAlgo:
title:Hash algorithm
"$ref": "./hash-v1.yaml#/$defs/algorithm"
description:|
How the serialization will be hashed.
additionalProperties:false
outputName:
type:string
title:Output name
description:Name of the derivation output to depend on
outputNames:
type:array
title:Output Names
description:Set of names of derivation outputs to depend on
A cryptographic hash value used throughout Nix for content addressing and integrity verification.
This schema describes the JSON representation of Nix's `Hash` type.
type:object
properties:
algorithm:
"$ref": "#/$defs/algorithm"
format:
type:string
enum:
- base64
- nix32
- base16
- sri
title:Hash format
description:|
The encoding format of the hash value.
- `base64` uses standard Base64 encoding [RFC 4648, section 4](https://datatracker.ietf.org/doc/html/rfc4648#section-4)
- `nix32` is Nix-specific base-32 encoding
- `base16` is lowercase hexadecimal
- `sri` is the [Subresource Integrity format](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity).
hash:
type:string
title:Hash
description:|
The encoded hash value, itself.
It is specified in the format specified by the `format` field.
It must be the right length for the hash algorithm specified in the `algorithm` field, also.
The hash value does not include any algorithm prefix.
required:
- algorithm
- format
- hash
additionalProperties:false
"$defs":
algorithm:
type:string
enum:
- blake3
- md5
- sha1
- sha256
- sha512
title:Hash algorithm
description:|
The hash algorithm used to compute the hash value.
`blake3` is currently experimental and requires the [`blake-hashing`](@docroot@/development/experimental-features.md#xp-feature-blake3-hashes) experimental feature.
Information about a [store object](@docroot@/store/store-object.md).
This schema describes the JSON representation of store object metadata as returned by commands like [`nix path-info --json`](@docroot@/command-ref/new-cli/nix3-path-info.md).
Store object information can come in a few different variations.
Firstly, "impure" fields, which contain non-intrinsic information about the store object, may or may not be included.
Second, binary cache stores have extra non-intrinsic infomation about the store objects they contain.
Thirdly, [`nix path-info --json --closure-size`](@docroot@/command-ref/new-cli/nix3-path-info.html#opt-closure-size) can compute some extra information about not just the single store object in question, but the store object and its [closure](@docroot@/glossary.md#gloss-closure).
The impure and NAR fields are grouped into separate variants below.
See their descriptions for additional information.
The closure fields however as just included as optional fields, to avoid a combinatorial explosion of variants.
oneOf:
- $ref:"#/$defs/base"
- $ref:"#/$defs/impure"
- $ref:"#/$defs/narInfo"
$defs:
base:
title:Store Object Info
description:|
Basic store object metadata containing only intrinsic properties.
This is the minimal set of fields that describe what a store object contains.
type:object
required:
- version
- narHash
- narSize
- references
- ca
properties:
version:
type:integer
const:2
title:Format version (must be 2)
description:|
Must be `2`.
This is a guard that allows us to continue evolving this format.
Here is the rough version history:
- Version 0: `.narinfo` line-oriented format
- Version 1: Original JSON format, with ugly `"r:sha256"` inherited from `.narinfo` format.
- Version 2: Use structured JSON type for `ca`
path:
type:string
title:Store Path
description:|
[Store path](@docroot@/store/store-path.md) to the given store object.
Note: This field may not be present in all contexts, such as when the path is used as the key and the the store object info the value in map.
narHash:
"$ref": "./hash-v1.yaml"
title:NAR Hash
description:|
Hash of the [file system object](@docroot@/store/file-system-object.md) part of the store object when serialized as a [Nix Archive](@docroot@/store/file-system-object/content-address.md#serial-nix-archive).
narSize:
type:integer
minimum:0
title:NAR Size
description:|
Size of the [file system object](@docroot@/store/file-system-object.md) part of the store object when serialized as a [Nix Archive](@docroot@/store/file-system-object/content-address.md#serial-nix-archive).
references:
type:array
title:References
description:|
An array of [store paths](@docroot@/store/store-path.md), possibly including this one.
items:
type:string
ca:
oneOf:
- type:"null"
const:null
- "$ref": "./content-address-v1.yaml"
title:Content Address
description:|
If the store object is [content-addressed](@docroot@/store/store-object/content-address.md),
this is the content address of this store object's file system object, used to compute its store path.
Otherwise (i.e. if it is [input-addressed](@docroot@/glossary.md#gloss-input-addressed-store-object)), this is `null`.
additionalProperties:false
impure:
title:Store Object Info with Impure Fields
description:|
Store object metadata including impure fields that are not *intrinsic* properties.
In other words, the same store object in different stores could have different values for these impure fields.
If known, the path to the [store derivation](@docroot@/glossary.md#gloss-store-derivation) from which this store object was produced.
Otherwise `null`.
> This is an "impure" field that may not be included in certain contexts.
registrationTime:
type:["integer","null"]
title:Registration Time
description:|
If known, when this derivation was added to the store (Unix timestamp).
Otherwise `null`.
> This is an "impure" field that may not be included in certain contexts.
ultimate:
type:boolean
title:Ultimate
description:|
Whether this store object is trusted because we built it ourselves, rather than substituted a build product from elsewhere.
> This is an "impure" field that may not be included in certain contexts.
signatures:
type:array
title:Signatures
description:|
Signatures claiming that this store object is what it claims to be.
Not relevant for [content-addressed](@docroot@/store/store-object/content-address.md) store objects,
but useful for [input-addressed](@docroot@/glossary.md#gloss-input-addressed-store-object) store objects.
> This is an "impure" field that may not be included in certain contexts.
items:
type:string
# Computed closure fields
closureSize:
type:integer
minimum:0
title:Closure Size
description:|
The total size of this store object and every other object in its [closure](@docroot@/glossary.md#gloss-closure).
> This field is not stored at all, but computed by traversing the other fields across all the store objects in a closure.
additionalProperties:false
narInfo:
title:Store Object Info with Impure fields and NAR Info
description:|
The store object info in the "binary cache" family of Nix store type contain extra information pertaining to *downloads* of the store object in question.
(This store info is called "NAR info", since the downloads take the form of [Nix Archives](@docroot@/store/file-system-object/content-address.md#serial-nix-archive, and the metadata is served in a file with a `.narinfo` extension.)
This download information, being specific to how the store object happens to be stored and transferred, is also considered to be non-intrinsic / impure.
Where to download a compressed archive of the file system objects of this store object.
> This is an impure "`.narinfo`" field that may not be included in certain contexts.
compression:
type:string
title:Compression
description:|
The compression format that the archive is in.
> This is an impure "`.narinfo`" field that may not be included in certain contexts.
downloadHash:
"$ref": "./hash-v1.yaml"
title:Download Hash
description:|
A digest for the compressed archive itself, as opposed to the data contained within.
> This is an impure "`.narinfo`" field that may not be included in certain contexts.
downloadSize:
type:integer
minimum:0
title:Download Size
description:|
The size of the compressed archive itself.
> This is an impure "`.narinfo`" field that may not be included in certain contexts.
closureDownloadSize:
type:integer
minimum:0
title:Closure Download Size
description:|
The total size of the compressed archive itself for this object, and the compressed archive of every object in this object's [closure](@docroot@/glossary.md#gloss-closure).
> This is an impure "`.narinfo`" field that may not be included in certain contexts.
> This field is not stored at all, but computed by traversing the other fields across all the store objects in a closure.
[file system object]: @docroot@/store/file-system-object.md
The format of this specification is close to [Extended Backus–Naur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), with the exception of the `str(..)` function / parameterized rule, which length-prefixes and pads strings.
@@ -41,3 +41,15 @@ The `str` function / parameterized rule is defined as follows:
- `int(n)` = the 64-bit little endian representation of the number `n`
- `pad(s)` = the byte sequence `s`, padded with 0s to a multiple of 8 byte
## Kaitai Struct Specification
The Nix Archive (NAR) format is also formally described using [Kaitai Struct](https://kaitai.io/), an Interface Description Language (IDL) for defining binary data structures.
> Kaitai Struct provides a language-agnostic, machine-readable specification that can be compiled into parsers for various programming languages (e.g., C++, Python, Java, Rust).
```yaml
{{#include nar.ksy}}
```
The source of the spec can be found [here](https://github.com/nixos/nix/blob/master/src/nix-manual/source/protocols/nix-archive/nar.ksy). Contributions and improvements to the spec are welcomed.
-`nix-shell` shebang lines now support single-quoted arguments.
-`builtins.fetchTree` is now its own experimental feature, [`fetch-tree`](@docroot@/development/experimental-features.md#xp-fetch-tree).
This allows stabilising it independently of the rest of what is encompassed by [`flakes`](@docroot@/development/experimental-features.md#xp-fetch-tree).
-`builtins.fetchTree` is now its own experimental feature, [`fetch-tree`](@docroot@/development/experimental-features.md#xp-feature-fetch-tree).
This allows stabilising it independently of the rest of what is encompassed by [`flakes`](@docroot@/development/experimental-features.md#xp-feature-flakes).
- The interface for creating and updating lock files has been overhauled:
- Modify `nix derivation {add,show}` JSON format [#9866](https://github.com/NixOS/nix/issues/9866) [#10722](https://github.com/NixOS/nix/pull/10722)
The JSON format for derivations has been slightly revised to better conform to our [JSON guidelines](@docroot@/development/cli-guideline.md#returning-future-proof-json).
The JSON format for derivations has been slightly revised to better conform to our [JSON guidelines](@docroot@/development/json-guideline.md).
In particular, the hash algorithm and content addressing method of content-addressed derivation outputs are now separated into two fields `hashAlgo` and `method`,
rather than one field with an arcane `:`-separated format.
- Support unit prefixes in configuration settings [#10668](https://github.com/NixOS/nix/pull/10668)
Configuration settings in Nix now support unit prefixes, allowing for more intuitive and readable configurations. For example, you can now specify [`--min-free 1G`](@docroot@/command-ref/opt-common.md#opt-min-free) to set the minimum free space to 1 gigabyte.
Configuration settings in Nix now support unit prefixes, allowing for more intuitive and readable configurations. For example, you can now specify [`--min-free 1G`](@docroot@/command-ref/conf-file.md#conf-min-free) to set the minimum free space to 1 gigabyte.
This enhancement was extracted from [#7851](https://github.com/NixOS/nix/pull/7851) and is also useful for PR [#10661](https://github.com/NixOS/nix/pull/10661).
The *build trace* is a [memoization table](https://en.wikipedia.org/wiki/Memoization) for builds.
It maps the inputs of builds to the outputs of builds.
Concretely, that means it maps [derivations][derivation] to maps of [output] names to [store objects][store object].
In general the derivations used as a key should be [*resolved*](./resolution.md).
A build trace with all-resolved-derivation keys is also called a *base build trace* for extra clarity.
If all the resolved inputs of a derivation are content-addressed, that means the inputs will be fully determined, leaving no ambiguity for what build was performed.
(Input-addressed inputs however are still ambiguous. They too should be locked down, but this is left as future work.)
Accordingly, to look up an unresolved derivation, one must first resolve it to get a resolved derivation.
Resolving itself involves looking up entries in the build trace, so this is a mutually recursive process that will end up inspecting possibly many entries.
Except for the issue with input-addressed paths called out above, base build traces are trivially *coherent* -- incoherence is not possible.
That means that the claims that each key-value base build try entry makes are independent, and no mapping invalidates another mapping.
Whether the mappings are *true*, i.e. the faithful recording of actual builds performed, is another matter.
Coherence is about the multiple claims of the build trace being mutually consistent, not about whether the claims are individually true or false.
In general, there is no way to audit a build trace entry except for by performing the build again from scratch.
And even in that case, a different result doesn't mean the original entry was a "lie", because the derivation being built may be non-deterministic.
As such, the decision of whether to trust a counterparty's build trace is a fundamentally subject policy choice.
Build trace entries are typically *signed* in order to enable arbitrary public-key-based trust polices.
## Derived build traces {#derived}
Implementations that wish to memoize the above may also keep additional *derived* build trace entries that do map unresolved derivations.
But if they do so, they *must* also keep the underlying base entries with resolved derivation keys around.
Firstly, this ensures that the derived entries are merely cache, which could be recomputed from scratch.
Secondly, this ensures the coherence of the derived build trace.
Unlike with base build traces, incoherence with derived build traces is possible.
The key ingredient is that derivation resolution is only deterministic with respect to a fixed base build trace.
Without fixing the base build trace, it inherits the subjectivity of base build traces themselves.
Concretely, suppose there are three derivations \\(a\\), \\(b\\), and \\(c\\).
Let \\(a\\) be a resolved derivation, but let \\(b\\) and \\(c\\) be unresolved and both take as an input an output of \\(a\\).
Now suppose that derived entries are made for \\(b\\) and \\(c\\) based on two different entries of \\(a\\).
(This could happen if \\(a\\) is non-deterministic, \\(a\\) and \\(b\\) are built in one store, \\(a\\) and \\(c\\) are built in another store, and then a third store substitutes from both of the first two stores.)
If trusting the derived build trace entries for \\(b\\) and \\(c\\) requires that each's underlying entry for \\(a\\) be also trusted, the two different mappings for \\(a\\) will be caught.
However, if \\(b\\) and \\(c\\)'s entries can be combined in isolation, there will be nothing to catch the contradiction in their hidden assumptions about \\(a\\)'s output.
- Once this is done, the derivation is *normalized*, replacing each input deriving path with its store path, which we now know from realising the input.
## Builder Execution
## Builder Execution {#builder-execution}
The [`builder`](./derivation/index.md#builder) is executed as follows:
@@ -102,11 +102,11 @@ But rather than somehow scanning all the other fields for inputs, Nix requires t
### System {#system}
The system type on which the [`builder`](#attr-builder) executable is meant to be run.
The system type on which the [`builder`](#builder) executable is meant to be run.
A necessary condition for Nix to schedule a given derivation on some [Nix instance] is for the "system" of that derivation to match that instance's [`system` configuration option] or [`extra-platforms` configuration option].
By putting the `system` in each derivation, Nix allows *heterogenous* build plans, where not all steps can be run on the same machine or same sort of machine.
By putting the `system` in each derivation, Nix allows *heterogeneous* build plans, where not all steps can be run on the same machine or same sort of machine.
Nix can schedule builds such that it automatically builds on other platforms by [forwarding build requests](@docroot@/advanced-topics/distributed-builds.md) to other Nix instances.
@@ -167,10 +167,10 @@ It is only in the potential for that check to fail that they are different.
>
> In a future world where floating content-addressing is also stable, we in principle no longer need separate [fixed](#fixed) content-addressing.
> Instead, we could always use floating content-addressing, and separately assert the precise value content address of a given store object to be used as an input (of another derivation).
> A stand-alone assertion object of this sort is not yet implemented, but its possible creation is tracked in [Issue #11955](https://github.com/NixOS/nix/issues/11955).
> A stand-alone assertion object of this sort is not yet implemented, but its possible creation is tracked in [issue #11955](https://github.com/NixOS/nix/issues/11955).
>
> In the current version of Nix, fixed outputs which fail their hash check are still registered as valid store objects, just not registered as outputs of the derivation which produced them.
> This is an optimization that means if the wrong output hash is specified in a derivation, and then the derivation is recreated with the right output hash, derivation does not need to be rebuilt --- avoiding downloading potentially large amounts of data twice.
> This is an optimization that means if the wrong output hash is specified in a derivation, and then the derivation is recreated with the right output hash, derivation does not need to be rebuilt — avoiding downloading potentially large amounts of data twice.
> This optimisation prefigures the design above:
> If the output hash assertion was removed outside the derivation itself, Nix could additionally not only register that outputted store object like today, but could also make note that derivation did in fact successfully download some data.
For example, for the "fetch URL" example above, making such a note is tantamount to recording what data is available at the time of download at the given URL.
@@ -43,7 +43,7 @@ In particular, the specification decides:
- if the content is content-addressed, how is it content addressed
- if the content is content-addressed, [what is its content address](./content-address.md#fixed-content-addressing) (and thus what is its [store path])
- if the content is content-addressed, [what is its content address](./content-address.md#fixed) (and thus what is its [store path])
That is to say, an input-addressed output's store path is a function not of the output itself, but of the derivation that produced it.
Even if two store paths have the same contents, if they are produced in different ways, and one is input-addressed, then they will have different store paths, and thus guaranteed to not be the same store object.
A naive implementation of an output hash computation for input-addressedoutputs would be to hash the derivation hash and output together.
This clearly has the uniqueness properties we want for input-addressed outputs, but suffers from an inefficiency.
Specifically, new builds would be required whenever a change is made to a fixed-output derivation, despite having provably no differences in the inputs to the new derivation compared to what it used to be.
Concretely, this would cause a "mass rebuild" whenever any fetching detail changes, including mirror lists, certificate authority certificates, etc.
**TODO hash derivation modulo.**
To solve this problem, we compute output hashes differently, so that certain output hashes become identical.
We call this concept quotient hashing, in reference to quotient types or sets.
So how do we compute the hash part of the output path of a derivation?
This is done by the function `hashDrv`, shown in Figure 5.10.
It distinguishes between two cases.
If the derivation is a fixed-output derivation, then it computes a hash over just the `outputHash` attributes.
So how do we compute the hash part of the output paths of an input-addressed derivation?
This is done by the function `hashQuotientDerivation`, shown below.
If the derivation is not a fixed-output derivation, we replace each element in the derivation’s inputDrvs with the result of a call to `hashDrv` for that element.
(The derivation at each store path in `inputDrvs` is converted from its on-disk ATerm representation back to a `StoreDrv` by the function `parseDrv`.) In essence, `hashDrv` partitions storederivations into equivalence classes, and for hashing purpose it replaces each store path in a derivation graph with its equivalence class.
First, a word on inputs.
`hashQuotientDerivation` is only defined on derivations whose [inputs](@docroot@/store/derivation/index.md#inputs) take the first-order form:
```typescript
typeConstantPath={
path: StorePath;
};
The recursion in Figure 5.10 is inefficient:
it will call itself once for each path by which a subderivation can be reached, i.e., `O(V k)` times for a derivation graph with `V` derivations and with out-degree of at most `k`.
In the actual implementation, memoisation is used to reduce this to `O(V + E)` complexity for a graph with E edges.
inputDrvOutputs: Set<FirstOrderOutputPath>;// new instead
// ...other fields...
};
```
In the [currently-experimental][xp-feature-dynamic-derivations] higher-order case where outputs of outputs are allowed as [deriving paths][deriving-path] and thus derivation inputs, derivations using that generalization are not valid arguments to this function.
Those derivations must be (partially) [resolved](@docroot@/store/resolution.md) enough first, to the point where no such higher-order inputs remain.
Then, and only then, can input addresses be assigned.
```
function hashQuotientDerivation(drv) -> Hash:
assert(drv.outputs are input-addressed)
drv′ ← drv with {
inputDrvOutputs = ⋃(
assert(drvPath is store path)
case hashOutputsOrQuotientDerivation(readDrv(drvPath)) of
drvHash : Hash →
(drvHash.toBase16(), output)
outputHashes : Map[String, Hash] →
(outputHashes[output].toBase16(), "out")
| (drvPath, output) ∈ drv.inputDrvOutputs
)
}
return hashSHA256(printDrv(drv′))
function hashOutputsOrQuotientDerivation(drv) -> Map[String, Hash] | Hash:
, ca = output.contentAddress // or get from build trace if floating
}
else: // drv.outputs are input-addressed
return hashQuotientDerivation(drv)
```
### `hashQuotientDerivation`
We replace each element in the derivation's `inputDrvOutputs` using data from a call to `hashOutputsOrQuotientDerivation` on the `drvPath` of that element.
When `hashOutputsOrQuotientDerivation` returns a single drv hash (because the input derivation in question is input-addressing), we simply swap out the `drvPath` for that hash, and keep the same output name.
When `hashOutputsOrQuotientDerivation` returns a map of content addresses per-output, we look up the output in question, and pair it with the output name `out`.
The resulting pseudo-derivation (with hashes instead of store paths in `inputDrvs`) is then printed (in the ["ATerm" format](@docroot@/protocols/derivation-aterm.md)) and hashed, and this becomes the hash of the "quotient derivation".
When calculating output hashes, `hashQuotientDerivation` is called on an almost-complete input-addressing derivation, which is just missing its input-addressed outputs paths.
The derivation hash is then used to calculate output paths for each output.
<!-- TODO describe how this is done. -->
Those output paths can then be substituted into the almost-complete input-addressed derivation to complete it.
> **Note**
>
> There may be an unintentional deviation from specification currently implemented in the `(outputHashes[output].toBase16(), "out")` case.
> This is not fatal because the deviation would only apply for content-addressing derivations with more than one output, and that only occurs in the floating case, which is [experimental][xp-feature-ca-derivations].
> Once this bug is fixed, this note will be removed.
### `hashOutputsOrQuotientDerivation`
How does `hashOutputsOrQuotientDerivation` in turn work?
It consists of two main cases, based on whether the outputs of the derivation are to be input-addressed or content-addressed.
#### Input-addressed outputs case
In the input-addressed case, it just calls `hashQuotientDerivation`, and returns that derivation hash.
This makes `hashQuotientDerivation` and `hashOutputsOrQuotientDerivation` mutually-recursive.
> **Note**
>
> In this case, `hashQuotientDerivation` is being called on a *complete* input-addressing derivation that already has its output paths calculated.
> The `inputDrvs` substitution takes place anyways.
#### Content-addressed outputs case
If the outputs are [content-addressed](./content-address.md), then it computes a hash for each output derived from the content-address of that output.
> **Note**
>
> In the [fixed](./content-address.md#fixed) content-addressing case, the outputs' content addresses are statically specified in advance, so this always just works.
> (The fixed case is what the pseudo-code shows.)
>
> In the [floating](./content-address.md#floating) case, the content addresses are not specified in advance.
> This is what the "or get from [build trace](@docroot@/store/build-trace.md) if floating" comment refers to.
> In this case, the algorithm is *stuck* until the input in question is built, and we know what the actual contents of the output in question is.
>
> That is OK however, because there is no problem with delaying the assigning of input addresses (which, remember, is what `hashQuotientDerivation` is ultimately for) until all inputs are known.
### Performance
The recursion in the algorithm is potentially inefficient:
it could call itself once for each path by which a subderivation can be reached, i.e., `O(V^k)` times for a derivation graph with `V` derivations and with out-degree of at most `k`.
In the actual implementation, [memoisation](https://en.wikipedia.org/wiki/Memoization) is used to reduce this cost to be proportional to the total number of `inputDrvOutputs` encountered.
### Semantic properties
*See [this chapter's appendix](@docroot@/store/math-notation.md) on grammar and metavariable conventions.*
In essence, `hashQuotientDerivation` partitions input-addressing derivations into equivalence classes: every derivation in that equivalence class is mapped to the same derivation hash.
We can characterize this equivalence relation directly, by working bottom up.
We start by defining an equivalence relation on first-order output deriving paths that refer content-addressed derivation outputs. Two such paths are equivalent if they refer to the same store object:
where \\({}^*(s, o)\\) denotes the store object that the output deriving path refers to.
We will also need the following construction to lift any equivalence relation on \\(X\\) to an equivalence relation on (finite) sets of \\(X\\) (in short, \\(\\mathcal{P}(X)\\)):
\\[
\\begin{prooftree}
\\AxiomC{$\\forall a \\in A. \\exists b \\in B. a \\,\\sim\_X\\, b$}
\\AxiomC{$\\forall b \\in B. \\exists a \\in A. b \\,\\sim\_X\\, a$}
\\BinaryInfC{$A \\,\\sim_{\\mathcal{P}(X)}\\, B$}
\\end{prooftree}
\\]
Now we can define the equivalence relation \\(\\sim_\\mathrm{IA}\\) on input-addressed derivation outputs. Two input-addressed outputs are equivalent if their derivations are equivalent (via the yet-to-be-defined \\(\\sim_{\\mathrm{IADrv}}\\) relation) and their output names are the same:
And now we can define \\(\\sim_{\\mathrm{IADrv}}\\).
Two input-addressed derivations are equivalent if their content-addressed inputs are equivalent, their input-addressed inputs are also equivalent, and they are otherwise equal:
<!-- cheating a bit with the semantics to get a good layout that fits on the page -->
where \\(\\mathrm{caInputs}(d)\\) returns the content-addressed inputs of \\(d\\) and \\(\\mathrm{iaInputs}(d)\\) returns the input-addressed inputs.
> **Note**
>
> An astute reader might notice that that nowhere does `inputSrcs` enter into these definitions.
> That means that replacing an input derivation with its outputs directly added to `inputSrcs` always results in a derivation in a different equivalence class, despite the resulting input closure (as would be mounted in the store at build time) being the same.
> [Issue #9259](https://github.com/NixOS/nix/issues/9259) is about creating a coarser equivalence relation to address this.
>
> \\(\\sim_\mathrm{Drv}\\) from [derivation resolution](@docroot@/store/resolution.md) is such an equivalence relation.
> It is coarser than this one: any two derivations which are "'hash quotient derivation'-equivalent" (\\(\\sim_\mathrm{IADrv}\\)) are also "resolution-equivalent" (\\(\\sim_\mathrm{Drv}\\)).
> It also relates derivations whose `inputDrvOutputs` have been rewritten into `inputSrcs`.
A few times in this manual, formal "proof trees" are used for [natural deduction](https://en.wikipedia.org/wiki/Natural_deduction)-style definition of various [relations](https://en.wikipedia.org/wiki/Relation_(mathematics)).
The following grammar and assignment of metavariables to syntactic categories is used in these sections.
*See [this chapter's appendix](@docroot@/store/math-notation.md) on grammar and metavariable conventions.*
To *resolve* a derivation is to replace its [inputs] with the simplest inputs — plain store paths — that denote the same store objects.
Derivations that only have store paths as inputs are likewise called *resolved derivations*.
(They are called that whether they are in fact the output of derivation resolution, or just made that way without non-store-path inputs to begin with.)
## Input Content Equivalence of Derivations
[Deriving paths][deriving-path] intentionally make it possible to refer to the same [store object] in multiple ways.
This is a consequence of content-addressing, since different derivations can produce the same outputs, and the same data can also be manually added to the store.
This is also a consequence even of input-addressing, as an output can be referred to by derivation and output name, or directly by its [computed](./derivation/outputs/input-address.md) store path.
Since dereferencing deriving paths is thus not injective, it induces an equivalence relation on deriving paths.
Let's call this equivalence relation \\(\\sim\\), where \\(p_1 \\sim p_2\\) means that deriving paths \\(p_1\\) and \\(p_2\\) refer to the same store object.
**Content Equivalence**: Two deriving paths are equivalent if they refer to the same store object:
\\[
\\begin{prooftree}
\\AxiomC{${}^*p_1 = {}^*p_2$}
\\UnaryInfC{$p_1 \\,\\sim_\\mathrm{DP}\\, p_2$}
\\end{prooftree}
\\]
where \\({}^\*p\\) denotes the store object that deriving path \\(p\\) refers to.
This also induces an equivalence relation on sets of deriving paths:
\\[
\\begin{prooftree}
\\AxiomC{$\\{ {}^*p | p \\in P_1 \\} = \\{ {}^*p | p \\in P_2 \\}$}
**Input Content Equivalence**: This, in turn, induces an equivalence relation on derivations: two derivations are equivalent if their inputs are equivalent, and they are otherwise equal:
Derivation resolution always maps derivations to input-content-equivalent derivations.
## Resolution relation
Dereferencing a derived path — \\({}^\*p\\) above — was just introduced as a black box.
But actually it is a multi-step process of looking up build results in the [build trace] that itself depends on resolving the lookup keys.
Resolution is thus a recursive multi-step process that is worth diagramming formally.
We can do this with a small-step binary transition relation; let's call it \\(\rightsquigarrow\\).
We can then conclude dereferenced equality like this:
\\[
\\begin{prooftree}
\\AxiomC{$p\_1 \\rightsquigarrow^* p$}
\\AxiomC{$p\_2 \\rightsquigarrow^* p$}
\\BinaryInfC{${}^*p\_1 = {}^*p\_2$}
\\end{prooftree}
\\]
I.e. by showing that both original items resolve (over 0 or more small steps, hence the \\({}^*\\)) to the same exact item.
With this motivation, let's now formalize a [small-step](https://en.wikipedia.org/wiki/Operational_semantics#Small-step_semantics) system of reduction rules for resolution.
### Formal rules
### \\(\text{resolved}\\) unary relation
\\[
\\begin{prooftree}
\\AxiomC{$s \in \text{store-path}$}
\\UnaryInfC{$s$ resolved}
\\end{prooftree}
\\]
\\[
\\begin{prooftree}
\\AxiomC{$\forall i \in \mathrm{inputs}(d). i \text{ resolved}$}
\\UnaryInfC{$d$ resolved}
\\end{prooftree}
\\]
### \\(\rightsquigarrow\\) binary relation
> **Remark**
>
> Actually, to be completely formal we would need to keep track of the build trace we are choosing to resolve against.
>
> We could do that by making \\(\rightsquigarrow\\) a ternary relation, which would pass the build trace to itself until it finally uses it in that one rule.
> This would add clutter more than insight, so we didn't bother to write it.
>
> There are other options too, like saying the whole reduction rule system is parameterized on the build trace, essentially [currying](https://en.wikipedia.org/wiki/Currying) the ternary \\(\rightsquigarrow\\) into a function from build traces to the binary relation written above.
Like all well-behaved evaluation relations, partial resolution is [*confluent*](https://en.wikipedia.org/wiki/Confluence_(abstract_rewriting)).
Also, if we take the symmetric closure of \\(\\rightsquigarrow^\*\\), we end up with the equivalence relations of the previous section.
Resolution respects content equivalence for deriving paths, and input content equivalence for derivations.
> **Remark**
>
> We chose to define from scratch an "resolved" unary relation explicitly above.
> But it can also be defined as the normal forms of the \\(\\rightsquigarrow^\*\\) relation:
>
> \\[ a \text{ resolved} \Leftrightarrow \forall b. b \rightsquigarrow^* a \Rightarrow b = a\\]
>
> In prose, resolved terms are terms which \\(\\rightsquigarrow^\*\\) only relates on the left side to the same term on the right side; they are the terms which can be resolved no further.
## Partial versus Complete Resolution
Similar to evaluation, we can also speak of *partial* versus *complete* derivation resolution.
Partial derivation resolution is what we've actually formalized above with \\(\\rightsquigarrow^\*\\).
Complete resolution is resolution ending in a resolved term (deriving path or derivation).
(Which is a normal form of the relation, per the remark above.)
With partial resolution, a derivation is related to equivalent derivations with the same or simpler inputs, but not all those inputs will be plain store paths.
This is useful when the input refers to a floating content addressed output we have not yet built — we don't know what (content-address) store path will used for that derivation, so we are "stuck" trying to resolve the deriving path in question.
(In the above formalization, this happens when the build trace is missing the keys we wish to look up in it.)
Complete resolution is a *functional* relation, i.e. values on the left are uniquely related with values on the right.
It is not however, a *total* relation (in general, assuming arbitrary build traces).
This is discussed in the next section.
## Termination
For static derivations graphs, complete resolution is indeed total, because it always terminates for all inputs.
(A relation that is both total and functional is a function.)
For [dynamic][xp-feature-dynamic-derivations] derivation graphs, however, this is not the case — resolution is not guaranteed to terminate.
The issue isn't rewriting deriving paths themselves:
a single rewrite to normalize an output deriving path to a constant one always exists, and always proceeds in one step.
The issue is that dynamic derivations (i.e. those that are filled-in the graph by a previous resolution) may have more transitive dependencies than the original derivation.
> **Example**
>
> Suppose we have this deriving path
> ```json
> {
> "drvPath": {
> "drvPath": "...-foo.drv",
> "output": "bar.drv"
> },
> "output": "baz"
> }
> ```
> and derivation `foo` is already resolved.
> When we resolve deriving path we'll end up with something like.
> ```json
> {
> "drvPath": "...-foo-bar.drv",
> "output": "baz"
> }
> ```
> So far is just an atomic single rewrite, with no termination issues.
> But the derivation `foo-bar` may have its *own* dynamic derivation inputs.
> Resolution must resolve that derivation first before the above deriving path can finally be normalized to a plain `...-foo-bar-baz` store path.
The important thing to notice is that while "build trace" *keys* must be resolved.
The *value* those keys are mapped to have no such constraints.
An arbitrary store object has no notion of being resolved or not.
But, an arbitrary store object can be read back as a derivation (as will in fact be done in case for dynamic derivations / nested output deriving paths).
And those derivations need *not* be resolved.
It is those dynamic non-resolved derivations which are the source of non-termination.
By the same token, they are also the reason why dynamic derivations offer greater expressive power.
// console.log(`TeX error in "${jax.latex}": ${error.message}`);
// return jax.formatError(error);
//}
}
};
</script>
<!-- Load a newer versino of MathJax than mdbook does by default, and which in particular has working relative paths for the "bussproofs" extension. -->
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.