Skip to content

Conversation

@leana8959
Copy link

@leana8959 leana8959 commented Nov 11, 2025

This is the second part of the exact print parser. We changed the parser so that it doesn't merge the common stanzas during the parsing stage.

Depends on #11285 to update the test suite (after which the diff will be significantly smaller).

Background

Common stanzas are merged into their corresponding section during the parsing stage. This merging process doesn't retain the definition of the imports (i.e. common <name> ...) nor where they are used (import: <name>). To achieve exact printing and to be able to programmatically update the import list, we need to retain this information.

This PR is to remedy exactly that by deferring the merging process.

Implementation

We insert a new field gpdCommonStanza :: Map ImportName (CondTree ConfVar [Dependency] BuildInfo) in the GenericPackageDescription data type to track all the defined common stanzas. The merging is run only when the new accessors are called on a GenericPackageDescription.
We also added a bidirectional PatternSynonym that uses the old record syntax. The impact on existing code should be very low. Comparing against my PR to update the test suite is promising leana8959/cabal@accessor-tests...leana8959:cabal:no-flatten-package-description.

Please let me know your thoughts :)


Checklist below:

This PR modifies behaviour or interface

Include the following checklist in your PR:

@leana8959
Copy link
Author

Here are the benchmark results by timing the ./validate --partial-hackage-tests test.

  • baseline (master branch)

    $ hyperfine --setup "./validate.sh" "./validate.sh --partial-hackage-tests"
    Benchmark 1: ./validate.sh --partial-hackage-tests
      Time (mean ± σ):     224.440 s ±  5.713 s    [User: 191.896 s, System: 34.800 s]
      Range (min … max):   218.596 s … 239.915 s    10 runs
    
  • no-flatten-package-description

    $ hyperfine --setup "./validate.sh" "./validate.sh --partial-hackage-tests"
    Benchmark 1: ./validate.sh --partial-hackage-tests
      Time (mean ± σ):     232.745 s ± 10.650 s    [User: 194.793 s, System: 38.621 s]
      Range (min … max):   220.815 s … 254.096 s    10 runs
    

@ulysses4ever
Copy link
Collaborator

I'm not sure why validate is used for benchmarking parser changes. Validate is likely dominated by completely unrelated factors (e. g. solver). Would it be possible to just download a bunch of cabal files from Hackage (see clc-stackage for a curated set of packages) and run only the parser on them?

@leana8959
Copy link
Author

@ulysses4ever

Would it be possible to just download a bunch of cabal files from Hackage (see clc-stackage for a curated set of packages) and run only the parser on them?

The options --{partial,complete}-hackage-tests call the hackage test. These tests runs the readField and parsec parser against entire hackage. I'll look into clc-hackage as well, but it seems like hackage-tests is already exactly what we want here?

Validate is likely dominated by completely unrelated factors (e. g. solver)

I thought it would be important to test out whether I broke the solver by changing the representation to GPD too, which is why I tested everything :)

@ulysses4ever
Copy link
Collaborator

The options --{partial,complete}-hackage-tests call the hackage test.

these are tests, which is different from benchmarking. You seem to be using validate for benchmarking, which makes no sense to me, as explained above. On the other hand, using validate for tests may have some value although it is preferable to use the actual test like hackage-tests, which you do, so I don't have any issue with your approach to testing.

@leana8959
Copy link
Author

Would it be possible to just download a bunch of cabal files from Hackage (see clc-stackage for a curated set of packages) and run only the parser on them?

This is pretty much what the hackage-test test does. Do you think timing this alone would be more acceptable (compare to ./validate) even though it's not a benchmark?

@ulysses4ever
Copy link
Collaborator

If timing hackage-tests would mean timing download times too, then no, it doesn't sound right. I don't know if you can arrange for separate download, which would make it more reasonable. Generally, if you want to have an estimate for performance footprint, you have to develop a benchmark, you can't just hijack test executables whole-sale. You could probably reuse some of the testing infrastructure (tasty-bench exists for a reason). If this sounds too much, you should just stop trying to achieve it because currently you're posting numbers for validate that most likely have no connection to the actual performance change, and as such are actively misleading. It'd be great to have some very simple benchmark that shows that we don't degrade performance significantly but other than that we should focus on design rather than performance, I think: the design issue has been a die-hard for a long time...

@leana8959
Copy link
Author

leana8959 commented Nov 21, 2025

If timing hackage-tests would mean timing download times too, then no, it doesn't sound right. I don't know if you can arrange for separate download, which would make it more reasonable.

hackage-tests looks for $XDG_CACHE/cabal/packages/hackage.haskell.org/01-index.tar and reads from it. It doesn't do the downloading (it would fail if the file doesn't exist and tell the user to run cabal update).

If this sounds too much, you should just stop trying to achieve it because currently you're posting numbers for validate that most likely have no connection to the actual performance change, and as such are actively misleading.

I totally understand. I can make sure cabal update is called before I invoke the test suite, does this make it more reasonable? It seems like the hackage test suite is doing what we want, which is parsing a huge collection of cabal files. I can also specify it to only use the read-fields (envelope cabal format) parser or the parsec (the actual cabal grammar) parser. If not, I'll not test the performance with hackage-tests anymore 👍

Otherwise, do you have an example of using tasty-bench to benchmark something similar in cabal? We wanted to add benchmark because in the tech proposal phase people were concerned about performance.

@mpickering
Copy link
Collaborator

How does the bidirectional pattern synonym work? If the accessor flattens the structure then the setter won't know whether to update the common stanza or the specific component. This seems like a big foot-gun in the API if the goal is to allow people to manipulate GenericPackageDescription. How much are these pattern synonyms used for setting fields? Should we rewrite those situations to work properly with common stanzas?

It would be helpful if you could lay out some comments pointing out how to review the patch since it is rather large.

@leana8959
Copy link
Author

leana8959 commented Nov 25, 2025

How does the bidirectional pattern synonym work?

The part before the where keyword defines how the synonym expands in patterns, the part after where defines the expansion in expressions. Having both defined allows us to use the pattern fields identical to a normal record field 1.
We use the PatternSynonyms extension in conjunction with ViewPatterns to make function calls to do the merging, and the merging is done with the same logic prior this patch.

The pattern synonym was introduced to maintain backward compatibility of the record syntax. We managed to not change the majority of the existing code (except traversals that touch the condtrees for example). Without the record syntax pattern synonym, one would need to pattern match on the entire GenericPackageDescription to change a single field, which is not ergonomic.

If the accessor flattens the structure then the setter won't know whether to update the common stanza or the specific component.

Either the accessors or the caller of the accessors has to decide whether to update the common stanza or the specific component, this is a hard problem to solve. Should we choose to not flatten the structure, the call-sites of the accessors would then have to decide where to make the modification to the data (e.g. the common stanza or the component). This modification must also not create missing imports in any component.
With the pattern synonym, all imports are merged on the fly so that the accessor call-sites don't have to deal with the implementation details of common stanza resolution.

This is done by

  • Merging the imports on the fly using the first part of the pattern synonym.
  • Construct the data structure using the second part of the pattern synonym.

c.f. semantics of accessors using bidirectional PatternSynonyms 1

Further more, the first part and the second part of an explicitly bidirectional pattern synonym must have the same type. If we want to be able to construct GenericPackageDescription with import information in the second part (i.e. not flatten the condtrees), the first part must not merge them, which renders the whole "compatibility accessors" idea futile.

We could also choose to not have a right hand side, but due to how pattern synonym works 1, we won't able to use these fields as setters, only accessors. This is also a valid option.

As a side note, PatternSynonyms would give us a new set of "compatibility" accessors that doesn't work together with the old ones. This can also be a footgun (or just plain annoying) for other people working on cabal.
The following snippet demonstrates that.

-- This doesn't work
foo
  { internalField      = bar -- a field from the internal description
  , compatibilityField = baz -- a field from the new "compatibility" pattern
  }

-- We have to do this because these two fields don't come from the same pattern.
foo
  { internalField      = bar -- a field from the internal description
  }
  { compatibilityField = baz -- a field from the new "compatibility" pattern
  }

What do you think about this trade off for backward compatibilty? Are there different solutions/details that I might be missing?

How much are these pattern synonyms used for setting fields?

I grepped with \{ <gpdFieldName> for each field name of the gpd pattern synonym and I only found
three. They are all generating dummy gpd, either for partial parsing failure or backward
compatibility functions or tests.

It would be helpful if you could lay out some comments pointing out how to review the patch since it is rather large.

I will come back later for this :)

Footnotes

  1. Pattern synonym in Record Syntax https://gitlab.haskell.org/ghc/ghc/-/wikis/pattern-synonyms/record-pattern-synonyms#record-updates 2 3

@leana8959
Copy link
Author

It would be helpful if you could lay out some comments pointing out how to review the patch since it is rather large.

We annotate each node of a CondTree's data with its imports, and not merge in the imports as we go. Notably, the gpdCommonStanza field holds a map of WithImports BuildInfos indexed by common stanza name and each CondTree ConfVar [Dependency] a is now CondTree ConfVar [Dependency] (WithImports a). This representation can capture recursive imports within nested conditionals or transitive imports. During import resolution, we first resolve the imports within common stanza, and then resolve the imports for each component.

In the parser test common-conditional you can compare the cabal file with its parsed expression and see this.
https://github.com/haskell/cabal/blob/6ddbcc40f18d2732878c0fe2c726196a7b7da890/Cabal-tests/tests/ParserTests/regressions/common-conditional.cabal
https://github.com/haskell/cabal/blob/6ddbcc40f18d2732878c0fe2c726196a7b7da890/Cabal-tests/tests/ParserTests/regressions/common-conditional.expr

The merging logic is done in Cabal-syntax/src/Distribution/Types/GenericPackageDescription.hs in mergeImports. The specific merging function for each field is named condLibrary', condSubLibraries', and so on.

Previously, merging was done in Cabal-syntax/src/Distribution/PackageDescription/Parsec.hs by having common stanza map as a state, adding entries and merging from it as we proceed in the parsing. This common stanza map is essentially the same as the one in this patch, we only defer the merging process.

The types TestSuite has a parsing internal representation TestSuiteStanza. Upon validating the TestSuiteStanza configuration, it can then be turned into a TestSuite. Previously it was validated and converted in one go during parsing. We have split the function to do the conversion and the validation seperately. This type now lives in Cabal-syntax/src/Distribution/Types/TestSuiteStanza.hs.
To retrieve a CondTree ConfVar [Dependency] TestSuite (merged, validated and converted representation), we would have to first merge the imports, and then convert it. It doesn't make sense to validate a incomplete TestSuiteStanza without its imports. We assume the conversion is valid because we do the validation during parsing, other code that want to manipulate TestSuiteStanza after parsing should validate again before getting the CondTree ConfVar [Dependency] TestSuite.
This kind of internal representation is also true for Benchmark and its counterpart BenchmarkStanza.

Let me know if I can elaborate on some details!

proof of concept

retain imports for foreignLib section

retain imports for executable section

retain imports for test-suite

retain imports for benchmark

attempt to insert used imports and not all imports

use newtype

change import data type; fix issues with import propagation & decoration

fix missing import names in testSuite

fix missing import names in benchmark

run fourmolu

run hlint

defer merging prototype

Currently we have achieve the following:
- Stop merging, the merging function "endo" is id
- CondTree are completly retained in bigger types such as libarry and
  executable

We will need to do the following:
- Allow merging in the accessor

We broke:
- A bunch of Read and Ord instances

Revert "defer merging prototype"

This reverts commit 21636db.

stop merging

retain common stanza in GenericPackageDescription

experiment: use WithImport in gpd

add function to merge imports

mergeLibrary

fix transitive imports retaintion and merging

run fourmolu

deferred merging for sublibraries

simplification; remove todos

retain foreignlib imports

retain executable imports

retain TestSuiteStanza imports

We isolated the type TestSuiteStanza and the logic to infer test type

retain BenchmarkStanza imports

We isolated the BenchmarkStanza type and the logic to infer benchmark
type.

clean up

remove benchmark import field

We now use the WithImports type to tag imports

introduce type alias in GenericPackageDescription

add GenericPackageDescription pattern to hide internal implementation

backward compatible accessors

fix compiler errors

add todo

fix compiler errors for integration test

run fourmolu

don't expose intemediary accessors

remove early experiment "import" fields in TestSuite and Benchmark

move TestSuiteStanza validation to its module

clean up {TestSuite,Benchmark}Stanza exports from FieldGrammar

restore old behaviour in code working with PackageDescription

we fixed GenericPackageDescription's constructor

patch {TestSuite,Benchmark}Stanza type when using accessor

remove accessors tests

fix accessor dropping common stanza map when non it is not required

This guarantees that unmerged internal representation is correct

run fourmolu

remove FieldGrammar export {TestSuite,Benchmark}Stanza

fix(GenericPackageDescription): "pattern" keyword deprecated after 914

test: remove new test files

test: add tests for gpd accessors

test: check equality on each field

test: improve import list

test: use @? operator

test: use a tuple to store all gpd fields

test: define ToExpr tuple instance manually

test: update expected

test: use Rec constructor to annotate field names

test: update expected

test: remove comment not in scope for this PR

test: use field equality in hackage tests

test: add new test files

test: fix test build

test: newtype to guide GPD ToExpr instance

test: add internal accessors test

test: update expected for internal accessors test

test: update expected

test: fix hackage tests ToExpr instance
@leana8959 leana8959 force-pushed the no-flatten-package-description branch from 6ddbcc4 to 413bb83 Compare November 26, 2025 05:27
@leana8959
Copy link
Author

I downloaded all latest packages on hackage using nh2/hackage-download and grepped to see how much downstream code is using GenericPackageDescription accessors. There are 60 occurrences.

You can also view it here using serokell's tool, though I don't know how exhaustive it is.

The regex pattern I used: \{[ \n]*(packageDescription|gpdScannedVersion|genPackageFlags|condLibrary|condSubLibraries|condForeignLibs|condExecutables|condTestSuites|condBenchmarksUnmerged)[ \n]*=. The = sign excludes usages of gpd fields as getter.

60 occurrences of accessor usages in latest hackage
./autopack-0.0.0.0/src/Autopack.hs:                { condLibrary = condLibrary gPackDescr <&> \condLib -> condLib
./LibClang-3.8.0/Setup.hs:              gpd' = gpd { condLibrary = Just (cdt { condTreeData = lib' }) }
./cabal2nix-2.20.1/src/Distribution/Nixpkgs/Haskell/PackageSourceSpec.hs:setCabalFileHash sha256 gpd = gpd { packageDescription = (packageDescription gpd) {
./cabal-doctest-1.0.12/src/Distribution/Extra/Doctest.hs:    { condTestSuites = map f (condTestSuites gpd)
./cabal-edit-0.1.0.0/exe/Main.hs:setBuildInfo binfo pkg@GenericPackageDescription {..} = pkg {condLibrary = fmap go condLibrary}
./cabal-edit-0.1.0.0/exe/Main.hs:setDeps deps pkg@GenericPackageDescription {..} = pkg {condLibrary = fmap go condLibrary}
./constraints-deriving-1.1.1.2/Setup.hs:addReexportsGPD gpd = gpd { condLibrary = addReexportsCT <$> condLibrary gpd }
./Cabal-3.16.0.0/src/Distribution/PackageDescription/Check.hs:        { packageDescription = pd
./Cabal-ide-backend-1.23.0.0/Distribution/Simple/Configure.hs:            pkg_descr0'' = pkg_descr0 { condTestSuites = flaggedTests
./Cabal-syntax-3.16.0.0/src/Distribution/PackageDescription/PrettyPrint.hs:    { packageDescription = pd
./cabal-install-3.16.0.0/tests/IntegrationTests2.hs:              { packageDescription = emptyPackageDescription{package = pkgid}
./cabal-install-3.16.0.0/src/Distribution/Client/IndexUtils.hs:            { packageDescription =
./base-compat-migrate-0.2.0.1/Generate.hs:    { condLibrary =
./amazonka-elasticsearch-2.0/gen/Amazonka/ElasticSearch/UpdatePackage.hs:updatePackage_packageDescription = Lens.lens (\UpdatePackage' {packageDescription} -> packageDescription) (\s@UpdatePackage' {} a -> s {packageDescription = a} :: UpdatePackage)
./amazonka-elasticsearch-2.0/gen/Amazonka/ElasticSearch/Types/PackageDetails.hs:packageDetails_packageDescription = Lens.lens (\PackageDetails' {packageDescription} -> packageDescription) (\s@PackageDetails' {} a -> s {packageDescription = a} :: PackageDetails)
./amazonka-elasticsearch-2.0/gen/Amazonka/ElasticSearch/CreatePackage.hs:      { packageDescription =
./amazonka-elasticsearch-2.0/gen/Amazonka/ElasticSearch/CreatePackage.hs:createPackage_packageDescription = Lens.lens (\CreatePackage' {packageDescription} -> packageDescription) (\s@CreatePackage' {} a -> s {packageDescription = a} :: CreatePackage)
./cabal-add-0.2/src/Distribution/Client/Add.hs:    Right GenericPackageDescription {packageDescription = PackageDescription {specVersion = CabalSpecV1_0}} ->
./cabal-auto-expose-0.1.0.1/src/Distribution/Simple/AutoExpose.hs:        gpd { condLibrary = fmap (updateCondTreeLib exposedLib) (condLibrary gpd)
./dist-upload-0.0.4/Main.hs:       {packageDescription = 
./stackage2nix-0.7.2/src/AllCabalHashes.hs:  { packageDescription = (packageDescription cabal)
./stackage2nix-0.7.2/src/Distribution/Nixpkgs/Haskell/FromStack.hs:removeTests gd = gd { condTestSuites = [] }
./amazonka-opensearch-2.0/gen/Amazonka/OpenSearch/UpdatePackage.hs:updatePackage_packageDescription = Lens.lens (\UpdatePackage' {packageDescription} -> packageDescription) (\s@UpdatePackage' {} a -> s {packageDescription = a} :: UpdatePackage)
./amazonka-opensearch-2.0/gen/Amazonka/OpenSearch/Types/PackageDetails.hs:packageDetails_packageDescription = Lens.lens (\PackageDetails' {packageDescription} -> packageDescription) (\s@PackageDetails' {} a -> s {packageDescription = a} :: PackageDetails)
./amazonka-opensearch-2.0/gen/Amazonka/OpenSearch/CreatePackage.hs:      { packageDescription =
./amazonka-opensearch-2.0/gen/Amazonka/OpenSearch/CreatePackage.hs:createPackage_packageDescription = Lens.lens (\CreatePackage' {packageDescription} -> packageDescription) (\s@CreatePackage' {} a -> s {packageDescription = a} :: CreatePackage)
./ide-backend-0.10.0.1/IdeSession/Cabal.hs:        { packageDescription = pkgDesc
./ide-backend-0.10.0.1/IdeSession/Cabal.hs:        { packageDescription = pkgDesc
./ide-backend-0.10.0.1/IdeSession/Cabal.hs:        { packageDescription = pkgDescFromName libName version
./hpack-0.39.0/test/Hpack/RenderSpec.hs:      renderPackage_ package {packageDescription = Just "foo\n\nbar\n"} `shouldBe` unlines [
./hpack-0.39.0/test/Hpack/RenderSpec.hs:      renderPackageWith defaultRenderSettings 16 [] [] package {packageDescription = Just "foo\n\nbar\n"} `shouldBe` unlines [
./hpack-convert-1.0.1/test/Hpack/RunSpec.hs:      renderPackage_ package {packageDescription = Just "foo\n\nbar\n"} `shouldBe` unlines [
./hpack-convert-1.0.1/test/Hpack/RunSpec.hs:      renderPackage defaultRenderSettings 16 [] [] package {packageDescription = Just "foo\n\nbar\n"} `shouldBe` unlines [
./hackport-0.9.1.0/src/Merge/Dependencies.hs:mkRetroPD pd = RetroPackageDescription { packageDescription = pd, buildDepends = exeAndLibDeps pd }
./hackport-0.9.1.0/cabal/Cabal-syntax/src/Distribution/PackageDescription/PrettyPrint.hs:    { packageDescription = pd
./hackport-0.9.1.0/cabal/cabal-install/src/Distribution/Client/IndexUtils.hs:            { packageDescription = emptyPackageDescription
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs:  [ ("packageDescription", \gpd -> gpd { packageDescription = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs:  , ("genPackageFlags",    \gpd -> gpd { genPackageFlags    = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs:  , ("condLibrary",        \gpd -> gpd { condLibrary        = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs:  , ("condSubLibraries",   \gpd -> gpd { condSubLibraries   = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs:  , ("condForeignLibs",    \gpd -> gpd { condForeignLibs    = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs:  , ("condExecutables",    \gpd -> gpd { condExecutables    = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs:  , ("condTestSuites",     \gpd -> gpd { condTestSuites     = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/custom-setup/CabalDoctestSetup.hs:    { condTestSuites = map f (condTestSuites gpd)
./haskell-gi-0.26.17/lib/Data/GI/CodeGen/CabalHooks.hs:      gpd' = gpd {condLibrary = Just cL'}
./haskell-gi-0.26.17/lib/Data/GI/CodeGen/CabalHooks.hs:      gpd' = gpd {condLibrary = Just cL'}
./jailbreak-cabal-1.4.1/Main.hs:stripVersionRestrictions pkg = pkg { condLibrary = fmap relaxLibraryTree (condLibrary pkg)
./gpah-0.0.2/src/Generics/GPAH/Utils.hs:getSrcDirs (GenericPackageDescription {condLibrary = l, condExecutables = es, condTestSuites = ts}) = 
./gpah-0.0.2/src/Generics/GPAH/Utils.hs:ghcOpts (GenericPackageDescription {condLibrary = l, condExecutables = es, condTestSuites = ts}) =
./gpah-0.0.2/src/Generics/GPAH/Utils.hs:cppOpts (GenericPackageDescription {condLibrary = l, condExecutables = es, condTestSuites = ts}) =
./g2-0.2.0.0/src/G2/Translation/Cabal/Cabal.hs:                                            { condLibrary = cl
./exherbo-cabal-0.2.1.1/src/Main.hs:                        descr' = descr { packageDescription = pkgDescr' }
./hackage-proxy-0.3.0.1/TweakCabal.hs:            { condLibrary = tweakCondTree <$> condLibrary gpd
./leksah-0.16.2.2/src/IDE/Pane/PackageEditor.hs:        { packageDescription = pd{
./leksah-0.16.2.2/src/IDE/ImportTool.hs:            gpd { condLibrary     = addDepToLib d (condLibrary gpd),
./leksah-0.16.2.2/src/IDE/Package.hs:    addModule LibExposedMod gpd@GenericPackageDescription{condLibrary = Just lib} =
./leksah-0.16.2.2/src/IDE/Package.hs:        gpd {condLibrary = Just (addModToLib moduleName lib)}
./leksah-0.16.2.2/src/IDE/Package.hs:    addModule LibOtherMod gpd@GenericPackageDescription{condLibrary = Just lib} =
./leksah-0.16.2.2/src/IDE/Package.hs:        gpd {condLibrary = Just (addModToBuildInfoLib moduleName lib)}
./voyeur-0.1.0.1/Setup.hs:onPkgDescr f gpd = gpd { packageDescription = f (packageDescription gpd) }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants