MRG: adjust warnings around tax abund and provide v5 upgrades to tax metagenome #3711
MRG: adjust warnings around tax abund and provide v5 upgrades to tax metagenome #3711
tax metagenome #3711Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…o v5_sig_manifest
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## latest #3711 +/- ##
==========================================
+ Coverage 88.17% 88.19% +0.01%
==========================================
Files 137 137
Lines 22491 22552 +61
Branches 2281 2298 +17
==========================================
+ Hits 19831 19889 +58
- Misses 2347 2348 +1
- Partials 313 315 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I like it! |
for more information, see https://pre-commit.ci
…to update_tax_abund
|
See new validation workflow here: https://github.com/sourmash-bio/2025-validate-sourmash-tax-formats. |
for more information, see https://pre-commit.ci
…nto update_tax_abund
for more information, see https://pre-commit.ci
tax metagenome tax metagenome
…nto update_tax_abund
for more information, see https://pre-commit.ci
|
Ready for review @bluegenes! This got really complicated from a code and testing perspective. My suggestion for detailed review efforts are to confirm that the various |
Major new features: * start writing v4->v5 migration docs (#3721) * adjust warnings around tax abund and provide v5 upgrades to `tax metagenome` (#3711) Minor new features: * try setting up --v4 and --v5 behavior differences for `sig check` (#3072) * update `sig manifest` default rebuilding behavior for v5. (#3074) * handle (ignore) empty taxids for `bioboxes` format (#3748) * improve summary_csv for lingroups (#3758) Cleanup and documentation updates: * use auto-generated database list (#3754) Developer updates: * CI: fix dependabot config syntax, and clippy beta lints (#3762) * CI: update to cibuildwheel 3.1.1 (#3738) * ci: group dependabot updates by language (#3749) * Remove docutils dep (#3769) * bump version to 4.9.4-dev (#3715) * disable WebAssembly builds, for now (#3724) Dependabot updates: * Build(ci): Bump actions/download-artifact from 4 to 5 (#3766) * Build(deps): Bump DeterminateSystems/nix-installer-action from 17 to 18 (#3727) * Build(deps): Bump DeterminateSystems/nix-installer-action from 18 to 19 (#3746) * Build(deps): Bump criterion from 0.6.0 to 0.7.0 (#3741) * Build(deps): Bump md5 from 0.7.0 to 0.8.0 (#3719) * Build(deps): Bump memmap2 from 0.9.5 to 0.9.7 (#3732) * Build(deps): Bump prefix-dev/setup-pixi from 0.8.10 to 0.8.11 (#3733) * Build(deps): Bump prefix-dev/setup-pixi from 0.8.11 to 0.8.14 (#3747) * Build(deps): Bump rand from 0.9.1 to 0.9.2 (#3743) * Build(deps): Bump serde_json from 1.0.140 to 1.0.141 (#3742) * [pre-commit.ci] pre-commit autoupdate (#3718) * [pre-commit.ci] pre-commit autoupdate (#3725) * [pre-commit.ci] pre-commit autoupdate (#3731) * [pre-commit.ci] pre-commit autoupdate (#3737) * [pre-commit.ci] pre-commit autoupdate (#3740) * [pre-commit.ci] pre-commit autoupdate (#3756)
Add new command line arguments to
tax metagenomearound abundances.--use-abundancesturns on the use of sketch abundances forkronaand maybe other output formats, while--ignore-abundances/--no-abundancesturns them off. To add to the complexity, by default, thehumanandkreportoutput formats do use abundances when present 😭 .Here's a chart to disambiguate 😆
This PR provides a warning if no abundances are available for
tax metagenometo use, or if--use-abundances/--ignore-abundancesis not explicitly set (for sourmash v4).For sourmash v5,
--use-abundancesbecomes default and it is an error to not have abundances. This can be overridden with--ignore-abundances.For sourmash v5, this PR also changes the default output format for
tax genomeandtax metagenometo-F humanfrom-F csv_summary(#2162).Uses
--v4/--v5command line switches and test infrastructure, ref #3076.As of this PR,
sourmash v4 - all tax metagenome commands warn if abundances are not available.
sourmash v5 - all tax metagenome commands require abund, unless --ignore-abund is set.
TODO:
--use-abund--use-abund(v4/v5)tax genomeFixes #2162
Fixes #3577
Fixes #3598
UPDATE: I created a validation workflow here: https://github.com/sourmash-bio/2025-validate-sourmash-tax-formats. It converts everything to the new taxburst JSON format and then does a tree comparison. The new krona format passes 🎉 !