Skip to content

MRG: handle (ignore) empty taxids for bioboxes format#3748

Merged
bluegenes merged 9 commits intolatestfrom
bioboxes-taxid
Aug 5, 2025
Merged

MRG: handle (ignore) empty taxids for bioboxes format#3748
bluegenes merged 9 commits intolatestfrom
bioboxes-taxid

Conversation

@bluegenes
Copy link
Copy Markdown
Contributor

@bluegenes bluegenes commented Jul 29, 2025

  • bioboxes format relies on having a taxid, so it fails with empty/missing taxids. Some NCBI lineages have missing/empty internal tax ranks, e.g. order, genus, etc. Here we ignore (do not print) results from empty taxids, which allows us to bypass these problematic ranks and still get output from the classifiable ranks.

example taxonomy with missing ranks (lineage from an older CAMI db):

GCF_003480425.1,2292892,2|1239|||||2292892|,Bacteria,Firmicutes,unclassified Firmicutes class,unclassified Firmicutes order,unclassified Firmicutes family,unclassified Firmicutes genus,Firmicutes bacterium AM31-12AC,unclassified Firmicutes bacterium AM31-12AC subspecies/strain

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 29, 2025

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.18%. Comparing base (7946469) to head (bc21aaf).
⚠️ Report is 49 commits behind head on latest.

Files with missing lines Patch % Lines
src/sourmash/tax/tax_utils.py 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           latest    #3748      +/-   ##
==========================================
- Coverage   88.19%   88.18%   -0.01%     
==========================================
  Files         137      137              
  Lines       22553    22555       +2     
  Branches     2298     2299       +1     
==========================================
  Hits        19890    19890              
- Misses       2348     2349       +1     
- Partials      315      316       +1     
Flag Coverage Δ
hypothesis-py 25.26% <0.00%> (-0.01%) ⬇️
python 92.65% <0.00%> (-0.02%) ⬇️
rust 81.62% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bluegenes bluegenes changed the title WIP: handle (ignore) empty taxids for bioboxes format MRG: handle (ignore) empty taxids for bioboxes format Aug 4, 2025
@bluegenes bluegenes changed the title MRG: handle (ignore) empty taxids for bioboxes format MRG: handle (ignore) empty taxids for bioboxes format Aug 4, 2025
@bluegenes
Copy link
Copy Markdown
Contributor Author

@ctb ready for review

@bluegenes bluegenes enabled auto-merge (squash) August 5, 2025 17:11
@bluegenes bluegenes disabled auto-merge August 5, 2025 17:11
@bluegenes bluegenes merged commit 9262d51 into latest Aug 5, 2025
40 of 43 checks passed
@bluegenes bluegenes deleted the bioboxes-taxid branch August 5, 2025 17:11
ctb added a commit that referenced this pull request Aug 7, 2025
Major new features:

* start writing v4->v5 migration docs (#3721)
* adjust warnings around tax abund and provide v5 upgrades to `tax
metagenome` (#3711)

Minor new features:

* try setting up --v4 and --v5 behavior differences for `sig check`
(#3072)
* update `sig manifest` default rebuilding behavior for v5. (#3074)
* handle (ignore) empty taxids for `bioboxes` format (#3748)
* improve summary_csv for lingroups (#3758)

Cleanup and documentation updates:

* use auto-generated database list (#3754)

Developer updates:

* CI: fix dependabot config syntax, and clippy beta lints (#3762)
* CI: update to cibuildwheel 3.1.1 (#3738)
* ci: group dependabot updates by language (#3749)
* Remove docutils dep (#3769)
* bump version to 4.9.4-dev (#3715)
* disable WebAssembly builds, for now (#3724)

Dependabot updates:

* Build(ci): Bump actions/download-artifact from 4 to 5 (#3766)
* Build(deps): Bump DeterminateSystems/nix-installer-action from 17 to
18 (#3727)
* Build(deps): Bump DeterminateSystems/nix-installer-action from 18 to
19 (#3746)
* Build(deps): Bump criterion from 0.6.0 to 0.7.0 (#3741)
* Build(deps): Bump md5 from 0.7.0 to 0.8.0 (#3719)
* Build(deps): Bump memmap2 from 0.9.5 to 0.9.7 (#3732)
* Build(deps): Bump prefix-dev/setup-pixi from 0.8.10 to 0.8.11 (#3733)
* Build(deps): Bump prefix-dev/setup-pixi from 0.8.11 to 0.8.14 (#3747)
* Build(deps): Bump rand from 0.9.1 to 0.9.2 (#3743)
* Build(deps): Bump serde_json from 1.0.140 to 1.0.141 (#3742)
* [pre-commit.ci] pre-commit autoupdate (#3718)
* [pre-commit.ci] pre-commit autoupdate (#3725)
* [pre-commit.ci] pre-commit autoupdate (#3731)
* [pre-commit.ci] pre-commit autoupdate (#3737)
* [pre-commit.ci] pre-commit autoupdate (#3740)
* [pre-commit.ci] pre-commit autoupdate (#3756)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants