


How Python is Fighting Open Source's 'Phantom' Dependencies Problem (blogspot.com) 15
Since 2023 the Python Software Foundation has had a Security Developer-in-Residence (sponsored by the Open Source Security Foundation's vulnerability-finding "Alpha-Omega" project). And he's just published a new 11-page white paper about open source's "phantom dependencies" problem — suggesting a way to solve it.
"Phantom" dependencies aren't tracked with packaging metadata, manifests, or lock files, which makes them "not discoverable" by tools like vulnerability scanners or compliance and policy tools. So Python security developer-in-residence Seth Larson authored a recently-accepted Python Enhancement Proposal offering an easy way for packages to provide metadata through Software Bill-of-Materials (SBOMs). From the whitepaper: Python Enhancement Proposal 770 is backwards compatible and can be enabled by default by tools, meaning most projects won't need to manually opt in to begin generating valid PEP 770 SBOM metadata. Python is not the only software package ecosystem affected by the "Phantom Dependency" problem. The approach using SBOMs for metadata can be remixed and adopted by other packaging ecosystems looking to record ecosystem-agnostic software metadata...
Within Endor Labs' [2023 dependencies] report, Python is named as one of the most affected packaging ecosystems by the "Phantom Dependency" problem. There are multiple reasons that Python is particularly affected:
- There are many methods for interfacing Python with non-Python software, such as through the C-API or FFI. Python can "wrap" and expose an easy-to-use Python API for software written in other languages like C, C++, Rust, Fortran, Web Assembly, and more.
- Python is the premier language for scientific computing and artificial intelligence, meaning many high-performance libraries written in system languages need to be accessed from Python code.
- Finally, Python packages have a distribution type called a "wheel", which is essentially a zip file that is "installed" by being unzipped into a directory, meaning there is no compilation step allowed during installation. This is great for being able to inspect a package before installation, but it means that all compiled languages need to be pre-compiled into binaries before installation...
When designing a new package metadata standard, one of the top concerns is reducing the amount of effort required from the mostly volunteer maintainers of packaging tools and the thousands of projects being published to the Python Package Index... By defining PEP 770 SBOM metadata as using a directory of files, rather than a new metadata field, we were able to side-step all the implementation pain...
We'll be working to submit issues on popular open source SBOM and vulnerability scanning tools, and gradually, Phantom Dependencies will become less of an issue for the Python package ecosystem.
The white paper "details the approach, challenges, and insights into the creation and acceptance of PEP 770 and adopting Software Bill-of-Materials (SBOMs) to improve the measurability of Python packages," explains an announcement from the Python Software Foundation. And the white paper ends with a helpful note.
"Having spoken to other open source packaging ecosystem maintainers, we have come to learn that other ecosystems have similar issues with Phantom Dependencies. We welcome other packaging ecosystems to adopt Python's approach with PEP 770 and are willing to provide guidance on the implementation."
"Phantom" dependencies aren't tracked with packaging metadata, manifests, or lock files, which makes them "not discoverable" by tools like vulnerability scanners or compliance and policy tools. So Python security developer-in-residence Seth Larson authored a recently-accepted Python Enhancement Proposal offering an easy way for packages to provide metadata through Software Bill-of-Materials (SBOMs). From the whitepaper: Python Enhancement Proposal 770 is backwards compatible and can be enabled by default by tools, meaning most projects won't need to manually opt in to begin generating valid PEP 770 SBOM metadata. Python is not the only software package ecosystem affected by the "Phantom Dependency" problem. The approach using SBOMs for metadata can be remixed and adopted by other packaging ecosystems looking to record ecosystem-agnostic software metadata...
Within Endor Labs' [2023 dependencies] report, Python is named as one of the most affected packaging ecosystems by the "Phantom Dependency" problem. There are multiple reasons that Python is particularly affected:
- There are many methods for interfacing Python with non-Python software, such as through the C-API or FFI. Python can "wrap" and expose an easy-to-use Python API for software written in other languages like C, C++, Rust, Fortran, Web Assembly, and more.
- Python is the premier language for scientific computing and artificial intelligence, meaning many high-performance libraries written in system languages need to be accessed from Python code.
- Finally, Python packages have a distribution type called a "wheel", which is essentially a zip file that is "installed" by being unzipped into a directory, meaning there is no compilation step allowed during installation. This is great for being able to inspect a package before installation, but it means that all compiled languages need to be pre-compiled into binaries before installation...
When designing a new package metadata standard, one of the top concerns is reducing the amount of effort required from the mostly volunteer maintainers of packaging tools and the thousands of projects being published to the Python Package Index... By defining PEP 770 SBOM metadata as using a directory of files, rather than a new metadata field, we were able to side-step all the implementation pain...
We'll be working to submit issues on popular open source SBOM and vulnerability scanning tools, and gradually, Phantom Dependencies will become less of an issue for the Python package ecosystem.
The white paper "details the approach, challenges, and insights into the creation and acceptance of PEP 770 and adopting Software Bill-of-Materials (SBOMs) to improve the measurability of Python packages," explains an announcement from the Python Software Foundation. And the white paper ends with a helpful note.
"Having spoken to other open source packaging ecosystem maintainers, we have come to learn that other ecosystems have similar issues with Phantom Dependencies. We welcome other packaging ecosystems to adopt Python's approach with PEP 770 and are willing to provide guidance on the implementation."
Re:The Language for Losers (Score:4, Funny)
The only dependency manager that's any good is NPM!
The One True Build system is Webpack!!! Nothing could go wrong!!
Re: The Language for Losers (Score:2)
Re: (Score:2)
Fear not. All those famous Python packages Pythonistas depend on are written in real languages like C, C++, Rust etc. You know for doing all the real work of maths, statistics, AI, accessing GPU and other hardware, etc, etc.
Re: (Score:2)
To be fair, it does have some nice features. ... else control flow mechanism.
One of my favourites is the while/for
Trying to wrap my head around this (Score:1)
It sounds and looks like this just means "files at a specific path that weren't included in the package."
Why is this not a bug?
Re: Trying to wrap my head around this (Score:2)
AI and scientific (Score:2)
FTA:
" Python is the premier language for scientific computing and artificial intelligence, meaning many high-performance libraries written in system languages need to be accessed from Python code."
Which has for a long time made me wonder why the entire program isn't written in system languages such as C++. Given how critical performance is to these paradigms and the amount spent on GPU hardware for acceleration you'd think ditching a slow language such a Python would be a no brainer even if maybe as a wrapp
Re: (Score:3)
Data scientists and machine learning specialists tend to know Python and not C++, so that's why all these libraries are written for use with Python.
Re: (Score:2)
Frankly if they can learn the maths around data science and AI then learning a new progamming language such as C++ shouldn't be much of a problem for them (thought the C++ steering committee seem to be doing their best to turn it into a dogs dinner write only language).
Re: (Score:3)
Re: (Score:2)
I guess if they or their organisation want to spend money on extra compute power to offset using python instead of spending a few weeks getting up to speed on C++ thats up to them. Sure, C++ isn't trivial but its hardly Klingon either and they wouldn't need to learn all the obscure dusty parts of its cupboard to use it in this arena.
oh for fuck's sack (Score:1)
There is no reason, none whatsoever for Python to impose this crap on its package managers. Yes the government and big banks want SBOMs from their software vendors, but that sounds like a "them" problem not a "Python" problem. They can use Python or not, makes no difference to the Python maintainers. Just rolling over and doing what the feds want on a hobby project. Awesome.
The bigger problem is that SBOMs in no way shape or form actually provide for security which is the goal. The security problems are fro