Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming Python

Are Python Libraries Riddled With Security Holes? (techradar.com) 68

"Almost half of the packages in the official Python Package Index (PyPI) repository have at least one security issue," reports TechRadar, citing a new analysis by Finnish researchers, which even found five packages with more than a thousand issues each... The researchers used static analysis to uncover the security issues in the open source packages, which they reason end up tainting software that use them. In total the research scanned through 197,000 packages and found more than 749,000 security issues in all... Explaining their methodology the researchers note that despite the inherent limitations of static analysis, they still found at least one security issue in about 46% of the packages in the repository. The paper reveals that of the issues identified, the maximum (442,373) are of low severity, while 227,426 are moderate severity issues. However, 11% of the flagged PyPI packages have 80,065 high severity issues.
The Register supplies some context: Other surveys of this sort have come to similar conclusions about software package ecosystems. Last September, a group of IEEE researchers analyzed 6,673 actively used Node.js apps and found about 68 per cent depended on at least one vulnerable package... The situation is similar with package registries like Maven (for Java), NuGet (for .NET), RubyGems (for Ruby), CPAN (for Perl), and CRAN (for R). In a phone interview, Ee W. Durbin III, director of infrastructure at the Python Software Foundation, told The Register, "Things like this tend not to be very surprising. One of the most overlooked or misunderstood parts of PyPI as a service is that it's intended to be freely accessible, freely available, and freely usable. Because of that we don't make any guarantees about the things that are available there..."

Durbin welcomed the work of the Finnish researchers because it makes people more aware of issues that are common among open package management systems and because it benefits the overall health of the Python community. "It's not something we ignore but it's also not something we historically have had the resources to take on," said Durbin. That may be less of an issue going forward. According to Durbin, there's been significantly more interest over the past year in supply chain security and what companies can do to improve the situation. For the Python community, that's translated into an effort to create a package vulnerability reporting API and the Python Advisory Database, a community-run repository of PyPI security advisories that's linked to the Google-spearheaded Open Vulnerability Database.

This discussion has been archived. No new comments can be posted.

Are Python Libraries Riddled With Security Holes?

Comments Filter:
  • No (Score:4, Funny)

    by Anonymous Coward on Sunday August 01, 2021 @05:16PM (#61645103)

    Managed / sandboxed languages never have security holes. Only C.

  • by gweihir ( 88907 ) on Sunday August 01, 2021 @05:27PM (#61645121)

    In general, most code is bad and most libraries are bad too. The problem is one of selection: If anybody can submit to a repo and there is no review, things will generally be pretty bad. The old saying "you get what you pay for: is true for FOSS as well, even of the mechanisms are a bit different. For a counter-example, look at the Linux kernel (minus drivers) and find that FOSS can produce really high quality. But there is reviev, testing, and other quality gates and there is people getting paid to do this work.

    • by sg_oneill ( 159032 ) on Sunday August 01, 2021 @06:59PM (#61645347)

      I kind of attribute a lot of this to pythons recent surge in popularity over the last 5 years.

      Traditionally pythons been a slow careful evolver. The libraries we use had been developed over long periods of time with a lot of work put into keeping things clean and well architected.

      But it would seem of late I'm seeing a LOT of the sorts of fad packages you'd see in ruby and js turning up in python, with barf.io type names and websites clearly built by marketing companies and a lot of practices not normally associated wih the "python way". I assume by the same clowns that built a tonne of this crap in JS and Ruby and then abandoned it

      I'd say this to any new python person , before startng that wizz bang package , look to see if an existing one is there, and just work on that instead. Pythons actually an old language (Of the Tiobe top ten, I'd wager only C and C++ are older, and in the case of C++ not by much) and theres an already existing ecosystem of battletested and proven packages that cover MOST needs. Don't poison that ecosystem. And for god sake, if you DO create a new package, dont abandon it.

      • by Z00L00K ( 682162 )

        I wouldn't say careful since there have been backwards compatibility breaks.

        For me Python isn't an old language, it's in the last 2 decades that it had became a fad.

        • Well fine, but I've been using python since the late 1990s, when Javascript was still shitty browser scripting, and Java was a tempramental thing taught to college students to create shockingly ugly desktop apps. Even then the older python seemed a lot more stable and thoughtfully designed.

          • Re: (Score:2, Insightful)

            by sfcat ( 872532 )

            thoughtfully designed

            There is absolutely nothing about Python that is thoughtfully designed. It is a language intended to replace Perl and is targeted at system operators. And it very much looks like such a language. That's why a google server for "Python sucks" yields 24,000,000 results. Here is a good summary of what is wrong with Python. [github.com]

            • by maevius ( 518697 )

              There is absolutely nothing about Python that is thoughtfully designed.

              I'm going to assume that you are going for shock value/trolling here

              It is a language intended to replace Perl and is targeted at system operators. And it very much looks like such a language.

              Condescending much? You haven't used python for any amount of time, have you?

              That's why a google server for "Python sucks" yields 24,000,000 results.

              Your argument is a google search for "python sucks"? Java, C, ruby, go have more results. Go even has its own website: http://www.golang.sucks/ [www.golang.sucks]

              Here is a good summary of what is wrong with Python. [github.com]

              That's not a good summary, that's an ok summary. With the exception of list comprehension which becomes unreadable very quickly, most points are rather minor, or things that you learn to live with - as is the case with all

            • You can mash almost any two words and get a million or so results. Seriously, it means *nothing*.

              And no, it wasn't intended to replace perl which came out roughly the same time (Perl came out around the time implementation of python was in process ~87/88). Considering i wasnt really till the early 90s that Perl became well known, nothing about your theory matches. Python was designed as a system programming language for AMOEBA, a research OS by the dude who designed Minix, and its very clear where its infl

            • by gweihir ( 88907 )

              thoughtfully designed

              There is absolutely nothing about Python that is thoughtfully designed.

              Sorry, but the only way you can say that is if you really, really do not understand the concepts involved. Python is actually designed with a lot of insight and has quite a few characteristics that are very advanced. It is very much not a language for beginners though.

          • > python seemed a lot more stable and thoughtfully designed.

            A common idiom is dual operations such as push/pop, begin/end, open/close, etc. Using whitespace makes it easier for people to see the intended effect and to spot bugs such as this:

            // Amateur hour -- no indentation
            glBegin();
            glVertex();
            glVertex();
            glVertex();

            // Professional code -- good use of indentation
            glBegin();
            . . . glVertex();
            . . . glVertex();
            . . . glVertex();
            glEnd();

            Having the compil

            • by gweihir ( 88907 )

              Python was designed by an fucking amateur completely clueless about the purpose of indentation: Code is written and read for people not compilers.

              Nope. Python was designed by people that were _tired_ of said "fucking clueless amateurs". Of which you are clearly one. Using indention as part of the syntax takes a bit of getting used to, but not much. Anybody that cannot deal with it has has a problem and has no business writing anything more complicated than "Hello, world".

              • /whoosh

                Found the fucking clueless amateur who thinks indentation is for compilers instead of people.

                You are probably the same fucking idiot that thinks a=b+c means something different than a = b + c.

                Wait till you discover multiple column alignment with whitespace! /s

                • ou are probably the same fucking idiot that thinks a=b+c means something different than a = b + c.

                  Sadly, in MATLAB, there are several instances where those two in fact are different. Then again, MATLAB also thinks it's ok to define a vector either as [a,b,c] or [a, b, c] OR [a b c] .

                  As you might guess, I don't like either MATLAB or python.

                • by gweihir ( 88907 )

                  Your "fucking clueless amateur" just happens to be a PhD level CS type with an engineering PhD and lots of experience using Python and many other languages. There is absolutely no reason why the compiler should not be able to look at indentation if the language is so designed. None at all.

                  But your complete cluelessness nicely shows: Indentation and whitespace are two fundamentally different things. But that probably flies right over your head. You are one of these people that desperately tries to identify a

                  • /whoosh

                    Someday you'll learn to write code for people.

                  • > PhD level CS type

                    All that knowledge and yet you STILL can't figure out that MULTIPLE indentations can exist in the SAME scope block . /s

                    One day you might actually comprehend the fact that some people write code in "columns" -- such as DEBUG only / Logging / Tracing / etc. code.

                    . . . foo();
                    #if DEBUG
                    Log( "Foo" );
                    #endif

                    . . . bar();
                    #if DEBUG
                    Log( "bar" );
                    #endif

                    When everything is at the same indent level the code becomes harder to read.

                    . . . foo();
                    . . . #if DEBUG
                    . . . . Log( "Foo"

      • by glum64 ( 8102266 )

        if you DO create a new package, dont abandon it.

        The amount of abandoned code is enormous. It is a problem. Before deciding on using a package, I always check activity and the amount of contributors. Less than five for less than three recent years is usually a warning sign. At the same time, I do not blame people for losing interest or simply dying.

      • Looking through the tiobe top 10 (and going by the "first appeared" dates on wikipedia).

        C: 1972 (no month given)
        Java: May 1995
        Python: Feburary 1991
        C++ 1985 (no month given)
        C# 2000 (no month given)
        VB 1991 (no month given
        Javascript: December 1995
        PHP: 1995 (no month given)
        Assembly language: 1949 (no month given)
        SQL: 1974 (no month given)

        So we have 4 entries older than python, 1 from the same year as python (not clear whether it was earlier or later in the year) and 4 entries newer than python.

        Though it's argua

      • by DarkOx ( 621550 )

        Both PyPI and the public gem repos on the Ruby side a literally piles of hot garbage. There have been supply chain attacks on both and the amount of trash is so large its hard to find wheat among the chaff outside cases where you are doing something so bog standard there is essentially one accepted tool/library/framework.

        However I don't think the situation is dire as some of these articles make it appear. You can point a SAST tool at almost any code base that was not developed using one along the way most o

      • by gweihir ( 88907 )

        Yes, pretty much. Becoming popular is always a problem for quality. Too many people with low or no skills will rush in and many of them will not know that they have low skills. Also, actually maintaining a package means being in it for the long run. People that rush to the next hype rarely are.

      • by epine ( 68316 )

        I'd wager only C and C++ are older, and in the case of C++ not by much ...

        The first Python release wasn't until 1991.

        In 1985, C++ had its first official reference manual, and later that same year, it's first commercial compiler.

        Maybe that's your definition of "not very much".

        But I was there, and more happened in the world of computing during those six years than any six year period that followed.

        Although the magazine replied to the reader's proposal with "Please say you're kidding about the bi-weekly schedu

    • But there is reviev

      But there is review

      [Submit patch for post #61645121]

  • Well, duh. (Score:5, Insightful)

    by Todd Knarr ( 15451 ) on Sunday August 01, 2021 @05:28PM (#61645125) Homepage

    This is why you don't trust random packages from repositories. Those repositories are public, anyone can publish packages to them and make them available. That doesn't mean the package is any good. You should be evaluating the packages you use to see what their history and security situation is, and sticking to well-known, widely-used packages with a solid history of not having bugs in them. And when it comes to tiny, single-purpose packages, ask yourself whether you really need a package for that or whether you could write the code yourself. Then you can make sure it's bug-free in your use cases and not have to depend on a possibly-unreliable source for it.

    Be careless, pay the price.

  • Microsoft? (Score:2, Interesting)

    by Haydn ( 592455 )
    Doesn't a higher percentage of Microsoft's commercial software contain serious bugs?

    Earlier today there was an article with a link to the top 12 most recently frequently exploited vulnerabilities from CISA: https://us-cert.cisa.gov/ncas/... [cisa.gov]
    And 11 of the 12 were Microsoft specific.
    • Re:Microsoft? (Score:4, Informative)

      by BladeMelbourne ( 518866 ) on Sunday August 01, 2021 @06:27PM (#61645263)

      "most recently frequently exploited" is highly dependent upon how many machines run said software, which when it comes to Microsoft software - is more than any python library.

      Microsoft have many millions of lines-of-code in their operating systems, web browsers, frameworks and vast array of applications, so it's not exactly unexpected. They also fix stuff quickly and regularly... so I'm not going to grab my pitch fork and flame torch.

  • Yes, just like typical libraries for any language. Next!

  • by Opportunist ( 166417 ) on Sunday August 01, 2021 @05:44PM (#61645179)

    Some are. But let's take a look at what kind of software is written in Python.

    I don't know what you use Python for. My main application is to run scripts that I quickly slapped together to get some shit done. It's not something that will ever touch a production server.

    For this application, security isn't that big an issue.

    If you want to run production code... well, you better audit that library.

    • there are many small web services written as python utilities. Many of these wind up far more exposed than their authors intended, particularly if they provide a useful configuration. It's particularly popular for configuration tools.

    • by pjt33 ( 739471 )

      Are you running those scripts in a VM or sandbox? Because the main security issue around Python packages seems to be scanning the computer for credentials which they can upload, so you might be exposing your production servers without ever deploying to them.

      • Since Python is mostly a tool I employ to automate security tests, yes. The worst thing that could happen in general is that it exposes a set of credentials that belong to a machine that is supposed to be penetrated, which has no relevant real data and will be purged within a week or two.

  • ...popular platform is bound to be targeted by those wishing to steal from others.

  • J/K. It's an even worse piece of shit wrt third party code and probably even core.
  • by ceoyoyo ( 59147 )

    These studies are always just lazy ways to bump your paper count and maybe make a headline or two.

    If you're hooking up your Python installation with joes_test_module installed, or R (ffs) to the Internet, then you're doing it wrong. Otherwise, who cares?

    • There is also a bait and switch between "security holes" and then it is really "security issues."

      • Well also that "security issues" doesn't generally mean a "security vulnerability", if I'm writing a python script to automate something on my system and it uses a library that has a security issue that makes it susceptible to a buffer overflow that doesn't suddenly mean my machine is vulnerable.
        • Exactly. And a "security issue" can even mean, you didn't use 2FA. Or you did you 2FA, but in most cases it places both factors on your phone. One person might call one way an "issue," the next might call the proposed solution an "issue."

          If I put on my sysadmin hat, it is instantly obvious that everything is an "issue," event things you think you did right are security issues, because what if you messed them up? You can test, but what if you forgot a condition, or you coded the test wrong? You might have, h

  • by dabears85 ( 6999352 ) on Sunday August 01, 2021 @08:14PM (#61645463)

    Where I work, if I write a unit test that involves a username/password, the scanners flag it as a critical vulnerability. Then I have to fill out a bunch of paperwork and attend many meetings to prove the hardcoded password is in a unit test, where all dependencies are mocked out.

    Our Java scanners flag reading from a file a medium vulnerability, even when the input is sanitized. Writing to a file is a high vulnerability because you might be writing sensitive information, such as passwords or SSNs.

    So if this study uses the same scanners as my company, yeah, PyPi will be riddled with a crazy amount of vulnerabilities!

    • That's actually a very good comment. They only static analysis tool I've used for quite a bit is SonarQube. In my case for Java code. It has so many rules some violations are really minor things and, IMO, they're not always classified at the right level of "seriousness" (sorry, I forgot the right word for this). So, if you're going to use static analysis tools review and limit the list of rules used .
    • by Corbets ( 169101 )

      Not quite sure what your point is. Hard coding a password, even for unit testing, IS a security vulnerability, albeit using it for unit testing is a lesser exposure than hard coding it in permanently

  • That will be safer that way! /s
  • Who funded this analysis?
  • by nickovs ( 115935 ) on Sunday August 01, 2021 @09:04PM (#61645533)
    The paper seems to classify everything that Bandit (their static analyser of choice) flags as being a security issue. A great many of these issues are fine examples of poor practice that can come back to bite you, but are certainly not all security issues per se. For instance of the 442,373 low severity issues found, 72,686 are instances of people using 'try-except' clauses where it catches all exceptions and the except branch only contains the 'pass' statement. Doing that indiscriminately is a very bad idea, but there are plenty of cases where it's perfectly safe to do. Similarly they count merely importing the standard, built-in pickle and subprocess modules (23,081 instances and 54,913 instances, respectively) as security issues.

    Public repositories with open submission policies are likely to be chock full of terrible submissions; if you set no bar at all then the standard is bound to be low. It would appear that this holds just as true for the arXiv pre-print repo of academic papers as it does for PyPI.

    • An issue closely related to this is something may or may not be a security problem depending on how something is used. I have a library that does parameter estimation and is designed to run on large machines. It uses pickle for checkpointing so that calculations can be resumed. You can't access the program remotely because it has no networking in it.

      In order for someone to exploit the pickle they would have to get access to your account, manipulate your files and then get you to run it. Who cares about pick

  • No? Then I'm pretty sure your software is riddled with security holes.

    Until some formal process is in place for rigorous, we can't reasonably expect a piece of software to be more than a toy. And a dangerous toy if placed in a critical system.

    It's not a problem unique to Python. But the popularity of user submitted package repositories makes solving the problems of software quality much more complicated.

    I'd suggest Python, Perl, Ruby, Go, Rust and others take testing seriously. At least offer developers some useful resources to devise tests, such as OWASP top 10 [owasp.org]. A project scoring system that is based on code coverage would help as well. One that allows individual unit tests to be categorized and checks off to bump up the score would encourage the right sorts of tests. And because it's a public project, cheating will be difficult as the test and logs are visible to all.

    • I disagree with this. There is a lot of software where security is just not an issue because there is no network connectivity to it. A lot of scientific software that runs on clusters falls into this category. If someone can SSH into your account with permission to read and edit your files or as you to run software then who cares at that point about if the molecular dynamics library has a potential security problem. If someone can't do that then there is no problem at all.

      I work on high performance scientif

  • I think languages without mature eco systems that have multiple ways of doing the same thing are more at risk in being hit by bad actors.

    If there is only 1 library to do 'X' or only one way to do 'Y', then large numbers of apps will hit those bad links, vs. you have languages where there are 15 ways to do 'X' or 'Y', apps not only, statistically, won't be as likely to hit bad code but bad code is more likely to show aberrant behavior when put side-by-side the other 14 ways to do X and Y.

    Python's straight-j

A sine curve goes off to infinity, or at least the end of the blackboard. -- Prof. Steiner

Working...