Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Python Programming Spam

How Spam Flooded the Official Python Software Package Repository PyPI (bleepingcomputer.com) 41

"The official Python software package repository, PyPI, is getting flooded with spam packages..." Bleeping Computer reported Thursday.

"Each of these packages is posted by a unique pseudonymous maintainer account, making it challenging for PyPI to remove the packages and spam accounts all at once..." PyPI is being flooded with spam packages named after popular movies in a style commonly associated with torrent or "warez" sites that provide pirated downloads: watch-(movie-name)-2021-full-online-movie-free-hd-... Although some of these packages are a few weeks old, BleepingComputer observed that spammers are continuing to add newer packages to PyPI... The web page for these bogus packages contain spam keywords and links to movie streaming sites, albeit of questionable legitimacy and legality...

February of this year, PyPI had been flooded with bogus "Discord", "Google", and "Roblox" keygens in a massive spam attack, as reported by ZDNet. At the time, Ewa Jodlowska, Executive Director of the Python Software Foundation had told ZDNet that the PyPI admins were working on addressing the spam attack, however, by the nature of pypi.org, anyone could publish to the repository, and such occurrences were common.

Other than containing spam keywords and links to quasi-video streaming sites, these packages contain files with functional code and author information lifted from legitimate PyPI packages... As previously reported by BleepingComputer, malicious actors have combined code from legitimate packages with otherwise bogus or malicious packages to mask their footsteps, and make the detection of these packages a tad more challenging...

In recent months, the attacks on open-source ecosystems like npm, RubyGems, and PyPI have escalated. Threat actors have been caught flooding software repositories with malware, malicious dependency confusion copycats, or simply vigilante packages to spread their message. As such, securing these repositories has turned into a whack-a-mole race between threat actors and repository maintainers.

This discussion has been archived. No new comments can be posted.

How Spam Flooded the Official Python Software Package Repository PyPI

Comments Filter:
  • The web page for these bogus packages contain spam keywords and links to movie streaming sites, albeit of questionable legitimacy and legality...

    Oh legality has always been questionable.

    • What was that FP supposed to mean? Care to clarify?

      Just came across some recommendations to use PyPI for a minor project, but I hadn't been able to figure out whether it was worth the effort. Therefore, I feel like I need to start with a request for clarification. Yeah, I've done a number of noddies in Python, but can't decide if I should get more serious with the language. As regards PyPI in particular, I resent any name that is so confusing because of case mixing. (I had to use the search trick to figure

    • Expect more major language package repositories to come under attack.

      #1 reason to use as few as possible open source packages in any large software system.

  • Could this be addressed with an authenticated, moderated, classified directory of `recommended packages' that are known to be maintained well?
    • Only if they are pre-packaged. "pip install" tends to pull in leading edge versions of other packages from pypi.org, and resolving their dependencies can be a nightmare.

    • PyPi isn't the only game out there. There's other distributions that use a different package repository such as Anaconda.

    • The Open Source Community is going to start getting more and more attacks from malware groups once they realize that 95% of developers don't look past 'pip install X' before deploying code and pip instal X includes 50 other nested packages in their open source projects that all have reviewers who are donating free time to the side project.

      Programming languages have 3 key elements that they need to build on, easy syntax, a strong standard library, and a strong community library. Python without trusted commun

  • by presidenteloco ( 659168 ) on Sunday May 23, 2021 @12:00AM (#61412028)
    Is there an open identity and reputation service good enough to identify people who would publish?

    If so, it's probably time we start using that kind of thing, even in the open-source world. Unfortunately we live in a world with a statistically significant number of complete assholes, and some of them have learned how to code.

    If there isn't an open and transparent and useful such identity service yet, maybe working on that should be a top priority.

    Real identity, and rating (of the identity and/or the submitted packages) would go a long way to preventing this kind of problem.
    • It seems like PGP would work for this. If you have a private encryption key to authenticate a public one, that's really all you need to verify your identity, because you can attach proof to your official public communications. It's easy to show you, as a real person, have control of it, so no need for the hassle of verifying who you are besides your public persona. The thing we'd need is an official repository (or a small set of them), instead of the various scattered databases maintained by individual o

      • PGP has never worked well for large projects. It's proven useful for tracing an individually published tarball or software package from a particular vendor, but for large repositories it's proven too troublesome to insist on sisngature chains. It's theoretically possible, but has never been easy, to pre-bundle sets of python modules into individual packages.

        • I probably should have just said "public-private encryption technology" rather than a specific implementation like PGP, because that's just a specific implementation. Keep in mind that certificates for the web work by using this same underlying technology, and naturally that scales *very* well. What's needed is a central repository for authentication, which I suspect is where PGP was lacking, being too distributed with too many scattered key databases.

          • rather than a specific implementation like PGP, because that's just a specific implementation.

            Don't mind me. I work for the Department of Redundancy Dept.

          • The same issue applies. to any non-commercial-signature based technology. The spammers can generate fake accounts with individual keys _very_ quickly. There are more locked in signature technologies, such as Trusted Computing, but Trusted Computing is not about trust. It's about DRM, and has been used only very reluctantly in the free software world.

            • by Entrope ( 68843 )

              That's why something like the PGP Web of Trust could be helpful. Require a third party to attest who is the owner of a given private key, based on in-person claim of the key and confirmation of legal identity. (Yes, this is hard to bootstrap in the middle of this pandemic.) X.509 certs have centralized attestation chains. Or create a novel scheme.

              • Unfortunately, without money changing hands or a centralized authority, it doesn't scale well. PGP and GPG tried this, and it's simply not used. Very, very few software packaging tools establish more than one GPG key published on their corporate or organization's website, and _no one_ checks the signatures. I'm also afraid that even if they tried, they'd run into precisely what SSL ran into: a few poor quality central authorities will poison the web of trust by carelessness or malice.

                The technology is worka

        • Python traditionally however valued well maintained mature packages over the "theres a package for everything, also 3/4 of them are abandoned" approach of the js world.

          Part of this was pythons underdog status for most of its history (remember folks, pythons a pretty old language, in the scheme of things), and I had been worried about its recent ascent to popularity over the past few years it'd attract the daft hipsters that wrecked JS's ecosystem who'd make a mess of the place and then leave chasing

      • by tepples ( 727027 )

        If you have a private encryption key to authenticate a public one, that's really all you need to verify your identity, because you can attach proof to your official public communications. It's easy to show you, as a real person, have control of it
        [...]
        I imagine some system of a reputation "web of trust' might work in some manner

        Assuming this cryptographic web of trust is anything like that implemented by PGP:

        Anyone can generate one or a thousand private keys, register each key with the repository, and publish spam through each key. If you're planning on deterring this by associating each key with a natural person, that raises a couple practical questions. Would showing proof of identity as a real person be done with the cooperation of government, with the cooperation of private-sector commercial certificate authorities, or with ke

        • by Entrope ( 68843 )

          Governments and private certificate authorities are functionally similar; you could use either or both with fundamentally the same mechanisms. The real question is centralized or decentralized trust for attestation of identity.

          For a decentralized mechanism, (A) can be mitigated by combining conferences or business meetings with key-signing parties. Not everyone needs to have direct intercontinental links to be strongly and robustly connected across continents. (B) is part of why PGP's WoT has an integrat

          • Governments and private certificate authorities are functionally similar; you could use either or both with fundamentally the same mechanisms.

            Other than that private certificate authorities are likely to price the service such that only an established business, not a hobbyist or a lone freelancer, can afford the cost of obtaining and keeping a valid certificate. We've already seen this with Windows requiring a commercial extended validation (EV) code signing certificate at hundreds of dollars per year for things like instant SmartScreen reputation or ability to run as a kernel extension (KMCS). A government, by contrast, has an incentive to sign

    • by lkcl ( 517947 ) <lkcl@lkcl.net> on Sunday May 23, 2021 @04:19AM (#61412254) Homepage

      Real identity, and rating (of the identity and/or the submitted packages) would go a long way to preventing this kind of problem.

      and it would also be run by e.g. google or facebook or microsoft or apple or yahoo, and it would be entirely centralised and under their complete control. how well do you think that would be received in the *Libre and Open* world?

      Is there an open identity and reputation service good enough to identify people who would publish?

      as others have mentioned, GPG has been used for decades. however GPG on its own is not enough: you also need a "Webring of Trust", and a complete signing system that tracks everything.

      there does exist a pre-established "gold standard" for this (i spent 3 weeks analysing all package distribution systems): Debian's package management was the *only one* that satisfied a whopping *seventeen* requirements.

      the nice thing about this solution - which has been in operation, proven and refined for over 20 years - is that it's entirely Libre-licensed and available to be extracted from the Debian Project's source code, and applied to other projects. this includes automated package building, archive creation, mirroring systems, package cacheing systems - everything that's needed is right there.

      OpenEmbedded (aka Yocto) extracted purely the "deb archive" system, over 15 years ago, but did not also create an archive keyring package or include the archive creation system etc. because they're a different type of project.

      given that this "gold standard" package management has existed for 20 years, harsh as it is to say this, node and pypi basically get what they deserve.

      • by mwa ( 26272 )

        I haven't friended anyone on here in a decade but you just resurrected my "I told you so" from the first days of pypi.

        It's like remembering when obvious things where obvious.

      • Would you mind sharing the requirements/report? I really would like to have some arguments, but don't have the three weeks to do such an analysis.

      • Please post links to the requirements and to the results of the study. Thank you. I recently had a discussion with colleagues on python pip install without knowing what you get... I'd be interested to know how various distributions line up.
  • by Halo1 ( 136547 ) on Sunday May 23, 2021 @01:03AM (#61412082)

    When our mediawiki was being flooded with fake "microsoft support phone number" scams, the only thing that worked in the end was
    1) write a custom captcha. It's a wiki for a Pascal compiler, so it generates a template-based semi-random program that contains trivial syntax errors (so it can't be automatically compiled), and you have to write that the program prints if you'd compile and run it after fixing. ReCaptcha and the like were completely useless (and way more annoying for legitimate users). This was to hamper automated submissions.
    2) lots of regexes to match 1-8xx phone numbers in all possible fields (article content, user name, upload comments, ...). This was to prevent the manual submissions.

    I combined point 2 with reporting 403 errors, and having fail2ban [fail2ban.org] ban any ip-address that triggered more than a couple of those in a few hours for a week. It was a lot of work, but it did stop them in the end (I guess the effort was simply not worth it any more at some point).

  • ... and make the detection of these packages a tad more challenging.

    Simply have these "developers" answer a question when checking things in like, "Is this spam?" or require them to tag the packages as "spam". Problem solved ... :-)

  • I expect botnets to appear shortly which will use your phone number to silently receive and relay activation codes.
  • If its anything like other similar links I have seen, the links probably go to a website that claims that you can stream all these movies if you just enter your credit card details (which of course the threat actors will then use for who knows what)

  • Which should be long enough to validate the submissions.

  • I don't recall CPAN having this issue even back when it was the most popular package repo across all languages. It still works as well as ever of course, although it is easier now with Perl being much less popular.
    Anyway good package repo to take ideas from...

  • This just feels like comedic irony.
  • by dexterace ( 68362 ) on Sunday May 23, 2021 @09:47AM (#61412818)

    It's interesting that all these identification schemes pop up... The Java ecosystem solved this long ago: http://maven.apache.org/reposi... [apache.org]

    In short, together with usual digital signature and such, you must also provide proof of owning the domain name that your library's package name relates to: this makes really hard to auto-generate spam packages; and even more focused attacks are much, much harder.

Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!

Working...