Pytosquatting

Work in progress: Fixing typosquatting+namesquatting threats in Python Package Index (PyPI).

Subsequent updates

Last updated: 20210318

20210301: Some new kid on the block RemindSupplyChainRisks squads 3591 packages: [github.com/pypa/pypi-support]

20210209: A popular article surfaces about supply chain attacks: [Alex Birsan, medium.com: Dependency Confusion...]

20200519: An article about how --extra-index-url often is misunderstood - [Matt Kubilus, medium.com: Pip --extra-index-url, Considered Dangerous]

20191201: In-the-wild malware found on PyPi stealing PGP and SSH keys through namesquatted python3-dateutil library [ZDNet]

20190717: Company ReverseLabs writes a blog post SupPy Chain Malware, finding 3 packages on PyPi that install backdoors, one of which has 82 downloads per month on average.

20181209: A Github repository by developer "roscoe" discloses 18 typosquatted packages, some uploading user credentials, others installing malware.

20181013: Independent security researcher Bertus announces a code analysis revealing 12 malicious packages on PyPi.

20180508: A post on Reddit explains how the package ssh-decorator sends a user's private SSH key to a server on the internet. The Github and PyPi projects are since then taken down. Evidence is inconclusive as to whether the original author of the package had their PyPi credentials hijacked or was complicit.

Epilogue

We have closed the Pytosquatting initiative for now. This is because Python Security Response Team (PSRT) has announced that they will take action (see below Timeline).

Timeline

In June 2016, Typosquatting programming language package managers stated that urllib2 had ~4,000 downloads in 2 weeks. But in June 2017, we found the same package name vacant and so we (being the good guys) squatted it for several months up until this disclosure. We take these findings seriously.

20170519: Steve Stagg writes about how he registered stdlib names, sent emails and that »I raised an issue on the official pypi github issue tracker in January. This also got no reply.«

20170628: PyPI Warehouse issue #2151 is opened. Title is "Block package names that conflict with core libraries", but no names were blocked.

20170913: We squatted all available names of stdlib packages (128) - scroll down to see statistics from pingbacks.

20170914: A number of in-the-wild malicious packages on PyPI were disclosed by Slovak National Security Authority.

20170917: PyPI's main developer Donald Stufft creates PR#2396 for database-backed blacklisting of package names. It's unclear how they want to apply the blacklistings, but it would mean a more efficient process for administrators. Most of the stdlib names that we squatted are black listed.

20170922: Python Security Response Team (PSRT) takes action by announcing a detailed plan to mitigate future attacks. The plan is included in an over-all boost of PyPy, receiving a $170k grant from Mozilla Foundation.

Mitigation

Here's a couple of proprosals that we originally posted -- which have since then been expanded in a nice way in PSRT's security announcement.

  • Strategy #1: We are namesquatting a bunch of stuff on PyPI (all available Python 2 and Python 3 standard libraries). So no matter if you use the security hardened Pip installer, we have managed to mitigate the bulk of the immediate problem.
  • Strategy #2: Use a Pip installer that does safety lookups and fails loudly if the attempted package name does not validate. This should be implemented in your automated deployments and test builds!

Aftermath

We had a pingback in the setup.py of packages involved in Strategy #1, meaning that during a limited duration, we gathered statistics on the extend of the issue. The callback didn't involve any stats from user systems, just an IP so we can count that a unique system has attempted to install a non-existing package that could have been exploited.

We are calling for analysis of the current PyPI resources to find in-the-wild exploits of typosquatting as Slovak National Security Authority has done. We hope there are none, but the problem has been around for a long time, and our primer didn't get reactions from the PyPI admins.

Mockup of Strategy #2

Once done, we hope to achieve a better pip installer that:

  • Verifies that you don't install a package with the name of a stdlib
  • Asks a webservice or local database if you are installing a typo of a popular package

It could look like this...

pip install pipsec  # Install security-hardening plugin for pip
pip install virtualenv-wrapper  # See that it fails
pip install virtualenvwrapper  # This is correct
          

It seems to be hinted by the closure of pip#4527 that attempts to add security to the client side isn't popular. Arguments are weak, though, so there's no real reason not to do something like the above.

Media

Ars Technica: Devs unknowingly use “malicious” modules snuck into official Python repository

Golem.de: Bösartige Python-Pakete entdeckt (DE)

Hacker News: Malicious software libraries found in PyPI posing as well known libraries

Ack

Send comments or complaints to Benjamin Bach and Hanno Böck.

Check out the code for this website on https://github.com/benjaoming/pytosquatting.


Appendix

Stdlib installations

Blocked stdlib installations since 20170913-20170916: 20188

On 20170916, PyPI removed our Top 20 of squatted packages, so our statistics won't match up anymore. They didn't remove the other 108 squatted packages.

Package Average per day
1 timeit 10.1
2 pkgutil 2.9
3 ntpath 2.2
4 urllib2 1.5
5 subprocess 0.9
6 argparser 0.8
7 this 0.8
8 collections 0.8
9 setuptols 0.7
10 smtplib 0.6
11 shutil 0.6
12 venv 0.6
13 curses 0.6
14 idlelib 0.6
15 glob 0.6
16 docutil 0.5
17 base64 0.4
18 concurrent 0.4
19 threading 0.4
20 csselect 0.4