One Ring to Rule Them All

Munawar Hafiz, CEO of OpenRefactory, writes about how a simple mistake can result in critical supply chain attacks. Edited by Charlie Bedard.

On June 28, JFrog’s Brian Moussalli reported a leaked GitHub Personal Access Token (PAT) belonging to Ee Durbin (@ewdurbin), the Administrator of PyPI. Quoting from the incident report, “This token was immediately revoked, and a review of my GitHub account and activity was performed. No indicators of malicious activity were found.”

The issue arose from the development process of the cabotage-app. While working on the build code locally, Ee Durbin encountered GitHub API rate limits. To bypass these limits, he included his own access token in the local files instead of configuring a localhost GitHub App. Although the intention was to keep these changes local, the access token ended up being included in .pyc files, which are compiled bytecode files, and inadvertently pushed to hub.docker.com.

If an attacker had gained access to the GitHub Personal Access Token (PAT) for Ee, particularly with its extensive permissions, they could have potentially executed several malicious activities within the PyPI repository and related projects. It was like possessing the One Ring. Instead of just wielding a single power, the token granted dominion over all repositories and projects, much like how the One Ring controlled all other rings of power, allowing the bearer to reshape the entire codebase to their will.

What Could Have Happened?

Here are the specific supply chain attacks they could have been conducted:

Malicious Code Injection:
- Insertion of Malicious Code: The attacker could inject malicious code into the source code of packages hosted on PyPI. This could include backdoors, data exfiltration mechanisms, or other harmful code that would execute when the compromised packages are installed or run by end-users.
- Modification of Existing Packages: They could alter existing packages to include malware, which would then be distributed to users downloading or updating those packages.
Publishing Fake or Compromised Packages:
- Creating Malicious Packages: The attacker could publish new packages that seem legitimate but contain malicious code, tricking users into downloading and installing them.
- Hijacking Package Names: By taking control of abandoned or poorly maintained packages, they could publish new, malicious versions under the same package name.
Compromising Build and Deployment Pipelines:
- Infiltrating CI/CD Pipelines: The attacker could modify continuous integration and continuous deployment (CI/CD) pipelines to introduce malicious code during the build process, ensuring that compromised code is distributed in official releases.
- Subverting Security Checks: They could disable or alter security checks and tests to ensure that their malicious modifications go unnoticed.
Data Exfiltration and Espionage:
- Impersonation: The token could be used to perform actions on GitHub while appearing to be another person (Ee in this case), including making commits, opening issues, or commenting on discussions. An attacker could access any private repositories Ee has on GitHub, potentially allowing the attacker to view, modify, or even delete code and files.
- Industrial Espionage: They could gather intelligence on development practices, upcoming features, and strategic plans, which could be valuable to competitors or malicious entities.
Supply Chain Attacks on Dependencies:
- Targeting Dependencies: By compromising packages that are dependencies of other widely used projects, the attacker could create a cascading effect, compromising a broader range of projects and users.
- Exploiting Trust Relationships: Leveraging the trust that developers place in PyPI and its packages, the attacker could distribute compromised packages to a wide audience, potentially affecting thousands of projects that depend on these packages.

What Should You Do?

In order to protect yourself from such a debacle in future, here are some best practices you can follow:

1. Expire Tokens: The users should use short-lived tokens that automatically expire after a set period, reducing the risk of long-term exposure. The token that was used in the case of Ee Durbin was available on the pyc file from March of 2023! Perhaps the token was set to never expire. The token was used to bypass the GitHub API rate limits for a project (cabotage/cabotage-app:v3.0.0b35). That was a short lived task. A specific token with a limited lifetime could have been used for that task.

What Should GitHub/DockerHub Do?

The following policies should be enforced by GitHub/DockerHub:

1. Enforce Token Lifetime Policy: GitHub may enforce this as a policy. Currently, the default time for a personal access token is 30 days:
  
  Is it too long? There is a 7 day option available, but not used. Perhaps this duration can be reviewed.
2. Allow Easy Renewal: Implement easy mechanisms for users to renew tokens without requiring a full reset. This would reduce a user’s urge to make a token with a long lifetime.
3. Enforce best practices on .gitignore and .dockerignore files: GitHub and DockerHub should enforce by default that artifact files of certain kinds should be in the .gitignore and .dockerignore files and they should not be pushed. If a developer wants to willingly bypass this policy, this should be an opt out for them.
4. Token Scanning and Revocation: Implement automatic scanning of public repositories and commits for leaked tokens. When a token is detected, notify the user immediately and automatically revoke the token to prevent misuse.
5. Fine Grained Permission: Allow users to create tokens with specific scopes and permissions, limiting the potential impact of a token leak. Default to the least privilege necessary for tokens.
6. Revocation and Rotation: Make it simple for users to revoke and rotate tokens, encouraging regular token management.
7. Activity Logs: Maintain detailed logs of all token usage, including creation, access, and revocation events.
8. Anomaly Detection: Implement anomaly detection to identify and alert users of suspicious token activity, such as unusual access patterns or usage from unexpected locations.