Munawar Hafiz, CEO of OpenRefactory, writes about how a simple mistake can result in critical supply chain attacks. Edited by Charlie Bedard.
On June 28, JFrog’s Brian Moussalli reported a leaked GitHub Personal Access Token (PAT) belonging to Ee Durbin (@ewdurbin), the Administrator of PyPI. Quoting from the incident report, “This token was immediately revoked, and a review of my GitHub account and activity was performed. No indicators of malicious activity were found.”
The issue arose from the development process of the cabotage-app. While working on the build code locally, Ee Durbin encountered GitHub API rate limits. To bypass these limits, he included his own access token in the local files instead of configuring a localhost GitHub App. Although the intention was to keep these changes local, the access token ended up being included in .pyc files, which are compiled bytecode files, and inadvertently pushed to hub.docker.com.
If an attacker had gained access to the GitHub Personal Access Token (PAT) for Ee, particularly with its extensive permissions, they could have potentially executed several malicious activities within the PyPI repository and related projects. It was like possessing the One Ring. Instead of just wielding a single power, the token granted dominion over all repositories and projects, much like how the One Ring controlled all other rings of power, allowing the bearer to reshape the entire codebase to their will.
What Could Have Happened?
Here are the specific supply chain attacks they could have been conducted:- Malicious Code Injection:
- Insertion of Malicious Code: The attacker could inject malicious code into the source code of packages hosted on PyPI. This could include backdoors, data exfiltration mechanisms, or other harmful code that would execute when the compromised packages are installed or run by end-users.
- Modification of Existing Packages: They could alter existing packages to include malware, which would then be distributed to users downloading or updating those packages.
- Publishing Fake or Compromised Packages:
- Creating Malicious Packages: The attacker could publish new packages that seem legitimate but contain malicious code, tricking users into downloading and installing them.
- Hijacking Package Names: By taking control of abandoned or poorly maintained packages, they could publish new, malicious versions under the same package name.
- Compromising Build and Deployment Pipelines:
- Infiltrating CI/CD Pipelines: The attacker could modify continuous integration and continuous deployment (CI/CD) pipelines to introduce malicious code during the build process, ensuring that compromised code is distributed in official releases.
- Subverting Security Checks: They could disable or alter security checks and tests to ensure that their malicious modifications go unnoticed.
- Data Exfiltration and Espionage:
- Impersonation: The token could be used to perform actions on GitHub while appearing to be another person (Ee in this case), including making commits, opening issues, or commenting on discussions. An attacker could access any private repositories Ee has on GitHub, potentially allowing the attacker to view, modify, or even delete code and files.
- Industrial Espionage: They could gather intelligence on development practices, upcoming features, and strategic plans, which could be valuable to competitors or malicious entities.
- Supply Chain Attacks on Dependencies:
- Targeting Dependencies: By compromising packages that are dependencies of other widely used projects, the attacker could create a cascading effect, compromising a broader range of projects and users.
- Exploiting Trust Relationships: Leveraging the trust that developers place in PyPI and its packages, the attacker could distribute compromised packages to a wide audience, potentially affecting thousands of projects that depend on these packages.
What Should You Do?
In order to protect yourself from such a debacle in future, here are some best practices you can follow:-
- Expire Tokens: The users should use short-lived tokens that automatically expire after a set period, reducing the risk of long-term exposure. The token that was used in the case of Ee Durbin was available on the pyc file from March of 2023! Perhaps the token was set to never expire. The token was used to bypass the GitHub API rate limits for a project (cabotage/cabotage-app:v3.0.0b35). That was a short lived task. A specific token with a limited lifetime could have been used for that task.
- Use Environment Variables: Store secrets in environment variables rather than hardcoding them in your codebase. This way, the secrets are kept outside of the code and can be managed more securely.
- Scope Tokens: Limit the scope of tokens to only the permissions required for their specific use case. Apply the principle of least privilege, avoid granting more permissions than necessary.
- Use Secret Management Tools: Use secret management tools such as HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or Google Cloud Secret Manager. These tools provide secure storage, access control, and auditing capabilities for sensitive information.
- Update and Monitor .gitignore and .dockerignore: Ensure that sensitive files and directories (e.g., .env, config/secrets.yml, __pycache__, *.pyc) are listed in .gitignore and .dockerignore to prevent them from being accidentally committed to version control.
- Use Automated Scanning: Use automated tools to scan your codebase for potential secret leaks. Tools like GitHub Secret Scanning, TruffleHog, or Gitleaks can help identify secrets in your repository.
- Follow a Rotation Schedule: Regularly rotate your tokens and secrets to minimize the risk of long-term exposure. Implement automated rotation mechanisms where possible.
- Avoid Using PAT: Using OpenID Connect (OIDC) is a modern approach to managing authentication and access in a more secure and scalable way compared to Personal Access Tokens (PATs).
What Should GitHub/DockerHub Do?
The following policies should be enforced by GitHub/DockerHub:
-
- Enforce Token Lifetime Policy: GitHub may enforce this as a policy. Currently, the default time for a personal access token is 30 days:
Is it too long? There is a 7 day option available, but not used. Perhaps this duration can be reviewed. - Allow Easy Renewal: Implement easy mechanisms for users to renew tokens without requiring a full reset. This would reduce a user’s urge to make a token with a long lifetime.
- Enforce best practices on .gitignore and .dockerignore files: GitHub and DockerHub should enforce by default that artifact files of certain kinds should be in the .gitignore and .dockerignore files and they should not be pushed. If a developer wants to willingly bypass this policy, this should be an opt out for them.
- Token Scanning and Revocation: Implement automatic scanning of public repositories and commits for leaked tokens. When a token is detected, notify the user immediately and automatically revoke the token to prevent misuse.
- Fine Grained Permission: Allow users to create tokens with specific scopes and permissions, limiting the potential impact of a token leak. Default to the least privilege necessary for tokens.
- Revocation and Rotation: Make it simple for users to revoke and rotate tokens, encouraging regular token management.
- Activity Logs: Maintain detailed logs of all token usage, including creation, access, and revocation events.
- Anomaly Detection: Implement anomaly detection to identify and alert users of suspicious token activity, such as unusual access patterns or usage from unexpected locations.
- Enforce Token Lifetime Policy: GitHub may enforce this as a policy. Currently, the default time for a personal access token is 30 days: