ctfnote
  • /home/ret2basic.eth
  • Game Hacking
    • ✅C++
    • Ghidra
    • Cheat Engine
    • Proxy
    • DLL injection
    • Keygen
    • Aimbot
  • Web3 Security Research
    • 👑Web3 Security Research Trivia
    • ✅Solidity
      • ✅Mastering Ethereum
      • ✅Storage
      • ✅Memory
      • ✅Calldata
      • ✅ABI
    • ✅Foundry
      • ✅Introduction
      • ✅How to Write Basic Tests
      • ✅Set Soliditiy Compiler Version
      • ✅Remappings
      • ✅Auto Format Code
      • ✅Console Log
      • ✅Authentication
      • ✅Error
      • ✅Event
      • ✅Time
      • ✅Send ETH
      • ✅Signature
      • ✅Fork
      • ✅Mint 1 Million DAI on Mainnet Fork
      • ✅FFI
      • ✅Fuzz
      • ✅Invariant Testing - Part 1
      • Invariant Testing - Part 2
      • Invariant Testing - Part 3
      • Differential Test
    • ✅Secureum
      • ✅Epoch 0
        • ✅Slot 1: Ethereum 101
          • ✅Notes
          • ✅Ethereum Whitepaper
          • ✅Extra Study: What happens when you send 1 DAI
          • ✅Quiz
        • ✅Slot 2: Solidity 101
          • ✅Notes
          • ✅OpenZeppelin ERC20
          • ✅OpenZeppelin ERC721
          • ✅OpenZeppelin Ownable
          • ✅OpenZeppelin Pausable
          • ✅OpenZeppelin ReentrancyGuard
          • ✅Quiz
        • ✅Slot 3: Solidity 201
          • ✅Notes
          • ✅OpenZeppelin SafeERC20
          • ✅OpenZeppelin ERC-777
          • ✅OpenZeppelin ERC-1155
          • ✅OpenZeppelin ERC-3156
          • ✅OpenZeppelin - Proxy Upgrade Pattern
          • ✅Quiz
        • ✅Slot 4: Pitfalls and Best Practices 101
          • ✅Notes
          • ✅Intro to Security First Development
          • ✅Quiz
        • ✅Slot 5: Pitfalls and Best Practices 201
          • ✅Notes
          • So you want to use a price oracle
          • The Dangers of Surprising Code
          • ✅Quiz
        • ✅Slot 6: Auditing Techniques & Tools 101
          • ✅Notes
          • ✅Quiz
        • ✅Slot 7: Audit Findings 101
          • Notes
          • ✅Fei Protocol - ConsenSys
          • ✅Uniswap V3 - Trail of Bits
          • ✅Chainlink - Sigma Prime
          • ✅Opyn Gamma - OpenZeppelin
          • ✅Quiz
        • ✅Slot 8: Audit Findings 201
          • Notes
          • 1inch Liquidity - Consensus
          • Original Dollar - Trail of Bits
          • Synthetix EtherCollateral - Sigma Prime
          • Holdefi - OpenZeppelin
          • Quiz
      • ✅Epoch ∞
        • ✅RACE #4 - ERC20 Implementation
        • ✅RACE #5 - ERC1155 Implementation
        • ✅RACE #6 - ERC721 Application
        • ✅RACE #7 - Bored Ape
        • ✅RACE #8 - ERC721 Roles
        • ✅RACE #9 - Proxy
        • ✅RACE #10 - Test Cases
        • ✅RACE #11 - Staking
        • ✅RACE #12 - ERC20 Permit
        • ✅RACE #13 - ERC20 with Callback
        • ✅RACE #14 - Lending
        • ✅RACE #15 - DEX
        • ✅RACE #16 - Flash Loan
        • ✅RACE #17
    • DeFi
      • Glossary
        • TWAP vs. VWAP
        • Tranches
      • DeFi MOOC
        • Lecture 2: Introduction to Blockchain Technologies
        • Lecture 5: DEX
        • Lecture 6: Decentralized Lending
        • Lecture 10: Privacy on the Blockchain
        • Lecture 12: Practical Smart Contract Security
        • Lecture 13: DeFi Security
      • Uniswap V2
      • Compound V3
        • ✅Whitepaper
        • ✅Interacting with Compound
          • ✅Supply and Redeem
          • ✅Borrow and Repay
          • ✅Liquidation
          • ✅Long and Short
        • ✅Interest Model
        • CToken
      • Aave
      • Chainlink
        • ✅Getting Started
        • ✅Data Feeds
        • ✅VRF
      • Optimism
        • Bedrock
      • LayerZero
      • Opensea
        • Seaport
    • EVM
      • ✅Andreas Antonopoulos - The Ethereum Virtual Machine
      • ✅Program The Blockchain - Smart Contract Storage
      • ✅EVM Codes - EVM Playground for Opcodes
      • ✅Fvictorio - EVM Puzzles
      • ✅Daltyboy11 - More EVM Puzzles
      • ✅EVM Through Huff
      • Noxx - EVM Deep Dives
      • ✅Jordan McKinney - EVM Explained
      • Openzepplin - Deconstructing a Solidity Contract
      • Jeancvllr - EVM Assembly
      • Peter Robinson - Solidity to Bytecode, Memory & Storage
      • Marek Kirejczyk - Ethereum Under The Hood
      • ✅Official Solidity Docs
      • Dissecting EVM using go-ethereum Eth client implementation - deliriusz.eth
    • Vulnerabilities
      • Rounding Issues
        • Kyberswap
      • Bridges
      • Governance / Voting Escrows
      • Bizzare Bug Classes
        • TIME - ERC2771Context + Multicall calldata manipulation
    • Fancy Topics
      • Vulnerabilities SoK
        • ✅Demystifying Exploitable Bugs in Smart Contracts
        • Blockchain Hacking Techniques 2022 Top 10 - Todo
      • yAcademy
        • Proxies
          • yAcademy - Proxy Basics
          • yAcademy - Proxies Deep Dive
          • yAcademy - Security Guide to Proxy Vulns
        • defi-fork-bugs
      • Spearbit
        • ✅Community Workshop: Riley Holterhus
        • Economic Security with fmrmf
        • Numerical Analysis for DeFi Audits: A TWAMM Case Study by Kurt Barry
  • Red Teaming
    • ✅Enumeration
      • Service Enumeration
        • SMTP (Port 25)
        • Samba (Port 139, 445)
        • SNMP (Port 161,162,10161,10162)
        • rsync (Port 873)
        • NFS (Port 2049)
        • Apache JServ Protocol (Port 8081)
        • NetBIOS
      • Nmap
      • Gobuster / Feroxbuster / FUFF / Wfuzz
      • Drupal
    • ✅Exploitation
      • Public Exploits
      • PHP Webshells
      • Reverse Shell
      • TTY
      • File Transfer
      • Metasploit
      • Password Spray
    • ✅Buffer Overflow
      • Step 0: Spiking (Optional)
      • Step 1: Fuzzing
      • Step 2: Finding the Offset
      • Step 3: Overwriting the EIP
      • Step 4: Finding Bad Characters
      • Step 5: Finding the Right Module
      • Step 6: Generating Shellcode and Gaining Root
    • ✅Privilege Escalation
      • Linux Privilege Escalation
        • Linux Permissions
        • Manual Enumeration
        • Automated Tools
        • Kernel Exploits
        • Passwords and File Permissions
        • SSH Keys
        • Sudo
        • SUID
        • Capabilities
        • Cron Jobs
        • NFS Root Squashing
        • Docker
        • GNU C Library
        • Exim
        • Linux Privilege Escalation Course Capstone
      • Windows Privilege Escalation
        • Manual Enumeration
        • Automated Tools
        • Kernel Exploits
        • Passwords and Port Forwarding
        • WSL
        • Token Impersonation and Potato Attacks
        • Meterpreter getsystem
        • Runas
        • UAC Bypass
        • Registry
        • Executable Files
        • Startup Applications
        • DLL Hijacking
        • Service Permissions (Paths)
        • CVE-2019-1388
        • HiveNightmare
        • Bypass Space Filter
    • ✅Post Exploitation
      • Linux Post Exploitation
        • Add a User
        • SSH Key
      • Windows Post Exploitation
        • windows-resources
        • Add a User
        • RDP
    • ✅Pivoting
      • Windows: Chisel
      • Linux: sshuttle
    • Active Directory (AD)
      • Initial Compromise
        • HTA Phishing
        • VBA Macro Phishing
        • LLMNR Poisoning
        • SMB Relay
        • GPP / cPassword
      • Domain Enumeration
        • Manual Enumeration
        • PowerView
        • BloodHound
      • Lateral Movement
        • PsExec
        • WMI
        • Runas
        • Pass the Hash
        • Overpass the Hash
        • Pass the Ticket
      • Kerberos
        • Kerberoast
        • AS-REP Roast
      • MS SQL Server
    • Command & Control (C2)
      • Cobalt Strike
        • Bypassing Defences
          • Artifact Kit
          • Resource Kit
          • AMSI Bypass
          • PowerPick
        • Extending Cobalt Strike
          • Elevate Kit
          • Malleable C2 Profile
      • Metasploit
        • Payloads
        • Post Exploitation
        • Automation
      • C2 Development
    • Malware Development
      • "Hot Dropper"
      • PE Format
        • Overview
      • Process Injection
      • Reflective DLL
      • x86 <=> x64
      • Hooking
      • VeraCry
      • Offensive C#
      • AV Evasion
        • AV Evasion with C# and PowerShell
        • AMSI Bypass
  • Cryptography
    • Hash Functions
    • MAC
    • AES
      • Byte at a Time
      • CBC CCA
      • CBC Bit Flipping
      • CBC Padding Oracle
    • Diffie-Hellman
    • RSA
      • Prime Factors
      • Multiple Ciphertexts
      • Low Public Exponent
      • Low Private Exponent
    • ECC
    • Digital Signature
    • JWT
    • PRNG
    • SSL/TLS
    • Research
      • ✅Lattice-based Cryptography (Lattice)
      • Elliptic Curve Cryptography (ECC)
      • Oblivious Transfer (OT)
      • Secure Multi-party Computation (MPC)
      • Learning with Error (LWE)
      • Fully Homomorphic Encryption (FHE)
      • Zero Knowledge Proof (ZKP)
      • Oblivious RAM (ORAM)
  • Computer Science
    • Linux
      • Setup
      • curl
      • Hard Link vs. Symlink
      • Man Page
      • /dev/null
    • Python
      • New Features
      • Operators, Expressions, and Data Manipulation
      • Program Structure and Control Flow
      • Objects, Types, and Protocols
      • Functions 101
      • Generators
      • Classes and Object-Oriented Programming
      • Memory Management
      • Concurrency and Parallelism
        • Multithreading and Thread Safety
        • Asynchronization
        • Multiprocessing
        • Global Interpreter Lock (GIL)
      • Built-in Functions and Standard Library
        • import collections
        • import itertools
        • import sys
        • import re
        • import pickle
        • import json
      • Third-party Library
        • from pwn import *
        • import requests
        • from bs4 import BeautifulSoup
        • from scapy.all import *
        • py2exe
    • HTML, CSS, JavaScript, and React
      • HTML
      • CSS
      • JavaScript
        • var vs. let
        • Objects
        • Arrays
        • Functions
        • Modules
        • Asynchronous JavaScript
      • React
    • Data Structures and Algorithms
      • Binary Search
    • The Linux Programming Interface
      • Processes
        • Memory Allocation
        • The Process API
        • Process Creation
        • Process Termination
        • Monitoring Child Processes
        • Program Execution
      • Signals
      • Threads
        • Thread Synchronization
        • Thread Safety and Pre-Thread Storage
      • IPC
        • Pipes and FIFOs
        • Memory Mappings
        • Virtual Memory Operations
      • Sockets
    • Computer Systems
      • Hexadecimal
      • Signedness
      • Registers
      • Instructions
      • Syscall
      • Process Memory
      • Stack Frame
      • Preemptive Multitasking
      • IPC
      • Threads
    • Databases
      • MySQL
        • Basic Syntax
        • Data Types
        • Modifying Tables
        • Duplicating and Deleting
        • SELECT
        • Transaction
      • GraphQL
    • Distributed Systems
      • Introduction
        • What is a Distributed System?
        • Design Goals
        • Scaling Techniques
        • Types of Distributed Systems
      • Architecture
        • System Architectures
        • Example Architectures
      • Communication
        • Foundations
        • Remote Procedure Call
        • Message-oriented Communication
      • Coordination
        • Clock Synchronization
        • Logical Clock
      • Consistency and Replication
        • Introduction
        • Data-centric Consistency
        • Client-centric Consistency
    • Static Analysis
      • Intermediate Representation
      • Data Flow Analysis
      • Interprocedural Analysis
      • Pointer Analysis
      • Static Analysis for Security
      • Datalog-Based Program Analysis
      • Soundness and Soundiness
      • CFL-Reachability and IFDS
  • Web
    • ✅Prerequisites
      • OWASP Top 10
        • 1. Broken Access Control
        • 2. Cryptographic Failures
        • 3. Injection
        • 4. Insecure Design
        • 5. Security Misconfiguration
        • 6. Vulnerable and Outdated Components
        • 7. Identification and Authentication Failures
        • 8. Software and Data Integrity Failures
        • 9. Security Logging and Monitoring Failures
        • 10. SSRF
      • HTTP
        • HTTP Status Codes
        • HTTP Headers
      • Burp Suite
        • Burp Intruder
        • Burp Extender
        • Burp Collaborator
      • Information Gathering
        • DNS
        • Git
        • Editor
        • Server
      • Bug Bounty Report Writing
    • File Upload
      • Webshell
      • IIS, Nginx, and Apache Vulnerabilities
      • .htaccess (Apache) / web.config (IIS)
      • Alternate Data Stream
      • Code Review: bWAPP Unrestricted File Upload
    • SQL Injection (SQLi)
      • Cheat Sheet
      • UNION Attacks
      • Examining the Database
      • Blind SQL Injection
      • WAF Bypass
      • Out-Of-Band (OOB)
      • Webshell and UDF
      • sqlmap
        • Code Review: Initialization
        • Code Review: tamper
    • Cross-Site Scripting (XSS)
      • Cheat Sheet
      • Reflected XSS
      • Stored XSS
      • DOM-Based XSS
      • XSS Contexts
      • CSP
    • CSRF and SSRF
      • Client-Side Request Forgery (CSRF)
        • XSS vs. CSRF
        • CSRF Tokens and SameSite Cookies
      • Server-Side Request Forgery (SSRF)
        • Attacks
        • Bypassing Restrictions
        • SSRF + Redis
    • XML External Entities (XXE)
    • Insecure Deserialization
      • Python Deserialization
      • PHP Deserialization
      • Java Deserialization
        • Shiro
        • FastJSON
        • WebLogic
    • HTTP Request Smuggling
    • OS Command Injection
      • Whitespace Bypass
      • Blacklist Bypass
      • Blind OS Command Injection
      • Lab 1: HITCON 2015 BabyFirst
      • Lab 2: HITCON 2017 BabyFirst Revenge
      • Lab 3: HITCON 2017 BabyFirst Revenge v2
    • ✅Directory Traversal
    • HTTP Parameter Pollution
    • Server-Side Template Injection (SSTI)
    • LDAP Injection
    • Redis
      • Authentication
      • RCE
      • Mitigations
  • Pwn
    • Linux Exploitation
      • Protections
      • Shellcoding
        • Calling Convention
        • Null-free
        • Reverse Shell
        • ORW
      • ROP
        • Stack Alignment
        • ret2text
        • ret2syscall
        • ret2libc
        • ret2csu
        • BROP
        • SROP
        • Stack Pivot
      • ptmalloc
        • chunks
        • malloc() and free()
        • bins
        • tcache
      • UAF
      • Race Conditions
        • TOCTTOU
        • Dirty Cow
        • Meltdown
        • Spectre
      • Kernel
      • Appendix: Tools
        • socat
        • LibcSearcher-ng
        • OneGadget
    • Windows Exploitation
      • Classic
      • SEH
      • Egghunting
      • Unicode
      • Shellcoding
      • ROP
      • Appendix: Tools
        • ImmunityDbg
        • Mona.py
    • Fuzzing
      • AFL++
        • Quickstart
        • Instrumentation
        • ASAN
        • Code Coverage
        • Dictionary
        • Parallelization
        • Partial Instrumentation
        • QEMU Mode
        • afl-libprotobuf-mutator
      • WinAFL
      • Fuzzilli
  • Reverse
    • Bytecode
      • Python Bytecode
    • 👑Z3 solver
    • angr
      • angr Template
Powered by GitBook
On this page
  • Serialization vs. Deserialization
  • Serialization (object => byte stream)
  • Deserialization (byte stream => object)
  • What is Insecure Deserialization?
  • Manipulating Serialized Objects
  • Modifying Object Attributes
  • Modifying Data Types
  • Magic Methods
  • Object Injection
  • Gadget Chains
  • Defense
  • Reference

Was this helpful?

  1. Web

Insecure Deserialization

PreviousXML External Entities (XXE)NextPython Deserialization

Last updated 3 years ago

Was this helpful?

Serialization vs. Deserialization

Serialization (object => byte stream)

Serialization is the process of converting complex data structures, such as objects and their fields, into a "flatter" format that can be sent and received as a sequential stream of bytes.

Serializing data makes it much simpler to:

  • Write complex data to inter-process memory, a file, or a database

  • Send complex data, for example, over a network, between different components of an application, or in an API call

Crucially, when serializing an object, its state is also persisted. In other words, the object's attributes are preserved, along with their assigned values.

Deserialization (byte stream => object)

Deserialization is the process of restoring this byte stream to a fully functional replica of the original object, in the exact state as when it was serialized. The website's logic can then interact with this deserialized object, just like it would with any other object.

Many programming languages offer native support for serialization. Exactly how objects are serialized depends on the language. Some languages serialize objects into binary formats, whereas others use different string formats, with varying degrees of human readability. Note that all of the original object's attributes are stored in the serialized data stream, including any private fields. To prevent a field from being serialized, it must be explicitly marked as "transient" in the class declaration.

Be aware that when working with different programming languages, serialization may be referred to as marshalling (Ruby) or pickling (Python). These terms are synonymous with "serialization" in this context.

What is Insecure Deserialization?

Insecure deserialization is when user-controllable data is deserialized by a website. This potentially enables an attacker to manipulate serialized objects in order to pass harmful data into the application code.

It is even possible to replace a serialized object with an object of an entirely different class. Alarmingly, objects of any class that is available to the website will be deserialized and instantiated, regardless of which class was expected. For this reason, insecure deserialization is sometimes known as an "object injection" vulnerability.

An object of an unexpected class might cause an exception. By this time, however, the damage may already be done. Many deserialization-based attacks are completed before deserialization is finished. This means that the deserialization process itself can initiate an attack, even if the website's own functionality does not directly interact with the malicious object. For this reason, websites whose logic is based on strongly typed languages can also be vulnerable to these techniques.

Insecure deserialization typically arises because there is a general lack of understanding of how dangerous deserializing user-controllable data can be. Ideally, user input should never be deserialized at all.

However, sometimes website owners think they are safe because they implement some form of additional check on the deserialized data. This approach is often ineffective because it is virtually impossible to implement validation or sanitization to account for every eventuality. These checks are also fundamentally flawed as they rely on checking the data after it has been deserialized, which in many cases will be too late to prevent the attack.

Vulnerabilities may also arise because deserialized objects are often assumed to be trustworthy. Especially when using languages with a binary serialization format, developers might think that users cannot read or manipulate the data effectively. However, while it may require more effort, it is just as possible for an attacker to exploit binary serialized objects as it is to exploit string-based formats.

Deserialization-based attacks are also made possible due to the number of dependencies that exist in modern websites. A typical site might implement many different libraries, which each have their own dependencies as well. This creates a massive pool of classes and methods that is difficult to manage securely. As an attacker can create instances of any of these classes, it is hard to predict which methods can be invoked on the malicious data. This is especially true if an attacker is able to chain together a long series of unexpected method invocations, passing data into a sink that is completely unrelated to the initial source. It is, therefore, almost impossible to anticipate the flow of malicious data and plug every potential hole.

In short, it can be argued that it is NOT possible to securely deserialize untrusted input.

The impact of insecure deserialization can be very severe because it provides an entry point to a massively increased attack surface. It allows an attacker to reuse existing application code in harmful ways, resulting in numerous other vulnerabilities, often remote code execution.

Even in cases where remote code execution is not possible, insecure deserialization can lead to privilege escalation, arbitrary file access, and denial-of-service attacks.

The following functions are often linked with insecure deserialization:

  • PHP: unserialize()

  • Java: readObject()

  • Python: pickle.loads()

  • Ruby: Marshall.load()

Manipulating Serialized Objects

Exploiting some deserialization vulnerabilities can be as easy as changing an attribute in a serialized object. As the object state is persisted, you can study the serialized data to identify and edit interesting attribute values. You can then pass the malicious object into the website via its deserialization process. This is the initial step for a basic deserialization exploit.

Broadly speaking, there are two approaches you can take when manipulating serialized objects. You can:

  • either edit the object directly in its byte stream form

  • or write a short script in the corresponding language to create and serialize the new object yourself. This approach is often easier when working with binary serialization formats.

Modifying Object Attributes

When tampering with the data, as long as the attacker preserves a valid serialized object, the deserialization process will create a server-side object with the modified attribute values.

As a simple example, consider a website that uses a serialized User object to store data about a user's session in a cookie. If an attacker spotted this serialized object in an HTTP request, they might decode it to find the following byte stream:

O:4:"User":2:{s:8:"username";s:6:"carlos";s:7:"isAdmin";b:0;}

The isAdmin attribute is an obvious point of interest. An attacker could simply change the boolean value of the attribute to 1 (true), re-encode the object, and overwrite their current cookie with this modified value. In isolation, this has no effect. However, let's say the website uses this cookie to check whether the current user has access to certain administrative functionality:

$user = unserialize($_COOKIE);

if ($user->isAdmin === true) {
    // allow access to admin interface
}

This vulnerable code would instantiate a User object based on the data from the cookie, including the attacker-modified isAdmin attribute. At no point is the authenticity of the serialized object checked. This data is then passed into the conditional statement and, in this case, would allow for an easy privilege escalation.

This simple scenario is not common in the wild. However, editing an attribute value in this way demonstrates the first step towards accessing the massive amount of attack-surface exposed by insecure deserialization.

Modifying Data Types

We've seen how you can modify attribute values in serialized objects, but it's also possible to supply unexpected data types.

PHP-based logic is particularly vulnerable to this kind of manipulation due to the behavior of its loose comparison operator (==) when comparing different data types. For example, if you perform a loose comparison between an integer and a string, PHP will attempt to convert the string to an integer, meaning that 5 == "5" evaluates to true.

Unusually, this also works for any alphanumeric string that starts with a number. In this case, PHP will effectively convert the entire string to an integer value based on the initial number. The rest of the string is ignored completely. Therefore, 5 == "5 of something" is in practice treated as 5 == 5.

This becomes even stranger when comparing a string the integer 0:

0 == "Example string" // true

Why? Because there is no number, that is, 0 numerals in the string. PHP treats this entire string as the integer 0.

Consider a case where this loose comparison operator is used in conjunction with user-controllable data from a deserialized object. This could potentially result in dangerous logic flaws:

$login = unserialize($_COOKIE)

if ($login['password'] == $password) {
    // log in successfully
}

Let's say an attacker modified the password attribute so that it contained the integer 0 instead of the expected string. As long as the stored password does not start with a number, the condition would always return true, enabling an authentication bypass. Note that this is only possible because deserialization preserves the data type. If the code fetched the password from the request directly, the 0 would be converted to a string and the condition would evaluate to false.

Be aware that when modifying data types in any serialized object format, it is important to remember to update any type labels and length indicators in the serialized data too. Otherwise, the serialized object will be corrupted and will not be deserialized.

When working directly with binary formats, we recommend using the Hackvertor extension, available from the BApp store. With Hackvertor, you can modify the serialized data as a string, and it will automatically update the binary data, adjusting the offsets accordingly. This can save you a lot of manual effort.

Magic Methods

Magic methods are a special subset of methods that you do not have to explicitly invoke. Instead, they are invoked automatically whenever a particular event or scenario occurs. Magic methods are a common feature of OOP in various languages. They are sometimes indicated by prefixing or surrounding the method name with double-underscores.

Developers can add magic methods to a class in order to predetermine what code should be executed when the corresponding event or scenario occurs. Exactly when and why a magic method is invoked differs from method to method. One of the most common examples in PHP is __construct(), which is invoked whenever an object of the class is instantiated, similar to Python's __init__. Typically, constructor magic methods like this contain code to initialize the attributes of the instance. However, magic methods can be customized by developers to execute any code they want.

Magic methods are widely used and do not represent a vulnerability on their own. But they can become dangerous when the code that they execute handles attacker-controllable data, for example, from a deserialized object. This can be exploited by an attacker to automatically invoke methods on the deserialized data when the corresponding conditions are met.

Most importantly in this context, some languages have magic methods that are invoked automatically during the deserialization process. For example, PHP's unserialize() method looks for and invokes an object's __wakeup() magic method.

In Java deserialization, the same applies to the ObjectInputStream.readObject() method, which is used to read data from the initial byte stream and essentially acts like a constructor for "re-initializing" a serialized object. However, Serializable classes can also declare their own readObject() method as follows:

private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException
{
    // implementation
}

A readObject() method declared in exactly this way acts as a magic method that is invoked during deserialization. This allows the class to control the deserialization of its own fields more closely.

You should pay close attention to any classes that contain these types of magic methods. They allow you to pass data from a serialized object into the website's code before the object is fully deserialized. This is the starting point for creating more advanced exploits.

Object Injection

As we've seen, it is occasionally possible to exploit insecure deserialization by simply editing the object supplied by the website. However, injecting arbitrary object types can open up many more possibilities.

In object-oriented programming, the methods available to an object are determined by its class. Therefore, if an attacker can manipulate which class of object is being passed in as serialized data, they can influence what code is executed after, and even during, deserialization.

Deserialization methods do not typically check what they are deserializing. This means that you can pass in objects of any serializable class that is available to the website, and the object will be deserialized. This effectively allows an attacker to create instances of arbitrary classes. The fact that this object is not of the expected class does not matter. The unexpected object type might cause an exception in the application logic, but the malicious object will already be instantiated by then.

If an attacker has access to the source code, they can study all of the available classes in detail. To construct a simple exploit, they would look for classes containing deserialization magic methods, then check whether any of them perform dangerous operations on controllable data. The attacker can then pass in a serialized object of this class to use its magic method for an exploit.

Gadget Chains

A "gadget" is a snippet of code that exists in the application that can help an attacker to achieve a particular goal. An individual gadget may not directly do anything harmful with user input. However, the attacker's goal might simply be to invoke a method that will pass their input into another gadget. By chaining multiple gadgets together in this way, an attacker can potentially pass their input into a dangerous "sink gadget", where it can cause maximum damage.

It is important to understand that, unlike some other types of exploit, a gadget chain is not a payload of chained methods constructed by the attacker. All of the code already exists on the website. The only thing the attacker controls is the data that is passed into the gadget chain. This is typically done using a magic method that is invoked during deserialization, sometimes known as a "kick-off gadget".

In the wild, many insecure deserialization vulnerabilities will only be exploitable through the use of gadget chains. This can sometimes be a simple one or two-step chain, but constructing high-severity attacks will likely require a more elaborate sequence of object instantiations and method invocations. Therefore, being able to construct gadget chains is one of the key aspects of successfully exploiting insecure deserialization.

Defense

Generally speaking, deserialization of user input should be avoided unless absolutely necessary. The high severity of exploits that it potentially enables, and the difficulty in protecting against them, outweigh the benefits in many cases.

If you do need to deserialize data from untrusted sources, incorporate robust measures to make sure that the data has not been tampered with. For example, you could implement a digital signature to check the integrity of the data. However, remember that any checks must take place before beginning the deserialization process. Otherwise, they are of little use.

If possible, you should avoid using generic deserialization features altogether. Serialized data from these methods contains all attributes of the original object, including private fields that potentially contain sensitive information. Instead, you could create your own class-specific serialization methods so that you can at least control which fields are exposed.

Finally, remember that the vulnerability is the deserialization of user input, not the presence of gadget chains that subsequently handle the data. Don't rely on trying to eliminate gadget chains that you identify during testing. It is impractical to try and plug them all due to the web of cross-library dependencies that almost certainly exist on your website. At any given time, publicly documented memory corruption exploits are also a factor, meaning that your application may be vulnerable regardless.

Reference

Insecure deserialization | Web Security AcademyWebSecAcademy
Insecure deserialization - Web Security Academy
Logo
Serialization vs. Deserialization
Insecure Deserialization