TCP Message Framing: Reassembling Length-Prefixed Messages from a Byte Stream
TCP is a stream protocol — it has no concept of message boundaries. A single send() call might arrive as three recv() calls, or three send() calls might coalesce into one buffer. If you’re building a trading protocol over TCP, you need message framing: a way for the receiver to know where one message ends and the next begins.
This article covers the simplest framing scheme — length-prefixing — and a complete C++ implementation that handles all the edge cases.
The Problem
TCP’s abstraction is a bidirectional byte stream:
Sender: send(msg1) send(msg2) send(msg3)
│ │ │
└───────────┼───────────┘
▼
TCP byte stream
│
┌──────────────┼──────────────┐
▼ ▼ ▼
Receiver: recv() → 4 bytes recv() → 11 bytes recv() → -1 (EOF)
The receiver has no idea where the message boundaries are. Two common framing strategies:
| Strategy | Description | Use Case |
|---|---|---|
| Delimiter-based | Messages end with \n or a special byte sequence | Text protocols, HTTP |
| Length-prefixed | Each message starts with N bytes encoding its total length | Binary protocols, FIX, exchange feeds |
For trading systems, length-prefixing is the standard. It’s deterministic, requires no escaping, and allows the receiver to know exactly how many bytes to wait for.
Message Format
Our format is the simplest possible: a 2-byte little-endian header encoding the total message length (including itself), followed by the payload.
| Bytes | Description |
|---|---|
| 0–1 | Message length (uint16_t, little-endian) — total message size including these 2 bytes |
| 2…N | Payload — (length - 2) bytes |
Example: [0x05, 0x00, 'A', 'B', 'C'] → length = 0x0005 = 5 → total 5 bytes → payload = “ABC”.
The length includes the header bytes — this is important because it means the minimum valid message is 2 bytes (length field only, empty payload). More commonly, a length of 0 or 1 should be treated as invalid.
The Interface
We’re given two abstractions:
struct IDataProvider {
virtual int GetData(std::byte* data, int maxLength) { return 0; }
virtual ~IDataProvider() = default;
};
struct ITcpSocket {
virtual void OnMessage(std::byte* bytes, int length) { };
virtual ~ITcpSocket() = default;
};
GetData mimics recv() — it fills data with up to maxLength bytes and returns the number actually read, or -1 for EOF. OnMessage is the callback for each complete frame.
Our task: implement TcpSocket::Process() that loops until EOF, calling OnMessage for each complete message, handling all edge cases.
Implementation
The State Machine
We need one piece of running state: totalSize. When it’s 0, we’re waiting for a header. When it’s non-zero, we know how many bytes the current message needs and are accumulating toward that total.
class TcpSocket : public ITcpSocket {
public:
TcpSocket(IDataProvider* provider) : provider_{provider} {}
void Process() {
const auto AllocationSize = 65655;
auto bytes = std::make_unique<std::byte[]>(AllocationSize);
int totalReceived = 0;
uint16_t totalSize = 0;
while (true) {
// Determine how many bytes to ask for
int remaining = totalReceived < 2
? AllocationSize - totalReceived // Need header
: totalSize - totalReceived; // Need rest of message
int received = provider_->GetData(
bytes.get() + totalReceived,
remaining
);
if (received == -1)
break;
totalReceived += received;
// Parse the header once we have at least 2 bytes
if (totalReceived > 1 && totalSize == 0) {
totalSize = static_cast<uint16_t>(bytes[0])
| (static_cast<uint16_t>(bytes[1]) << 8);
}
// Process complete messages
while (totalSize > 0 && totalReceived >= totalSize) {
OnMessage(bytes.get(), totalSize);
// Shift remaining data to the front
if (totalReceived > totalSize) {
std::memmove(
bytes.get(),
bytes.get() + totalSize,
totalReceived - totalSize
);
}
totalReceived -= totalSize;
totalSize = 0;
// Re-parse header if we have enough trailing data
if (totalReceived > 1) {
totalSize = static_cast<uint16_t>(bytes[0])
| (static_cast<uint16_t>(bytes[1]) << 8);
}
}
}
}
private:
IDataProvider* provider_;
};
The Key Operations
Let’s walk through each critical step.
1. Reading the Right Amount
int remaining = totalReceived < 2
? AllocationSize - totalReceived // Don't know message size yet
: totalSize - totalReceived; // Know exactly how many bytes to wait for
Before parsing the header, ask for as much as the buffer can hold. After parsing, ask for exactly the remaining message bytes. This prevents over-reading — if the next message starts in the same GetData call, we’d need AllocationSize to hold it all anyway.
2. Parsing Little-Endian uint16_t
totalSize = static_cast<uint16_t>(bytes[0])
| (static_cast<uint16_t>(bytes[1]) << 8);
Byte 0 is the LSB, byte 1 is the MSB. The bitwise OR composes them. This is host-endian-independent — it works correctly on both little-endian (x86) and big-endian machines because we’re explicitly reconstructing the value from known byte positions.
An equally valid alternative:
uint16_t totalSize;
std::memcpy(&totalSize, bytes, sizeof(totalSize));
On a little-endian host this is a no-op; on a big-endian host you’d need __builtin_bswap16(). The manual shift approach is more explicit and equally fast (the compiler optimizes it to a single mov on x86).
3. Shifting Leftover Data with memmove
if (totalReceived > totalSize) {
std::memmove(
bytes.get(),
bytes.get() + totalSize,
totalReceived - totalSize
);
}
This is the critical operation that beginners often get wrong. After processing a message, if there are extra bytes left in the buffer, they belong to the next message. We must shift them to the front without discarding them.
Why memmove and not memcpy? The source and destination regions overlap — bytes.get() + totalSize is ahead of bytes.get(). memcpy has undefined behavior on overlapping memory. memmove handles overlaps correctly.
4. Re-parsing After Shift
totalReceived -= totalSize;
totalSize = 0;
if (totalReceived > 1) {
totalSize = static_cast<uint16_t>(bytes[0])
| (static_cast<uint16_t>(bytes[1]) << 8);
}
After shifting, the remaining bytes are now at the front of the buffer. If we have at least 2, we can immediately parse the next message’s header. This handles the case where a single GetData call contains multiple complete messages.
Edge Cases, One by One
Split Header
Read 1: [0x05] → totalReceived=1, totalSize=0 (need 2 bytes for header)
Read 2: [0x00, 'A','B','C'] → totalReceived=5, parse length=0x0005, full message!
Split Message Body
Read 1: [0x05, 0x00, 'A'] → totalReceived=3, totalSize=5, not enough
Read 2: ['B', 'C'] → totalReceived=5, full message!
Multiple Messages in One Read
GetData returns: [0x03,0x00,'X', 0x04,0x00,'Y','Z']
↑ msg1 (3B) ↑ msg2 (4B)
Processing:
- Parse header → length=3
- totalReceived=7 >= 3 → deliver msg1 (bytes 0-2)
- Shift bytes [3,6] to front → buffer = [0x04,0x00,‘Y’,‘Z’, …]
- totalReceived=4, re-parse header → length=4
- totalReceived=4 >= 4 → deliver msg2 (bytes 0-3)
- totalReceived=0, loop exits on EOF
No Data / Immediate EOF
GetData returns: -1 → break out of loop, no messages delivered
Follow-Up: Preventing Memory Exhaustion
A common interview follow-up: what if a malicious client sends a huge length value?
The fix: validate the length before allocating or waiting for data:
constexpr uint16_t MAX_MESSAGE_SIZE = 4096;
if (totalReceived > 1 && totalSize == 0) {
totalSize = static_cast<uint16_t>(bytes[0])
| (static_cast<uint16_t>(bytes[1]) << 8);
if (totalSize == 0 || totalSize > MAX_MESSAGE_SIZE) {
// Protocol violation — close the connection
return;
}
}
Without this check, a totalSize of 0xFFFF (65535) would cause you to allocate and wait for 65KB of data. On a busy trading gateway with thousands of connections, this becomes a denial-of-service vector.
Follow-Up: 4-Byte Headers
What if the header were 4 bytes instead of 2?
if (totalReceived > 3 && totalSize == 0) {
totalSize = static_cast<uint32_t>(bytes[0])
| (static_cast<uint32_t>(bytes[1]) << 8)
| (static_cast<uint32_t>(bytes[2]) << 16)
| (static_cast<uint32_t>(bytes[3]) << 24);
}
The logic is identical — just more bytes to accumulate before parsing. A 4-byte header supports messages up to ~4 GB, which is overkill for most trading protocols (FIX messages are typically < 8 KB).
Key Takeaways
- TCP is a stream — message boundaries are your responsibility
- Length-prefixing is the simplest framing strategy for binary protocols
- Maintain running state — track
totalReceivedandtotalSizebetween reads memmove, notmemcpy— the buffer shift is an overlapping copy- Re-parse after shifting — handle multiple messages in a single read
- Validate the length — prevent memory exhaustion from malicious or corrupt headers
- Host-independent endianness — manual byte reconstruction works everywhere
Message framing is one of those problems that looks trivial until you’ve been bitten by a missing memmove at 3 AM. A correct implementation handles partial reads, multiple messages, and trailing data without losing a single byte — and now you have one.