The ITCH v5.0 specification declares over 20 message types related to system events, stock characteristics, the placement and modification of limit orders, and trade execution. It also contains information about the net order imbalance before the open and closing cross.
The Nasdaq offers samples of daily binary files for several months. The GitHub repository for this chapter contains a notebook, build_order_book.ipynb that illustrates how to parse a sample file of ITCH messages and reconstruct both the executed trades and the order book for any given tick.
The following table shows the frequency of the most common message types for the sample file date March 29, 2018:
Message type |
Order book impact |
Number of messages |
A |
New unattributed limit order |
136,522,761 |
D |
Order canceled |
133,811,007 |
U |
Order canceled and replaced |
21,941,015 |
E |
Full or partial execution; possibly multiple messages for the same original order |
6,687,379 |
X |
Modified after partial cancellation |
5,088,959 |
F |
Add attributed order |
2,718,602 |
P |
Trade Message (non-cross) |
1,120,861 |
C |
Executed in whole or in part at a price different from the initial display price |
157,442 |
Q |
Cross Trade Message |
17,233 |
For each message, the specification lays out the components and their respective length and data types:
Name |
Offset |
Length |
Value |
Notes |
Message type |
0 |
1 |
F |
Add Order MPID attribution message |
Stock locate |
1 |
2 |
Integer |
Locate code identifying the security |
Tracking number |
3 |
2 |
Integer |
Nasdaq internal tracking number |
Timestamp |
5 |
6 |
Integer |
Nanoseconds since midnight |
Order reference number |
11 |
8 |
Integer |
Unique reference number of the new order |
Buy/sell indicator |
19 |
1 |
Alpha |
The type of order: B = Buy Order, S = Sell Order |
Shares |
20 |
4 |
Integer |
Number of shares for the order being added to the book |
Stock |
24 |
8 |
Alpha |
Stock symbol, right-padded with spaces |
Price |
32 |
4 |
Price (4) |
The display price of the new order |
Attribution |
36 |
4 |
Alpha |
Nasdaq Market participant identifier associated with the order |
Python provides the struct module to parse binary data using format strings that identify the message elements by indicating length and type of the various components of the byte string as laid out in the specification.
Let's walk through the critical steps to parse the trading messages and reconstruct the order book:
- The ITCH parser relies on the message specifications provided as a .csv file (created by create_message_spec.py) and assembles format strings according to the formats dictionary:
formats = {
('integer', 2): 'H', # int of length 2 => format string 'H'
('integer', 4): 'I',
('integer', 6): '6s', # int of length 6 => parse as string,
convert later
('integer', 8): 'Q',
('alpha', 1) : 's',
('alpha', 2) : '2s',
('alpha', 4) : '4s',
('alpha', 8) : '8s',
('price_4', 4): 'I',
('price_8', 8): 'Q',
}
- The parser translates the message specs into format strings and namedtuples that capture the message content:
# Get ITCH specs and create formatting (type, length) tuples
specs = pd.read_csv('message_types.csv')
specs['formats'] = specs[['value', 'length']].apply(tuple,
axis=1).map(formats)
# Formatting for alpha fields
alpha_fields = specs[specs.value == 'alpha'].set_index('name')
alpha_msgs = alpha_fields.groupby('message_type')
alpha_formats = {k: v.to_dict() for k, v in alpha_msgs.formats}
alpha_length = {k: v.add(5).to_dict() for k, v in alpha_msgs.length}
# Generate message classes as named tuples and format strings
message_fields, fstring = {}, {}
for t, message in specs.groupby('message_type'):
message_fields[t] = namedtuple(typename=t, field_names=message.name.tolist())
fstring[t] = '>' + ''.join(message.formats.tolist())
- Fields of the alpha type require post-processing as defined in the format_alpha function:
def format_alpha(mtype, data):
for col in alpha_formats.get(mtype).keys():
if mtype != 'R' and col == 'stock': # stock name only in
summary message 'R'
data = data.drop(col, axis=1)
continue
data.loc[:, col] = data.loc[:, col].str.decode("utf-
8").str.strip()
if encoding.get(col):
data.loc[:, col] = data.loc[:,
col].map(encoding.get(col)) # int encoding
return data
- The binary file for a single day contains over 300,000,000 messages worth over 9 GB. The script appends the parsed result iteratively to a file in the fast HDF5 format to avoid memory constraints (see last section in this chapter for more on this format). The following (simplified) code processes the binary file and produces the parsed orders stored by message type:
with (data_path / file_name).open('rb') as data:
while True:
message_size = int.from_bytes(data.read(2), byteorder='big',
signed=False)
message_type = data.read(1).decode('ascii')
message_type_counter.update([message_type])
record = data.read(message_size - 1)
message = message_fields[message_type]._make(unpack(fstring[message_type],
record))
messages[message_type].append(message)
# deal with system events like market open/close
if message_type == 'S':
timestamp = int.from_bytes(message.timestamp,
byteorder='big')
if message.event_code.decode('ascii') == 'C': # close
store_messages(messages)
break
- As expected, a small number of the over 8,500 equity securities traded on this day account for most trades:
with pd.HDFStore(hdf_store) as store:
stocks = store['R'].loc[:, ['stock_locate', 'stock']]
trades = store['P'].append(store['Q'].rename(columns=
{'cross_price': 'price'}).merge(stocks)
trades['value'] = trades.shares.mul(trades.price)
trades['value_share'] = trades.value.div(trades.value.sum())
trade_summary =
trades.groupby('stock').value_share.sum().sort_values
(ascending=False)
trade_summary.iloc[:50].plot.bar(figsize=(14, 6), color='darkblue',
title='% of Traded Value')
plt.gca().yaxis.set_major_formatter(FuncFormatter(lambda y, _:
'{:.0%}'.format(y)))
We get the following plot for the graph: