SiLK, the System for Internet-Level Knowledge, is a toolkit originally developed by Carnegie Mellon’s CERT to conduct large-scale netflow analysis. SiLK is now used extensively by the Department of Defense, academic institutions, and industry as a basic analytical toolkit.
This chapter focuses primarily on using SiLK as an analytical tool. The CERT Network Situational Awareness team has published extensive references on using SiLK, installing collectors, and setting up the suite.
SiLK is a suite of tools for querying and analyzing NetFlow data. The SiLK suite enables an analyst to rapidly and efficiently query very large volumes of network traffic in order to identify complex aggregate phenomena or extract individual events.
SiLK is effectively a database at the command line.
Each tool performs a specific query, manipulation, or aggregation of
data, and commands are chained together to produce results.
By chaining together multiple records along pipes, SiLK enables the
analyst to create complex commands that field data along multiple
channels simultaneously. For example, the following sequence of SiLK
queries pull HTTP (port 80) traffic from flow data, producing a
time series and a list of activity by busiest address. See Example 5-1 for the basics of SiLK operation: commands are passed through a series of pipes, which can be stdin
,
stdout
, or fifos
(named pipes).
$ mkfifo out2 $ rwfilter --proto=6 --aport=80 data.rwf --pass=stdout | rwfilter --input=stdin --proto=6 --pass=stdout --all=out2 | rwstats --top --count=10 --fields=1 & rwcount out2 --bin-size=300
Data is maintained in an efficient binary representation up until the last moment, until commands that produce text (or some optional outputs) are called to produce output.
SiLK is very much an old-school Unix application suite: a family of tools tied together with pipes and using a lot of optional arguments. By using this approach, it’s possible to create powerful analytic scripts with SiLK, because the tools have well-defined interfaces that will efficiently handle binary data. Effectively using SiLK involves connecting the appropriate tools together in order to process binary data and produce text only at the very end of the process.
This chapter also uses some basic Unix shell commands such as
ls
, cat
, and head
. I don’t require you to know the shell on an
expert level.
The SiLK homepage is maintained at the CERT NetSA Security Suite web page. The SiLK package is available free for download, and can be installed on most Unix systems without much difficulty. The CERT also provides a live CD image that can be used on its own.
The SiLK live CD comes with a training dataset called LBNL-05, anonymized header traces from Lawrence Berkeley National Labs in 2005. If you install the live CD, the data will be immediately accessible. If not, you can fetch the data from The LBNL-05 reference data page.[5]
In addition to the live CD, SiLK is available in several package managers, including homebrew.
The LBNL datafiles are stored in a file hierarchy; Example 5-2 shows the results of downloading and unarchiving them.
$ gunzip -c SiLK-LBNL-05-noscan.tar $ gunzip -c SiLK-LBNL-05-scanners.tar $ cd SiLK-LBNL-05 $ ls README-S0.txt in out silk.conf README-S1.txt inweb outweb $ ls in/2005/01/07/*.01 in/2005/01/07/in-S0_20050107.01 in/2005/01/07/in-S1_20050107.01
When collecting data, SiLK partitions the data into subdirectories that divide traffic by the type of traffic and the time the event occurred. This provides scalability and speeds up analysis. However, it’s also generally a black box, and one we’re breaking right now simply to have some files to work with. For the purposes of demonstration and education, we’re going to work with four specific files:
These files are not special in any way. I chose them just to provide examples of scan and nonscan traffic. The following data discusses how to partition data and what the filenames mean.
SiLK records are stored in a compact binary format. They can’t be read
directly, and are instead accessed using the rwcut
tool (see Example 5-3). In the following
example, and any other examples with an output greater than
80 characters, the lines are manually broken for clarity.
$ rwcut inweb/2005/01/06/iw-S0_20050106.20 | more sIP| dIP|sPort|dPort|pro| packets| bytes| flags| sTime| dur| eTime|sen| 148.19.251.179| 128.3.148.48| 2497| 80| 6| 16| 2631| FS PA |2005/01/06T20:01:54.119| 0.246|2005/01/06T20:01:54.365| ?| 148.19.251.179| 128.3.148.48| 2498| 80| 6| 14| 2159| S PA |2005/01/06T20:01:54.160| 0.260|2005/01/06T20:01:54.420| ?| ...
In its default invocation, rwcut
outputs 12 fields: source and destination IP addresses and ports, protocol, number of
packets, number of bytes, TCP flags, start time, duration, end time,
and sensor of a flow. These values have been discussed previously in
Chapter 2, except for the sensor
field. SiLK can be
configured to identify individual sensors, which is useful when you’re
trying to figure out where traffic came from or where it’s going. The
sensor field is whatever ID is assigned during configuration. In the
default data there are no sensors, so the value is set to a question
mark (?).
All SiLK commands have built-in documentation. Typing rwcut --help
brings up an enormous help page. We will cover the basic
options. A fuller description of options can be found in the
SiLK documentation for
rwcut
.
The most commonly used rwcut
commands select the fields displayed
during invocation. rwcut
can actually print 29 different fields, in
arbitrary order. A list of these fields is in Table 5-1.
rwcut
fields are specified using the --fields=
option, which takes
the numeric values in Table 5-1 or the string values, and prints
the requested fields in the order specified, as in Example 5-4.
Field | Numeric ID | Description |
sIP | 1 | Source IP address |
dIP | 2 | Destination IP address |
sPort | 3 | Source port |
dPort | 4 | Destination Port: if ICMP, the ICMP type and code is encoded here also |
protocol | 5 | Layer 3 protocol |
packets | 6 | Packets in the flow |
bytes | 7 | Bytes in the flow |
flags | 8 | OR of TCP flags |
sTime | 9 | Start time in seconds |
eTime | 10 | End time in seconds |
dur | 11 | Duration (eTime–sTime) |
sensor | 12 | Sensor ID |
in | 13 | SNMP ID of the incoming interface on the router |
out | 14 | SNMP ID of the outgoing interface on the router |
nhIP | 15 | Next hop address |
sType | 16 | Classification of the source address (internal, external) |
dType | 17 | Classification of the destination address (internal, external) |
scc | 18 | Country code of the source IP |
dcc | 19 | Country code of the destination IP |
class | 20 | Class of the flow |
type | 21 | Type of the flow |
sTime +msec | 22 | sTime in milliseconds |
eTime +msec | 23 | eTime in milliseconds |
dur +msec | 24 | duration msecs |
icmpTypeCode | 25 | ICMP type and code |
initialFlags | 26 | Flags in the first TCP packet |
sessionFlags | 27 | Flags in all packets except the first |
attributes | 28 | Attributes of the flow observed by the generator |
application | 29 | Guess as to the application in the flow |
$# Show a limited set of fields $ rwcut --field=1-5 inweb/2005/01/06/iw-S0_20050106.20 | head -2 sIP| dIP|sPort|dPort|pro| 148.19.251.179| 128.3.148.48| 2497| 80| 6| $#Note the -, now explicitly enumerate $ rwcut --field=1,2,3,4,5 inweb/2005/01/06/iw-S0_20050106.20 | head -2 sIP| dIP|sPort|dPort|pro| 148.19.251.179| 128.3.148.48| 2497| 80| 6| $#Field order is based on what you enter in --field $ rwcut --field=5,1,2,3,4 inweb/2005/01/06/iw-S0_20050106.20 | head -2 pro| sIP| dIP|sPort|dPort| 6| 148.19.251.179| 128.3.148.48| 2497| 80| $#We can use text instead of numbers $ rwcut --field=sIP,dIP,proto inweb/2005/01/06/iw-S0_20050106.20 |head -2 sIP| dIP|pro| 148.19.251.179| 128.3.148.48| 6|
rwcut
supports a number of other output formatting and manipulation
tools. Some particularly useful ones, which let you control the
lines that appear in the output, include:
--no-title
--num-recs
head
pipe
in the previous example. The default value is zero, which makes rwcut
dump the entire contents of whatever file it’s reading.
--start-rec-num
and --end-rec-num
Example 5-5 shows a few ways to manipulate record numbers and headers.
$# Drop the title $ rwcut --field=1-9 --no-title inweb/2005/01/06/iw-S0_20050106.20 | head -5 148.19.251.179| 128.3.148.48| 2497| 80| 6| 16| 2631|FS PA |2005/01/06T20:01:54.119| 148.19.251.179| 128.3.148.48| 2498| 80| 6| 14| 2159| S PA |2005/01/06T20:01:54.160| 148.19.251.179| 128.3.148.48| 2498| 80| 6| 2| 80|F A |2005/01/06T20:07:07.845| 56.71.233.157| 128.3.148.48|48906| 80| 6| 5| 300| S |2005/01/06T20:01:50.011| 56.96.13.225| 128.3.148.48|50722| 80| 6| 6| 360| S |2005/01/06T20:02:57.132| $# Drop the head statement $ rwcut --field=1-9 inweb/2005/01/06/iw-S0_20050106.20 --num-recs=5 sIP| dIP|sPort|dPort|pro| packets| bytes| flags | sTime| 148.19.251.179| 128.3.148.48| 2497| 80| 6| 16| 2631|FS PA |2005/01/06T20:01:54.119| 148.19.251.179| 128.3.148.48| 2498| 80| 6| 14| 2159| S PA |2005/01/06T20:01:54.160| 148.19.251.179| 128.3.148.48| 2498| 80| 6| 2| 80|F A |2005/01/06T20:07:07.845| 56.71.233.157| 128.3.148.48|48906| 80| 6| 5| 300| S |2005/01/06T20:01:50.011| 56.96.13.225| 128.3.148.48|50722| 80| 6| 6| 360| S |2005/01/06T20:02:57.132| $# Print only the third through fifth record $ rwcut --field=1-9 inweb/2005/01/06/iw-S0_20050106.20 --start-rec-num=3 --end-rec-num=5 sIP| dIP|sPort|dPort|pro| packets| bytes| flags | sTime| 148.19.251.179| 128.3.148.48| 2498| 80| 6| 2| 80|F A |2005/01/06T20:07:07.845| 56.71.233.157| 128.3.148.48|48906| 80| 6| 5| 300| S |2005/01/06T20:01:50.011| 56.96.13.225| 128.3.148.48|50722| 80| 6| 6| 360| S |2005/01/06T20:02:57.132|
A number of options manipulate output format. Tabulation is
controllable with the --column-separator
, --no-final-column
, and
--no-columns
switches. --column-seperator
will change the
character used to distinguish columns, while --no-final-column
drops
the delimiter at the end of the line. --no-columns
removes any
space padding between columns. The --delimited
switch combines
all three: it takes a character as an argument, uses that character as
a column separator, removes all padding in the columns, and drops the
final column separator.
In addition, there are a variety of switches for changing column content:
--integer-ips
--ip-format=decimal
.
--ip-format
--integer-ips
, --ip-format
specifies how addresses are rendered. Options include canonical
(dotted quad for IPv4, canonical IPv6 for IPv6), zero-padded
(canonical
, except zeroes are expanded to the maximal value for each
format, so 127.0.0.1 is 127.000.000.001), decimal
(print as the
corresponding 32-bit or 128-bit integer), hexadecimal
(print the
integer in hexadeximal format), and force-ipv6
(prints all addresses
in canonical IPv6 format, including IPv4 addresses mapped to the
::ffff:0:0/96 netblock).
--epoch-time
--integer-tcp-flags
--zero-pad-ips
--ip-format
in SiLK v3.
--icmp-type-and-code
--pager
Example 5-6 shows some of the preceding options.
$# Change from fixed with columns to delims $ rwcut --field=1-5 inweb/2005/01/06/iw-S0_20050106.20 --no-columns --num-recs=2 sIP|dIP|sPort|dPort|protocol| 148.19.251.179|128.3.148.48|2497|80|6| 148.19.251.179|128.3.148.48|2498|80|6| $# Change the column separator $ rwcut --field=1-5 inweb/2005/01/06/iw-S0_20050106.20 --column-sep=: --num-recs=2 sIP: dIP:sPort:dPort:pro: 148.19.251.179: 128.3.148.48: 2497: 80: 6: 148.19.251.179: 128.3.148.48: 2498: 80: 6: $# Use --delim to change everything at once $ rwcut --field=1-5 inweb/2005/01/06/iw-S0_20050106.20 --delim=: --num-recs=2 sIP:dIP:sPort:dPort:protocol 148.19.251.179:128.3.148.48:2497:80:6 148.19.251.179:128.3.148.48:2498:80:6 $# Convert IP addresses to integers $ rwcut --field=1-5 inweb/2005/01/06/iw-S0_20050106.20 --integer-ip --num-recs=2 sIP| dIP|sPort|dPort|pro| 2484337587|2147718192| 2497| 80| 6| 2484337587|2147718192| 2498| 80| 6| $# Use epoch time $ rwcut --field=1-5,9 inweb/2005/01/06/iw-S0_20050106.20 --epoch --num-recs=2 sIP| dIP|sPort|dPort|pro| sTime| 148.19.251.179| 128.3.148.48| 2497| 80| 6|1105041714.119| 148.19.251.179| 128.3.148.48| 2498| 80| 6|1105041714.160| $# Zero pad IP addresses $ rwcut --field=1-5,9 inweb/2005/01/06/iw-S0_20050106.20 --zero-pad --num-recs=2 sIP| dIP|sPort|dPort|pro| sTime| 148.019.251.179|128.003.148.048| 2497| 80| 6|2005/01/06T20:01:54.119| 148.019.251.179|128.003.148.048| 2498| 80| 6|2005/01/06T20:01:54.160|
You will note that, as the command lines get more complex, I have truncated the longer options. SiLK uses GNU-style long options universally, so the only requirement for specifying an option is to type enough characters to make the name unambiguous. Expect more and more truncation as we build more and more complex commands.
The most basic SiLK command with analytical values is rwcut
paired with rwfilter
through a pipe. Example 5-7 shows a simple rwfilter
command.
$ rwfilter --dport=80 inweb/2005/01/06/iw-S0_20050106.20 --pass=stdout | rwcut --field=1-9 --num-recs=5 sIP| dIP|sPort|dPort|pro| packets| bytes| flags | sTime| 148.19.251.179| 128.3.148.48| 2497| 80| 6| 16| 2631|FS PA |2005/01/06T20:01:54.119| 148.19.251.179| 128.3.148.48| 2498| 80| 6| 14| 2159| S PA |2005/01/06T20:01:54.160| 148.19.251.179| 128.3.148.48| 2498| 80| 6| 2| 80|F A |2005/01/06T20:07:07.845| 56.71.233.157| 128.3.148.48|48906| 80| 6| 5| 300| S |2005/01/06T20:01:50.011| 56.96.13.225| 128.3.148.48|50722| 80| 6| 6| 360| S |2005/01/06T20:02:57.132|
rwfilter
with a single filter (the --dport
option in this case),
and a single redirect (the --pass=stdout
) is about as simple as you
can get. rwfilter
is the workhorse of the SiLK suite: it reads
input (directly from a file, using a set of globbing specifications,
or through a pipe), applies one or more filters to each record in the
data, and then redirects the records based on whether a record matches
the filters (passes
) or doesn’t match (fails
).
SiLK’s rwfilter
documentation is
humongous, but primarily consists of repetitively describing the
filter specifications for every field, so don’t be intimidated.
rwfilter
options basically do one of three things: they specify
how to filter data, how to read data, or how to direct the
result of those filters.
The easiest filters to start with are --sport
, --dport
, and
--protocol
. As the names imply, they filter on the source
port, destination port, and protocol, respectively (see Example 5-8). These values can
filter on a specific value (e.g., --sport=80
will pass any traffic
where the source port is 80), or a range specified with a dash or
commas (so --sport=79-83
will pass anything where the source port is
between 79 and 83 inclusive, and could be expressed as
--sport=79,80,81,82,83
).
$ rwfilter --dport=4350-4360 inweb/2005/01/06/iw-S0_20050106.20 --pass=stdout | rwcut --field=1-9 --num-recs=5 sIP| dIP|sPort|dPort|pro| packets| bytes| flags | sTime| 218.131.115.42| 131.243.105.35| 80| 4360| 6| 2| 80|F A |2005/01/06T20:24:21.879| 148.19.96.160|131.243.107.239| 80| 4350| 6| 27| 35445|FS PA |2005/01/06T20:59:42.451| 148.19.96.160|131.243.107.239| 80| 4352| 6| 4| 709|FS PA |2005/01/06T20:59:42.507| 148.19.96.160|131.243.107.239| 80| 4351| 6| 15| 16938|FS PA |2005/01/06T20:59:42.501| 148.19.96.160|131.243.107.239| 80| 4353| 6| 4| 704|FS PA |2005/01/06T20:59:42.544| $ rwfilter --sport=4000- inweb/2005/01/06/iw-S0_20050106.20 --pass=stdout | rwcut --field=1-9 --num-recs=5 sIP| dIP|sPort|dPort|pro| packets| bytes| flags | sTime| 56.71.233.157| 128.3.148.48|48906| 80| 6| 5| 300| S |2005/01/06T20:01:50.011| 56.96.13.225| 128.3.148.48|50722| 80| 6| 6| 360| S |2005/01/06T20:02:57.132| 56.96.13.225| 128.3.148.48|50726| 80| 6| 6| 360| S |2005/01/06T20:02:57.432| 58.236.56.129| 128.3.148.48|32621| 80| 6| 3| 144| S |2005/01/06T20:12:10.747| 56.96.13.225| 128.3.148.48|54497| 443| 6| 6| 360| S |2005/01/06T20:09:30.124| $ rwfilter --dport=4350,4352 inweb/2005/01/06/iw-S0_20050106.20 --pass=stdout | rwcut --field=1-9 --num-recs=5 sIP| dIP|sPort|dPort|pro| packets| bytes| flags | sTime| 148.19.96.160|131.243.107.239| 80| 4350| 6| 27| 35445|FS PA |2005/01/06T20:59:42.451| 148.19.96.160|131.243.107.239| 80| 4352| 6| 4| 709|FS PA |2005/01/06T20:59:42.507| 148.19.96.160|131.243.107.239| 80| 4352| 6| 1| 40| A |2005/01/06T20:59:42.516| $ rwfilter --proto=1 in/2005/01/07/in-S0_20050107.01 --pass=stdout | rwcut --field=1-6 --num-recs=2 sIP| dIP|sPort|dPort|pro| packets| 35.223.112.236| 128.3.23.93| 0| 2048| 1| 1| 62.198.182.170| 128.3.23.81| 0| 2048| 1| 1| $ rwfilter --proto=1,6,17 in/2005/01/07/in-S0_20050107.01 --pass=stdout | rwcut --num-recs=2 --fields=1-6 sIP| dIP|sPort|dPort|pro| packets| 116.66.41.147|131.243.163.201| 4283| 1026| 17| 1| 116.66.41.147|131.243.163.201| 3131| 1027| 17| 1| $ rwfilter --proto=1,6,17 in/2005/01/07/in-S0_20050107.01 --fail=stdout | rwcut --num-recs=2 --fields=1-6 sIP| dIP|sPort|dPort|pro| packets| 57.120.186.177| 128.3.26.171| 0| 0| 50| 70| 57.120.186.177| 128.3.26.171| 0| 0| 50| 81|
Note the use of --fail
in the last example. Because there are 255
potential protocols, specifying “everything but TCP, ICMP, and UDP”
could be expressed in two ways: either by specifying everything you want (--proto=0,2-5,7-16,18-
), or
by using the --fail
option. I’ll discuss more advanced
manipulation of --pass
and --fail
in the next chapter.
Volume (size) options (bytes and packets) are similar to the protocol and
port options in that you express them numerically. Unlike the
enumerations (ports and protocol), these numeric values can be
expressed only as single digits or ranges, not as comma-separated values.
So, --packets=70-81
is acceptable, but --bytes=1,2,3,4
is not.
The simplest form of IP address filtering simply expresses the IP
address directly (see Example 5-9). The following examples show strict filtering on the
source (--saddress
) and destination (--daddress
) address, and the
--any-address
option. --any-address
will match either source or
destination addresses.
$ rwfilter --saddress=197.142.156.83 --pass=stdout in/2005/01/07/in-S0_20050107.01 | rwcut --num-recs=2 sIP| dIP|sPort|dPort|pro| packets| bytes| flags| sTime| dur| eTime|sen| 197.142.156.83| 224.2.127.254|44510| 9875| 17| 12| 7163| | 2005/01/07T01:24:44.359| 16.756|2005/01/07T01:25:01.115| ?| 197.142.156.83| 224.2.127.254|44512| 9875| 17| 4| 2590| | 2005/01/07T01:25:02.375| 5.742|2005/01/07T01:25:08.117| ?| $ rwfilter --daddress=128.3.26.249 --pass=stdout in/2005/01/07/in-S0_20050107.01 | rwcut --num-recs=2 sIP| dIP|sPort|dPort|pro| packets| bytes| flags| sTime| dur| eTime|sen| 211.210.215.142| 128.3.26.249| 4068| 25| 6| 7| 388|FS PA | 2005/01/07T01:27:06.789| 5.052|2005/01/07T01:27:11.841| ?| 203.126.20.182| 128.3.26.249|51981| 4587| 6| 56| 2240|F A | 2005/01/07T01:27:04.812| 18.530|2005/01/07T01:27:23.342| ?| $ rwfilter --any-address=128.3.26.249 --pass=stdout in/2005/01/07/in-S0_20050107.01 | rwcut --num-recs=2 sIP| dIP|sPort|dPort|pro| packets| bytes| flags| sTime| dur| eTime|sen| 211.210.215.142| 128.3.26.249| 4068| 25| 6| 7| 388|FS PA | 2005/01/07T01:27:06.789| 5.052|2005/01/07T01:27:11.841| ?| 203.126.20.182| 128.3.26.249|51981| 4587| 6| 56| 2240|F A | 2005/01/07T01:27:04.812| 18.530|2005/01/07T01:27:23.342| ?|
Address options accept a variety of range descriptors. Each quad in an IP address can be expressed using the same comma-dash format that protocols and ports use. IP addresses will also accept the character x to mean 0-255. This expression can be used within each quad; SiLK will match each quad separately. In addition to this comma-dash format, SiLK can match on CIDR blocks.
SiLK supports IPv6 by using IPv6’s colon-based notation. The following are all examples of valid IPv6 filters in SiLK, and Example 5-10 shows how to filter them:
::ffff:x ::ffff:0:aaaa,0-5 ::ffff:0.0.5-130,1,255.x
$#Filtering on the last quad $ rwfilter --daddress=131.243.104.x inweb/2005/01/06/iw-S0_20050106.20 --pass=stdout | rwcut --field=1-5 --num-recs=5 sIP| dIP|sPort|dPort|pro| 150.52.105.212|131.243.104.181| 80| 1262| 6| 150.52.105.212|131.243.104.181| 80| 1263| 6| 59.100.39.174| 131.243.104.27| 80| 3188| 6| 59.100.39.174| 131.243.104.27| 80| 3191| 6| 59.100.39.174| 131.243.104.27| 80| 3193| 6| # Filtering a range of specific values in the third quad $ rwfilter --daddress=131.243.104,107,219.x inweb/2005/01/06/iw-S0_20050106.20 --pass=stdout | rwcut --field=1-5 --num-recs=5 sIP| dIP|sPort|dPort|pro| 208.122.23.36|131.243.219.201| 80| 2473| 6| 205.233.167.250|131.243.219.201| 80| 2471| 6| 58.68.205.40| 131.243.219.37| 80| 3433| 6| 208.233.181.122| 131.243.219.37| 80| 3434| 6| 58.68.205.40| 131.243.219.37| 80| 3435| 6| # Using CIDR blocks $ rwfilter --saddress=56.81.0.0/16 inweb/2005/01/06/iw-S0_20050106.20 --pass=stdout | rwcut --field=1-5 --num-recs=5 sIP| dIP|sPort|dPort|pro| 56.81.19.218|131.243.219.201| 80| 2480| 6| 56.81.16.73|131.243.219.201| 80| 2484| 6| 56.81.16.73|131.243.219.201| 80| 2486| 6| 56.81.30.48|131.243.219.201| 443| 2490| 6| 56.81.31.159|131.243.219.201| 443| 2489| 6|
There are three time options: --stime
, --etime
, and
--active-time
. These fields require a time range, which in SiLK is
written in the format:
YYYY/MM/DDTHH:MM:SS-YYYY/MM/DDTHH:MM:SS
Note the T separating the day and hour. The --stime
and --etime
fields filter exactly what it says on the can, which can be a bit
counterintuitive; specifying --stime=2012/11/08T00:00:00-2012/11/08T00:02:00
filters any
record whose start time is between midnight and two minutes after
midnight on November 8, 2012. Records that started before midnight
and are still being transmitted during that range
will not pass. To find
records that occurred within a particular period, use the
--active-time
filter.
Flows are aggregates of packets, and in the majority of cases, this aggregation is relatively easy to understand. For example, the number of bytes in a flow is the sum of the number of bytes of all the packets that comprise the flow. TCP flags, however, are a bit more problematic. In NetFlow v5, a flow’s flags are the bitwise OR of the flags in its constituent packets—meaning that a flow indicates that a flag was present or absent in the entire flow, but not where. A flow could conceivably consist of a gibberish sequence of flags such as a FIN, then an ACK and SYN. Monitoring software such as YAF expands NetFlow to include additional flag fields, which SiLK can take advantage of.
The core flag filtering switches are --flags-initial
,--flags-all
,
and --flags-session
. These options accept flags in the form high
flags/mask flags. If a flag is listed in the mask, SiLK always
parses it. If a flag is listed in the high flags, SiLK passes it
only if the value is high. The flags themselves are expressed using
the characters in Table 5-2.
The combination of high flags and mask flags tends to confuse people, so let’s review some examples. Remember that the basic rule is that in order to evaluate a flag, it must be in the mask. A flag specified as high but not specified in the mask will be ignored.
In addition to these options, SiLK provides a set of flag-specific
options in the form of --syn-flag
, --fin-flag
, and so on for each
potential flag. These options take a 1 or 0 as an argument:
setting the value to 1 will pass records where the flag is high, 0
will pass records where the flag is low, and not including the option
will pass all records.
If you compare rwfilter
’s option-based filtering against tcpdump
’s
BPF filtering, it’s immediately obvious that rwfilter
’s approach is
much more primitive. This was an intentional decision: rwfilter is
focused on processing large volumes as quickly as possible, and the
overhead involved in processing some kind of parseable language was
deemed too expensive.
The place where this usually trips people up is the lack of obvious
not
and or
operators. For example, if you want to filter out all
web sessions, you may try to filter traffic where one port is 80, and
the other is ephemeral. The initial attempt might be:
rwfilter --sport=80,1024-65535 --dport=80,1024-65535 --pass=stdout
The problem is that this will also pass any flows where the source and
destination port are both 80, and flows where the source and
destination port are both ephemeral. To deal with problems like this,
rwfilter
has a collection of helper functions, which combined with
the --fail
option and multiple filters should be able to address any
of these problems.
In the case of ports, the --aport
option refers to either
the source or the destination port. Using --aport
and two
filters, you can identify the appropriate sessions as follows:
rwfilter --aport=80 --pass=stdout | rwfilter --input-pipe=stdin --aport=1024-65535 --pass=stdout
The first filter identifies anything engaged in port 80 traffic, and the second takes that set and identifies anything that also used an ephemeral port.
A number of IP address helper options are available. --anyaddress
filters across source and destination addresses
simultaneously. --not-saddress
and --not-daddress
pass
records with addresses that don’t match the option specification.
rwfilter
has a couple of direct text output options:
--print-stat
(see Example 5-11) and --print-volume-stat
. These can be used to print
a summary of the traffic without having to resort to cut, count, or
other display tools. They also will print volumes of records that
did not pass a filter.
$ rwfilter --print-volume-stat in/2005/01/07/in-S0_20050107.01 --proto=0-255 | Recs| Packets| Bytes| Files| Total| 2019| 2730488| 402105501| 1| Pass| 2019| 2730488| 402105501| | Fail| 0| 0| 0| | $ rwfilter --print-stat in/2005/01/07/in-S0_20050107.01 --proto=0-255 Files 1. Read 2019. Pass 2019. Fail 0.
Note in Example 5-11 the use of the --proto=0-255
option. In almost
all invocations, rwfilter
expects some form of filtering applied
to it, so when you need a filter that passes everything, the easiest
approach is just to specify all the protocols. --print-stat
and
--print-volume-stat
output to stderr
, so you can still use stdout
for pass, fail, and all channels.
Like rwcut
, rwfilter
has a record limit command.
--max-pass-records
and --max-fail-records
can be used to limit the
number of records passed through a pass or fail channel.
SiLK filter files contain a fair amount of metadata, which can be
accessed using the rwfileinfo
command (see Example 5-12). rwfileinfo
can work with
files, as seen in the examples below, or directly on stdin
by using
stdin
or -
as an argument.
$ rwfileinfo in/2005/01/07/in-S0_20050107.01 in/2005/01/07/in-S0_20050107.01: format(id) FT_RWAUGMENTED(0x14) version 2 byte-order littleEndian compression(id) none(0) header-length 28 record-length 28 record-version 2 silk-version 0 count-records 2019 file-size 56560 packed-file-info 2005/01/07T01:00:00 ? ? $ rwfilter --print-stat in/2005/01/07/in-S0_20050107.01 --proto=6 --pass=example.rwf Files 1. Read 2019. Pass 1353. Fail 666. $ rwfileinfo example.rwf example.rwf: format(id) FT_RWGENERIC(0x16) version 16 byte-order littleEndian compression(id) none(0) header-length 156 record-length 52 record-version 5 silk-version 2.1.0 count-records 1353 file-size 70512 command-lines 1 rwfilter --print-stat --proto=6 --pass=example.rwf in/2005/01/07/in-S0_20050107.01 $ rwfilter --aport=25 example.rwf --pass=example2.rwf --fail=example2_fail.rwf $ rwfileinfo example2.rwf example2.rwf: format(id) FT_RWGENERIC(0x16) version 16 byte-order littleEndian compression(id) none(0) header-length 208 record-length 52 record-version 5 silk-version 2.1.0 count-records 95 file-size 5148 command-lines 1 rwfilter --print-stat --proto=6 --pass=example.rwf in/2005/01/07/in-S0_20050107.01 2 rwfilter --aport=25 --pass=example2.rwf --fail=example2_fail.rwf example.rwf
The fields reported by rwfileinfo
are as follows:
example2.rwf
rwfileinfo
dump is the name of the file.
format(id)
format
value is a C macro describing the
type of the file, followed by the hexadecimal ID of that type.
version
byte-order
compression(id)
header-length
record-length
record-version
silk-version
count-records
file-size
command-lines
Example 5-13 shows how to use the --note-add
command.
$ rwfilter --aport=22 example.rwf --note-add='Filtering ssh' --pass=ex2.rwf $ rwfileinfo ex2.rwf ex2.rwf: format(id) FT_RWGENERIC(0x16) version 16 byte-order littleEndian compression(id) none(0) header-length 260 record-length 52 record-version 5 silk-version 2.1.0 count-records 10 file-size 780 command-lines 1 rwfilter --print-stat --proto=6 --pass=example.rwf in/2005/01/07/in-S0_20050107.01 2 rwfilter --aport=22 --note-add=Filtering ssh --pass=ex2.rwf example.rwf annotations 1 Filtering ssh
rwcount
can produce time series data from the output of
an rwfilter
command. It works by placing counts of bytes, packets, and flow
records into fixed-duration bins, which are equally
sized time periods specified by the user. rwcount
is a relatively straightforward application. Most of its complexity
comes from relating the flows, which themselves have a duration, to
the bins.
The simplest invocation of rwcount
is shown in Example 5-14. The
first thing to notice is the use of the --bin-size
option. In this
example, the bins are half an hour, or 1,800 seconds. If --bin-size
isn’t specified, rwcount
will default to 30-second bins. Bin sizes
don’t have to be integers; floating-point specifications with a
resolution down to the millisecond are acceptable for people who
like lots of bins in their output.
$ rwfilter in/2005/01/07/in-S0_20050107.01 --all=stdout | rwcount --bin-size=1800 Date| Records| Bytes| Packets| 2005/01/07T01:00:00| 257.58| 42827381.72| 248724.14| 2005/01/07T01:30:00| 1589.61| 211453506.60| 1438751.93| 2005/01/07T02:00:00| 171.81| 147824612.67| 1043011.93|
As Example 5-14 shows, rwcount
outputs four columns: a date column
in SiLK’s standard date format (YYYY/MM/DDTHH:MM:SS), followed by record,
byte, and packet columns. The floating-point values are a function of
rwcount
interpolating how much traffic should be in each bin;
rwcount
calls this a load scheme.
The load scheme is an attempt by rwcount
to approximate how much of
a flow took place over the period specified by the bins. In the
default load scheme, rwcount
splits each flow proportionally across
all the bins during which the flow was taking place. For example, if
a flow takes place from 00:04:00 to 00:11:00, and bins are five
minutes long, 1/7 of the flow will be added to the first
(00:00:00-00:04:59) bin, 5/7 to the second bin (00:05:00-00:09:59),
and 1/7 to the third (00:10:00-00:14:59) bin. rwcount
takes an
integer parameter in the --load-scheme
option, with the following
results:
0 |
Split the traffic evenly across all bins covered. In the previous example, the flow would be split into thirds, and a third added to each bin. |
1 |
Add the entire flow to the first bin covered by the flow. In the previous example, 00:00:00-00:04:59. |
2 |
Add the entire flow to the last bin covered by the flow. In the previous example, 00:10:00-00:14:59. |
3 |
Add the entire flow to the middle bin covered by the flow. In the previous example, 00:05:00-00:09:59. |
4 |
The default load scheme. |
rwcount
uses the flow data provided to guess which time bins are
required, but sometimes you have to explicitly specify the time,
especially when coordinating multiple files. This can be done using
the --start-epoch
and --end-epoch
options to specify starting and
ending bin times. Note that these parameters can use the epoch time
or yyyy/mm/dd:HH:MM:SS format. rwcount
also has an option to print
dates using epoch time: the --epoch-slots
option.
The --skip-zero
option (see Example 5-15) is one of a number of output format options.
Normally, rwcount
prints every empty bin it has allocated, but
--skip-zero
causes empty bins to be omitted from the output. In
addition, rwcount
supports many of the output options mentioned for
rwcut
: --no-titles
, --no-columns
, --column-separator
,
--no-final-delimter
, and --delimited
.
rwfilter in/2005/01/07/in-S0_20050107.01 --all=stdout | rwcount --bin-size=1800.00 --epoch Date| Records| Bytes| Packets| 1105059600| 257.58| 42827381.72| 248724.14| 1105061400| 1589.61| 211453506.60| 1438751.93| 1105063200| 171.81| 147824612.67| 1043011.93| $ rwfilter in/2005/01/07/in-S0_20050107.01 --all=stdout | rwcount --bin-size=1800.00 --epoch --start-epoch=1105057800 Date| Records| Bytes| Packets| 1105057800| 0.00| 0.00| 0.00| 1105059600| 257.58| 42827381.72| 248724.14| 1105061400| 1589.61| 211453506.60| 1438751.93| 1105063200| 171.81| 147824612.67| 1043011.93| $ rwfilter in/2005/01/07/in-S0_20050107.01 --all=stdout | rwcount --bin-size=1800.00 --epoch --start-epoch=1105056000 Date| Records| Bytes| Packets| 1105056000| 0.00| 0.00| 0.00| 1105057800| 0.00| 0.00| 0.00| 1105059600| 257.58| 42827381.72| 248724.14| 1105061400| 1589.61| 211453506.60| 1438751.93| 1105063200| 171.81| 147824612.67| 1043011.93| $ rwfilter in/2005/01/07/in-S0_20050107.01 --all=stdout | rwcount --bin-size=1800.00 --epoch --start-epoch=1105056000 --skip-zero Date| Records| Bytes| Packets| 1105059600| 257.58| 42827381.72| 248724.14| 1105061400| 1589.61| 211453506.60| 1438751.93| 1105063200| 171.81| 147824612.67| 1043011.93|
IP sets are SiLK’s most powerful capability, and something that distinguishes the toolkit from most other analytical tools. An IP set is a binary representation of an arbitrary collection of IP addresses. IP sets can be created from text files, from SiLK data, or by using other binary SiLK structures.
The easiest way to start with IP sets is to create one, as in Example 5-16.
$ rwfilter in/2005/01/07/in-S0_20050107.01 --all=stdout | rwset --sip-file=sip.set --dip-file=dip.set $ ls -l *.set -rw-r--r-- 1 mcollins staff 580 Jan 10 01:06 dip.set -rw-r--r-- 1 mcollins staff 15088 Jan 10 01:06 sip.set $ rwsetcat sip.set | head -5 0.0.0.0 32.16.40.178 32.24.41.181 32.24.215.49 32.30.13.177 $ rwfileinfo sip.set sip.set: format(id) FT_IPSET(0x1d) version 16 byte-order littleEndian compression(id) none(0) header-length 76 record-length 1 record-version 2 silk-version 2.1.0 count-records 15012 file-size 15088 command-lines 1 rwset --sip-file=sip.set --dip-file=dip.set
rwset
takes flow records and produces up to four output files. The
file specified with --sip-file
will contain source IP addresses from
the flow, --dip-file
will contain destination addresses,
--any-file
will contain source and destination IP addresses, and
nhip-file
will contain next hop addresses. The output is binary and
read with rwsetcat
, and as with all SiLK files, the file can be examined
using rwfileinfo
.
The power of IP sets comes when they’re combined with rwfilter
.
rwfilter
has eight commands that accept IP sets (--sipset
,
--dipset
, --nhipset
, --anyset
, and their negations).
Sets are explicitly designed so rwfilter
can rapidly query using
them, enabling a variety of useful queries, as seen in Example 5-17.
$ # First, we create IP sets; I use aport=123 (NTP on UDP) to filter down $ # to a reasonable set of addresses. NTP clients and servers use the same $ # port. $ rwfilter in/2005/01/07/in-S0_20050107.01 --pass=stdout --aport=123 | rwset --sip-file=sip.set --dip-file=dip.set $ # Now, let's see how many IP addresses are created $ rwsetcat --count-ip sip.set 15 $ # Generating output using rwfilter; note the use of the --dipset file as the $ # sip set; this means that I'm now looking for messages that responded to $ # these addresses. This means that I've seen ntp going to and from the $ # address, meaning it's likely to be a legitimate speaker, as opposed to a $ # scan on port 123. $ rwfilter out/2005/01/07/out-S0_20050107.01 --dipset=sip.set --pass=stdout --aport=123 | rwcut | head -5 sIP| dIP|sPort|dPort|pro| packets| bytes| flags| sTime| dur| eTime|sen| 128.3.23.152| 56.7.90.229| 123| 123| 17| 1| 76| | 2005/01/07T01:10:00.520| 0.083|2005/01/07T01:10:00.603| ?| 128.3.23.152| 192.41.221.11| 123| 123| 17| 1| 76| | 2005/01/07T01:10:15.519| 0.000|2005/01/07T01:10:15.519| ?| 128.3.23.231| 87.221.134.185| 123| 123| 17| 1| 76| | 2005/01/07T01:24:46.251| 0.005|2005/01/07T01:24:46.256| ?| 128.3.26.152| 58.243.214.183| 123|10123| 17| 1| 76| | 2005/01/07T01:27:08.854| 0.000|2005/01/07T01:27:08.854| ?| $ # Let's look at statistics; using the same file, I look at the hosts $ # that responded $ rwfilter out/2005/01/07/out-S0_20050107.01 --dipset=sip.set --aport=123 --print-stat Files 1. Read 12393. Pass 21. Fail 12372. $ # Now I look at everyone else; not-dipset means that I'm looking at everything $ # on port 123 that doesn't go to these addresses. $ rwfilter out/2005/01/07/out-S0_20050107.01 --not-dipset=sip.set --aport=123 --print-stat Files 1. Read 12393. Pass 337. Fail 12056.
Sets can also be generated by hand using rwsetbuild
, which takes
text input and produces a set file as the output. The rwsetbuild
specification takes any of the IP address specifications used by the
--saddress
option in rwfilter
: literal addresses, integers, ranges
within dotted quads, and netmasks. Example 5-18 demonstrates this.
$ cat > setsample.txt # Comments in set files are prefaced with a hashmark # Literal address 255.230.1.1 # Note that I'm putting addresses in some semi-random order; the output # will be ordered. 111.2.3-4.1-2 # Netmask 22.11.1.128/30 ^D $ rwsetbuild setsample.txt setsample.set $ rwsetcat --print-ip setsample.set 22.11.1.128 22.11.1.129 22.11.1.130 22.11.1.132 111.2.3.1 111.2.3.2 111.2.4.1 111.2.4.2 255.230.1.1
Sets can also be manipulated using the rwsettool
command, which
provides a variety of mechanisms for adding and removing sets.
rwsettool
supports four manipulations:
--union
--intersect
--difference
--sample
rwsettool
is generally invoked using an output path
(--output=_file_
), but if nothing is specified, it will dump to
stdout
. As with rwfilter
, rwsettool
output is binary, so a pure
terminal dump triggers an error. Example 5-19 shows a manipulation with rwsettool
.
$ rm setsample2.set $ cat > setsample2.txt # Build a set that covers our original setsample file to # see what happens with various functions 22.11.1.128/29 $ rwsetbuild setsample2.txt setsample2.set $ rwsettool --union setsample.set setsample2.set | rwsetcat 22.11.1.128 22.11.1.129 22.11.1.130 22.11.1.131 22.11.1.132 22.11.1.133 22.11.1.134 22.11.1.135 111.2.3.1 111.2.3.2 111.2.4.1 111.2.4.2 255.230.1.1 $ rwsettool --intersect setsample.set setsample2.set | rwsetcat 22.11.1.128 22.11.1.129 22.11.1.130 22.11.1.131 $ rwsettool --difference setsample.set setsample2.set | rwsetcat 111.2.3.1 111.2.3.2 111.2.4.1 111.2.4.2 255.230.1.1
rwuniq
is the utility knife of counting tools. It allows an
analyst to specify a key containing one or more fields, and will then
count a number of different values, including total number of bytes,
packets, flow records, or unique IP addresses matching the key.
rwuniq
’s default configuration counts the number of flows that
occurred for a particular key. The key itself must be specified using
the --field
option, which accepts the field specifiers in
Table 5-1. rwuniq
can accept multiple fields, and the key will
be generated in the order specified in the command line. Example 5-20 demonstrates the key features of the --field
option. As it shows, field order in the option affects field
ordering in the output.
$ rwfilter out/2005/01/07/out-S0_20050107.01 --all=stdout | rwuniq --field=sip,proto | head -4 sIP|pro| Records| 131.243.142.85| 17| 1| 131.243.141.187| 17| 6| 128.3.23.41| 17| 4| $ rwfilter out/2005/01/07/out-S0_20050107.01 --all=stdout | rwuniq --field=1,2 | head -4 sIP| dIP| Records| 128.3.174.158| 128.3.23.44| 2| 128.3.191.1|239.255.255.253| 8| 128.3.161.98|131.243.163.206| 1| $ rwfilter out/2005/01/07/out-S0_20050107.01 --all=stdout | rwuniq --field=sip,sport | head -4 sIP|sPort| Records| 131.243.63.143|53504| 1| 131.243.219.52|61506| 1| 131.243.163.206| 1032| 1| $ rwfilter out/2005/01/07/out-S0_20050107.01 --all=stdout | rwuniq --field=sport,sip | head -4 sPort| sIP| Records| 55876| 131.243.61.70| 1| 51864|131.243.103.106| 1| 50955| 131.243.103.13| 1|
Also, note that when fields’ orders are changed,
the order in which records are output also changes. rwuniq
does
not guarantee record ordering by default; sorting can be ordered by
using the --sort-output
option.
rwuniq
provides a number of count switches that instruct it to count
additional values (see Example 5-21). The counting switches are --bytes
, --packets
,
--flows
, --sip-distinct
, and --dip-distinct
. Each of these fields can
be used on their own, or by specifying a threshold (e.g., --bytes
,
--bytes=10
, or --bytes=10-100
). A single-value threshold
(--bytes=10
) provides a minimum, while a two-value threshold
(--bytes=10-100
) provides a range with a minimum and maximum. If
you don’t specify an argument, then the switch returns all values.
$ rwfilter out/2005/01/07/out-S0_20050107.01 --all=stdout | rwuniq --field=sport,sip --bytes --packets | head -5 sPort| sIP| Bytes| Packets| 55876| 131.243.61.70| 308| 4| 51864|131.243.103.106| 308| 4| 50955| 131.243.103.13| 308| 4| 56568| 128.3.212.145| 360| 5| $ rwfilter out/2005/01/07/out-S0_20050107.01 --all=stdout | rwuniq --field=sport,sip --bytes --packets=8 | head -5 sPort| sIP| Bytes| Packets| 0| 131.243.30.224| 2520| 30| 959| 128.3.215.60| 876| 19| 2315|131.243.124.237| 608| 8| 56838| 131.243.61.187| 616| 8| $ rwfilter out/2005/01/07/out-S0_20050107.01 --all=stdout | rwuniq --field=sport,sip --bytes --packets=8-20 | head -5 sPort| sIP| Bytes| Packets| 959| 128.3.215.60| 876| 19| 2315|131.243.124.237| 608| 8| 56838| 131.243.61.187| 616| 8| 514| 128.3.97.166| 2233| 20|
The last set of tools to discuss in this chapter are bag tools. A
bag is a form of storage structure. It contains a key (which can be
an IP address, a port, the protocol, or an interface index), and a
count of values for that key. Bags can be created from scratch or
from flow data using the rwbag
command (see Example 5-22).
$rwfilter out/2005/01/07/out-S0_20050107.01 --all=stdout | rwbag --sip-bytes=sip_bytes.bag $rwbagcat sip_bytes.bag | head -5 128.3.2.16| 10026403| 128.3.2.46| 27946| 128.3.2.96| 218605| 128.3.2.98| 636| 128.3.2.102| 1568|
Like sets, bags are a second-order binary structure for SiLK, meaning
that they have their own toolkit (rwbagcat
, rwbagtool
, and
rwbagbuild
), the data is binary (so it can’t be read with cat or a
text editor), and they can be derived from flow data or built from a
datafile.
The basic bag generation tool is rwbag
, which as seen in Example 5-22, takes flow data and produces a bag file from it.
rwbag
can generate 27 types of bags, simultaneously if you’re so
inclined. These 27 types comprise three types of counting (bytes
,
packets
, and flows
), and nine types of key (sip
, dip
, sport
, dport
,
proto
, sensor
, input
, output
, nhip
). Combine the key and the counting
type, and you have a switch that will create a bag. For example, to
count all packets from source and destination IP addresses,
call rwbag --sip-packets=b1.bag --dip-packets=b2.bag
.
In this section, we discuss more advanced SiLK facilities, in particular, the use of PMAPs and the collection and conversion of SiLK data.
A SiLK prefix map (PMAP) is a binary file that associates specific subnetworks (prefixes) with tags. PMAPs are used to record various mappings of a network, such as whether a network belongs to a particular organization or ASN, or country code lookup. Using a source such as GeoIP, you can build a PMAP that associates IP addresses with their country of origin.
The SiLK tool suite expects some basic PMAPs:
address_types.pmap
country_codes.pmap
PMAPs, like set files, can be created from text. Example 5-23 shows a simple PMAP file. Note the following attributes:
# This is a simple PMAP file that tracks some of the standard RFC 1918 # reserved addresses # # First we create some labels label 0 1918-reserved label 1 multicast label 2 future label 3 normal # # Specify the mode; this must be either ip or proto-port. ip in this case # refers to v4 addresses # mode ip # # Everything otherwise not specified is normal default normal # Now the maps 192.168.0.0/16 1918-reserved 10.0.0.0/8 1918-reserved 172.16.0.0/12 1918-reserved 224.0.0.0/4 multicast 240.0.0.0/4 future
Once you’ve created a text representations of the PMAP, you can
compile the binary PMAP file using the rwpmapbuild
command.
rwpmapbuild
has two mandatory arguments: an input filename, with
the file in the text format described above, and a name for the output
file. As with most SiLK commands, rwpmapbuild
will not overwrite an
existing output file. For example:
$ rwpmapbuild -i reserve.txt -o reserve.pmap $ ls -l reserve.* -rw-r--r-- 1 mcollins staff 406 May 27 17:16 reserve.pmap -rw-r--r-- 1 mcollins staff 526 May 27 17:00 reserve.txt
Once a PMAP file is created, it can be added to rwfilter
and rwcut
using the pmap-file
argument. Specifying the use of a PMAP file
effectively creates a new set of fields in the filter and cut
commands; since PMAP files are explicitly related to IP addresses,
these new fields are bound to IP addresses.
Consider Example 5-24, which uses rwcut
. In this example,
the --pmap-file
argument is colon-delimited; the value before the
colon (reserve
in the example) is a label, and the value after is a
filename. rwcut
binds the term reserve to the pmaps for the source
and destination IP address, creating two new fields: src-reserve
(for
the mapping of the source address to the PMAP) and dst-reserve
(for
the mapping of the destination address) to the PMAP.
$ rwcut --pmap-file=reserve:reserve.pmap --fields=1-4,src-reserve,dst-reserve traceroute.rwf | head -5 sIP| dIP|sPort|dPort| src-reserve| dst-reserve| 192.168.1.12| 192.168.1.1|65428| 53| 1918-reserved| 1918-reserved| 192.168.1.12| 192.168.1.1|56126| 53| 1918-reserved| 1918-reserved| 192.168.1.12| 192.168.1.1|52055| 53| 1918-reserved| 1918-reserved| 192.168.1.1| 92.168.1.12| 53|56126| 1918-reserved| 1918-reserved| $ # Using the pmap in filter; note that rwcut is not using the pmap $ rwfilter --pmap-file=reserve:reserve.pmap --pass=stdout traceroute.rwf --pmap-src-reserve=1918-reserved | rwcut --field=1-5 | head -5 sIP| dIP|sPort|dPort|pro| 192.168.1.12| 192.168.1.1|65428| 53| 17| 192.168.1.12| 192.168.1.1|56126| 53| 17| 192.168.1.12| 192.168.1.1|52055| 53| 17| 192.168.1.1| 192.168.1.12| 53|56126| 17|
There are a number of different tools for collecting data and pushing
it into SiLK. The major ones are YAF, which is a flow collector,
and rwptoflow
and rwtuc
, which convert other data into SiLK format.
Yet Another Flowmeter (YAF) is the reference implementation for the IETF IPFIX standard, and is the standard flow collection software for the SiLK toolkit. YAF can read pcap data from files or capture packets directly, which it then assembles into flow records and exports to disk. It has online documentation. The tool itself can be entirely configured using command-line options, but the number of options is fairly daunting. At its simplest, a YAF command looks like this:
$ sudo yaf -i en1 --live=pcap -out /tmp/yaf/yaf
This reads data from interface en1
and drops it to the file in the
temporary directory. Additional options control how data is read and
how it is converted into flow and output format
yaf
output is specified via the --out
switch in tandem with the
--ipfix
and --rotate
switches. By default, --out
outputs to a
file; in the example above, the file is /tmp/yaf/yaf, but any valid
filename will do (if --out
is set to -, then yaf
will output to
stdout
).
When --out
is specified with --rotate
, yaf
writes the output to
files that are rotated by a delay specified by the --rotate
switch
(e.g., --rotate 3600
will update files every hour). In this mode,
yaf
uses the name specified by --out
as a base filename, and
attaches a suffix specified in YYYYMMDDhhmmss format, along with a
decimal serial number and then a .yaf file extension.
When yaf
is specified with the --ipfix
switch, it communicates
IPFIX data to a daemon located elsewhere on the network. In this case
(the most complicated option), --ipfix
takes a transport protocol as
an argument, while --out
takes the IP address of the host. The
additional --ipfix-port
switch takes a port number when needed.
Consult the documentation for more information.
The most important options are:
--live
pcap
, dag
, or napatech
. dag
and
napatech
refer to proprietary packet capture systems, so
unless you have that hardware, just set --live
to pcap
.
--filter
--out
--ipfix
tcp
, udp
, sctp
, or
spread
) as an argument, and specifies that output is IPFIX
transported over the network. Consult the yaf
documentation for more information.
--ipfix-port
--ipfix
is specified. It
specifies the port that the IPFIX data is sent to.
--rotate
--out
is used as a prefix, and files are written with a
timestamp appended to them. The --rotate
option takes an
argument and the number of seconds before moving to a new file.
--silk
rwflowpack
tools.
--idle-timeout
--active-timeout
YAF has many more options, but these are the basic ones to consider when configuring flows. Consult the YAF manpage for more details.
SiLK uses its own compact binary formats to represent NetFlow data
that tools such as rwcut
and rwcount
present in a human-readable
form. There are times when an analyst needs to convert other data
into SiLK format, such as taking packet captures from IDS alerts and
converting it into a format where IP set filtering can be done on the
data.
The go-to tool for this task is rwptoflow
. rwptoflow
is a packet
data to flow conversion tool. It does not aggregate flows; instead,
each flow generated by rwptoflow
is converted into a one-packet flow
record. The resulting file can then be manipulated by the SiLK suite
as any other flow file.
rwptoflow
is invoked relatively simply with an input filename as its
argument. In Example 5-25, the pcap data from a traceroute
is converted into flow data using rwptoflow
. The resulting raw file
is then read using rwcut
and you can see the correspondence between
the traceroute records and the resulting flow records.
$ tcpdump -v -n -r traceroute.pcap | head -6 reading from file traceroute.pcap, link-type EN10MB (Ethernet) 21:06:50.559146 IP (tos 0x0, ttl 255, id 8010, offset 0, flags [none], proto UDP (17), length 64) 192.168.1.12.65428 > 192.168.1.1.53: 63077+ A? jaws.oscar.aol.com. (36) 21:06:50.559157 IP (tos 0x0, ttl 255, id 37467, offset 0, flags [none], proto UDP (17), length 86) 192.168.1.12.56126 > 192.168.1.1.53: 30980+ PTR? dr._dns-sd._udp.0.1.168.192.in-addr.arpa. (58) 21:06:50.559158 IP (tos 0x0, ttl 255, id 2942, offset 0, flags [none], proto UDP (17), length 66) 192.168.1.12.52055 > 192.168.1.1.53: 990+ PTR? db._dns-sd._udp.home. (38) $ rwptoflow traceroute.pcap > traceroute.rwf $ rwcut --num-recs=3 --fields=1-5 traceroute.rwf sIP| dIP|sPort|dPort|pro| 192.168.1.12| 192.168.1.1|65428| 53| 17| 192.168.1.12| 192.168.1.1|56126| 53| 17| 192.168.1.12| 192.168.1.1|52055| 53| 17|
When correlating data between different sources, you will occasionally
want to convert it into SiLK’s format. rwtuc
is the default tool
for converting data into SiLK representation, as it works with
columnar text files. Using rwtuc
, you can convert IDS alerts and
other data into SiLK data for further manipulations.
The easiest way to invoke rwtuc
is to use it as an inverse of
rwcut
. Create a file with columnar entries and make sure that the
titles match those used by rwcut
:
$cat rwtuc_sample.txt sIP |dIP |proto 128.2.11.4 | 29.3.11.4 | 6 11.8.3.15 | 9.12.1.4 | 17 $ rwtuc < rwtuc_sample.txt > rwtuc_sample.rwf $ rwcut rwtuc_sample.rwf --field=1-6 sIP| dIP|sPort|dPort|pro| packets| 128.2.11.4| 29.3.11.4| 0| 0| 6| 1| 11.8.3.15| 9.12.1.4| 0| 0| 17| 1|
As the following fragment shows, rwtuc
will read the columns, use the headers
to determine column content, and stuff any unspecified fields with a
default value if no column is provided. rwtuc
can also take column
specifications at the command line using the --fields
and
--column-separator
switches, as so:
$cat rwtuc_sample2.txt 128.2.11.4 x 29.3.11.4 x 6 x 5 7.3.1.1 x 128.2.11.4 x 17 x 3 $ rwtuc --fields=sip,dip,proto,packets --column-sep=x < rwtuc_sample2.txt > rwtuc_sample2.rwf $ rwcut --fields=1-7 rwtuc_sample2.rwf sIP| dIP|sPort|dPort|pro| packets| bytes| 128.2.11.4| 29.3.11.4| 0| 0| 6| 5| 5| 7.3.1.1| 128.2.11.4| 0| 0| 17| 3| 3|
SiLK’s binary format requires values for every field, which means that
rwtuc
makes a best guess for field values that it doesn’t have. For
instance, the previous example specifies packets as a field but not
bytes, so rwtuc
just defines the packet value to be identical to the
byte value.
If there exists a common default value (e.g., all traffic has
the same protocol), this value can be defined using one of a number of
field-stuffing options in rwtuc
. These options are identical to
the field filtering options in rwfilter
, except they only take
single values. For example, --proto=17
sets the protocol of
every entry to 17.
In the fragment below, we use the field stuffing command --bytes=300
to set a value of 300 bytes for every entry in rwtuc_sample2.txt:
$ rwtuc --fields=sip,dip,proto,packets --column-sep=x --bytes=300 < rwtuc_sample2.txt > rwtuc_sample2.rwf $ rwcut --fields=1-7 rwtuc_sample2.rwf sIP| dIP|sPort|dPort|pro| packets| bytes| 128.2.11.4| 29.3.11.4| 0| 0| 6| 5| 300| 7.3.1.1| 128.2.11.4| 0| 0| 17| 3| 300|
The resulting RWF file will contain a value of 300 bytes, even though the byte value is not in the original text file. The packet values, which are specified in the file, are set to whatever was specified there.
[5] You’ll notice that there are two datasets, one with scans and one without. To understand why, read Pang et al., “The Devil and Packet Trace Anonymization,” ACM CCR 36(1), January 2006.
3.145.47.253