9.4 IP Sec VPN Deployment

9.4.1 Cell and Hub Site Solutions

When designing the VPN and choosing suitable implementation for the cell site, the network designer should take into account the unique requirements of the BTS site.

BTS's are located in a variety of places, indoor, outdoor on rooftops, inside or outside protecting cabinets, attached to building walls or at the antenna mast. They can also be subject to rain or high humidity, extreme temperatures, etc. Most of these site solution and environmental conditions present some challenges to the BTS and how to provide a suitable VPN solution.

Another factor that needs to be considered is the topology of the site, whether the site has only one BTS, multiple BTS's or other site support equipment share the same backhaul links.

Existing standalone SEGs for cell sites are based on small routing devices, with a capacity suitable for a few BTS's. These devices are not usually meant to withstand the outdoor weather conditions so they are only suitable for sites where there is at least a cabinet to shelter the equipment. They also possess enough physical ports and routing capabilities to act as site routers. So they are mostly to be found in bigger sites.

Management of the standalone SEGs can be either done using the appliance vendor specific system or it could be integrated with the overall transport network management system.

For smaller sites with only one BTS or for sites with reduced footprint, an integrated SEG solution could be the most convenient one. An integrated SEG is provided as part of the BTS own implementation, with no extra HW required in the site other than a possible expansion module to cater for the additional processing load. Capacity of the integrated SEG will be lower than for the standalone one, but enough for serving the host BTS and possibly some other site equipment. Resilience in this case would be according to the general BTS high availability concept, as it is an integral part of the BTS.

Management would normally be integrated also to the BTS management system, so it is simpler to perform normal provisioning and maintenance tasks. However, some operator organizations might require that the management of the security appliances is not performed by a mobile network department but by the security department. In these cases the integrated SEG introduces a challenge that would call for either a change in the operator's processes, or a device with split management or alternatively the standalone device would be used.

In the hub site, requirements are completely different to the cell site, with controlled environmental conditions and no practical footprint limitation. On the other hand, capacity and high availability are the main factors to take into account. High capacity SEGs are usually deployed, providing VPN services to a number of controllers and Core Network devices. Several SEGs are also equipped to support high availability, and load sharing configurations would probably be implemented to increase both the capacity and availability. There might be some exceptions for small controller sites (with only one or two controllers) where the VPN termination is fully integrated into the mobile equipment.

Some solutions are built on a dedicated security appliances, while others are built around general purpose routers. In both cases, chassis of different sizes are equipped with application cards or cryptographic accelerators to fit the capacity requirements.

The VPN solution is an integral part of the hub site solution, and the interconnection and interoperability with the other devices, namely the site routers, needs to be taken into account. In particular, it should be ensured that the routing and high availability configuration of the SEG is compatible with and supported by the site routers. Another aspect to consider is the separation that the SEGs introduce between private and public network. The site routers and the overall site need to be designed keeping in mind that separation, and ensuring that both networks are kept separated at all times. Separation can be performed by means of VLANs, for traffic separation, and Virtual Routing, for routing separation.

9.4.2 IPsec Profiles

3GPP specifies the following profiles for IKEv1, IKEv2 and IPsec [3]:

IKEv1:

Phase 1:

  • Pre-shared keys for authentication (note that TS33.310 also specifies the use of certificates).
  • Main mode (no agressive mode).
  • FQDN is supported for node authentication.
  • Encryption algorithms: ENCR_AES_CBC (128bits key), ENCR_3DES.
  • Authentication algorithms: AUTH_HMAC_SHA1_96.
  • Diffie-Hellman group 2.
  • IKE SA lifetime longer than IPsec SA lifetime.

Phase 2:

  • PFS (Perfect Forward Secrecy) is optional.
  • Only IP addresses or subnets shall be mandatory.
  • Notifications are mandatory to support.
  • Diffie-Hellman group 2 (required if PFS is used) is mandatory to support.

IKEv2:

IKE_SA_INIT

  • Encryption algorithms: ENCR_AES_CBC (128bits key), ENCR_3DES [29].
  • Authentication algorithms: AUTH_HMAC_SHA1_96 [29].
  • Pseudo-random function: PRF_HMAC_SHA1.
  • Diffie-Hellman groups 2 and 14.
  • Optionally, AUTH_AES_XCBC_96 should be used for authentication and PRF_AES128_XCBC as PRF.

IKE_AUTH

  • Pre-shared keys for authentication (note that TS33.310 also specifies the use of certificates).
  • IP addresses and FQDN are supported for node authentication.

CREATE_CHILD_SA

  • PFS (Perfect Forward Secrecy) is optional.

IPsec:

  • ESP protocol in tunnel mode.
  • Encryption algorithms: null, ENCR_AES-CBC (128bits key), ENCR_3DES [30].
  • Authentication algorithms: AUTH_HMAC_SHA1_96 [30].
  • The IV (Initialization Vector) should be random and of the same size as the block of the chosen encryption algorithm.

Obviously, this is the minimum set which ensures compatibility between different implementations. However many implementations support a much larger variety of algorithms and key lengths.

9.4.3 VPN Resilience

Mobile networks are so widely spread that they support a significant amount of today's voice and data communications. Some of the services offered by the network are critical by the nature of the service (emergency calls) or because of the high revenues they bring to the operator. Those services put tight requirements on the network availability so it becomes paramount for the operator.

Besides, end user quality expectations for some of the services are the same as for wireline services in terms of call breaks and download times. A voice call user will certainly not be willing to wait for tens of seconds for the network to restore the service so the call will be terminated by the user.

Additionally, long breaks can lead to higher layer protocol timers to trigger recovery actions, to unstable network behaviour or to network restarts, which delays even further the service restoration.

From the end user point of view, resilience requirements are largely independent to the radio technology in question. However, the more capacity a technology has, the more important it is to provide a reliable network.

These factors need to be taken into account when defining the availability target of the mobile backhaul and therefore of the security solution. Exact figures are up to the service availability targets defined by operator, but there is certainly a much shorter break tolerance than in traditional data communication networks.

Resilience should be considered for the backhaul link as well as for the network equipment. Backhaul link resilience is discussed in Chapter 07.

Network equipment resilience, focusing in this context on the SEG, is provided by deploying redundant devices in a farm with at least two devices. In case one of the devices fails, any other of the devices in the farm will take over. There are different approaches for the service restoration depending on the capabilities of the devices, as well as targeted service break duration. In the following resilience discussion it is considered that the BTS terminates the VPN. However the same conclusions can be drawn for cell sites with an external SEG.

One possible approach for the service restoration is that once the active SEG is down, the clients (the BTSs) will trigger a recovery action. The BTS would monitor the SEG availability using a mechanism such as DPD, and as the monitoring is done across the backhaul, also the backhaul availability is taken into account. Therefore, this approach can protect against certain transport failures. Once the failure is detected the BTS would select a SEG from a list of backup devices and would re-establish all the SAs.

On the other hand, this approach presents the significant drawback that the BTS needs first to detect that the active SEG is down, usually by using a mechanism such as DPD. In order not to load the network with excessive monitoring traffic, the detection mechanism is rather slow so the failure detection might take any time from a few seconds up to several minutes depending on the implementation and configured timers. Additionally the re-establishment of the SAs would take additional time, in the order of a few seconds depending on the number of SAs and the performance of the SEGs. Altogether, the outage period is sufficiently long as to have all the voice calls dropped.

Furthermore, it is also possible that upper layers in the BTS would detect that the control plane and management plane connections are down, and a recovery action could be started. One of the common recovery actions is the restart of the BTS, which leads to an extended outage. This scenario is depicted in Figure 9.21.

Figure 9.21 Service restoration initiated by the BTS.

img

Outage periods can be dramatically reduced if failure detection is shortened. Instead of relying on the BTS to detect the failure, the SEGs can monitor the availability of each other with a fast polling mechanism, and upon failure detection, initiate the re-establishment of all the connections. Given that the polling is locally performed and only a few devices are monitored, the amount of traffic is not relevant. If the SEGs are configured with the identities (IP addresses) of the BTSs, they will re-establish the VPNs.

However, the SEGs are probably configured without the identities of the BTS (road warrior configuration) so that when new BTSs are added, policies do not need to be updated, making the management of the VPN easier and scalability better. In this case, the SEGs are only able to accept incoming IKE requests from the BTSs instead of initiating the connections themselves. Also the backup SEGs do not have previous knowledge of which connections were already established in the active SEG, so they are unable to restore the connections by themselves.

Another approach to resilience is to have two redundant tunnels for each BTS towards two different SEGs in the farm. The selection of which tunnel to use would be done by the BTS based on standard routing techniques, and the monitoring of the availability of the tunnel would be left to the routing protocol. The restoration of the service would depend once more on how fast the BTS and the SEGs are able to detect that one of the paths is down. Typically routing protocols are not able to detect the failures very fast. However, when they are combined with fast detection protocols such as Bidirectional Forwarding Detection (BFD) [32] the failure detection can be performed in a few seconds or less.

One aspect of this approach to take into account is that many routing algorithms operate by broadcasting or multicasting the advertisement and monitoring packets. While broadcast and multicast is possible with IPsec when the SAs are established manually (via management interface), IKE does not support that possibility so only point to point connections are possible. This limitation can be overcome by using GRE encapsulation on the top of IPsec. In this way IKE only needs to handle the GRE tunnel (which is point to point) while the routing advertisements and monitoring packets travel transparently within the GRE tunnel (see Figure 9.22).

An additional aspect to consider in this approach is that addressing of the BTS becomes a bit more complicated. While in other approaches the traffic endpoint address can be the same as the tunnel endpoint address, in this case they need to be different, so that routing is possible at the BTS. One possible configuration for the BTS addressing is to use network interface addresses as tunnel addresses, and loopback addresses for the traffic.

Figure 9.22 Service restoration by means of routing.

img

An additional and convenient approach from the BTS point of view is to rely completely on the SEG to restore the service without any action from the BTS, and with a minimum effect to the end user. We saw earlier that re-establishing the connections by the backup SEG might not be feasible if it lacks the knowledge of which connections were already established. In the stateful failover case, a synchronization connection exists between the SEGs so that the backup SEGs are updated continuously with the state information required to maintain the IKE SAs and IPsec SAs up. They also share virtual IP addresses for tunnel termination. Therefore when the failure is detected, the SAs are shifted to one of the backup SEG and the BTSs are not aware of the failover (see Figure 9.23). The effect on the end service could also be small if the failure detection and the failover are fast enough. The performance of this approach should be expected to be in the range of a few seconds.

To benefit from the stateful failover, the two SEGs should also synchronize their routing state so that they behave as a virtual router. This could be achieved by using HSRP/VRRP. Both SEGs will share the same virtual IP addresses, but only one of them is forwarding traffic. When a failure is detected, both VPN and routing functionalities are transferred to the backup SEG, which will advertise to the network neighbours that the IP address has been transferred.

Figure 9.23 Service restoration by using stateful failover.

img

An altogether completely different approach is not to have a fully redundant system but to mitigate the impact of a SEG failure by sharing the load among multiple devices. The failure of one SEG will put out of service all the BTS's connected to it, but the service could still be offered by neighbouring BTS's. The network capacity would be reduced but it could be acceptable depending on the area to be served. This approach can be combined with any of the other approaches, either for the benefits of load sharing, or to reduce the impact of the failover (see Figure 9.24).

Figure 9.24 Service restoration by using load sharing.

img

9.4.4 Fragmentation

Applying IPsec in tunnel model implies that clear text IP packets are encapsulated into another IP packet, typically using ESP encapsulation. If GRE is also used, two encapsulations take place. The encapsulation overhead varies depending not only on the protocols but also on the security services (encryption vs. integrity protection), the selected algorithms and original packet size. In any case it can cause the encapsulated packet to exceed the egress interface MTU which would require IP fragmentation.

In general, IP fragmentation should be avoided because of the reassembly effort required at the receiving node, increased network load, packet delay and delay variation, etc. In order to do so PMTUD (Path MTU Discovery) [21][22] can be used, if supported by the BTS's, SEGs and the end nodes. By using PMTUD, the nodes will discover the smallest of the MTUs along the path to the destination and they will be able to adjust the size of the data to be sent to the IP protocol, so that the packet will not experience any further fragmentation along the path.

Unfortunately, PMTUD is not supported by all the implementations or by all the protocols. While PMTUD is an integral part of TCP and SCTP, there is no support in UDP itself, but rather in the applications using UDP.

The MTU of the sources could also be manually configured to ensure that IP fragmentation happens only there, and nowhere else in the path. While this approach is possible in most of the cases, some implementations might not support configurable MTU. It also requires a good knowledge of other MTUs in the path to avoid additional fragmentations.

As can be seen, there will be cases when the IPsec of the SEG or the BTS stack needs to perform fragmentation. When the clear text at the IPsec stack is expected to exceed the interface MTU after encapsulation, the IPsec stack could decide to fragment the packet before the encapsulation (pre-fragmentation) based on pre-defined tunnel MTU. Alternatively, the packet can be encapsulated and fragmented by the IP layer before forwarding (post-fragmentation). Each approach has its benefits and drawbacks which are analyzed next.

  • Pre-fragmentation

It has the distinctive advantage that the VPN termination does not need to reassemble the packets before decryption. Only the final destination will perform the reassembly. In this way the VPN termination is offloaded of this resource consuming task. This is mostly important for the SEG if there are multiple devices behind. For the BTS it is less important since typically all the traffic is consumed by the BTS itself, so it still needs to do the final reassembly.

One the other hand, if further fragmentations happen in the public link, the benefits of the pre-fragmentation are void as both types would take place at the same time. To avoid this additional fragmentation, a careful planning is needed, or PMTUD (Path MTU Discovery) should be used.

Pre-fragmentation by the IPsec stack is not compliant with IPv6 as the packets can be fragmented only at the source. In this case, if the IPsec stack needs to fragment, post-fragmentation would be the only possibility.

  • Post-fragmentation

As noted above, when the public link MTU is not known or there is routing change, the packet would need to be fragmented by the transit routers. In this case it is better to perform only post-fragmentation since there is no benefit in pre-fragmentation.

Post-fragmentation would also be the only possibility for IPv6.

9.4.5 IPsec and Quality of Service

As discussed in Chapter 08, Quality of Service (QoS) is a fundamental concept in today's mobile networks, with multiple services of different types, different customer expectations and a variety of service levels offered by the operators. End-to-end QoS is based on a collection of mechanisms and tools which are deployed in network elements and subnets, and it is therefore paramount for the overall QoS that each of the components works as planned, together with other network aspects, in particular security.

Packets are classified according to the QoS class and they are marked in the IP header with a DSCP value. This marking is meant to be inspected by the network elements in order to apply a suitable QoS mechanism to the packet. When applying encryption with a protocol such as ESP in tunnel mode, all the information carried inside the tunnel is hidden, including the DSCP. So if the public network is meant to apply a differentiated service to the packet, this value should remain visible. The approach to follow is that the IPsec implementation in SEG and the BTS populates the outer IP header DSCP with a proper value.

The most straightforward way to generate the DSCP is just to copy the DSCP value received in the clear text packet, which is a suitable approach in many cases. However, the encapsulated packet will transit a different network, which could also be a different QoS domain, managed by a different organization or a service provider. This service provider might have a different QoS policy and the packet markings can be different. Therefore the IPsec implementation, to be compatible with the new QoS domain, would need to have a flexible mapping between the inner DSCP values and the outer ones.

It should be noticed that the outer DSCP is not authenticated (for both ESP and AH). This creates the risk that the DSCP is changed by an attacker in order to create a DoS, which is difficult to mitigate.

At the receiver side, when IPsec is terminated, the implementation has the option to maintain the inner DSCP as received. This has the advantage that the DSCP can be trusted as it has been authenticated and not altered by any node during transit in the VPN. According to RFC4301, implementations might also choose to use the outer DSCP in those cases when the DSCP space at the receiver and the sender sides are different, and this inner DSCP is not meaningful anymore.

When defining the security policies in the BTS and the SEG, usually they apply to a traffic aggregate, defined by the IP addresses, protocol types and sometimes also port numbers. This traffic aggregate might contain flows with different QoS classes and accordingly they are marked with different DSCP. This is typically the case for the user plane of the mobile network, where real time calls will be assigned to higher QoS class than non real time calls. For other traffic types, such as control plane, all the packets usually belong to the same QoS class.

The traffic aggregate with multiple traffic classes will be carried by a single IPsec SA, and therefore a single running sequence number is used for all the packets in that SA. When the packets are transmitted across the network and arrive at the routes, they can be assigned to different queues as the packets have different priorities, and if congestion happens, the packet order will change within packets of the same IPsec SA. When the packets arrive at the receiver, it will check if they fit within the anti-replay window. Packets with higher priority will probably be at the beginning of the window as they arrive first, and packets with lower priority will be towards the end. If the congestion is sufficiently high for a given window size, low priority packets will not fit in the window anymore and they will be dropped.

In order to avoid the packets being dropped, the first possible solution is to disable the window but it would open the system to possible anti-replay attacks. A more sensible solution would be to use large window sizes. This requires more memory and there could be a performance impact for large windows.

Another solution supported by newer IPsec specification [25] is the use of multiple IPsec SAs with the same traffic selectors for packets with different QoS class. In the egress direction the IPsec implementation inspects the DSCP of the clear text packet along with the other relevant fields in the header, and maps it to the correct SA (see Figure 9.25). In this way, all the packets within the SA have the same DSCP and re-ordering should not happen. It should be noticed that the DSCP is not negotiated by IKE as it is a local matter in the sender to do the mapping between the DSCPs and the SAs, and therefore all the SAs established for the same traffic aggregate will have the same traffic selectors. On the other hand, these parallel SAs are only supported by IKEv2, not by IKEv1 as it requires the established SAs to have unique traffic selectors (see RFC5996, Section 2.8 for further details).

Figure 9.25 IPsec SA selection based on the DSCP.

img

9.4.6 LTE S1 and X2 Study Case

In this section, we will look at how the concepts explained previously apply to a practical case. For this case we will consider the LTE system, with two eNodeBs, eNB1 and eNB2, connected through a leased line service to the Core Network site.

In the Core Network site there will be an MME, a SGW, a clock reference (for packet based synchronization) and a gateway towards the management system.

A pair of redundant routers provides the site connectivity, and two SEGs are connected to them in redundant configuration, terminating the VPN tunnels from the eNodeBs. A single tunnel is configured between each eNB and the pair of SEGs.

Figure 9.26 shows how all the S1 traffic from eNB1 is carried over the tunnel, and routed in clear text from the SEG1 to the destination.

Figure 9.26 S1 interface, management and synchronization connections.

img

The X2 interface, between eNB1 and eNB2, is implemented through the Core Network site routers (star topology). Therefore the packets are forwarded first to the Core Network site, decrypted there, routed, encrypted again and forwarded towards the destination eNB (see Figure 9.27).

A direct connection between the eNodeB's (mesh topology) would also be possible, but it would require additional complexity in VPN configuration as distinct policies for each X2 neighbour would need to be defined. This would be difficult to manage unless an automatic mechanism is used, such as ANR.

Figure 9.27 X2 interface connections.

img

For simplicity, a single VPN is created between each eNodeB and the SEG, and traffic is separated for each plane by assigning it to a different IPsec SA:

  • User plane (S1-U, X2-U).
  • Control plane (S1-C, X2-C).
  • Management plane.
  • Synchronization plane.

For this case, it is assumed that there is no application level protection for the Management plane, such as TLS. If TLS were used, there is no security reason to protect the Management plane with IPsec as well.

The routers provide a separate virtual routing instance for the VPN interfaces, and others for the Core Network interfaces, effectively separating the routing domains and reducing the risk of misconfiguration.

The device authentication, for both the eNodeBs and the SEGs, is based on digital certificates, which are signed by a CA owned by the mobile operator and located at the management site. Certificate Management Protocol (CMP) is used for the certificate management of the eNodeBs and the SEGs. A simple trust model is used, where the root CA issues directly the certificates of the end entities. Given that all the devices belong to the same operator, there is no need to have cross-certification.

Therefore, each eNodeB will be provisioned with its own device certificate and the root CA certificate, and the SEGs will be provisioned with the SEG device certificate and the root CA certificate. Additionally each device requires the private key associated to its own device certificate. Figure 9.28 illustrates the certificate management architecture.

Figure 9.28 Certificate management interfaces.

img

All the traffic exchanged between the eNodeBs and the SEGs receive the same kind of protection, and it is encrypted, authenticated and protected against replay. Even if not all the traffic types might require confidentiality protection, it makes configuration easier as there are less policies to handle. As indicated earlier, each traffic type is in its own IPsec SA and one security policy is defined for each SA. This allows a dedicated running Sequence Number and anti-replay window to be had, which mitigates the effect of packet re-ordering, in case it happens. The security policies are defined based on the IP addresses only. Tables 9.1 and 9.2 show the policies for the eNBs.

Table 9.1 eNB1 security policies.

img

Table 9.2 eNB2 security policies.

img

Table 9.3 SEG security policies.

img

Table 9.3 shows the SEG policies. Note that the SEG policies do not contain the eNB addresses in order to simplify the commissioning (additional eNB can be added without changing the SEG policies).

IKEv2 is selected for the key management. The selected algorithms are AES-CBC with 128bits for confidentiality, HMAC-SHA1 for the hash and Diffie-Hellman group 2. Same algorithms are used for IKE. Other parameters are also according to the 3GPP IKEv2 profile.

Resilience is provided by a redundant pair of SEGs, with stateful failover. The two SEGs are seen by the eNBs as one single node, as they have the same addressing IP@ (virtual) and the same credentials (certificate). The SEG resilience relies also on the fact that the site routers are also redundant. The SEGs also present a virtual address IP2@ towards the Core Network. Figure 9.29 shows the addressing used by the SEGs.

Figure 9.29 Stateful failover.

img

In case SEG1 fails, the SEG2 will take over and advertise to the R1 and R2 routers that it now has the addresses IP@ and IP2@. This advertising will happen usually by using Gratuitous ARP. After that, the traffic from the eNBs and from the Core Network devices is rerouted by R1 to the SEG2. No disruption would normally be visible to the eNBs or to the Core Network elements except a few lost packets. Figure 9.30 depicts the traffic path after the failover, showing how the addresses IP@ and IP2@ are now active in SEG2.

Figure 9.30 Stateful failover after the failure.

img
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.182.250