To get a baseline idea of what a successful login and resource access should look like, over the next few pages we will examine the various stages of a NetScaler Gateway VPN session using a Wireshark capture. The intent is to provide you with the knowledge of a known good trace that you can compare against when troubleshooting issues.
We will then follow up with a discussion of the troubleshooting tools and techniques for troubleshooting NetScaler Gateway VPNs.
VPN session establishment is a multi-step process where the client and NetScaler exchange a number of control messages. To make this exchange easier to digest, let's break this into different phases:
To avoid duplication, we will assume the SSL handshake was successful. SSL handshake troubleshooting would be exactly the same as covered in the SSL section of Chapter 2, Traffic Management Features.
Pre-authentication, if configured, will be the first step of the exchange:
/epatype
page (Packet 275)./epatype
and learns what the settings for EPA and device certificate check are. In our example, EPA was enabled and device certificate check is off, which is reflected in NetScaler's response:GET
to /epaq
.GET /epas
contains 0
, which indicates success. In the troubleshooting section, we will talk about how to interpret this value.The User provides their credentials. As a result, authentication and group extraction happen.
POST
request with the credentials.The following successful authentication, depending on whether client choices are configured in the session profile, the NetScaler Gateway presents the User with a list of options. The possibilities here are:
If ICA Proxy is set to ON, the client choices will not be displayed and the User will go directly to the Storefront page. Sometimes users might report seeing Error: Logins Exceeded on successful authentication. This might happen for one of three reasons:
Let's now look at a trace from a scenario in which the User chooses FULL VPN:
/cfg
requests are configuration download requests from the VPN client./cs
requests are connection setup messages./dns
requests are DNS requests. By default, they are exchanged as HTTP and get converted to a regular DNS protocol in NetScaler, before being sent to the DNS server./cgi/logout
, redirecting the User to the post-logout page. If configured, a clean-up script will be triggered at this point./cgi/setclient
path will be set to cvpn
. In that case, you will not see any control messages (/cfg, /dns
).Instead a /cvpn/
path will be added to the path the original request will be either shown as is, base64
encoded or encrypted based on the Clientless Access URL Encoding setting.
There are a number of tools and techniques available to troubleshoot the VPN feature. We will explore these in the following order:
aaad.debug
log file for authentication issuesns.log
on NetScaler for session informationpol_hits nsconmsg
counter to verify which policies are getting hitThese logs contain a wealth of information across several files on the client's PC. In order to capture the maximum detail, you need to enable debug. This can be enabled in two ways:
Once the issue has been reproduced, you can ask the User to run the nsClientCollect.exe
script, which will create a ZIP file containing all the necessary logs so they can be easily shared with you. Here is a sample run of the command:
Let's troubleshoot an example EPA failure.
The issue that the User reports that he cannot log in and sees an error that the client machine doesn't meet the security requirements:
From the details in the error it's clear that it's EPA and not authentication that is failing. To see the reason for this failure, the file nsepa.txt
picked up by the nsclientcollect
utility would be the best resource for identifying the problem. Open this file and look for a header called CSEC
. This field contains values – usually 0
or 3
– for each of the checks:
0
indicates a success3
indicates a failureNow let's consider the following screenshot:
Here, the value 03
means that there are two checks (since there are two digits) and that the first check succeeded (0
) but the second failed (3
). So you need to look at the EPA policy to identify what the second expression is, and then match it to the User's situation to see if it's the User's machine or the expression that needs to be addressed. As well as nsepa.txt
, a decrypted trace would also show this information.
The check for domain joined is a popular one; you can set it up in this way:
add aaa preauthenticationaction allow_xmx.lab_machines add aaa preauthenticationpolicy is_domain_xmx.lab q/CLIENT.REG('HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters_Domain').VALUE == xmx.lab/ allow_xmx.lab_machines
The file aaad.debug
, which we briefly visited in the AAA chapter, is the one you would look at for authentication issues with VPN as well. aaad.debug
is especially valuable when using multiple authentication policies, as it allows you to see which of the several authentication policies is failing before you engage in more specific troubleshooting.
Another indispensable aaad.debug
feature is the ability to display User group memberships. This is really helpful for identifying situations where incorrect group association is the reason why a User doesn't see the expected resources.
The following screenshot is an example of running cat /tmp/aaad.debug
, showing which groups the User is part of:
One aspect of group extraction that has been a challenge for a long time is ensuring the right group is picked up when the User is part of more than one group. NetScaler by default picks up whichever group is returned first, which might not necessarily be the one you are looking for. In other words, a priority is missing.
A solution has been included, starting with version 10.1, in the form of the parameter defaultauthenticationgroup
. When this parameter is set, upon successful authentication, NetScaler assumes the User to be part of this group and applies the policies bound to this group.
The file /var/log/ns.log
should be a familiar one to you by now as we have relied on it for troubleshooting several other feature issues. It is especially useful in a NetScaler Gateway context, since the logs for this feature are captured in a very detailed fashion. Let's explore its usefulness by trying to troubleshoot another example issue.
The issue is that a User passes EPA and authentication successfully, but instead of seeing a homepage, experiences a browser hang followed by a timeout:
Upon running a tail –f
on the ns.log
file (tail –f /var/log/ns.log
) and having the User access the page at the same time, it becomes evident that it's the session policy 192.168.1.55_443_pol
that is denying access:
At this point, you will need to look at the settings in the Security tab of the session policy to ensure that either the default authorization setting is adjusted, or that an appropriate authorization policy is used.
ns.log
also captures a ton of other information for VPN issues:
When users log in to NetScaler Gateway VPN, who gets access to what resources and on satisfying what conditions is governed by a combination of policies and profiles. Issues such as users seeing resources or options they aren't meant to see can happen due to the inheritance behavior of NetScaler Gateway policies.
Therefore, it is important to understand how inheritance works. NetScaler Gateway policies, be they pre-auth, session, or traffic policies, follow this processing order:
User level > Group level > VSERVER level > Global
In addition, certain policies, such as pre-auth, can only be bound at the global or vServer level since the User has not yet presented their username and cannot be identified as a result.
The pol_hits nsconmsg
counter is a very useful means of identifying what the resultant set of policies is. Taking an example, in the following screenshot we can see that for the User who just logged in, the LDAP authentication policy (ldap_LDAP_pol
), the global session policy (SETVPNPARAMS_POL
), and a more specific session policy (192.168.1.55_443_POL
) are being hit:
In a busy environment, this command can present a lot of information in a short time. Furthermore, you might also have a challenge in being able to tell which User the output is for, when several users are logging in. For this reason, it would be best to use the command during a window when you are able to limit the users logging in, such as after hours.
When troubleshooting you will often times need to make a configuration change to vServer or policies. This introduces a need to be able to:
The Active User sessions tool in the NetScaler Gateway tab is great for this purpose:
Considering that NetScaler Gateway exchanges are basically SSL conversations, it's easy to see why Wireshark is a critical tool for troubleshooting. All troubleshooting that we looked at in the SSL Chapter automatically applies here, including:
The following points are important to remember while taking traces:
> set ssl vserver vpn.xmx.lab -sessReuse DISABLED Done
0
. If you leave it at the default 164-byte truncated size, the complete certificate will not be captured, so the decryption will fail.18.224.69.83