Global Server Load Balancing (GLSB) does for data centers what load balancing does for servers. It directs users to the best possible resource (best data center for a resource). Best here can mean, the closest, the healthiest, or the least loaded.
GSLB uses a proprietary protocol called Metric Exchange Protocol (MEP) to obtain the metrics needed for this decision. The result of GSLB processing is an Address (A) record in response to a DNS request that the client will then use to obtain the resource.
In this section, we'll look at what a typical first time flow looks like, what some of the considerations are for GSLB to work correctly, and then move on to troubleshooting.
It is important to understand the several layers involved in a GSLB conversation to be able to troubleshoot it. Let's take a look at an example of a User wanting to launch the home page of web.xmx.lab
, which belongs to the xmx.lab
domain:
web.xmx.lab
for the first time.web.xmx.lab
lands on the local DNS (LDNS) that the User's computer uses for DNS resolution.xmx.lab
.xmx.lab
and this is the point at which GSLB comes into the picture. The authoritative name server, instead of directly providing an IP that represents web.xmx.lab
, provides the IP addresses of the ADNS service of one of the several NetScalers participating in GSLB.web.xmx.lab
.Step 6 in the preceding flow can also use CNAMEs instead of the ADNS service IP. In this case, the Authoritative DNS will send CNAME records for one of the ADNS services and their corresponding IPs (glue records). The rest of the process is the same.
Understanding the exchange that happens between two (or many) NetScalers participating in GSLB is key for troubleshooting. Exchanging MEP packets is how NetScalers participating in GSLB communicate to each other about key statistics for each of the GLSB services, which in turn are the configured LB vServers and services that they represent. These statistics include:
This will then allow the NetScaler receiving the DNS request to decide which target DNS IP to return.
There are two means by which the local GSLB NetScaler monitors the status of remote sites. MEP is the best way to monitor remote sites, as it includes all this useful information allowing for intelligent GSLB decisions, but it isn't the only way. You can also bind monitors for this purpose. The best practice is to use the Use Monitors when MEP is down option. This way, you indicate a preference for MEP but will fall back to monitors should there be a problem with the MEP exchange.
RPC settings give you control over MEP exchange. When you add a GSLB site, it will automatically result in an RPC node entry to give you control over how that site communicates.
You can view these entries using show rpcnode
. In the following screenshot 1
is NetScaler HA related; it is for the NSIP. 2
and 3
are for MEP. The entry says that to reach site 192.168.1.51
(site IP), use the password shown and use any available source and use non-secure communication.
It is a good practice especially if the communication between the sites is happening over the Internet to enable secure MEP, change the default password, and finally also set a specific source IP so that it sits well with your firewall rules.
In the preceding example, I am choosing the recommended settings and specifying that I want to use my SNIP, which is also my local site IP to talk to the remote sites.
Now with an understanding of the considerations, let's approach some of the common issues that NetScaler admins run into with GSLB and how to troubleshoot them.
When troubleshooting GSLB, it is important to query the ADNS service directly to understand if the behavior is as expected or incorrect. There are several points in the GSLB path, where caching might mean you are not looking at the NetScaler's exact response. These are:
The amount of time ideally should be based on the TTL, but some devices will cache the entries for much longer as a way of reducing upstream DNS traffic. While you have limited possibilities for overriding the caches given that they are external entities, for troubleshooting at least, the best method would be to use nslookup
or dig to query the ADNS directly.
If MEP is configured but shows as Down; this is either a network related issue (most common reason), or an RPC configuration issue.
Network considerations for MEP:
The default configurations will never fail as such, but then, as we discussed, there are good reasons why you might want to adjust RPC configuration. While doing so though, ensure that the password settings match on both sides. The following nsconmsg
command is handy when looking at MEP issues.
The output will tell you:
gslb_tot_gslb_msgs_rcvd
going up on the other, it's time to take a trace and follow traffic on ports 3009 or 3011 depending on whether secure is enabled or disabledProximity-based methods use the location of the Local DNS server (LDNS) of a User as a guide to understand where the User should be directed to. Again, the closer the User is to the LDNS, the more accurate the decision, because the LDNS IP is all the NetScalers will see, not the actual Client IP.
There are two methods of Proximity:
The only troubleshooting step for this method is to verify that these protocols (or at least one of them) are able to pass through the firewalls so that RTT can be determined.
.csv
file) or custom location entries to identify what the best VIP for a client is. Note that in 11.0, owing to popular demand, Citrix now includes a built-in GeoIP database for this purpose.If static proximity is not working as it should, check the following:
show locationFile
command, as follows:nsmap
command. This will tell you which site the NetScaler thinks the Client belongs to. This should explain any mismatch between where you expect the client to be directed to versus where they are actually being sent.In the absence of such an entry, you will see a round robin behavior which is suboptimal.
18.217.198.254