Global Server Load Balancing

Global Server Load Balancing (GLSB) does for data centers what load balancing does for servers. It directs users to the best possible resource (best data center for a resource). Best here can mean, the closest, the healthiest, or the least loaded.

GSLB uses a proprietary protocol called Metric Exchange Protocol (MEP) to obtain the metrics needed for this decision. The result of GSLB processing is an Address (A) record in response to a DNS request that the client will then use to obtain the resource.

In this section, we'll look at what a typical first time flow looks like, what some of the considerations are for GSLB to work correctly, and then move on to troubleshooting.

GSLB flow

It is important to understand the several layers involved in a GSLB conversation to be able to troubleshoot it. Let's take a look at an example of a User wanting to launch the home page of web.xmx.lab, which belongs to the xmx.lab domain:

  1. User accesses web.xmx.lab for the first time.
  2. The DNS request for web.xmx.lab lands on the local DNS (LDNS) that the User's computer uses for DNS resolution.
  3. LDNS doesn't know the answer to this request, so it looks up the addresses of the root servers it knows and contacts them for further direction.
  4. The root will not know the address either, but let's say it knows the TLD server that is aware of all .lab domains. In this case, it will return this TLD server's address to the LDNS.
  5. LDNS contacts the TLD, who in return provides the IP address of the DNS server authoritative for xmx.lab.
  6. LDNS then contacts the authoritative name server for xmx.lab and this is the point at which GSLB comes into the picture. The authoritative name server, instead of directly providing an IP that represents web.xmx.lab, provides the IP addresses of the ADNS service of one of the several NetScalers participating in GSLB.
  7. At this point, NetScaler, given its visibility into the various site metrics, decides what site IP is best to return and via the ADNS service, provides it to the client.
  8. The client finally sends the actual request (HTTP, HTTPs, and so on) to this IP, considering it as the IP of web.xmx.lab.

    Step 6 in the preceding flow can also use CNAMEs instead of the ADNS service IP. In this case, the Authoritative DNS will send CNAME records for one of the ADNS services and their corresponding IPs (glue records). The rest of the process is the same.

Metric Exchange Protocol

Understanding the exchange that happens between two (or many) NetScalers participating in GSLB is key for troubleshooting. Exchanging MEP packets is how NetScalers participating in GSLB communicate to each other about key statistics for each of the GLSB services, which in turn are the configured LB vServers and services that they represent. These statistics include:

  • The state of the GSLB service
  • Open connections
  • Surge queue values
  • The amount of requests being handled
  • LDNS RTT information
  • Persistency information

This will then allow the NetScaler receiving the DNS request to decide which target DNS IP to return.

MEP versus monitors

There are two means by which the local GSLB NetScaler monitors the status of remote sites. MEP is the best way to monitor remote sites, as it includes all this useful information allowing for intelligent GSLB decisions, but it isn't the only way. You can also bind monitors for this purpose. The best practice is to use the Use Monitors when MEP is down option. This way, you indicate a preference for MEP but will fall back to monitors should there be a problem with the MEP exchange.

RPC considerations

RPC settings give you control over MEP exchange. When you add a GSLB site, it will automatically result in an RPC node entry to give you control over how that site communicates.

You can view these entries using show rpcnode. In the following screenshot 1 is NetScaler HA related; it is for the NSIP. 2 and 3 are for MEP. The entry says that to reach site 192.168.1.51 (site IP), use the password shown and use any available source and use non-secure communication.

RPC considerations

It is a good practice especially if the communication between the sites is happening over the Internet to enable secure MEP, change the default password, and finally also set a specific source IP so that it sits well with your firewall rules.

RPC considerations

In the preceding example, I am choosing the recommended settings and specifying that I want to use my SNIP, which is also my local site IP to talk to the remote sites.

Troubleshooting GSLB

Now with an understanding of the considerations, let's approach some of the common issues that NetScaler admins run into with GSLB and how to troubleshoot them.

DNS caching and GSLB

When troubleshooting GSLB, it is important to query the ADNS service directly to understand if the behavior is as expected or incorrect. There are several points in the GSLB path, where caching might mean you are not looking at the NetScaler's exact response. These are:

  • End User application (for example, browser)
  • The OS
  • The LDNS
  • Upstream DNS servers of the ISP

The amount of time ideally should be based on the TTL, but some devices will cache the entries for much longer as a way of reducing upstream DNS traffic. While you have limited possibilities for overriding the caches given that they are external entities, for troubleshooting at least, the best method would be to use nslookup or dig to query the ADNS directly.

DNS caching and GSLB

Querying ADNS directly to verify gslb results

MEP down issues

If MEP is configured but shows as Down; this is either a network related issue (most common reason), or an RPC configuration issue.

MEP down issues

MEP showing as Down

Network considerations for MEP:

  • MEP uses one of two ports based on secure/nonsecure:
    • 3011 by nonsecure
    • 3009 if set to secure

    These firewall rules need to allow bidirectional communication for these ports

  • MEP connections are long-lived; intermediate devices should not attempt to tear this down

RPC related issues

The default configurations will never fail as such, but then, as we discussed, there are good reasons why you might want to adjust RPC configuration. While doing so though, ensure that the password settings match on both sides. The following nsconmsg command is handy when looking at MEP issues.

RPC related issues

The nsconmsg command for troubleshooting MEP

The output will tell you:

  • Whether MEP is being sent and received; if it's being sent on one end but you don't see gslb_tot_gslb_msgs_rcvd going up on the other, it's time to take a trace and follow traffic on ports 3009 or 3011 depending on whether secure is enabled or disabled
  • Whether there are any RPC related errors contributing to MEP failures

Troubleshooting proximity-based methods

Proximity-based methods use the location of the Local DNS server (LDNS) of a User as a guide to understand where the User should be directed to. Again, the closer the User is to the LDNS, the more accurate the decision, because the LDNS IP is all the NetScalers will see, not the actual Client IP.

There are two methods of Proximity:

  • Dynamic proximity: This involves each GSLB partner NetScaler to calculate RTT values to the client as a means to determine what the best location for the User is. The ADNS service that fronts dynamic RTT counts on being able to use one of the three probes –ICMP, DNS, and TCP in that order. The SNIP will be the one initiating the connections. Using the time taken for the response, the RTT table is populated. What is learnt is also shared between the sites using MEP.

    The only troubleshooting step for this method is to verify that these protocols (or at least one of them) are able to pass through the firewalls so that RTT can be determined.

  • Static proximity: This is more widely used than dynamic proximity. It involves the NetScaler referring to a database (.csv file) or custom location entries to identify what the best VIP for a client is. Note that in 11.0, owing to popular demand, Citrix now includes a built-in GeoIP database for this purpose.
    Troubleshooting proximity-based methods

If static proximity is not working as it should, check the following:

  • If you are using the database method, verify that it is correctly loaded with the show locationFile command, as follows:
    Troubleshooting proximity-based methods
  • Query the file with the Client's IP using the nsmap command. This will tell you which site the NetScaler thinks the Client belongs to. This should explain any mismatch between where you expect the client to be directed to versus where they are actually being sent.
    Troubleshooting proximity-based methods
  • If you are using custom entries, ensure that the service IP (10.72.142.55 in the screenshot) falls within the boundaries of a specific location entry (IP from 10.72.142.1 IP to 10.72.142.75). This is how the NetScaler identifies which site lies in which location.
    Troubleshooting proximity-based methods

    In the absence of such an entry, you will see a round robin behavior which is suboptimal.

  • Also bear in mind that custom entries and any DNS views have preference over the database entries. So search the configuration file to ensure that there isn't such an overriding configuration, which might explain the behavior you are seeing.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.198.254