Geolocation has been an important parameter for Global Server Load Balancing and server selection processes for many years now. In an ideal scenario, by using geolocation information all client requests are directed to the nearest available and optimal data center by taking into account the physical location of the client. In network communication, each IP packet sent contains an IP header that contains the IP address of the sender. There are several frequently updated and available databases of IP blocks which relate to a specific geographic region and are used by many services. These databases can include the country, region or city along with the user’s IP.
The main issue with geolocation is that it can be an effective tool, but only in some circumstances under certain conditions. The accuracy of the IP’s provided in the database is usually high at the country level, but falls off at regional and especially city level. This presents a problem in large countries with millions of users where the geolocation can be off by hundreds of kilometers. This can lead to service performance issues if a person from the West coast is accessing a server from the East coast due to the poor geolocation. Another example is with two or more datacenters available with an approximately similar geographic distance from the user but with very different network distances. A further problem with geolocation is that the provided result sometimes is the geographic location of the recursive DNS server, instead of the client that is trying to reach the service.
Privacy is also another concern with geolocation. Sometimes it is in the interest of the users to remain anonymous and they prefer not to reveal their location. To ensure the anonymity, the users will use proxy servers and VPNs to hide their real location. This however, can have a negative impact on the performance of the service, or the service may be blocked if a presence of a proxy server or VPN is detected.
Precise and accurate geolocation is also no guarantee of the best possible network and service response time because geographic distance and network (topological) distance can vary significantly. A study was conducted with 6 clients accessing 2 different servers within the same data center, where each server was connected to the Internet by two different ISPs that belong to different autonomous systems. All clients also had different ISPs that also belong to different autonomous systems and they were located in the same city within a radius of 3 kms from the servers’ data center. Because all of the six clients belong to a different network, every client has a unique network distance to the servers although they have the same geolocation. The study showed a big difference in the measured RTT (Round Tripe Time) and the number of hops from clients to servers, even though all had the same geolocation. The biggest topological distance between the client connecting to the same server connected on two different ISPs was 8 hops (reduced from 19 to 11 hops) and an RTT difference of 57 ms (reduced from 85 ms to 28 ms). A huge difference of 42% hops decreases and 67% RTT decrease was created because in many cases, different ISPs do not have a direct interconnection between them close enough. In this case, the client was geographically 3 km apart from the server, but in first case had an almost 2000 km topological distance since the IXP (Internet Exchange Point) of the client’s and server’s ISPs was far away in a different country. The largest difference in hop count and RTT between clients accessing the same server was 15 hops and 63 ms (19 hops and 85 ms for the topologically furthest client and 5 hops and 23 ms for the network closest client). This confirms that geolocation is not a sufficient method to determine the optimal server for multiple available network services as geographical and network distances can differ significantly.
Another issue with DNS geolocation is that it relies on the IP address of the recursive DNS server instead of the actual client. EDNS Client Subnet (ECS) is a feature in the Extension Mechanisms for DNS that allows a recursive DNS resolver to state the subnetwork for the host or client on whose behalf it is making a DNS query. It’s usually used to help speed up the delivery of data from content delivery networks, by utilizing DNS-based load balancing so it selects a service address near the client when the client computer is not necessarily close to the recursive resolver. One of the security flaws unfortunately of the ECS is that with the feature turned on, the network address of the client that initiated the resolution becomes visible to all servers involved in the resolution process. It will also be visible from any network traversed by the DNS packets. The RFC strongly recommends for the feature to be turned off by default in all nameserver software, and that operators only enable it explicitly in those circumstances where it provides a clear benefit for their clients. They also encourage the deployment of means to allow users to make use of the opt-out provided.
Network distance measurement however, is a far superior method and technique to using geolocation. By measuring the network distance (round trip time) between the client and available data sites, the client is guaranteed to access the service from the data center and server that is topologically closest to him and can provide him with the best performance.