Distance Calculations
A review of distance calculations and a retrospective on simple bounding.
A few years ago, I was working on a project that utilized a
lot of telemetry data and with that a lot of discussions around how to calculate
distance between coordinates. Obviously,
there are known ways to calculate the distance with great accuracy, but they
are computationally costly.
The core issue we focused on was with trying to dynamically determine job sites. We wanted to understand when a crew was working, so we wanted to see when multiple units were near units that had specific triggers in the telemetry. So, when a job site was defined as potential, we wanted to do a check for how many units were within a specific radius at that time. We had a lot of discussions around implementing a distance check with a radial distance vs bounding boxes. At the time we were thinking in terms of radial distance, or specific point to point distance. Our developers at the time were concerned that the number of calculations per day could be overwhelmingly burdensome to any process and would tank a good system.
So, what calculation would be burdensome? The go-to distance calculation is the haversine
formula, also known as “great circle distance”, assumes two sets of latitude
and longitude ${(latitude_1, {longitude}_1), ( {latitude}_2, {longitude}_2)}$
$$ {Distance} = acos(sin( {lat}_1)*sin( {lat}_2) + cos( {lat}_1)*cos( {lat}_2)*cos( {lon}_1- {lon}_2)) $$ Note: acos is the function for arccosine or cos-1.
This is a difficult formula, and it takes good testing to make sure all the sines and cosines are properly accounted.
For development speed we landed on quick boxes, simply truncating the decimal latitude and longitude to .0001 or .001
rounding distances. 0.0001 degree is roughly 1.11m and 0.001 is 11.11m at the equator. This was fine, but we always
wanted to look at solutions that worked better with radial distance and what the error was from simple degree rounding boxes.
This is a deep dive I didn't have the time for back then and what are some shorthand calculations to have lower error than bounding
boxes.
This is a quick look at boxing versus radius distance. This is on a plane, and the impact of that is on the next tab.
To compare the two, we’ll simplify down to the unit circle, a circle with a radius of 1 from the origin.
The circle and square have areas:
$$A_{circle} = π r^2, A_{square} = L*W$$
With a radius of 1, assuming the square is capturing the same maximum distances, the length and width are 2.
$$A_{circle} = π (1)^2, A_{square} = (2)* (2) ⟹ A_{circle} = π, A_{square} = 4$$
With that, the square captures 27.3% more space than the radial distance. A quick simulation shows how
~21% to 23% of total points in a square area outside of the circle.
So, we have a baseline error rate of roughly 23% more than radius calculation. The next tab will look at alternate
calculations to the haversine formula and see how much error they produce.
Above is a comparison between 3 alternate formulas compared to the haversine formula. The default example is from the Art Deco Welcome Center
in Miami Beach to the 8th Street lifegraud tower. This is just under 200m, and around the size of some check's we were looking at for job sites.
Most of the errors are between 0 and +/- 5% from the haversine calculation. Worst case then is around 10 meters, which is better than worst case
with bounding boxes of around 82 meters or 41.4%.
$$ 200 * √2 = 282.8427, 282.8427/200-1 = 41.4213% $$
Earth's Radius
Because Earth is an oblate spheroid, there is a frequently used mean radius of 6,371,009 meters. This
page
and this
wiki
have specifics about the measurement at the equator and poles, as well as the calculation of the mean and other uses.
Haversine Formula
For reference, the haversine formula, after converting all cooridates to radians:
$$ {Distance} = acos(sin( {lat}_1)*sin( {lat}_2) + cos( {lat}_1)*cos( {lat}_2)*cos( {lon}_1- {lon}_2)) * radius$$
The Polar Approximation
This function, and the others, rely on the standard Euclidean distance. This simplified method also converts cooridates to radians.
$$ Lat_{Mean} = (latitude_1 + latitude_2)/2 $$
$$ x = (longitude_1 - longitude_2)*πR/180 * cos(Lat_{Mean}*π/180) $$
$$ y = (latitude_1 - latitude_2)*πR/180 $$
$$ Distance = √(x^2 + y^2)$$
Simple Approximation
Taking from the Polar approximation, this looks at how accurate you can be with Euclidean distance. It is incredibly simple,
but works well for general approximation with a bigger error rate, but better than bounding.
$$ x = (longitude_1 - longitude_2) $$
$$ y = (latitude_1 - latitude_2) $$
$$ Distance = √(x^2 + y^2) * 105,000 $$
Modified Simple Approximation
This modification adds a weight to the 105,000 scaling factor. The error rate with the simple approximation is much lower with
small difference in latitude at 100,000, while it is far better with small longitude differences at 111,000. This attempts to
balance with a scale factor if there is more latitudinal difference or longitudinal difference.
$$ x = (longitude_1 - longitude_2), x_2 = x*π/180 $$
$$ y = (latitude_1 - latitude_2), y_2 = x*π/180 $$
$$ Distance = √(x^2 + y^2) * (105,000+log(y_2/x_2)*1000) $$