Distance is money. For businesses on the move, distance covered by the fleet is used to reimburse fuel, pay out contractors, charge customers, and expense workers. Inaccuracies in either direction lead to friction between the payer and payee. Over-billing leads to the customers and businesses losing out, and under-billing leads to employees and contractors losing out. And yet, accurate distance computation remains evasive. This post illustrates the technical challenges using real-life examples and recommends a solution.

About the examples

Examples illustrated in this post use data from a bike delivery service in S.E. Asia. The examples use OpenStreetMap, self-hosted on AWS ECS, running with following configuration.

{ "mode" : "driving", "threads": 10, "algorithm": "MLD", "max-viaroute-size": 500, "max-table-size": 500, "max-matching-size": 500, "max-nearest-size": 100, "max-alternatives": 3}

To know more about these OSRM configurations, please refer to Project-OSRM.

Although the ideas and concepts in this post are illustrated using OSRM, they would hold just as well with any other maps provider–Google Maps, Apple Maps, Mapbox, HERE Maps, TomTom or other regional map providers.

Using maps to estimate routes is false comfort

In absence of actual locations, maps provide the best estimate for point to point distance. Drives are the primary contributor to chargeable distance. From two wheelers to large trucks, the primary mode of transport for businesses is by motorable roads. Given the source and destination of a journey, maps will give you the route options from one to the other, including distances. Advanced maps provide routes that factor in vehicle type by eliminating roads that prohibit certain types of vehicles, or including turns that might only be accessible for those vehicles. Customers, fleets and businesses find these maps trustworthy because it is easy to open up maps on their phones or computers, and verify the distances between two points.

However, reality on the ground is different. Drivers might take detours due to business considerations like customer or business request, or on-ground conditions like road closures or traffic. Maps might over-estimate routes e.g. when legitimate turns are considered invalid; or under-estimate e.g. when wrong ways are considered legitimate. These errors erode trust when they are large and financially meaningful.

Fig 1: Actual route v Estimated route using maps

In the example above, the image on the left shows the actual route taken by the driver from point A (green marker) to point B (blue marker), while the image on the right shows the estimated route from A to B using OSRM.

The actual route shows that the driver took a lateral detour midway on the route and then came back on route. Due to this detour the distance travelled increased from 5.7 km (estimated by maps from A to B) to 8.2 km–a 43% increase.

Additional discrepancies arise due to inaccurate locations corresponding to a source or destination address (aka geocoding). In some regions, this might happen as often as one of out three cases, and might stray from actual locations by several hundred meters.

Using driver app for actual routes needs accurate locations

Business on the field is moving to apps. Apps are used by drivers to manage their work on the go. This presents the opportunity to use app location to get actual routes covered by the device and compute better distances. However, this is easier said than done.

App locations have inaccuracies. Getting locations from the OS and then to the server is unreliable. Using unreliable and inaccurate locations to compute actual distances might be more erroneous than point-to-point estimates.

Software teams often make the mistake of making it the map's problem. They feed inaccurate and unreliable locations to maps APIs to get routes.

Giving more locations to the map than just source and destination will result in more accurate routes and distances, right? Wrong!

Teams expect map matching or snap-to-road to remove inaccurate locations, and fill in location gaps with estimated routes. However, a typical set up that uses device locations with maps might observe that this fails 10-40% of the times. This is too large to ignore in a business where distances have material impact.

The issue is garbage-in-garbage-out. Maps are powerful, but with great power comes great responsibility. Let's take a closer look.

Errors introduced by maps

Here are two examples of locations thrown at maps for getting actual routes and distances. It leads to over-fitting by a factor of ~2.5 in one instance, and under-fitting by a factor of ~15 in the other!

Over-fitting by Maps

Fig 2: Actual route v Map-matched route

The image on the left shows fairly accurate locations for a short 1.1 km drive. When the same set of locations were provided to maps for map matching, the output as seen in the image on the right showed several additional turns due to:

  1. One-ways: Maps disallowed driving the wrong way, though these might be private streets or lanes where two-wheelers are able to go the other way
  2. Small streets: Maps disallowed cutting through a small street, which the two-wheeler actually cut through in reality

As a result, maps over-fit the locations and computed the distance as 2.8 km, which is ~2.5 times the distance calculated using actual locations (1.1 km).

Under-fitting by Maps

Fig 3a: Actual v Map-matched route (zoomed out)

The image on the left shows a cluster of locations at the start of the journey (top), and another cluster of locations at the end (bottom). There was a tracking gap on the rider's phone, perhaps due to app being terminated in the background or low-battery mode disallowing background location access.

When these locations were provided for map matching, the cluster of locations at the end were removed as noisy. Map engines decided that the locations were too few and too far away, and likely noise.

Please note that even though some maps accept location timestamps as inputs, even leading commercial map matching algorithms lack the sophistication to use them meaningfully, thus resulting in false negatives like in this example. The onus is on your system to break the input into parts and use the right map configs for each.

Fig 3b: Actual v Map-matched route (zoomed in at start)

In the zoomed in image of the same example, we see that maps did a good job of snapping the locations at the start of the journey to the road, thus leading to better accuracy. However, the elimination of the rest of the journey resulted in a 15x gap (0.3 km v 4.4 km)!

In this instance, the system that sends locations for map matching needs to review the output, infer false negatives, reintroduce those locations, and call maps again to get an estimated route during the gap, viz. from the last point at the start to the first point at the end. In absence of this sophistication, even a simple (Haversine) distance between actual locations will result in a better outcome than map matched distance.

Using maps responsibly

At this time, we would like to remind the reader that cartographers computed map distances using actual locations traversed. Locations with latitude, longitude and altitude were calibrated at discrete intervals, distances were computed between those points, and then added up. The accuracy of each such point contributed to the accuracy of map distances. While we must reuse the good work done by maps, location input to maps must be provided responsibly.

The two important parts of the location inputs are:

  1. Removing inaccurate outliers that might make maps over-fit
  2. Eliminating data gaps that might make maps under-fit

Use HyperTrack, it just works

Using estimated routes from source-to-destination to compute distances is unfair to your business and its stakeholders. Throwing raw locations to maps and use what comes out will not cut it either. To do a good enough job of using app locations and maps to compute distances so your business can depend on it requires work.

HyperTrack has solved this problem so you don't have to. Focus on your business and leave accurate distance computation to us. Follow this simple guide to compute automate distance based payouts. To set up a pilot in production to compare with your current system, write to us or simply sign up to get going.