Experience report GeoEnrichment

Introduction

If you wish to display your data on a map or perform cluster analysis, you need geo-information, such as longitude and latitude. Since this information is often not collected, you need a solution that addresses this issue. In addition to longitude and latitude, other additional information can be collected, such as building types, reviews, opening hours, contact details and more.

This blog is intended as an extension to the first blog on the topic of "SAP DATA Intelligence: GeoEnrichment via API". While the last blog focused on a simple use case, this blog covers experiences and lessons learned that should be considered in the process. Here, special attention is paid to potential pitfalls to be aware of.

First, use cases are considered and then legal and structural aspects of API usage are presented, as well as the differences between the various API's. Subsequently, coverage rates and search parameters are highlighted and finally the processing of the results is presented. When it comes to actual use, the legal framework must be checked, as interfaces and terms of use can change quickly. Therefore, this blog is only based on our own experiences and conditions.

Case stiches and experiences

What is the scenario?

Before a project is concretized, there must be a consensus on which scenario will be pursued. This statement is particularly relevant in the context of GeoEnrichment.

Here, we will limit ourselves to the two best-known scenarios:

Scenario Description
Enrichment There are already points of interest to which longitude and latitude are to be added.
Sourcing

New points of interest are to be searched for and saved.

An enrichment scenario could be to illustrate the location of customers on a map in the SAP Analytics Cloud to find out in which areas no own customers are yet represented. The figure below shows a simplified version of a corresponding map. Another scenario can be the optimization of delivery routes and distances. This requires longitude and latitude data before it can be calculated which is the fastest route or the route with the least fuel consumption.

Figure1: Quelle: https://samples.azuremaps.com/controls/bring-data-into-view-control

One sourcing scenario could be finding new sales locations. This involves explicitly searching for new locations or those where many people congregate.

Depending on the scenario, the API should be chosen as they can differ greatly and have a big impact on the cost structure. This means that, for example, for an enrichment scenario that only involves looking up longitude and latitude, it is not necessary to choose an API that provides a lot of information (building type, website, phone number, etc.) in addition to longitude and latitude. This causes higher costs than an API that only returns the longitude and latitude.

May I store the data?

The most important thing is to evaluate whether the storage of the data is allowed. To use the geo-maps in the SAP Analytics Cloud, longitude and latitude are required, which must necessarily be stored somewhere. Furthermore, if other potential points of interest are found, the information will only reach its full potential if you can store the data.

Storing data is the exception rather than the rule. Data providers agree to store data if it is assured that the paid service (API call) will be used regularly. The terms of use can be used for classification. Google generally prohibits the storage of data, but also makes exceptions if regular use of the service is assured. Azure Maps also offers free storage in some cases. Here, however, it depends on the service used. Bing is similarly situated to Azure Maps. With Open-Source Map, a separate server is set up. Thus, there are no difficulties with the storage, but a possible restriction in the volume per call.

What data do I need and which API is suitable for it?

This question strongly depends on the choice of the scenario. Afterwards, it should be evaluated which data is needed, which data is optional and which data will not be needed in the future. It is important to look into the future so that all options can still be used without losing sight of the actual scenario.

The table below answers to questions:

  1. Which API do I use in which use case?
  2. Which alternative is available for a use case?
Use cases Google Service Azure Map Service Bing Maps Service

Free text search by Point of Interest:

„Theater, Dortmund“

Find Place

 

Text Search

Get Search POI

 

Get Search Fuzzy

Find a Location by Query

Search by address:

„Theaterkarree 1-3, 44137 Dortmund“

Geocoding Get Search Address Find a Location by Address

Free text seach by new Point of Interest:

„Italienisches Restaurant, Dortmund“

Nearby Search

 

Text Search

Get Nearby Search Location Search

Detail search by Point of Interest:

„Interne_ID“

Place Details Location Data

In general, only a small selection of the possible APIs was shown. For example, distance or route APIs were not compared. It should be emphasized that many of the Azure Maps Services can also be executed as batch services, which can simplify processing. In a direct comparison, there are differences. Azure Map is an alternative to Google Maps when searching for longitude and latitude. For free searches, Google shows significantly more experience and performance. For perimeter searches, it is burdensome to evaluate the quality of the two APIs. Bing scores worse than the competitors in all points. It is no longer being developed as Azure Maps is being developed further.

The API choice should be made wisely, and the scenario should be tested in detail so that time and labor are used efficiently.

There are also open-source maps that only provide longitude and latitude. However, it is difficult to make a statement about the coverage and validity of the data, because the open-source maps are often filled by the community itself.

How high is the coverage rate?

The geo APIs have different levels of coverage. Thus, it strongly depends on the use case when what should be used. The keyword when searching is coverage. Here the coverage can be compared. The coverage influences the output accuracy just as much as the search parameters.

Data Provider America Europe Asia South America
Google Maps ++ ++ 0
Azure Maps ++ ++ 0
Bing Maps + + 0 0
Baidu Maps ? ? ++ ?
MapMyIndia ? ? ++ ?

The European and American regions are very well covered by the major providers such as Azure and Google. The Asian countries are dominated by the Chinese provider Baidu and an Indian counterpart. Country restrictions must be considered here. Baidu Maps can only be used with a Chinese phone number, for example.

Open-source maps are often maintained by communities. Valid results can be expected here, especially in the European and American regions, but outside of these, the results are often less meaningful than those from Google or Azure. Bing is not developed further but can still be used. Compared to Azure, however, Bing is able to offer its services for Japan. This is not yet possible for Azure.

How do I create meaningful search parameters?

The search parameter must distinguish which scenario and which data are available.

Scenario Search Parameter Description
Enrichment „Theaterkarree 1-3, 44137 Dortmund, Deutschland“ The structure of the postal address changes from country to country.
Sourcing „Theater, Deutschland“

Keywords are often searched for and no address is available. The country and, if necessary, the city should be included.

If it is possible, an area can also be selected as a further parameter.

If further information about a point of interest is sought, it always depends on the input parameters of the respective API. Google's Place Details API requires the Google internal ID (placeid), which can only be obtained with another API call. The Get Location API from Bing requires the address. Here the correct address must be selected, otherwise the wrong house and therefore the wrong information will be sourced.

How are results influenced?

The various APIs often have a location bias, which influences the results in relation to one's own location. This means that when searching for a Turkish address, if one's own IP originates from Germany, a German result is first searched for and possibly output. Here, countermeasures must be taken to ensure that the desired results are found in other countries.

Another bias relates to the language in which the search is conducted. The search results can be improved in a further step if they are written in the correct language. Here, a suitable solution should be planned that ensures translation into the correct national language. However, its use also increases the effort required.

How do I verify my results?

The accuracy can be controlled manually as well as automatically.

Scenario Description
Enrichment Manual control urgently needed.
Sourcing

Manual control depends on the scenario.

The returned data is consistent.

Possible source of error: category error. The search is for "Theatre in Dortmund", but an opera in Dortmund was returned. The information about the opera is consistent, but it does not fit the category.

To keep the manual controls as low as possible, there are automated logics:

The automated control must be built into the processing logic of the returning results. It is always necessary to check if the searched results are in the right country. Here it is necessary to check this programmatically and to control the further processing. Unfortunately, it cannot be limited to only one country. In the case of the city, this must also be considered, whereby the verification on a city basis becomes significantly more complex.

There are several ways to process this. An evaluation can show in which use case which processing makes sense.

How do I process results?

In addition, error-free processing must be ensured. During the processing of get requests, various potential sources of error occur that must be taken into account.

For example, not only one result is always delivered by the APIs. Several delivered results for a search parameter must be checked and processed further. This can be implemented via self-written criteria, but also with special algorithms or even via ready-made libraries.

In addition, error handling must also be considered. Here it is not enough to work with the status of the response. In some cases, the returned package of information changes and no longer contains the information that is expected. This must be picked up and processed.

In summary, the following can be stated:

The evaluation of whether and for how long the data may be stored is essential. After it is clear with which search parameter and which data is being searched for, different providers must be compared to define the best API for the scenario at hand. The coverage rate should be used to ensure the search of the best results. Furthermore, it is important to consider how the result is influenced, how it can be controlled against and how the quality and accuracy of the results can be increased.

Basically, the better the data quality of a search parameter and the more precise the idea of what result should be achieved, the better GeoEnrichment can be implemented.

The ranking in the figure below builds on the points described above and attempts to represent them in a highly simplified way.

Criteria Google Maps Azure Maps Bing Maps Open Street View
Data volume +++ + + +
Data validity ++ ++ 0 0
Data storage 0 + + +
Costs 0 +++
Data coverage ++ ++ 0 0

 

Conclusion

Even though this blog mainly describes the pitfalls of GeoEnrichment, it should not be overlooked that GeoEnrichment brings enormous potential. Very good results can be achieved with existing providers that add value.

When deciding on data enrichment, this blog can be used to help. Many problems occur only when processing the data and can take a lot of time. That's why it should be emphasized that when it comes to GeoEnrichment, the effort involved should not be underestimated. However, the possibilities are amazing. Especially the Google Maps API offers a wide range of different use cases due to its differentiated functionalities.

 

If you are interested and would like to evaluate whether GeoEnrichment makes sense for your projects, feel free to contact us. Also if you have further questions, we are at your disposal!

Contact Person

Julius Nordhues

Consultant