SAP DATA Intelligence: GEO-Enrichment through API

This blog explains how to use the OpenAPI client in an SAP Data Intelligence data flow. Information about a specific location (point of interest), e.g. geo-coordinates, opening hours, websites are to be obtained with the help of GoogleMaps. It is thus shown how data can be enriched with additional information with the help of APIs.

This blog follows the following structure:

  1. Scenario in SAP Data Intelligence
  2. Prerequisites
  3. Structure
  4. Execution
  5. Outlook
Scenario in SAP Data Intelligence

SAP Data Intelligence (formerly Data Hub) is offered as a data management and pipeline solution. Data flows for orchestrating and monitoring data can be created flexibly and offer insights into the entire IT landscape through various connection options. In addition, these data flows can be cleansed, filtered, enriched, operationalized and optimized, leaving a solid database that can be leveraged by ML scenarios.

In Pipelines, there are many different operators that can perform the functions mentioned above. A large part of the operators is self-explanatory and easy to handle. However, there are other operators that are not so self-explanatory. One such case is the standard operator "OpenAPI Client". We will describe this in more detail with the GeoEnrichment use case.

 

 

For this blog, the Google Places API (Application Programming Interface) is to be used with the OpenAPI client. This can provide detailed information for a Place Id (Google internal unique ID).[1]

Requirements

In order to call a Google API, you first need a so-called API Key, with which you identify yourself for each call and are billed via the Google Account after a certain number of calls.[2],[3]

In addition, a Place ID must be available. This ID represents a specific location. This can be read with the Google Geocoding API from names and addresses or an example is taken from the documentation. In our case we take a Place ID from the official documentation.

Structure of the graph

The structure of the graph takes place as follows:

  • Python (Alternatively: JavaScript) operator to prepare the input parameters for the OpenAPI client operator.
  • Optional wiretap to review the input parameters for the OpenAPI Client Operator
  • OpenAPI Client Operator is the actual heart with the API call
  • Optional wiretap to review the result of the API call
Calling the OpenAPI Client Operator

The first step is to configure the OpenAPI Client Operator. We select "manual" as the Connection Type to be able to maintain it directly. In productive operation, the connection parameters should be stored in the Connection Manager.[4] In the configuration area of the OpenAPI Client we have to fill Host, Schemes, Base Path, Method and Produces.

The filling of these fields can be understood with the documentation of the Google Place API already linked above. The figure below reflects an excerpt of the documentation, from which the areas of the URL with their corresponding meaning can be taken.

Abbildung 1: Eigene Darstellung in Anlehnung an Google Places Documentation

The "mandatory field" API Key Value is not filled directly in the OpenAPI client operator in this example. This is filled in the Python operator so that the input can also be scheduled as a parameter when starting the graph. The API Key Name can be freely chosen and "header" must be selected as API Key Type.

Python Operator

This now leads to the Python operator. This operator does not have to be configured or linked to a Dockerfile, but only adapted in the coding.

The OpenAPI client operator processes an input message. This is created in the code of the Python operator with the help of the "header" dictionary. The dictionary contains the API key and the place ID that are passed to the OpenAPI client.

The labels...

header[“openapi.query_params.key“] or
header[“openapi.query_params.place_id“]

follow a certain logic, which can be found in the OpenAPI Client Operator documentation.[5]

Additional header attributes can be used to pass more information for later processing. For example, we passed user and roles. The additional attributes have no influence on the result of the OpenAPI Client Operator and are therefore only passed on, but not processed.

After realization of the configuration and implementation, the graph can be executed.

Execution

Once the graph has been executed, wiretaps can be used to view the various processing steps. The Python operator passes a message to the OpenAPI client operator, which was created from a dictionary in Python (see below). This message is the input to the OpenAPI Client Operator. The wiretap illustrates which information is passed and how. Also, it shows what can help with potential problems.

Result:
[2022-01-29 13:24:15,480]

{“Role“:“Consultant“, “User“:“Nordhues”, ”openapi.query_params.key”:”my_api_key”, ”openapi.query_params.place_id”:”ChIJgUbEo8cfqokR5lP9_wh_DaM”}

 

The OpenAPI client processes these results and outputs them in the final wiretap. The figure below shows only a small part of the delivered data.

Outlook

This example should be used as an example for further API's. The structure, after which API's are called, often does not differ. Accordingly, this example can also be used for other API's, for example What3Words.[6] It remains to be said that API guidelines are to be studied in detail, which defaults there are regarding storage and further processing of the API results.

Error handling

Further processing of the output string is more complex if we have a scenario where many API calls have to be made. Each OpenAPI client operator processes only single calls at a time, accordingly more operators need to be added in the graph. In addition, error handling functions must be implemented in case the OpenAPI client operator does not find a result.

Package processing

In a scenario where a simple graph is to process multiple records, it is necessary to wait until all records are processed before taking further steps. The Python operator initiates an API call for each record (through the OpenAPI Client operator). The step after the OpenAPI Client Operator is thus started for each row individually. Creativity is now required to ensure that all records have been processed and recollected before further steps follow.

Examples and scenarios

When using GeoEnrichment API's, you end up with a dataset that has been enriched with geo-data and other information about a specific location. This results in many different use cases. For example, the use of the map functionality in the SAC or distance calculations between different stores.

For example, in the figure below, longitude, latitude, and type of location were read for the three locations. This allows data sets to be built and filtered more granularly, so that sales can be reported by type, for example.

If you have any questions regarding the conception or implementation of SAP DI, please do not hesitate to contact us.

More information:

Contact Person

Julius Nordhues

Consultant

Data Products Setup

I’ll start with Data Products setup. If you’re new to the concept, this recent video is a great starting point, but here’s a short summary. A data product is a well-described, easily discoverable, and consumable collection of data sets.

Creating a Data Product in Datasphere

Note that in this article I create Data Products in the Data Sharing Cockpit in Datasphere. This functionality is expected to move into the Data Product Studio, but that had not taken place at the time writing.

Before creating a Data Product in Datasphere, I need to set up a Data Provider profile, collecting descriptive metadata like contact and address details, industry, regional coverage, and importantly define Data Product Visibility. Enabling Formations allows me to share the Data Product with systems across your BDC Formation – Databricks, in this case.

With the Data Provider set up, I can go ahead and create a Data Product. As with the Data Provider, I’ll need to add metadata about the product and define its artifacts – the datasets it contains. Only datasets from a space of SAP HANA Data Lake Files type can be selected. Since this Data Product is visible across the Formation, it is available free of charge.

For this demo, the artifact is a local table containing ten years of Ice Cream sales data. Since this is a File type space, importing a CSV file directly to create a local table isn’t an option (see documentation).

I used a Replication Flow to perform an initial load from a BW aDSO table into a local table.

Once Data Product is created and listed, it becomes available in the Catalog & Marketplace, from where it can be shared with Databricks by selecting the appropriate connection details.

Jump into Databricks

To use the shared object In Databricks, I need to mount it to the Catalog – either by creating a new Catalog or using an existing one.

Databricks appends a version number to the end of the schema – ‘:v1’ – to maintain versioning in case of any future changes to the Data Product.

Once the share is mounted, the schema is created automatically, and the Sales actual data table becomes available within it. From there, I can access the shared table directly in a Notebook.

Creating a Data Product in Databricks

To create a Data Product in Databricks, I first need to create a Share – which I can either do via the Delta Sharing settings in the Catalog:

Or directly out of the table which is going to become a part of the Share:

Since a single Share can contain multiple tables, I have the option to either add the table to an existing Share, or create a new one:

To publish the Share as a Data Product, I run a Python script where I define the target table for the forecast and describe the Share in CSN notation, setting the Primary Keys. Primary Keys are required for installing Data Products in Datasphere.

Jump back into Datasphere

Once the Databricks Data Product is available in Datasphere, I install it into a Space configured as a HANA Database space – since my intention is to build a view on top of the table and use it for planning in SAC.

There are two installation options: as a Remote table for live data access, or as a Replication Flow, in which case the data is physically copied into the object store in Datasphere.

Since I want live access, I install it as a Remote Table:

and build a Graphical view of type Fact on top:

Forecast calculation

With my Data Products set up and Sales actual data are available in Databricks, I create a Notebook to calculate the Sales Forecast.

The approach combines Sales and Weather data to train a Linear Regression model. I import the Weather data *https://zenodo.org/records/4770937 from an external server directly into Databricks, select the relevant features from the weather dataset, and combine them with the Sales actual data:

* Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface
air temperature and precipitation series for the European Climate Assessment.
Int. J. of Climatol., 22, 1441-1453.
Data and metadata available at http://www.ecad.eu

Using the “sklearn” library, I build and train a Linear regression model:

Once trained, the model predicts the Sales forecast for Rome in June 2026 based on the weather forecast, and I save the results to my Catalog table:

Seamless planning data model

Seamless planning concept is built around physically storing planning data and public dimensions directly in Datasphere, keeping them alongside the actual data.

Since the QRC4 2025 SAC release, it has also been possible to use live versions and bring reference data into planning models without replication.

In this scenario, I build a seamless planning model on top of the Graphical view I created over the Remote table. This lets me use the forecast generated in Databricks as a reference for the final SAC Forecast version.

 

The model setup follows these steps:

Create a new model:

Start with data:

Select Datasphere as the data storage:

From there, I define the model structure and can review the data in the preview.

For a deeper dive into Seamless Planning, I recommend this biX blog.

Process Flow automation

Multi-action triggers Datasphere task chain

The final step is automating the entire forecast generation by using SAC Multi-actions and a Task-Chain in Datasphere – so that my user can trigger the calculation with a single button click from an SAC Story.

The model setup follows these steps:

Create a new model:

Triggering Task Chains from Multi-actions is a recent release. This blog post walks through how to set it up.

For details on how to trigger a Databricks Notebook from Datasphere, I recommend referring to this blog.

With everything in place, I create a Story, add my Seamless planning Model, and attach the Multi-action:

Running the Multi-action triggers the Task Chain, which in turn triggers the Databricks Notebook.

I can monitor the execution details in Datasphere:

and in Databricks:

Once the calculation completes, the updated forecast appears in the Story:

The end-to-end calculation took 2 minutes 45 seconds in total. The Task Chain in Datasphere is triggered almost instantly by the Multi-action, the Databricks Notebook execution itself took 1 minute 29 seconds, with the remaining time spent on Serverless Cluster startup.   

 

From here, I can copy the calculated forecast into a new private version:

adjust the numbers as needed, and publish it as a new public version to Datasphere:

Conclusion

With SAP Business Data Cloud, it is possible to build a forecasting workflow that feels seamless to the end user — even though it spans multiple systems under the hood.

Companies using BW as the main Data Warehouse and Databricks for ML calculations or Data Science tasks can benefit from using the platform, as the data no longer needs to be physically copied out of BW.

What this scenario demonstrates is that once wrapped as a Data Product, BW sales data can be shared with Databricks via the Delta Share protocol. Databricks, in turn, can then create its own Data Products on top of the calculation results and share them back with Datasphere as a Remote Table.

A Seamless Planning model in SAC sits on top of that Remote Table, giving planners live access to the generated forecast. A single Multi-action in an SAC Story ties it all together, triggering a Datasphere Task Chain that kicks off the Databricks Notebook — completing the full cycle in under three minutes.

As SAP Business Data Cloud continues to mature, scenarios like this one are becoming achievable – leaving the complexity in the architecture and not in the workflow.

Contact

Ilya Kirzner
Consultant
biX Consulting
Privacy overview

This website uses cookies so that we can provide you with the best possible user experience. Cookie information is stored in your browser and performs functions such as recognizing you when you return to our website and helps our team to understand which sections of the website are most interesting and useful to you.