Databricks and planning or simple inputs

Juli 2025

Motivation

Databricks has become an important data platform for companies. This platform has been further strengthened by the announcement of SAP's collaboration with Databricks  : 

Thanks to the strong AI possibilities in Databricks, you can, for example, have larger data automatically suggested for planning purposes.  

But what do you do when you realize that some data is missing – data that you can’t pull from an existing source or that needs to be entered or corrected manually? What options does Databricks offer for this? 

In this blog series, we want to explore exactly these options for planning in Databricks and highlight what is possible. 

In this post, we will start with the question: what exactly is planning and what are the technical requirements for collecting and managing numerical data? Possible use cases range from basic data entries to sophisticated planning applications. 

Not every use case needs all of these requirements. That’s why this initial analysis can help assess the specific needs of your own use case. 

In the following blog posts, we'll share how we were able to implement some of these requirements in Databricks already 

What is planning?

If we want to investigate whether planning is possible with Databricks, we first need to define what planning actually means and what kind of requirements the tool must meet to support it. We’ll soon see that there isn’t a single, universal planning process, and that technical needs vary depending on the business context. We will then take a closer look at specific aspects and present our findings in the upcoming blog posts. 

Here are a few examples of features that may be required depending on the planning scenario.  

Simple data entry

Let's start with data entry, even if this is not technically planning, but rather resembles an “input screen”. Capturing texts and figures that do not come from an existing source already represents a first simple variantn of planning. This can, for example, be used for basic control logic during data loading. 

Combining display and entry columns

The entry of planned figures alongside existing actual figures comes closer to the concept of planning, for example when sales figures for various products from the previous year are available and figures for the upcoming year need to be planned. In this case, existing data must be displayed in a table, and planned values must be entered in an additional column. It must be ensured that the planner enters values only in the planning column and cannot modify the others. In the example in Figure 1, the cells to be changed are visually highlighted and clearly distinguishable.

Figure 1: Example of a simple budget planning of products using SAC with the following functions 

1: Copy of actual data
2: Entry on a hierarchy note and splash down to single products
3: Adding of a new line not existing in the actual data

Distribution of plan figures when entering at an aggregated level

In large planning scenarios, values are often required at a very detailed level (e.g. product, month), but the actual data entry takes place at a much higher level (e.g. year and product group). An initial distribution should then be carried out according to certain rules — for example, based on existing actual data or, if that is unavailable, distributed evenly. This initial distribution can then be adjusted by a planner (see Figure 1 – point 1). 

In addition to basic distribution functionality, the question quickly arises as to what should be entered manually and what should be suggested by AI. However, even AI-generated suggestions should still be reviewed by a human, as the AI may not account for all external factors (e.g. planned product launches, new store openings, etc.). A distribution function like this is also required when revising AI-generated data. 

Master data in planning and new lines

If new combinations or rows are to be added during planning, it is essential for consistent planning that only valid master data is used. Therefore, it must be possible to validate entries against master data tables when adding new rows. These tables can also serve as input assistance, helping to avoid typos that could otherwise negatively impact the quality of the planning. 

These master data tables can also contain additional attributes that enable grouping or summarization in reporting and planning — for example, product groups as attributes of the products. 

In some cases, it may be necessary to create new master data during the planning process. In such cases, it must be ensured that these entries are created consistently and can later be correctly linked to actual data. 

Audit-characteristics and undo function

It is often required in planning to trace who changed which figures, when, and how. For this purpose, all changes must be stored along with the user ID and a timestamp. This requires that all changes are saved as deltas only (see Table 1). 

When data is stored this way, it becomes possible to implement a functionality for reverting data changes. 

Tabelle 1: Delta entry with timestamp 

Parallel planning / entries by several users 

In larger planning scenarios, multiple users will likely want to enter data at the same time. To prevent them from overwriting each other’s entries, you can apply the following basic strategies (see Figure 2): 

  • No check at all, save everything 
    If data is rarely changed or only by one user, no restrictions are usually necessary. However, the more frequently multiple users enter data, the greater the risk that they will overwrite each other's values. If all displayed data is always saved regardless of changes, User 1 may unintentionally overwrite values changed by User 2 with outdated values that they themselves haven't touched. The longer a user works on data entry, the higher this risk becomes. 
  • Only save changed data, otherwise no check 
    To reduce the risk of accidental overwriting, it helps to store only the modified data, optionally with a timestamp. This allows multiple users to make changes to different values simultaneously. It also enables change tracking (see Figure 2). In this case, the last person to make a change "wins". 
  • Warning if numbers are changed simultaneously 
    A change is considered simultaneous if it occurs between loading the values into a user's cache and saving them (see Figure 3). A decision must then be made on how to proceed. Otherwise, if only the delta is saved, the resulting value might no longer reflect the input from a single user but rather the sum of all concurrent changes. 
  • Locking 
    The most secure, but also the most complex approach is to lock values as soon as a user opens a specific area for planning. Other users then receive a warning that the planning data is currently locked by User XXX, and changes are not permitted. However, modern applications are increasingly moving away from this classic locking concept. Locks increase implementation effort and are sometimes not properly released when an application is closed incorrectly. This often leads to faulty lock messages that disrupt the planning process. In older SAP BW planning (SAP BPC), planning is based on such a locking concept. In newer SAC-based planning, SAP has abandoned the use of locks. Microsoft, too, no longer locks documents exclusively on a central drive when changes are made. 

Figure 2: Parallel planning - User 1 and User 2 can change and save different values at the same time without a lock. Both changes are taken into account. 

Figure 3: Parallel planning - User 1 and User 2 want to change the same value at the same time. Here it must be clarified how the conflict is resolved. 

Administration and Control of planning

Planning processes require oversight. For recurring planning cycles, such as forecasts, an administrator must define the starting month for the current planning round. All layouts should then automatically adjust accordingly. 

It must also be possible to open and close a planning phase. After the planning period ends, figures should no longer be editable — or, after a review, data may be locked for specific countries, while others might still require updates. 

Versioning

Different versions may be necessary during planning. For example, budget planning might begin with an initial draft created by the respective departments. The overall result is then reviewed, and certain areas may need to revise their input. Alternatively, both an optimistic and a pessimistic scenario can be created. All of this must be managed through separate versions. This requires support through a robust versioning concept.

Functions for initialization / planning preparation

Planning often does not start from scratch but is initially created based on actual data with a simple percentage-based adjustment. Various functions — such as copy functions — are required for this planning preparation (see Figure 1 – point 2). This process may also be further enhanced by using an AI tool that goes beyond basic rule-based adjustments.

Overview of functionalities

Here is an overview of all the functionalities presented: geben wir über alle vorgestellten Funktionalitäten einen Überblick: 

table 2: Overview of functionalities

Next steps

As described, a wide range of requirements is involved in planning — although not every planning use case needs all of them. 

We will therefore analyze the requirements individually and evaluate what can be implemented in Databricks and how much effort is involved. We will share our findings in the upcoming blog posts.  

We are just as curious as you are to see how far we can go with Databricks to meet these requirements. 

Data Products Setup

I’ll start with Data Products setup. If you’re new to the concept, this recent video is a great starting point, but here’s a short summary. A data product is a well-described, easily discoverable, and consumable collection of data sets.

Creating a Data Product in Datasphere

Note that in this article I create Data Products in the Data Sharing Cockpit in Datasphere. This functionality is expected to move into the Data Product Studio, but that had not taken place at the time writing.

Before creating a Data Product in Datasphere, I need to set up a Data Provider profile, collecting descriptive metadata like contact and address details, industry, regional coverage, and importantly define Data Product Visibility. Enabling Formations allows me to share the Data Product with systems across your BDC Formation – Databricks, in this case.

With the Data Provider set up, I can go ahead and create a Data Product. As with the Data Provider, I’ll need to add metadata about the product and define its artifacts – the datasets it contains. Only datasets from a space of SAP HANA Data Lake Files type can be selected. Since this Data Product is visible across the Formation, it is available free of charge.

For this demo, the artifact is a local table containing ten years of Ice Cream sales data. Since this is a File type space, importing a CSV file directly to create a local table isn’t an option (see documentation).

I used a Replication Flow to perform an initial load from a BW aDSO table into a local table.

Once Data Product is created and listed, it becomes available in the Catalog & Marketplace, from where it can be shared with Databricks by selecting the appropriate connection details.

Jump into Databricks

To use the shared object In Databricks, I need to mount it to the Catalog – either by creating a new Catalog or using an existing one.

Databricks appends a version number to the end of the schema – ‘:v1’ – to maintain versioning in case of any future changes to the Data Product.

Once the share is mounted, the schema is created automatically, and the Sales actual data table becomes available within it. From there, I can access the shared table directly in a Notebook.

Creating a Data Product in Databricks

To create a Data Product in Databricks, I first need to create a Share – which I can either do via the Delta Sharing settings in the Catalog:

Or directly out of the table which is going to become a part of the Share:

Since a single Share can contain multiple tables, I have the option to either add the table to an existing Share, or create a new one:

To publish the Share as a Data Product, I run a Python script where I define the target table for the forecast and describe the Share in CSN notation, setting the Primary Keys. Primary Keys are required for installing Data Products in Datasphere.

Jump back into Datasphere

Once the Databricks Data Product is available in Datasphere, I install it into a Space configured as a HANA Database space – since my intention is to build a view on top of the table and use it for planning in SAC.

There are two installation options: as a Remote table for live data access, or as a Replication Flow, in which case the data is physically copied into the object store in Datasphere.

Since I want live access, I install it as a Remote Table:

and build a Graphical view of type Fact on top:

Forecast calculation

With my Data Products set up and Sales actual data are available in Databricks, I create a Notebook to calculate the Sales Forecast.

The approach combines Sales and Weather data to train a Linear Regression model. I import the Weather data *https://zenodo.org/records/4770937 from an external server directly into Databricks, select the relevant features from the weather dataset, and combine them with the Sales actual data:

* Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface
air temperature and precipitation series for the European Climate Assessment.
Int. J. of Climatol., 22, 1441-1453.
Data and metadata available at http://www.ecad.eu

Using the “sklearn” library, I build and train a Linear regression model:

Once trained, the model predicts the Sales forecast for Rome in June 2026 based on the weather forecast, and I save the results to my Catalog table:

Seamless planning data model

Seamless planning concept is built around physically storing planning data and public dimensions directly in Datasphere, keeping them alongside the actual data.

Since the QRC4 2025 SAC release, it has also been possible to use live versions and bring reference data into planning models without replication.

In this scenario, I build a seamless planning model on top of the Graphical view I created over the Remote table. This lets me use the forecast generated in Databricks as a reference for the final SAC Forecast version.

 

The model setup follows these steps:

Create a new model:

Start with data:

Select Datasphere as the data storage:

From there, I define the model structure and can review the data in the preview.

For a deeper dive into Seamless Planning, I recommend this biX blog.

Process Flow automation

Multi-action triggers Datasphere task chain

The final step is automating the entire forecast generation by using SAC Multi-actions and a Task-Chain in Datasphere – so that my user can trigger the calculation with a single button click from an SAC Story.

The model setup follows these steps:

Create a new model:

Triggering Task Chains from Multi-actions is a recent release. This blog post walks through how to set it up.

For details on how to trigger a Databricks Notebook from Datasphere, I recommend referring to this blog.

With everything in place, I create a Story, add my Seamless planning Model, and attach the Multi-action:

Running the Multi-action triggers the Task Chain, which in turn triggers the Databricks Notebook.

I can monitor the execution details in Datasphere:

and in Databricks:

Once the calculation completes, the updated forecast appears in the Story:

The end-to-end calculation took 2 minutes 45 seconds in total. The Task Chain in Datasphere is triggered almost instantly by the Multi-action, the Databricks Notebook execution itself took 1 minute 29 seconds, with the remaining time spent on Serverless Cluster startup.   

 

From here, I can copy the calculated forecast into a new private version:

adjust the numbers as needed, and publish it as a new public version to Datasphere:

Conclusion

With SAP Business Data Cloud, it is possible to build a forecasting workflow that feels seamless to the end user — even though it spans multiple systems under the hood.

Companies using BW as the main Data Warehouse and Databricks for ML calculations or Data Science tasks can benefit from using the platform, as the data no longer needs to be physically copied out of BW.

What this scenario demonstrates is that once wrapped as a Data Product, BW sales data can be shared with Databricks via the Delta Share protocol. Databricks, in turn, can then create its own Data Products on top of the calculation results and share them back with Datasphere as a Remote Table.

A Seamless Planning model in SAC sits on top of that Remote Table, giving planners live access to the generated forecast. A single Multi-action in an SAC Story ties it all together, triggering a Datasphere Task Chain that kicks off the Databricks Notebook — completing the full cycle in under three minutes.

As SAP Business Data Cloud continues to mature, scenarios like this one are becoming achievable – leaving the complexity in the architecture and not in the workflow.

Contact

Ilya Kirzner
Consultant
biX Consulting
Privacy overview

This website uses cookies so that we can provide you with the best possible user experience. Cookie information is stored in your browser and performs functions such as recognizing you when you return to our website and helps our team to understand which sections of the website are most interesting and useful to you.