December 18, 2025 at 12:20 PM | Est. read time: 14 min

Community manager and producer of specialized marketing content
If your SAP HANA data is rich but your reporting cycles are slow, this guide is for you. By combining LangChain (for LLM-driven automation) with SAP HANA (for fast analytics), you can turn natural-language questions into governed SQL, generate charts and documents, and schedule delivery—without manual SQL writing or spreadsheet wrangling.
In this post, you’ll learn how to:
- Connect LangChain to SAP HANA securely
- Use text-to-SQL with guardrails for reliable, read-only queries
- Automate report creation (tables, charts, PDFs, emails)
- Add RAG to ground answers in your official data definitions
- Schedule jobs and monitor results in production
For broader context on where this fits in the data stack, see the strategic role of data engineering in modern organizations: The Role of Data Engineering in Modern Business. And if you plan to use agents (multiple tools working together under an LLM), this practical primer will help: LangChain Agents for Automation and Data Analysis.
Why LangChain + SAP HANA for Automated Reporting?
- Natural-language reporting: Stakeholders can ask “What were our top 10 SKUs by margin last quarter in EMEA?” and get a formatted report.
- Governed access: Grant the LLM read-only access to curated views and enforce row-level security.
- Speed and consistency: Eliminate copy-paste steps and reduce “multiple versions of the truth.”
- Scale: Schedule recurring reports (daily/weekly/monthly) and route results to email, Teams/Slack, or BI tools.
Reference Architecture (What You’ll Build)
- Input layer
- Prompt sources: Slack/Teams bot, web app, scheduled jobs.
- LLM planning
- LangChain text-to-SQL on a pre-approved subset of HANA tables/views.
- Optional RAG grounding with your business glossary and metric definitions.
- Data access
- SQLAlchemy connection via SAP HANA client (hdbcli).
- Read-only user, statement timeouts, safe-guards (no DDL/DML).
- Post-processing
- DataFrames → tables and charts.
- Summaries and narratives (LLM with strict context).
- Output delivery
- HTML/PDF/Excel → email, Teams/Slack, object storage, or BI.
- Orchestration & monitoring
- Cron/Airflow/Cloud scheduler, centralized logs, alerts.
Want to level up the grounding step? RAG is the go-to method to keep LLMs factual and aligned with your definitions. Here’s a concise deep dive: Mastering Retrieval Augmented Generation.
Prerequisites
- SAP HANA (on-prem or cloud) with a technical read-only user
- Network access from your app to SAP HANA
- Python 3.10+ recommended
- API access to your chosen LLM (OpenAI, Azure OpenAI, or an on-prem model)
Python packages:
- langchain, langchain-openai (or another LLM integration)
- sqlalchemy, sqlalchemy-hana, hdbcli
- pandas, matplotlib/plotly, jinja2 (optional for templating), weasyprint/reportlab (optional for PDFs)
Step-by-Step Setup
1) Prepare SAP HANA for AI-driven Reporting
- Create a dedicated read-only user and role:
- No DDL/DML privileges (select-only).
- Set statement timeouts to prevent runaway queries.
- Expose curated Calculation Views or SQL views that encode the business logic:
- Prefer views to raw tables to stabilize schemas and enforce row-level security.
- Use synonyms or a reporting schema to simplify object names and reduce confusion.
2) Configure the Connection (SQLAlchemy + hdbcli)
Install dependencies:
- pip install langchain langchain-openai sqlalchemy sqlalchemy-hana hdbcli pandas
Sample connection string (adjust to your environment):
`python
from sqlalchemy import create_engine
engine = create_engine(
"hana+hdbcli://USER:PASSWORD@HOST:30015", # or HANA Cloud port
connect_args={"sslValidateCertificate": True}, # align with your security policy
pool_pre_ping=True
)
`
3) Register HANA with LangChain and Restrict Scope
Keep the LLM’s world small. Only include approved views to minimize hallucinations and SQL errors.
`python
from langchain_community.utilities import SQLDatabase
ALLOWED_OBJECTS = ["VW_SALES_ORDERS", "VW_INVENTORY_BY_DAY", "VW_CUSTOMERS"]
db = SQLDatabase(
engine=engine,
include_tables=ALLOWED_OBJECTS,
sample_rows_in_table_info=3 # give the LLM a feel for column patterns
)
`
4) Create a Text-to-SQL Chain with Guardrails
`python
from langchain_openai import ChatOpenAI
from langchain.chains import create_sql_query_chain
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
The chain that writes SQL
sql_gen = create_sql_query_chain(llm, db)
The tool that executes SQL (read-only)
sql_exec = QuerySQLDataBaseTool(db=db)
def answer_question(question: str):
1) Generate SQL
sql = sql_gen.invoke({"question": question})
Safety checks
if any(x in sql.upper() for x in ["INSERT", "UPDATE", "DELETE", "DROP", "ALTER"]):
raise ValueError("Unsafe SQL detected.")
2) Execute SQL
rows = sql_exec.run(sql)
return sql, rows
`
Add a “query checker” pass by prompting the LLM to validate its own SQL (“Does this query only read from allowed views?”) and reject anything risky. You can also add an allowlist filter—if the SQL references objects not in ALLOWED_OBJECTS, block it.
5) Make the LLM Fluent in SAP HANA SQL
Few-shot examples help the model learn HANA-specific patterns, such as date formatting and functions:
- Current month start: TRUNC(CURRENT_DATE, 'MM') (or YEAR/MONTH extraction with EXTRACT)
- Last 3 months: ADD_MONTHS(CURRENT_DATE, -3)
- Formatting year-month: TO_VARCHAR(date_col, 'YYYY-MM')
Add a system prompt or few-shot examples like:
`
Use only the following views: VW_SALES_ORDERS, VW_INVENTORY_BY_DAY, VW_CUSTOMERS.
Prefer ISO date filters. For month grouping use TO_VARCHAR(date_col, 'YYYY-MM').
Never use SELECT *; always select named columns. Limit the result to 100 rows unless asked otherwise.
`
6) Post-process Results Into Reports
`python
import pandas as pd
import matplotlib.pyplot as plt
from io import BytesIO
def to_dataframe(rows):
return pd.DataFrame(rows)
def chart_revenue_by_month(df):
ax = df.plot(x="MONTH", y="REVENUE", kind="bar", legend=False, title="Revenue by Month")
fig = ax.get_figure()
buf = BytesIO()
fig.savefig(buf, format="png", bbox_inches="tight")
buf.seek(0)
return buf # send to email/Slack or embed in HTML
def to_html(df):
return df.to_html(index=False, justify="center")
`
You can generate PDFs via WeasyPrint or ReportLab, Excel via pandas to_excel, and embed charts/images into emails or dashboards. For distribution, wire this into your preferred notification channels.
7) Add RAG Grounding (Optional but Powerful)
To align answers with your official metrics and glossary:
- Store your business definitions in a vector store (or SAP HANA Vector Engine if available in your environment).
- Retrieve the most relevant definitions for each user question.
- Prepend those snippets to the LLM’s system/context prompt, ensuring SQL follows the documented definitions.
Deepen your RAG practice here: Mastering Retrieval Augmented Generation.
8) Schedule and Orchestrate
Start simple with cron or a managed cloud scheduler. For complex pipelines (multi-step reports, retries, dependencies), consider a workflow orchestrator such as Apache Airflow or a durable execution engine. Log every run (prompt, generated SQL, row counts, runtime). Set alerts for failures, slow queries, or empty results where data is expected.
Example Use Cases (With Sample Prompts)
1) Executive Sales Summary (Monthly)
- Prompt: “Show revenue and gross margin by month for 2024. Include a trend chart and a one-paragraph insight.”
- Output: A PDF with a table, bar chart, and narrative (“Revenue grew 8% MoM in Q2, led by EMEA…”).
2) Inventory Turnover and Stockouts
- Prompt: “Which SKUs had the highest stockout days last quarter? Include average days out-of-stock and affected locations.”
- Output: Table ranked by stockout days, optional heatmap chart by location.
3) Customer Health
- Prompt: “List top 20 customers by YoY revenue growth, with their region and primary sales rep. Exclude customers with fewer than 5 orders.”
- Output: Table + optional waterfall chart.
Reliability and Safety Best Practices
- Read-only roles: Strictly enforce select-only permissions for the reporting user.
- Schema whitelisting: Only expose curated views to the LLM.
- Query timeouts and LIMITs: Guard against long-running scans.
- Few-shots and rules: Teach HANA SQL idioms (date math, grouping) and formatting.
- Golden test set: Maintain a list of common business questions with expected SQL/outputs. Run regression tests on every prompt-template change.
- Transparent logs: Keep prompts, generated SQL, and results for traceability.
Performance Tips for SAP HANA
- Favor Calculation Views: Encapsulate business logic and improve stability for the LLM.
- Pre-aggregate heavy facts: Summaries by month/region/material speed up routine reports.
- Use SELECT column lists: Avoid SELECT * and unnecessary joins.
- Push filters down: Encourage date and region filters in the WHERE clause.
- Caching: Cache frequent queries when freshness allows, and invalidate on data updates.
A 30–60–90 Day Roadmap
- 0–30 days: Connect to HANA, whitelist 3–5 core views, implement read-only SQL generation, ship first automated report with basic charts.
- 31–60 days: Add RAG grounding with a business glossary, create golden test questions, wire scheduling and notifications, add error alerts.
- 61–90 days: Expand coverage (finance, supply chain, CX), add multi-lingual prompts, harden governance (data lineage, change control), and monitor adoption.
Common Pitfalls (And How to Avoid Them)
- Ambiguous column names: Normalize names (e.g., REVENUE_NET_USD) in views to reduce LLM confusion.
- Overexposed schemas: Don’t hand the LLM your entire HANA catalog. Curate.
- Non-deterministic prompts: Lock down your system prompt and few-shots. Version them like code.
- “Hallucinated” joins: Provide entity relationships in the prompt or encode them in the views.
- Slow-first runs: Warm up caches on schedules or precompute heavy aggregates.
Where This Fits in the Bigger Picture
Automated reporting with LLMs works best when your data foundations are solid. If you’re still building those foundations—or want to understand the broader operating model—this overview is helpful: The Role of Data Engineering in Modern Business. If your use case requires the LLM to orchestrate multiple tools (SQL, file generation, email, Slack), agents can unlock serious automation—learn how here: LangChain Agents for Automation and Data Analysis.
FAQ: LangChain + SAP HANA Reporting Automation
1) Can I use an on-prem model instead of a cloud LLM?
Yes. LangChain supports multiple providers, including self-hosted/open models. For sensitive environments, deploy an on-prem LLM behind your firewall and connect via LangChain’s model interface. Keep prompts and outputs within your private network.
2) How do I stop the LLM from running unsafe queries?
Use a read-only database user, an allowlist of views, and a SQL “safety check” pass to block DDL/DML. Add timeouts and a “dry run” mode that returns an EXPLAIN PLAN before execution for high-cost queries.
3) What’s the difference between this and SAP Analytics Cloud (SAC)?
SAC is a full BI platform for modeling, dashboards, and governed storytelling. LangChain + HANA is complementary—it adds natural-language, agent-driven automation, and custom workflows (e.g., “generate PDF + email + Slack with charts and a narrative”). Many teams use both.
4) Does SAP HANA support vector search for RAG?
SAP HANA Cloud includes vector capabilities in many editions, enabling similarity search for embeddings. If available, you can store glossary/document embeddings in HANA and use them to ground your prompts. Alternatively, use an external vector store.
5) How do I handle multi-lingual prompts (e.g., English and Portuguese)?
Add a language-detection step and translate the user’s request into your canonical reporting language. Keep your SQL generation in one language to reduce variability. Return the final narrative in the user’s language.
6) What if the schema changes (new columns, renamed views)?
Version your prompts and few-shots. Maintain a contract layer of views (stable names, stable meanings) so underlying table changes don’t break reports. Add schema checks in CI (fail fast if a reporting view changes).
7) Can I schedule recurring reports?
Yes. Use cron, a cloud scheduler, or a workflow engine. Parameterize date ranges (“last month”) and create named jobs (e.g., “EMEA Sales Monday 8 AM”). Log and alert on failures or anomalies (such as empty result sets).
8) How do I control costs?
- Cache frequent queries and reuse embeddings.
- Batch scheduled jobs to reduce peak load.
- Use smaller, efficient models for SQL generation and reserve larger models for complex narratives.
- Limit token lengths with tight prompts and concise schema descriptions.
9) How accurate is text-to-SQL?
With a curated view layer, good few-shots, and a golden test set, accuracy is strong for common questions. For edge cases, add guardrails and fallback patterns (e.g., provide two candidate SQL queries and run the one that passes a validator).
10) What security measures should be mandatory?
- Read-only DB user, network encryption (TLS), and secret management (no hardcoded credentials).
- Row-level security in HANA views.
- Prompt and output logging (without sensitive data).
- Regular audits of executed SQL and access patterns.
By starting small with a curated set of views and a few high-impact reports, you’ll prove value quickly—then scale to broader domains with confidence. With the right safeguards and a thoughtful rollout, LangChain + SAP HANA can move your reporting from manual and reactive to automated, trustworthy, and on-demand.







