In today’s fast-paced world of software development, integrating artificial intelligence into applications is no longer just a trend—it’s a necessity. At the heart of this revolution is Generative AI, a type of artificial intelligence that can create new content, such as text, images, and code, in response to prompts. It’s fundamentally changing how we interact with technology and build software solutions. For the millions of developers who rely on the Spring Framework, the good news is that you don’t need to be an AI expert to get started. The Spring AI project provides a robust, idiomatic, and simplified approach to bringing these capabilities directly into your Java applications.
This article will guide you through the process of adding Spring AI to your project, explore the core AI patterns it supports, and outline the key technologies you can integrate to build powerful, intelligent applications.
1. Adding Spring AI to Your Project
The first step is to configure your build file to include the necessary dependencies. Spring AI follows the familiar Spring Boot conventions, providing starter dependencies that handle the heavy lifting of auto-configuration.
Gradle Project
For Gradle, you’ll first need to add the Spring AI Bill of Materials (BOM) to your dependencies
block. The BOM ensures that all Spring AI-related dependencies use compatible versions. You can then add the specific AI model and other dependencies you need.
dependencies {
// Spring AI BOM for consistent versions
implementation platform("org.springframework.ai:spring-ai-bom:0.8.1")
// Starter for OpenAI (or other LLMs)
implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
// Optional: for a vector database like Pinecone
implementation 'org.springframework.ai:spring-ai-pinecone-store-spring-boot-starter'
// Other Spring Boot dependencies
implementation 'org.springframework.boot:spring-boot-starter-web'
// ...
}
Maven Project
For a Maven project, the process is very similar. You add the Spring AI BOM to the <dependencyManagement>
section of your pom.xml
, and then include the individual starter dependencies in your <dependencies>
section.
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>0.8.1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Other dependencies -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- ... -->
</dependencies>
2. Key AI Patterns Supported by Spring AI
Spring AI is not just a simple API wrapper; it’s designed to help you implement sophisticated AI patterns in a portable and modular way.
System Prompts & Prompt Engineering
Description: Prompt Engineering is the art of crafting specific instructions and context to guide an LLM’s behavior. A System Prompt is a key part of this, acting as the foundation for the conversation by defining the LLM’s role, rules, and style. It provides constraints and instructions before the user ever provides input.
Use Case: A system prompt is invaluable for ensuring consistency. For a customer service chatbot, you could use a system prompt that says, “You are a friendly and professional customer support assistant. You must always be polite and ask for a ticket number for every new issue.” This helps the LLM maintain a specific persona and follow business rules.
Retrieval-Augmented Generation (RAG)
Description: RAG enhances an LLM’s ability to answer questions by giving it access to external, private, or real-time data sources. It overcomes the LLM’s static knowledge by retrieving relevant information from your documents and “stuffing” it into the prompt. This process, often called Prompt Stuffing, provides the LLM with the context it needs to generate a grounded, accurate response.
Use Case: A great example is a Q&A chatbot for an enterprise. The chatbot can’t answer questions about internal policies because that information wasn’t in the LLM’s training data. With RAG, you can use an embedding model to convert your company’s documents into numerical representations (vectors) and store them in a vector database. When a user asks a question, the application finds the most relevant document snippets, which are then used as context for the LLM to formulate an answer.
Function Calling / Tooling
Description: This pattern allows an LLM to dynamically call external APIs or code functions to retrieve real-time data or perform actions. The LLM acts as a reasoning engine, deciding when a tool is needed based on a user’s request. The model doesn’t execute the code itself; it simply provides a structured response indicating the function to call and the parameters to use.
Use Case: Imagine a travel booking chatbot. A user asks, “What’s the weather like in Paris?” The LLM, recognizing that it needs current information, will “request” a call to a getWeather
function, passing “Paris” as the city. Your application intercepts this request, calls a weather API, and feeds the live weather data back to the LLM. The LLM then uses this information to formulate a polite, accurate response to the user.
Output Converters
Description: LLMs often return responses as unstructured text. An Output Converter solves this by instructing the model to return a structured format (like JSON or a list) and then parsing that output into a Java object. Spring AI provides a convenient way to map the raw text to a List
, Map
, or a custom POJO.
Use Case: A common use case is generating a structured report. You could prompt the LLM to “Give me the top 5 trending topics from the past week in JSON format with a title and summary for each.” An output converter would then automatically parse this JSON string into a List
of Topic
objects, making it easy to use the data in your application.
Chat Memory
Description: By default, LLMs are stateless; they treat each new prompt as a completely new conversation. Chat Memory gives your application the ability to remember previous messages and provide conversational context. Spring AI offers different implementations, from simple in-memory storage to persistent repositories like JDBC.
Use Case: Chat memory is crucial for creating natural, multi-turn conversations. Without it, if a user asks, “What’s my name?” after telling the chatbot “Hello, my name is Alex,” the chatbot won’t know the answer. With chat memory, the previous message is included in the new prompt, allowing the LLM to recall the user’s name and provide a relevant, personalized response.
Evaluators
Description: An Evaluator is a tool used to automatically assess the quality of an LLM’s response. This is a critical pattern for building reliable and safe AI applications. Spring AI provides built-in evaluators that can check for things like relevance to the prompt or factual accuracy against a given context.
Use Case: For a RAG-based Q&A system, you can use a RelevanceEvaluator
to automatically score how well the LLM’s answer aligns with the user’s question. This allows you to set a quality threshold and, if a response falls below it, either discard it or flag it for human review, ensuring your application provides high-quality information.
3. Technology Integrations
One of the greatest strengths of Spring AI is its modularity and extensive support for a wide range of AI technologies. This allows you to easily switch providers with minimal code changes.
Large Language Models (LLMs)
Spring AI provides starters for all major LLM providers, including:
- OpenAI: The most popular choice, providing access to models like GPT-4.
- Google Gemini: Integrates with Google’s powerful family of models.
- Hugging Face: Connects to a vast ecosystem of open-source models.
- Ollama: Allows you to use a local, self-hosted LLM.
Vector Databases
Vector databases are essential for implementing the RAG pattern. Spring AI supports a number of popular solutions, providing a consistent VectorStore
API for each:
- Pinecone
- Chroma
- Milvus
- PostgreSQL with the
pgvector
extension - Elasticsearch
Embedding Models
Embedding models are responsible for converting text into numerical vectors. Spring AI offers integrations for popular providers, including:
- OpenAI
- Mistral AI
4. Model Configuration
Configuring Spring AI is a straightforward process thanks to Spring Boot’s property-based configuration. You can manage your API keys, model names, and other options in the application.properties
or application.yml
file.
OpenAI
To connect to OpenAI, you must provide your API key. You can also specify the model and other options like temperature
for creativity.
spring.ai.openai.api-key=YOUR_API_KEY
spring.ai.openai.chat.options.model=gpt-4o-mini
spring.ai.openai.chat.options.temperature=0.7
Google Gemini
For Google Gemini, you configure the project ID and location, which are used to authenticate with Google Cloud’s Vertex AI.
spring.ai.vertex.ai.gemini.project-id=YOUR_PROJECT_ID
spring.ai.vertex.ai.gemini.location=us-central1
Ollama
Since Ollama runs locally, it doesn’t require an API key. You just need to specify the model you want to use.
spring.ai.ollama.chat.options.model=llama3
Hugging Face
For Hugging Face, you provide an API key and the URL for the specific inference endpoint you want to use.
spring.ai.huggingface.chat.api-key=YOUR_API_KEY
spring.ai.huggingface.chat.url=YOUR_INFERENCE_ENDPOINT_URL
5. Synchronous vs. Streaming API
The ChatClient
in Spring AI provides two primary ways to interact with an LLM: a synchronous call()
method and a reactive stream()
method. Choosing between them depends on your application’s requirements for responsiveness and user experience.
Synchronous call()
The call()
method is a blocking operation. Your application sends a request to the LLM and waits for the entire response to be generated before it can proceed.
Use Case: This approach is suitable for single-turn requests where the response is expected to be relatively short, such as a summary, a classification, or a joke. It’s simple to implement and doesn’t require a reactive programming model.
Example Code:
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@PostMapping("/chat/call")
public String chatWithCall(@RequestParam String message) {
// The call() method blocks until the full response is received.
ChatResponse response = chatClient.prompt().user(message).call().chatResponse();
return response.getResult().getOutput().getContent();
}
}
Streaming stream()
The stream()
method provides a non-blocking, reactive approach. The LLM’s response is sent back as a continuous stream of tokens, and your application can process these tokens as they arrive. This is handled using Spring’s reactive framework, Project Reactor, which returns a Flux
.
Use Case: This is ideal for building real-time, interactive applications like chatbots or content generators where you want to provide a “typewriter” effect to the user, showing the response as it’s being generated. It significantly improves the perceived responsiveness of your application for longer responses.
Example Code:
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;
@RestController
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@PostMapping(value = "/chat/stream", produces = "text/event-stream")
public Flux<String> chatWithStream(@RequestParam String message) {
// The stream() method returns a Flux, emitting tokens as they are generated.
return chatClient.prompt().user(message).stream().content();
}
}
6. AI Model Evaluation and Testing
Building a reliable AI application requires more than just integrating with a model; it requires a strategy for validating its outputs. AI Model Evaluation and Testing is a critical part of the development lifecycle, especially for preventing issues like hallucinations (where the model generates false information) or irrelevant responses.
The Role of Evaluators
Spring AI provides a core Evaluator
interface and several built-in implementations to help you test and validate your AI-generated content. These evaluators use a separate AI model to act as a judge, assessing the quality of your primary model’s output. This is a common and effective approach because an LLM can be an excellent tool for judging the output of another.
Key Evaluators in Spring AI
RelevanceEvaluator
: This evaluator checks how well an AI-generated response aligns with the original user prompt. It assesses the semantic similarity to ensure the answer is on-topic and helpful.FactCheckingEvaluator
: This evaluator is designed to combat hallucinations. It compares a specific claim made by the AI against a provided context (e.g., a document from a RAG pipeline) to verify factual accuracy.
Testing with Evaluators
You can integrate these evaluators directly into your JUnit tests to create a robust CI/CD pipeline for your AI features. For example, you can write a test that sends a prompt to your application, receives the response, and then uses a RelevanceEvaluator
to assert that the response meets a certain quality score.
Example Test Snippet:
import org.junit.jupiter.api.Test;
import org.springframework.ai.evaluation.RelevanceEvaluator;
import org.springframework.ai.evaluation.RelevanceEvaluator.RelevanceEvaluationOptions;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import static org.assertj.core.api.Assertions.assertThat;
@SpringBootTest
public class ChatControllerTests {
@Autowired
private ChatController chatController;
@Autowired
private RelevanceEvaluator relevanceEvaluator;
@Test
void testChatResponseRelevance() {
String prompt = "What are the key features of Spring AI?";
String response = chatController.chatWithCall(prompt);
// Evaluate the relevance of the response to the original prompt
var evaluation = relevanceEvaluator.evaluate(
response,
new RelevanceEvaluationOptions(prompt)
);
// Assert that the relevance score is above a certain threshold (e.g., 0.8)
assertThat(evaluation.getScore()).isGreaterThan(0.8);
}
}
7. Implementing Tool Calling with Custom Functions
Tool Calling is one of the most powerful features of Spring AI, allowing you to seamlessly connect an LLM to your own business logic. Spring AI makes this process incredibly simple by using standard annotations on Plain Old Java Objects (POJOs).
Step 1: Create a Tool Service
First, create a simple Spring component that contains the methods you want the LLM to be able to call. You can use a standard Function
interface or a class with methods.
Example Code:
import org.springframework.ai.model.tool.ToolFunction;
import org.springframework.stereotype.Component;
import java.util.function.Function;
@Component
public class WeatherService {
/**
* Get the current weather for a given city.
* @param request a request to get the current weather
* @return a response containing the weather data
*/
@ToolFunction(name = "getWeather",
description = "Get the current weather for a given city")
public Function<WeatherRequest, WeatherResponse> getWeatherFunction() {
return request -> {
// In a real application, you would call a weather API here.
// For this example, we'll return mock data.
System.out.println("Calling the weather service for city: " + request.city());
return new WeatherResponse(request.city(), 25.0, "Sunny");
};
}
public record WeatherRequest(String city) {}
public record WeatherResponse(String city, double temperature, String conditions) {}
}
The key is the @ToolFunction
annotation. It tells Spring AI to expose this function to the LLM, and the description
is crucial because the LLM uses it to understand when to call the function.
Step 2: Register the Tool with the ChatClient
Next, you need to tell your ChatClient
about the tools it has access to. You can do this by passing a list of ToolFunction
objects to the tools()
method on the ChatClient.Builder
.
Example Code:
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.model.tool.ToolFunctions;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class ChatController {
private final ChatClient chatClient;
private final WeatherService weatherService;
public ChatController(ChatClient.Builder builder, WeatherService weatherService) {
this.weatherService = weatherService;
this.chatClient = builder
.defaultAdvisors(new ToolFunctions(weatherService))
.build();
}
@PostMapping("/chat/tool-calling")
public String chatWithToolCalling(@RequestParam String message) {
// When the user asks about the weather, the LLM will call our getWeather function
ChatResponse response = chatClient.prompt().user(message).call().chatResponse();
return response.getResult().getOutput().getContent();
}
}
When a user’s prompt (e.g., “What is the weather like in New York?”) matches the description of the getWeather
tool, the LLM will execute the function and use the result to formulate a response. Spring AI handles the entire orchestration—from the LLM’s request to the function’s execution and the final response generation.
8. RAG Implementation: A Deeper Dive
While RAG is a powerful concept, its implementation involves a detailed, multi-step pipeline. Spring AI provides the necessary abstractions to manage each step seamlessly. The process is broken down into two main phases: Data Ingestion and Query & Retrieval.
Data Ingestion Pipeline
The first step is to get your unstructured data (documents, PDFs, etc.) into a format that a vector database can understand. This process is a classic ETL (Extract, Transform, Load) pipeline.
- Extract: A
DocumentReader
extracts content from a data source. Spring AI includes readers for common formats like Markdown, PDFs, and web pages. - Transform: A
TextSplitter
breaks down large documents into smaller, semantically meaningful chunks. This is crucial because LLMs have a limited context window. - Embed: An
EmbeddingClient
converts these text chunks into numerical vectors. This process captures the semantic meaning of the text, allowing for a similarity search later. - Load: The
VectorStore
then stores these vectors, ready for retrieval.
Example Ingestion Code:
import org.springframework.ai.document.Document;
import org.springframework.ai.embedding.EmbeddingClient;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.SimpleVectorStore;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import java.util.List;
@Configuration
public class VectorStoreConfig {
@Bean
public VectorStore vectorStore(EmbeddingClient embeddingClient,
@Value("classpath:/docs/my-policy-manual.pdf") Resource pdfResource) {
// Use a simple in-memory vector store for this example
SimpleVectorStore vectorStore = new SimpleVectorStore(embeddingClient);
// Extract text from the PDF
PagePdfDocumentReader pdfReader = new PagePdfDocumentReader(pdfResource);
// Split the documents into manageable chunks
TokenTextSplitter textSplitter = new TokenTextSplitter();
List<Document> documents = textSplitter.split(pdfReader.read());
// Add the documents to the vector store
vectorStore.add(documents);
return vectorStore;
}
}
Query & Retrieval Pipeline
Once your data is in the vector store, you can use it to answer user questions.
- Retrieve: When a user submits a query, the application uses an
Advisor
to first perform a similarity search on theVectorStore
to find the most relevant documents. - Augment: The retrieved documents are then “stuffed” into the user’s prompt, providing the LLM with the specific context it needs to generate a grounded response.
- Generate: The augmented prompt is sent to the LLM, which uses the provided context to answer the user’s question.
Example Retrieval Code:
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class RAGController {
private final ChatClient chatClient;
public RAGController(ChatClient.Builder builder, VectorStore vectorStore) {
this.chatClient = builder
// The advisor automatically retrieves documents and adds them to the prompt
.defaultAdvisors(new QuestionAnswerAdvisor(vectorStore))
.build();
}
@PostMapping("/chat/rag")
public String chatWithRAG(@RequestParam String message) {
// The user's message is passed, and the advisor handles the RAG process
return chatClient.prompt()
.user(message)
.call()
.content();
}
}
Discover more from GhostProgrammer - Jeff Miller
Subscribe to get the latest posts sent to your email.