In today’s fast-paced world of software development, integrating artificial intelligence into applications is no longer just a trend—it’s a necessity. At the heart of this revolution is Generative AI, a type of artificial intelligence that can create new content, such as text, images, and code, in response to prompts. It’s fundamentally changing how we interact with technology and build software solutions. For the millions of developers who rely on the Spring Framework, the good news is that you don’t need to be an AI expert to get started. The Spring AI project provides a robust, idiomatic, and simplified approach to bringing these capabilities directly into your Java applications.

This article will guide you through the process of adding Spring AI to your project, explore the core AI patterns it supports, and outline the key technologies you can integrate to build powerful, intelligent applications.

1. Adding Spring AI to Your Project

The first step is to configure your build file to include the necessary dependencies. Spring AI follows the familiar Spring Boot conventions, providing starter dependencies that handle the heavy lifting of auto-configuration.

Gradle Project

For Gradle, you’ll first need to add the Spring AI Bill of Materials (BOM) to your dependencies block. The BOM ensures that all Spring AI-related dependencies use compatible versions. You can then add the specific AI model and other dependencies you need.

dependencies {
    // Spring AI BOM for consistent versions
    implementation platform("org.springframework.ai:spring-ai-bom:0.8.1")

    // Starter for OpenAI (or other LLMs)
    implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'

    // Optional: for a vector database like Pinecone
    implementation 'org.springframework.ai:spring-ai-pinecone-store-spring-boot-starter'

    // Other Spring Boot dependencies
    implementation 'org.springframework.boot:spring-boot-starter-web'
    // ...
}

Maven Project

For a Maven project, the process is very similar. You add the Spring AI BOM to the <dependencyManagement> section of your pom.xml, and then include the individual starter dependencies in your <dependencies> section.

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>0.8.1</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency>
    <!-- Other dependencies -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <!-- ... -->
</dependencies>

2. Key AI Patterns Supported by Spring AI

Spring AI is not just a simple API wrapper; it’s designed to help you implement sophisticated AI patterns in a portable and modular way.

System Prompts & Prompt Engineering

Description: Prompt Engineering is the art of crafting specific instructions and context to guide an LLM’s behavior. A System Prompt is a key part of this, acting as the foundation for the conversation by defining the LLM’s role, rules, and style. It provides constraints and instructions before the user ever provides input.

Use Case: A system prompt is invaluable for ensuring consistency. For a customer service chatbot, you could use a system prompt that says, “You are a friendly and professional customer support assistant. You must always be polite and ask for a ticket number for every new issue.” This helps the LLM maintain a specific persona and follow business rules.

Retrieval-Augmented Generation (RAG)

Description: RAG enhances an LLM’s ability to answer questions by giving it access to external, private, or real-time data sources. It overcomes the LLM’s static knowledge by retrieving relevant information from your documents and “stuffing” it into the prompt. This process, often called Prompt Stuffing, provides the LLM with the context it needs to generate a grounded, accurate response.

Use Case: A great example is a Q&A chatbot for an enterprise. The chatbot can’t answer questions about internal policies because that information wasn’t in the LLM’s training data. With RAG, you can use an embedding model to convert your company’s documents into numerical representations (vectors) and store them in a vector database. When a user asks a question, the application finds the most relevant document snippets, which are then used as context for the LLM to formulate an answer.

Function Calling / Tooling

Description: This pattern allows an LLM to dynamically call external APIs or code functions to retrieve real-time data or perform actions. The LLM acts as a reasoning engine, deciding when a tool is needed based on a user’s request. The model doesn’t execute the code itself; it simply provides a structured response indicating the function to call and the parameters to use.

Use Case: Imagine a travel booking chatbot. A user asks, “What’s the weather like in Paris?” The LLM, recognizing that it needs current information, will “request” a call to a getWeather function, passing “Paris” as the city. Your application intercepts this request, calls a weather API, and feeds the live weather data back to the LLM. The LLM then uses this information to formulate a polite, accurate response to the user.

Output Converters

Description: LLMs often return responses as unstructured text. An Output Converter solves this by instructing the model to return a structured format (like JSON or a list) and then parsing that output into a Java object. Spring AI provides a convenient way to map the raw text to a List, Map, or a custom POJO.

Use Case: A common use case is generating a structured report. You could prompt the LLM to “Give me the top 5 trending topics from the past week in JSON format with a title and summary for each.” An output converter would then automatically parse this JSON string into a List of Topic objects, making it easy to use the data in your application.

Chat Memory

Description: By default, LLMs are stateless; they treat each new prompt as a completely new conversation. Chat Memory gives your application the ability to remember previous messages and provide conversational context. Spring AI offers different implementations, from simple in-memory storage to persistent repositories like JDBC.

Use Case: Chat memory is crucial for creating natural, multi-turn conversations. Without it, if a user asks, “What’s my name?” after telling the chatbot “Hello, my name is Alex,” the chatbot won’t know the answer. With chat memory, the previous message is included in the new prompt, allowing the LLM to recall the user’s name and provide a relevant, personalized response.

Evaluators

Description: An Evaluator is a tool used to automatically assess the quality of an LLM’s response. This is a critical pattern for building reliable and safe AI applications. Spring AI provides built-in evaluators that can check for things like relevance to the prompt or factual accuracy against a given context.

Use Case: For a RAG-based Q&A system, you can use a RelevanceEvaluator to automatically score how well the LLM’s answer aligns with the user’s question. This allows you to set a quality threshold and, if a response falls below it, either discard it or flag it for human review, ensuring your application provides high-quality information.

3. Technology Integrations

One of the greatest strengths of Spring AI is its modularity and extensive support for a wide range of AI technologies. This allows you to easily switch providers with minimal code changes.

Large Language Models (LLMs)

Spring AI provides starters for all major LLM providers, including:

  • OpenAI: The most popular choice, providing access to models like GPT-4.
  • Google Gemini: Integrates with Google’s powerful family of models.
  • Hugging Face: Connects to a vast ecosystem of open-source models.
  • Ollama: Allows you to use a local, self-hosted LLM.

Vector Databases

Vector databases are essential for implementing the RAG pattern. Spring AI supports a number of popular solutions, providing a consistent VectorStore API for each:

  • Pinecone
  • Chroma
  • Milvus
  • PostgreSQL with the pgvector extension
  • Elasticsearch

Embedding Models

Embedding models are responsible for converting text into numerical vectors. Spring AI offers integrations for popular providers, including:

  • OpenAI
  • Google
  • Mistral AI

4. Model Configuration

Configuring Spring AI is a straightforward process thanks to Spring Boot’s property-based configuration. You can manage your API keys, model names, and other options in the application.properties or application.yml file.

OpenAI

To connect to OpenAI, you must provide your API key. You can also specify the model and other options like temperature for creativity.

spring.ai.openai.api-key=YOUR_API_KEY
spring.ai.openai.chat.options.model=gpt-4o-mini
spring.ai.openai.chat.options.temperature=0.7

Google Gemini

For Google Gemini, you configure the project ID and location, which are used to authenticate with Google Cloud’s Vertex AI.

spring.ai.vertex.ai.gemini.project-id=YOUR_PROJECT_ID
spring.ai.vertex.ai.gemini.location=us-central1

Ollama

Since Ollama runs locally, it doesn’t require an API key. You just need to specify the model you want to use.

spring.ai.ollama.chat.options.model=llama3

Hugging Face

For Hugging Face, you provide an API key and the URL for the specific inference endpoint you want to use.

spring.ai.huggingface.chat.api-key=YOUR_API_KEY
spring.ai.huggingface.chat.url=YOUR_INFERENCE_ENDPOINT_URL

5. Synchronous vs. Streaming API

The ChatClient in Spring AI provides two primary ways to interact with an LLM: a synchronous call() method and a reactive stream() method. Choosing between them depends on your application’s requirements for responsiveness and user experience.

Synchronous call()

The call() method is a blocking operation. Your application sends a request to the LLM and waits for the entire response to be generated before it can proceed.

Use Case: This approach is suitable for single-turn requests where the response is expected to be relatively short, such as a summary, a classification, or a joke. It’s simple to implement and doesn’t require a reactive programming model.

Example Code:

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @PostMapping("/chat/call")
    public String chatWithCall(@RequestParam String message) {
        // The call() method blocks until the full response is received.
        ChatResponse response = chatClient.prompt().user(message).call().chatResponse();
        return response.getResult().getOutput().getContent();
    }
}

Streaming stream()

The stream() method provides a non-blocking, reactive approach. The LLM’s response is sent back as a continuous stream of tokens, and your application can process these tokens as they arrive. This is handled using Spring’s reactive framework, Project Reactor, which returns a Flux.

Use Case: This is ideal for building real-time, interactive applications like chatbots or content generators where you want to provide a “typewriter” effect to the user, showing the response as it’s being generated. It significantly improves the perceived responsiveness of your application for longer responses.

Example Code:

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;

@RestController
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @PostMapping(value = "/chat/stream", produces = "text/event-stream")
    public Flux<String> chatWithStream(@RequestParam String message) {
        // The stream() method returns a Flux, emitting tokens as they are generated.
        return chatClient.prompt().user(message).stream().content();
    }
}

6. AI Model Evaluation and Testing

Building a reliable AI application requires more than just integrating with a model; it requires a strategy for validating its outputs. AI Model Evaluation and Testing is a critical part of the development lifecycle, especially for preventing issues like hallucinations (where the model generates false information) or irrelevant responses.

The Role of Evaluators

Spring AI provides a core Evaluator interface and several built-in implementations to help you test and validate your AI-generated content. These evaluators use a separate AI model to act as a judge, assessing the quality of your primary model’s output. This is a common and effective approach because an LLM can be an excellent tool for judging the output of another.

Key Evaluators in Spring AI

  • RelevanceEvaluator: This evaluator checks how well an AI-generated response aligns with the original user prompt. It assesses the semantic similarity to ensure the answer is on-topic and helpful.
  • FactCheckingEvaluator: This evaluator is designed to combat hallucinations. It compares a specific claim made by the AI against a provided context (e.g., a document from a RAG pipeline) to verify factual accuracy.

Testing with Evaluators

You can integrate these evaluators directly into your JUnit tests to create a robust CI/CD pipeline for your AI features. For example, you can write a test that sends a prompt to your application, receives the response, and then uses a RelevanceEvaluator to assert that the response meets a certain quality score.

Example Test Snippet:

import org.junit.jupiter.api.Test;
import org.springframework.ai.evaluation.RelevanceEvaluator;
import org.springframework.ai.evaluation.RelevanceEvaluator.RelevanceEvaluationOptions;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import static org.assertj.core.api.Assertions.assertThat;

@SpringBootTest
public class ChatControllerTests {

    @Autowired
    private ChatController chatController;

    @Autowired
    private RelevanceEvaluator relevanceEvaluator;

    @Test
    void testChatResponseRelevance() {
        String prompt = "What are the key features of Spring AI?";
        String response = chatController.chatWithCall(prompt);

        // Evaluate the relevance of the response to the original prompt
        var evaluation = relevanceEvaluator.evaluate(
            response,
            new RelevanceEvaluationOptions(prompt)
        );

        // Assert that the relevance score is above a certain threshold (e.g., 0.8)
        assertThat(evaluation.getScore()).isGreaterThan(0.8);
    }
}

7. Implementing Tool Calling with Custom Functions

Tool Calling is one of the most powerful features of Spring AI, allowing you to seamlessly connect an LLM to your own business logic. Spring AI makes this process incredibly simple by using standard annotations on Plain Old Java Objects (POJOs).

Step 1: Create a Tool Service

First, create a simple Spring component that contains the methods you want the LLM to be able to call. You can use a standard Function interface or a class with methods.

Example Code:

import org.springframework.ai.model.tool.ToolFunction;
import org.springframework.stereotype.Component;

import java.util.function.Function;

@Component
public class WeatherService {

    /**
     * Get the current weather for a given city.
     * @param request a request to get the current weather
     * @return a response containing the weather data
     */
    @ToolFunction(name = "getWeather",
            description = "Get the current weather for a given city")
    public Function<WeatherRequest, WeatherResponse> getWeatherFunction() {
        return request -> {
            // In a real application, you would call a weather API here.
            // For this example, we'll return mock data.
            System.out.println("Calling the weather service for city: " + request.city());
            return new WeatherResponse(request.city(), 25.0, "Sunny");
        };
    }

    public record WeatherRequest(String city) {}
    public record WeatherResponse(String city, double temperature, String conditions) {}
}

The key is the @ToolFunction annotation. It tells Spring AI to expose this function to the LLM, and the description is crucial because the LLM uses it to understand when to call the function.

Step 2: Register the Tool with the ChatClient

Next, you need to tell your ChatClient about the tools it has access to. You can do this by passing a list of ToolFunction objects to the tools() method on the ChatClient.Builder.

Example Code:

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.model.tool.ToolFunctions;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class ChatController {

    private final ChatClient chatClient;
    private final WeatherService weatherService;

    public ChatController(ChatClient.Builder builder, WeatherService weatherService) {
        this.weatherService = weatherService;

        this.chatClient = builder
                .defaultAdvisors(new ToolFunctions(weatherService))
                .build();
    }

    @PostMapping("/chat/tool-calling")
    public String chatWithToolCalling(@RequestParam String message) {
        // When the user asks about the weather, the LLM will call our getWeather function
        ChatResponse response = chatClient.prompt().user(message).call().chatResponse();
        return response.getResult().getOutput().getContent();
    }
}

When a user’s prompt (e.g., “What is the weather like in New York?”) matches the description of the getWeather tool, the LLM will execute the function and use the result to formulate a response. Spring AI handles the entire orchestration—from the LLM’s request to the function’s execution and the final response generation.

8. RAG Implementation: A Deeper Dive

While RAG is a powerful concept, its implementation involves a detailed, multi-step pipeline. Spring AI provides the necessary abstractions to manage each step seamlessly. The process is broken down into two main phases: Data Ingestion and Query & Retrieval.

Data Ingestion Pipeline

The first step is to get your unstructured data (documents, PDFs, etc.) into a format that a vector database can understand. This process is a classic ETL (Extract, Transform, Load) pipeline.

  1. Extract: A DocumentReader extracts content from a data source. Spring AI includes readers for common formats like Markdown, PDFs, and web pages.
  2. Transform: A TextSplitter breaks down large documents into smaller, semantically meaningful chunks. This is crucial because LLMs have a limited context window.
  3. Embed: An EmbeddingClient converts these text chunks into numerical vectors. This process captures the semantic meaning of the text, allowing for a similarity search later.
  4. Load: The VectorStore then stores these vectors, ready for retrieval.

Example Ingestion Code:

import org.springframework.ai.document.Document;
import org.springframework.ai.embedding.EmbeddingClient;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.SimpleVectorStore;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import java.util.List;

@Configuration
public class VectorStoreConfig {

    @Bean
    public VectorStore vectorStore(EmbeddingClient embeddingClient,
                                   @Value("classpath:/docs/my-policy-manual.pdf") Resource pdfResource) {

        // Use a simple in-memory vector store for this example
        SimpleVectorStore vectorStore = new SimpleVectorStore(embeddingClient);

        // Extract text from the PDF
        PagePdfDocumentReader pdfReader = new PagePdfDocumentReader(pdfResource);

        // Split the documents into manageable chunks
        TokenTextSplitter textSplitter = new TokenTextSplitter();
        List<Document> documents = textSplitter.split(pdfReader.read());

        // Add the documents to the vector store
        vectorStore.add(documents);

        return vectorStore;
    }
}

Query & Retrieval Pipeline

Once your data is in the vector store, you can use it to answer user questions.

  1. Retrieve: When a user submits a query, the application uses an Advisor to first perform a similarity search on the VectorStore to find the most relevant documents.
  2. Augment: The retrieved documents are then “stuffed” into the user’s prompt, providing the LLM with the specific context it needs to generate a grounded response.
  3. Generate: The augmented prompt is sent to the LLM, which uses the provided context to answer the user’s question.

Example Retrieval Code:

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class RAGController {

    private final ChatClient chatClient;

    public RAGController(ChatClient.Builder builder, VectorStore vectorStore) {
        this.chatClient = builder
                // The advisor automatically retrieves documents and adds them to the prompt
                .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore))
                .build();
    }

    @PostMapping("/chat/rag")
    public String chatWithRAG(@RequestParam String message) {
        // The user's message is passed, and the advisor handles the RAG process
        return chatClient.prompt()
                .user(message)
                .call()
                .content();
    }
}

Discover more from GhostProgrammer - Jeff Miller

Subscribe to get the latest posts sent to your email.

By Jeffery Miller

I am known for being able to quickly decipher difficult problems to assist development teams in producing a solution. I have been called upon to be the Team Lead for multiple large-scale projects. I have a keen interest in learning new technologies, always ready for a new challenge.