Build a chatbot with Lilypad

Lilypad recently released an open-beta Inference API aka “Anura”! It supports a range of powerful text-to-text models such as Llama3, DeepSeek and Qwen2.5. Anura offers a flexible and scalable way to integrate LLMs into applications, making it straight-forward to build AI-powered tools with real-time inference. Whether you're working on chatbots, content generation or research applications, the Lilypad API provides a scalable and efficient solution with ease-of-use API access.
In this guide, we will be building a simple yet functional chatbot using Next.js and Llama3, leveraging Anura's API to generate responses based on user input. By the end, you’ll have a working chatbot and a deeper understanding of how to integrate LLMs into your own applications.
Available models
Our API currently supports a variety of powerful AI models tailored for different tasks, from code generation to multimodal reasoning. Whether you're looking for high-performance text generation or efficient scaling, we have models suited to your needs. We are always exploring and deploying new models to expand our offerings, so if there's a specific model you'd like to see added, let us know!
Qwen2.5 Coder (7B)
DeepSeek R1 (7B)
LLaVA (7B)
OpenThinker (7B)
Phi-4 (14B)
Phi-4 Mini (3.8B)
Qwen2.5 (7B)
DeepScaler (1.5B)
Llama 3.1 (8B)
Mistral (7B)
Getting started
Run the following command to create a new Next.js application:
npx create-next-app lilypad-chatbot
Navigate into the project directory:
cd lilypad-chatbot
Before moving on, let’s generate Anura API keys. Head to the Anura site and sign up. Log in to your account and generate an API key.
Create a .env file in the project root and add your key to the LILYPAD_API_TOKEN variable:
LILYPAD_API_TOKEN=<API_TOKEN>
Client-side components
The Form component serves as the interface between the user and the selected LLM. It manages user input, maintains conversation history, sends requests to the inference API and updates the UI with AI-generated responses.
At the heart of this component is conversation state management. Since LLMs perform better with contextual input, we retain the last six messages (MAX_HISTORY = 6). As this is more of a basic illustrative example, setting a max history prevents excessive memory usage while guaranteeing that the AI has enough recent context to generate relevant responses. Every time a user submits a new message, it’s added to the conversation state and the oldest messages are discarded if necessary.
Once the message is ready, it is sent to the API via a POST request. This request includes the current conversation history, allowing the LLM to generate responses in context. The function extractContent() processes this stream, filtering out incomplete or malformed data and extracting the final AI-generated text.
After processing the response, the conversation state updates and the UI refreshes to show the latest interaction. The input field is also cleared, allowing the user to continue the conversation.
Create a directory named components inside of app. Next we can create the form component we will be using as the main chat section. Create a file named Form.js and add the following code:
"use client";
import { useState } from "react";
const MAX_HISTORY = 6; // Limits conversation history to the last 6 messages to avoid excessive memory usage.
export default function Form() {
const [inputValue, setInputValue] = useState("");
const [conversation, setConversation] = useState([]); // Stores the chat history between the user and the AI.
const [loading, setLoading] = useState(false); // Tracks if the AI is currently generating a response.
const handleSubmit = async (e) => {
e.preventDefault();
setLoading(true);
// Keeps only the last MAX_HISTORY messages to maintain context but prevent unbounded growth.
const updatedConversation = [
...conversation.slice(-MAX_HISTORY), // Trims conversation history before adding new input.
{ role: "user", content: inputValue }, // Appends the latest user message.
];
try {
await new Promise(resolve => setTimeout(resolve, 500)); // Simulates a short delay to mimic request latency.
// Sends conversation history to the backend for inference.
const res = await fetch("/api/run-inference", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages: updatedConversation }),
});
const data = await res.json();
const result = extractContent(data); // Extracts the AI response from the streamed API output.
if (!res.ok) throw new Error(data.error || "Failed to fetch response");
// Updates conversation history, ensuring the user message and AI response stay within MAX_HISTORY.
setConversation([...updatedConversation, { role: "assistant", content: result }]);
setInputValue(""); // Clears input field after submission.
} catch (error) {
console.error("Error:", error.message);
alert(`Error: ${error.message}`);
}
setLoading(false);
};
function extractContent(apiResponse) {
const { text } = apiResponse;
// Split response by "data: " but remove empty entries
const jsonStrings = text.split("data: ").filter((entry) => {
try {
const jsonData = JSON.parse(entry.trim());
// Ensure it's the assistant's message by checking for 'choices'
return jsonData.choices?.[0]?.message?.content;
} catch {
return false; // Skip invalid JSON entries
}
});
if (jsonStrings.length === 0) return null;
// Get the last valid assistant response
const finalData = JSON.parse(jsonStrings[jsonStrings.length - 1].trim());
return finalData.choices[0].message.content;
}
return (
<div className="flex flex-col mx-auto text-center items-center w-2/3 justify-center bg-black text-white">
<p className="text-3xl mb-4">llama3.1:8b</p>
<div className="w-full p-6 border border-white rounded-lg bg-gray-900 text-left">
<p className="text-lg font-semibold mb-2">Conversation:</p>
<div className="p-3 bg-gray-800 border border-gray-600 rounded-lg w-full text-white">
{conversation.length === 0 ? (
<p className="text-gray-400">No messages yet. Ask something!</p>
) : (
conversation.map((msg, index) => (
<div key={index} className={`mb-2 ${msg.role === "user" ? "text-blue-300" : "text-green-300"}`}>
<strong>{msg.role === "user" ? "You:" : "AI:"}</strong> {msg.content}
</div>
))
)}
</div>
</div>
<form onSubmit={handleSubmit} className="mx-auto w-full p-6 border border-white rounded-lg mt-4">
{/* Input field for user messages */}
{!loading ? (
<input
type="text"
value={inputValue}
onChange={(e) => setInputValue(e.target.value)}
className="w-full p-3 bg-black border border-white rounded-lg text-white placeholder-gray-400 focus:outline-none focus:ring-2 focus:ring-white"
placeholder="Ask me anything..."
required
/>
) : (
// Displays a loading animation while waiting for the response. You can use whatever animation you want
<img src="lp-logo.svg" className="w-6 mx-auto animate-spin" />
)}
{/* Submit button */}
<button
type="submit"
className="w-full mt-4 p-3 font-semibold border border-white rounded-lg transition-all duration-200 ease-in-out bg-white text-black hover:bg-gray-300 disabled:opacity-50 disabled:cursor-not-allowed"
disabled={loading}
>
{loading ? "Thinking..." : "Submit"}
</button>
</form>
{/* Button to clear conversation history */}
{conversation.length > 0 && (
<button
className="w-full mt-4 p-3 font-semibold border border-white rounded-lg transition-all duration-200 ease-in-out bg-white text-black hover:bg-gray-300"
onClick={() => {
setConversation([]); // Resets conversation history.
}}
>
Try Again
</button>
)}
</div>
);
}
The page.js file serves as the entry point for the application, rendering the main chat interface. It imports the Form component from the components directory and displays it inside a centered container.
Inside of page.js, add the code below:
import Form from "./components/Form";
export default function Home() {
return (
<div className="items-center justify-items-center min-h-screen p-8 pb-20 gap-16 sm:p-20 font-[family-name:var(--font-geist-sans)]">
<Form />
</div>
);
}
Server-side component
The route.js file acts as the backend handler for processing user queries and interacting with the Lilypad API. It receives messages from the frontend, sends them to the LLM, processes the streamed response and then returns the final output to the client.
Unlike a standard API call that returns a complete response at once, the Lilypad API responds with a continuous stream of data. This means we need to handle incoming chunks incrementally, decoding them as they arrive. The reader.read() loop ensures that the entire response is collected piece by piece before returning it to the client.
Each request carries the current conversation history (context for the LLM), allowing the model to generate responses with relevant context. The temperature and max_tokens parameters control how the model behaves. The temperature adjusts randomness, while max_tokens ensures the response doesn’t exceed a set length. For more information on the valid parameters for the API, please refer to the Lilypad docs.
Once the response is fully received and processed, it is returned to the client. This allows the Form component to update the UI dynamically, creating a smooth and responsive chat experience.
Inside of the app directory, create an api folder, and within it, a run-inference directory. In this directory, create a route.js file. This file will define the server-side API endpoint responsible for handling chat requests and communicating with the API for AI inference.
The route.js function will:
Extract the conversation history from the request.
Construct the inference request with model parameters.
Send the request to the Lilypad API for processing.
Stream the response chunk by chunk, ensuring efficient handling of large responses.
Return the processed response to the client.
import { NextResponse } from "next/server";
export async function POST(req) {
try {
const { messages } = await req.json();
const API_URL = "https://anura-testnet.lilypad.tech/api/v1/chat/completions";
const API_TOKEN = process.env.LILYPAD_API_TOKEN;
const requestBody = {
model: "llama3.1:8b", // Can use any available model on the API
messages, // Passes the conversation history to maintain context
max_tokens: 2048, // Caps response length to prevent runaway token usage
temperature: 0.7, // Controls randomness—higher values make responses more diverse
};
const response = await fetch(API_URL, {
method: "POST",
headers: {
"Content-Type": "application/json",
"Accept": "text/event-stream", // Enables streaming to return partial results as they are generated
"Authorization": `Bearer ${API_TOKEN}`,
},
body: JSON.stringify(requestBody),
});
if (!response.ok) {
throw new Error(`API request failed with status ${response.status}`);
}
const reader = response.body.getReader();
let result = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
result += new TextDecoder().decode(value); // Decodes streamed chunks as they arrive
}
return NextResponse.json({ text: result });
} catch (error) {
console.error("API Route Error:", error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}
Running the chatbot
Now that everything is set up, it's time to run the chatbot and test the interaction with the API. Follow these steps to start the application and begin chatting with the AI:
Start the Development Server
Run npm run dev to start your local server.
Interacting with the Chatbot
Open http://localhost:3000 in your browser and type a message in the input field and press Submit.
The chatbot will:
Send your input to the API.
Process the AI’s response and display it in the chat window.
Maintain a conversation history, ensuring relevant context in responses.
Here is an example of how the chatbot interacts in a conversation:

What’s next?
That’s it! With this guide, you’ve successfully built a basic chatbot using Next.js and Lilypad’s Anura API, integrating real-time AI responses into your app. You’ve learned how to:
Set up a Next.js project and configure Anura API keys.
Build a client-side chat interface that maintains conversation history.
Create a server-side API route to process user input and stream AI responses.
This is just the beginning! You can extend this chatbot by:
Switching models to explore different capabilities.
Adding UI enhancements, such as message streaming instead of displaying responses all at once.
Improving memory handling, such as storing chat history in a database for long-term context.
For more details on available models and additional API features, check out the Lilypad API documentation. If you have any questions or ideas for improvements, feel free to reach out to the the in the Lilypad Discord.
You can check out the source code for this example here!





