Building with AI – A Developer's Diary
Build a Tiny JavaScript Dev Assistant with Gemini Tool Combination
Function calling is useful. Built-in tools are useful. The interesting part is using both in the same request—search, then act in your app.
There's a moment in most AI API tutorials where things get a little disappointing. You hook up function calling, the model calls your function, great—but it only knew what to tell your function based on information it already had from training. No fresh context. No awareness of what changed last week. Just the same static knowledge, now routing through your code.
Gemini's tool combination feature fixes that. You can give the model access to built-in tools (like googleSearch) and your own custom functions at the same time, in the same request. The model can use the built-in tool to gather fresh context, then call your function to act on it—inside your app, with real data.
That's a genuinely different kind of assistant. Not one that guesses from memory, but one that looks something up and then does something useful with it.
Let's build a demo that shows the whole pattern, start to finish, in a single HTML file.
What we're building
The demo is a miniature developer sidekick. The user types a prompt like:
“Find the latest info on Next.js 15 caching changes and save a TODO reminding me to review stale cache handling.”
Gemini then does two things in sequence: it uses Google Search to look up current public information, then calls a custom function named saveTask() to add a structured task directly to the page.
The app is intentionally small—one HTML file, no build step, no framework—because the goal here is understanding the pattern, not the plumbing around it. The activity log is what makes this a useful tutorial rather than just a magic box: you can watch every step of the workflow as it happens, including when the built-in tool fires and when the model requests your custom function.
The tools declaration
Here's where the combination lives. When you call generateContent, you pass a tools array under config. One entry is the built-in googleSearch tool—you just include it and Gemini handles the rest. The second is your custom function declaration.
const tools = [
{ googleSearch: {} },
{
functionDeclarations: [
{
name: "saveTask",
description:
"Persist a task to the user's local dashboard. Call when the user asks to " +
"save, remember, or add a TODO (after you have enough detail).",
parameters: {
type: Type.OBJECT,
properties: {
title: {
type: Type.STRING,
description: "Short task title (max ~50 chars)",
},
note: {
type: Type.STRING,
description: "Full reminder text or research summary to store",
},
},
required: ["title", "note"],
},
},
],
},
];
Notice that tools lives inside config, not as a top-level argument to generateContent. That's an SDK requirement worth noting because it's easy to get wrong.
Steering the model with a system instruction
Left to its own devices, the model might describe a task in prose and call that “saving” it. A system instruction fixes this immediately:
JavaScriptconst systemInstruction =
"You are a research assistant with Google Search. When the user wants a task saved, " +
"you MUST call saveTask with a clear title and note; prose alone does not persist anything.";
This is a small addition with a big effect. It tells the model two things: that it has search capability it should actually use, and that saveTask()—not a paragraph of text—is the only way something actually gets saved. Without guidance like this, the model might satisfy the prompt conversationally and never touch your function.
The loop that makes it work
This is the part that trips people up the first time. Tool use isn't a single round-trip. You send the prompt, inspect what the model returns, handle any function calls, and then send those results back so the model can continue. It's a loop, not a one-shot.
There are a few things worth being deliberate about here. First, we cap the number of API rounds with MAX_TOOL_ROUNDS—a bug or an unexpectedly complex prompt can otherwise cause the model to loop indefinitely. Twelve rounds is generous but finite.
const MAX_TOOL_ROUNDS = 12;
let contents = [{ role: "user", parts: [{ text: userPrompt }] }];
let apiRound = 0;
while (true) {
if (apiRound >= MAX_TOOL_ROUNDS) {
addLog("Stopped: max API rounds exceeded (safety limit).", "error");
break;
}
apiRound++;
addLog(`Calling Gemini (API round ${apiRound})…`, "info");
const result = await ai.models.generateContent({
model: modelName,
contents,
config: {
systemInstruction,
tools,
toolConfig: {
includeServerSideToolInvocations: true,
},
},
});
// Exit early if the prompt was blocked before generation
if (result.promptFeedback?.blockReason) {
addLog(`Blocked: ${result.promptFeedback.blockReason}`, "error");
break;
}
if (!result.candidates?.length) {
addLog("No candidates in response.", "error");
break;
}
const candidate = result.candidates[0];
if (candidate.finishReason && candidate.finishReason !== "STOP") {
addLog(`Finish reason: ${candidate.finishReason}`, "info");
}
// Log any search queries used for grounding
const gm = candidate.groundingMetadata;
if (gm?.webSearchQueries?.length) {
addLog(`Search queries: ${JSON.stringify(gm.webSearchQueries)}`, "info");
}
const response = candidate.content;
if (!response?.parts?.length) {
addLog("No content parts (empty or filtered).", "error");
break;
}
// Replay the full model message verbatim — critical for preserving
// toolCall / toolResponse / thoughtSignature linkage in history
contents.push({ role: "model", parts: response.parts });
let hasFunctionCalls = false;
const functionResponses = [];
for (const part of response.parts) {
if (part.text) {
addLog(part.text, "model");
}
// Built-in tool invocation record (e.g. Google Search) — visible because includeServerSideToolInvocations: true
if (part.toolCall) {
const tc = part.toolCall;
addLog(`toolCall: ${tc.toolType ?? "?"} (id: ${tc.id ?? "n/a"})`, "info");
}
// Server-side result for that toolCall (same id / toolType)
if (part.toolResponse) {
const tr = part.toolResponse;
addLog(`toolResponse: ${tr.toolType ?? "?"} (id: ${tr.id ?? "n/a"})`, "info");
}
// Custom tool: we execute locally and send functionResponse next turn (id must match)
if (part.functionCall) {
hasFunctionCalls = true;
const call = part.functionCall;
addLog(`functionCall: ${call.name} (id: ${call.id ?? "missing"})`, "info");
let functionResult;
if (call.name === "saveTask") {
const args = call.args ?? {};
functionResult = saveTask({
title: String(args.title ?? ""),
note: String(args.note ?? ""),
});
} else {
functionResult = { error: "Unknown function" };
}
functionResponses.push({
functionResponse: {
name: call.name,
id: call.id,
response: functionResult,
},
});
}
}
// No saveTask → final answer (or model chose text only); stop the loop
if (!hasFunctionCalls) {
addLog("No function calls — conversation complete.", "info");
break;
}
// Second leg of tool combination: supply results for each functionCall in one user message
addLog("Sending functionResponse part(s) back to Gemini…", "info");
contents.push({ role: "user", parts: functionResponses });
}
A few things worth calling out in that loop. includeServerSideToolInvocations: true is what makes the googleSearch activity visible as toolCall and toolResponse parts in the response—without it, the model uses search silently and you never see it happen. We're also preserving the full conversation history in contents and pushing the model's message back verbatim, including those server-side parts. That linkage is required; strip those parts out and the multi-turn context breaks. And the custom function—saveTask()—is the only thing allowed to mutate the page. The model requests it; your code runs it.
The custom function
The function itself is simple. It renders a task card, logs the action, and returns a structured result the model can reference in its final reply. In a real app you'd swap this out for createTicket(), saveDraft(), queueJob(), or whatever your application actually needs to do.
function saveTask({ title, note }) {
const li = document.createElement("li");
li.innerHTML = `<strong>${title}</strong><br /><small>${new Date().toLocaleString()}</small><br /><span>${note}</span>`;
tasksEl.prepend(li);
addLog(`Task saved: "${title}"`, "info");
return { status: "success", message: "Task added to local view" };
}
The return value matters. The model gets it back as a function response and uses it to shape its final reply. Return something meaningful and the model can confirm what it did. Return nothing and it's guessing.
The complete demo
Here's everything assembled into a single file. Paste in your API key, open it in a browser, and run it.
HTML — full demo<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Gemini Tool Combination Demo</title>
<!-- Page layout: cards, form controls, monospace log, task list styling -->
<style>
:root {
--primary: #2563eb;
--bg: #f8fafc;
--card-bg: #ffffff;
--text: #0f172a;
--border: #dbe1ea;
}
body {
font-family: Inter, system-ui, -apple-system, sans-serif;
max-width: 900px;
margin: 40px auto;
padding: 0 20px;
background: var(--bg);
color: var(--text);
line-height: 1.6;
}
.card {
background: var(--card-bg);
border: 1px solid var(--border);
border-radius: 14px;
padding: 20px;
margin-bottom: 20px;
box-shadow: 0 1px 3px rgba(0,0,0,0.05);
}
h1, h2 { margin-top: 0; }
.field { margin-bottom: 15px; }
label { display: block; font-weight: 600; margin-bottom: 5px; font-size: 0.9rem; }
input, textarea, button {
font: inherit;
width: 100%;
box-sizing: border-box;
}
input, textarea {
padding: 12px;
border-radius: 10px;
border: 1px solid #cbd5e1;
background: #fff;
}
textarea { min-height: 100px; resize: vertical; }
.actions { display: flex; gap: 10px; }
button {
background: var(--primary);
color: white;
border: 0;
border-radius: 10px;
padding: 12px 20px;
cursor: pointer;
font-weight: 600;
transition: opacity 0.2s;
}
button:hover { opacity: 0.9; }
button.secondary { background: #64748b; }
#log {
background: #0f172a;
color: #e2e8f0;
padding: 14px;
border-radius: 10px;
overflow: auto;
white-space: pre-wrap;
font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, monospace;
font-size: 0.85rem;
max-height: 300px;
}
ul { list-style-type: none; padding: 0; }
li {
background: #f1f5f9;
padding: 12px;
border-radius: 8px;
margin-bottom: 10px;
border-left: 4px solid var(--primary);
}
.status-badge {
display: inline-block;
padding: 2px 8px;
border-radius: 4px;
font-size: 0.75rem;
font-weight: bold;
margin-bottom: 10px;
}
.status-waiting { background: #e2e8f0; color: #475569; }
.status-running { background: #dbeafe; color: #1e40af; }
</style>
</head>
<body>
<!-- What this demo proves: Search (server) + custom function (client) in one Gemini 3 thread -->
<h1>Gemini tool combination demo</h1>
<p>
Demonstrates <strong>Google Search</strong> (built-in, server-side) plus a <strong>custom function</strong>
(<code>saveTask</code>) in one flow, using Gemini 3's
<a href="https://ai.google.dev/gemini-api/docs/tool-combination" target="_blank" rel="noopener">tool context circulation</a>:
<code>includeServerSideToolInvocations: true</code>, full model <code>parts</code> (including
<code>toolCall</code> / <code>toolResponse</code>) echoed in history, then your
<code>functionResponse</code> for the client-side tool.
</p>
<!-- Inputs: API key + prompt; triggers the multi-turn tool flow -->
<div class="card">
<div class="field">
<label for="apiKey">Gemini API Key</label>
<input type="password" id="apiKey" placeholder="Paste your API key here..." />
</div>
<div class="field">
<label for="prompt">What should I research and save?</label>
<textarea id="prompt">Find the latest info about Next.js 15 caching changes and save a TODO reminding me to review stale cache handling.</textarea>
</div>
<div class="actions">
<button id="runBtn">Run Assistant</button>
<button id="clearBtn" class="secondary">Clear All</button>
</div>
</div>
<!-- Scrollable trace: model text, toolCall/toolResponse, functionCall steps -->
<div class="card">
<h2>Activity Log <span id="status" class="status-badge status-waiting">Idle</span></h2>
<pre id="log">Waiting for prompt...</pre>
</div>
<!-- Side effect of the custom tool: tasks the model asked to persist via saveTask -->
<div class="card">
<h2>Saved Tasks</h2>
<ul id="tasks"></ul>
</div>
<script type="module">
// Gemini JS SDK (browser build); Type = schema enums for function declarations
import { GoogleGenAI, Type } from "https://esm.run/@google/genai";
// DOM references
const logEl = document.getElementById("log");
const tasksEl = document.getElementById("tasks");
const promptEl = document.getElementById("prompt");
const apiKeyEl = document.getElementById("apiKey");
const runBtn = document.getElementById("runBtn");
const clearBtn = document.getElementById("clearBtn");
const statusEl = document.getElementById("status");
// Restore API key between refreshes (demo only — don't ship keys in real apps)
const savedKey = localStorage.getItem("gemini_api_key");
if (savedKey) apiKeyEl.value = savedKey;
// Append a timestamped line to the activity log
function addLog(message, type = "info") {
const timestamp = new Date().toLocaleTimeString();
const prefix = type === "error" ? "❌ " : type === "model" ? "🤖 " : "ℹ️ ";
logEl.textContent += `\n[${timestamp}] ${prefix}${message}`;
logEl.scrollTop = logEl.scrollHeight;
}
// Implements the saveTask custom tool: UI update + payload returned in functionResponse
function saveTask({ title, note }) {
const li = document.createElement("li");
li.innerHTML = `<strong>${title}</strong><br /><small>${new Date().toLocaleString()}</small><br /><span>${note}</span>`;
tasksEl.prepend(li);
addLog(`Task saved: "${title}"`, "info");
return { status: "success", message: "Task added to local view" };
}
// Cap generateContent calls so a bug can't loop forever
const MAX_TOOL_ROUNDS = 12;
async function runDemo() {
const apiKey = apiKeyEl.value.trim();
const userPrompt = promptEl.value.trim();
if (!apiKey) { addLog("Please enter an API key.", "error"); return; }
if (!userPrompt) { addLog("Please enter a prompt.", "error"); return; }
localStorage.setItem("gemini_api_key", apiKey);
logEl.textContent = "Initializing...";
statusEl.textContent = "Running";
statusEl.className = "status-badge status-running";
try {
// API client + Gemini 3 preview (tool combination requires Gemini 3)
const ai = new GoogleGenAI({ apiKey });
const modelName = "gemini-3-flash-preview";
// Tools live under config.tools — NOT top-level on generateContent (SDK requirement).
// Pair: googleSearch (server-side) + functionDeclarations (client-side saveTask).
const tools = [
{ googleSearch: {} },
{
functionDeclarations: [
{
name: "saveTask",
description:
"Persist a task to the user's local dashboard. Call when the user asks to save, remember, or add a TODO (after you have enough detail).",
parameters: {
type: Type.OBJECT,
properties: {
title: {
type: Type.STRING,
description: "Short task title (max ~50 chars)",
},
note: {
type: Type.STRING,
description: "Full reminder text or research summary to store",
},
},
required: ["title", "note"],
},
},
],
},
];
// Steers the model to search when needed and to call saveTask instead of only describing a task
const systemInstruction =
"You are a research assistant with Google Search. When the user wants a task saved, you MUST call saveTask with a clear title and note; prose alone does not persist anything.";
// Multi-turn chat: each model turn (with tool parts) is pushed verbatim, then user sends functionResponse
let contents = [{ role: "user", parts: [{ text: userPrompt }] }];
let apiRound = 0;
// Loop: call model → append model message → if functionCall, run tools + append user functionResponse → repeat
while (true) {
if (apiRound >= MAX_TOOL_ROUNDS) {
addLog("Stopped: max API rounds exceeded (safety limit).", "error");
break;
}
apiRound++;
addLog(`Calling Gemini (API round ${apiRound})…`, "info");
// Single request: system + tools + toolConfig; includeServerSideToolInvocations exposes Search tool parts in the reply
const result = await ai.models.generateContent({
model: modelName,
contents,
config: {
systemInstruction,
tools,
toolConfig: {
// Required for tool context circulation (Search + custom tools in one thread).
includeServerSideToolInvocations: true,
},
},
});
// Early exit if the prompt was blocked before generation
if (result.promptFeedback?.blockReason) {
addLog(`Blocked: ${result.promptFeedback.blockReason}`, "error");
break;
}
if (!result.candidates?.length) {
addLog("No candidates in response.", "error");
break;
}
const candidate = result.candidates[0];
// Non-STOP reasons (e.g. SAFETY) are logged for debugging
if (candidate.finishReason && candidate.finishReason !== "STOP") {
addLog(`Finish reason: ${candidate.finishReason}`, "info");
}
// When Google Search runs, the API may attach queries used for grounding
const gm = candidate.groundingMetadata;
if (gm?.webSearchQueries?.length) {
addLog(`Search queries: ${JSON.stringify(gm.webSearchQueries)}`, "info");
}
const response = candidate.content;
// No parts can happen if output was filtered or empty
if (!response?.parts?.length) {
addLog("No content parts (empty or filtered).", "error");
break;
}
// Must replay the model message as returned (preserves toolCall/toolResponse/thoughtSignature linkage)
contents.push({ role: "model", parts: response.parts });
let hasFunctionCalls = false;
const functionResponses = [];
// Walk all parts: text, server tool trace, and client function calls
for (const part of response.parts) {
if (part.text) {
addLog(part.text, "model");
}
// Built-in tool invocation record (e.g. Google Search) — visible because includeServerSideToolInvocations: true
if (part.toolCall) {
const tc = part.toolCall;
addLog(`toolCall: ${tc.toolType ?? "?"} (id: ${tc.id ?? "n/a"})`, "info");
}
// Server-side result for that toolCall (same id / toolType)
if (part.toolResponse) {
const tr = part.toolResponse;
addLog(`toolResponse: ${tr.toolType ?? "?"} (id: ${tr.id ?? "n/a"})`, "info");
}
// Custom tool: we execute locally and send functionResponse next turn (id must match)
if (part.functionCall) {
hasFunctionCalls = true;
const call = part.functionCall;
addLog(`functionCall: ${call.name} (id: ${call.id ?? "missing"})`, "info");
let functionResult;
if (call.name === "saveTask") {
const args = call.args ?? {};
functionResult = saveTask({
title: String(args.title ?? ""),
note: String(args.note ?? ""),
});
} else {
functionResult = { error: "Unknown function" };
}
functionResponses.push({
functionResponse: {
name: call.name,
id: call.id,
response: functionResult,
},
});
}
}
// No saveTask → final answer (or model chose text only); stop the loop
if (!hasFunctionCalls) {
addLog("No function calls — conversation complete.", "info");
break;
}
// Second leg of tool combination: supply results for each functionCall in one user message
addLog("Sending functionResponse part(s) back to Gemini…", "info");
contents.push({ role: "user", parts: functionResponses });
}
} catch (err) {
addLog(err.message, "error");
console.error(err);
} finally {
// Always return UI to idle after success, error, or break
statusEl.textContent = "Idle";
statusEl.className = "status-badge status-waiting";
}
}
// Wire buttons
runBtn.addEventListener("click", runDemo);
clearBtn.addEventListener("click", () => {
// Reset local UI only (does not clear localStorage API key)
tasksEl.innerHTML = "";
logEl.textContent = "Waiting for prompt...";
addLog("Cleared all tasks and logs.");
});
</script>
</body>
</html>
What's actually happening under the hood
When you run the demo, the sequence looks roughly like this: the model receives the prompt, decides it needs fresh context, invokes googleSearch (the built-in side), incorporates what it finds, determines it should now call saveTask() with specific arguments, and returns that function call request to your code. Your code executes saveTask(), the task appears on the page, and you send the result back so the model can wrap up with a natural language response.
The activity log makes all of this observable in real time. You can watch each API round, the search queries the model used for grounding, the toolCall/toolResponse parts from the built-in tool, the function call request with its arguments, and the final reply. For a tutorial, that visibility is the whole point—understanding how it works beats seeing a polished output by a lot.
Where to take it from here
The pattern in this demo scales well. saveTask() could just as easily be createTicket(), updateProject(), saveDraft(), or queueJob(). Add a second custom function and the model can choose between them based on context. Add structured output for priority or tags and the data quality improves significantly.
Some natural extensions of this pattern in real applications:
- Support tools: check current docs or incident status, then create a ticket or update account metadata
- Internal dashboards: gather fresh context about a project or company, then save notes or tasks in your team app
- Developer assistants: research framework changes or release notes, then create issue drafts, TODOs, or patches
- Sales and ops tools: pull current news or context on a lead, then write back to a CRM or workflow queue
But keep the first version small. The whole charm of this example is that the entire pattern is visible at a glance. Once you understand the loop—prompt, built-in tool use, custom function call, function response, final output—it's easy to build on it.
The takeaway
This is one of those features that sounds abstract until you see it in motion. Once you do, it clicks: of course we want a model that can look something up and act on it in the same workflow. That's what makes the difference between a clever demo and something you'd actually want inside a real app.
Tiny demo. Big idea.