[!CAUTION] Actively Evolving
Flemma is growing fast â new tools, providers, and UI features land regularly. Expect occasional breaking changes while the project matures. Pin a commit if you need a steady target.
Flemma turns Neovim into an AI agent. Give it a task, and it works â calling tools, reading and editing files, running shell commands, and re-sending results back to the model in a fully autonomous loop. You stay in control: every action is visible in the .chat buffer, every tool call can require your approval, and you can take the wheel at any point. But when you trust the model, Flemma gets out of the way and lets it drive.
Streaming conversations, reusable prompt templates, file attachments, cost tracking, and ergonomic commands for Anthropic, OpenAI, and Google Vertex AI.
https://github.com/user-attachments/assets/c4c1ab0d-a83c-4a19-86f4-926f52ee2026
thinking parameter across all providers, with automatic mapping to Anthropic budgets, OpenAI reasoning effort, and Vertex thinking budgets.{{ expressions }}, include() helpers.@./path; MIME detection and provider-aware formatting..chat files are plain text. Commit them, diff them, branch them, share them. No opaque database, no export step â your conversation history lives in version control the moment you save..chat BuffersFlemma works with any plugin manager. With lazy.nvim you only need to declare the plugin â opts = {} triggers require("flemma").setup({}) automatically:
{
"Flemma-Dev/flemma.nvim",
opts = {},
}
For managers that do not wire opts, call require("flemma").setup({}) yourself after the plugin is on the runtime path.
| Requirement | Why it matters |
|---|---|
| Neovim 0.11 or newer | Uses Tree-sitter folding APIs introduced in 0.11 and relies on vim.fs helpers. |
curl |
Streaming is handled by spawning curl with Server-Sent Events enabled. |
| Markdown Tree-sitter grammar | Flemma registers .chat buffers to reuse the Markdown parser for syntax highlighting and folding. |
file CLI (optional but recommended) |
Provides reliable MIME detection for @./path attachments. When missing, extensions are used as a best effort. |
bwrap (optional, Linux) |
Enables filesystem sandboxing for tool execution. Without it, tools run unsandboxed. |
| Provider | Environment variable | Notes |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY |
|
| OpenAI | OPENAI_API_KEY |
Supports GPT-5 family, including reasoning effort settings. |
| Google Vertex AI | VERTEX_AI_ACCESS_TOKEN or service-account credentials |
Requires additional configuration (see below). |
When environment variables are absent Flemma looks for secrets in the Secret Service keyring. Store them once and every Neovim instance can reuse them:
secret-tool store --label="Anthropic API Key" service anthropic key api
secret-tool store --label="OpenAI API Key" service openai key api
secret-tool store --label="Vertex AI Service Account" service vertex key api project_id your-gcp-project
VERTEX_SERVICE_ACCOUNT='{"type": "..."}', or$PATH; Flemma shells out to gcloud auth application-default print-access-token whenever it needs to refresh the token.:Flemma switch vertex gemini-2.5-pro project_id=my-project location=us-central1.Note: If you only supply VERTEX_AI_ACCESS_TOKEN, Flemma uses that token until it expires and skips gcloud.
Configure the plugin:
require("flemma").setup({})
Create a new file that ends with .chat. Flemma only activates on that extension.
Type a message, for example:
@You: Turn the notes below into a short project update.
- Added Vertex thinking budget support.
- Refactored :Flemma command routing.
- Documented presets in the README.
Press Ctrl-] (normal or insert mode) or run :Flemma send. Flemma freezes the buffer while the request is streaming and shows @Assistant: Thinking.... With autopilot enabled (the default), tool calls are executed and re-sent automatically â you only need to intervene when a tool requires manual approval.
When the reply finishes, a floating notification lists token counts and cost for the request and the session.
Cancel an in-flight response with Ctrl-C or :Flemma cancel.
Most AI tools keep the real conversation hidden â in a SQLite file or a JSON log you can't touch. Flemma doesn't. The .chat buffer is the conversation, and nothing exists outside it. What you see is exactly what the model receives. Edit an assistant response to correct a hallucination, delete a tangent, rewrite your own message, paste in a tool result by hand â it all just works because there is no shadow state to fall out of sync. Want to fork a conversation? Duplicate the file. Want version history? You have Git. Switch from GPT to Claude mid-conversation, or turn thinking on for one turn and off for the next â every choice lives in the buffer where you can see and control it.
.chat Buffers```lua
release = {
version = "v25.10-1",
focus = "command presets and UI polish",
}
notes = [[
- Presets appear first in :Flemma switch completion.
- Thinking tags have dedicated highlights.
- Logging toggles now live under :Flemma logging:*.
]]
```
@System: You turn engineering notes into concise changelog entries.
@You: Summarise {{release.version}} with emphasis on {{release.focus}} using the points below:
{{notes}}
@Assistant:
- Changelog bullets...
- Follow-up actions...
<thinking>
Model thoughts stream here and auto-fold.
</thinking>
flemma.codeblock.parsers.register("yaml", parser_fn). Lua frontmatter also exposes flemma.opt for per-buffer tool selection, approval, and provider parameter overrides.@System:, @You:, or @Assistant:. The parser is whitespace-tolerant and handles blank lines between messages."high"), Anthropic and Vertex AI models stream <thinking> sections; Flemma folds them automatically and keeps dedicated highlights for the tags and body.[!NOTE] Cross-provider thinking. When you switch providers mid-conversation, thinking blocks from the previous provider are visible in the buffer but are not forwarded to the new provider's API. The visible text inside
<thinking>tags is a summary for your reference; the actual reasoning data lives in provider-specific signature attributes on the tag. Only matching-provider signatures are replayed.
| Fold level | What folds | Why |
|---|---|---|
| Level 2 | The frontmatter block | Keep templates out of the way while you focus on chat history. |
| Level 2 | <thinking>...</thinking> |
Reasoning traces are useful, but often secondary to the answer. |
| Level 1 | Each message | Collapse long exchanges without losing context. |
Toggle folds with your usual mappings (za, zc, etc.). The fold text shows a snippet of the hidden content so you know whether to expand it. The initial fold level is configurable via editing.foldlevel (default 1, which collapses thinking blocks).
Between messages, Flemma draws a ruler using the configured ruler.char and highlight. This keeps multi-step chats legible even with folds open.
Inside .chat buffers Flemma defines:
]m / [m â jump to the next/previous message header.im / am (configurable) â select the inside or entire message as a text object. am selects linewise and includes thinking blocks and trailing blank lines, making dam delete entire conversation turns. im skips <thinking> sections so yanking im never includes reasoning traces.<C-]> and <C-c> in normal mode. <C-]> is a hybrid key with three phases: inject approval placeholders, execute approved tools, send the conversation. <M-CR> (Alt-Enter) executes the single tool under the cursor â useful for stepping through pending tools one at a time. Insert-mode <C-]> behaves identically to normal mode but re-enters insert when the operation finishes.Disable or remap these through the keymaps section (see Configuration Reference).
Use the single entry point :Flemma {command}. Autocompletion lists every available sub-command.
| Command | Purpose | Example |
|---|---|---|
:Flemma status [verbose] |
Show runtime status (provider, parameters, autopilot, sandbox, tools) in a scratch buffer. verbose appends the full config dump with Lua highlighting. |
:Flemma status verbose |
:Flemma send [key=value ...] |
Send the current buffer. Optional callbacks run before/after the request. | :Flemma send on_request_start=stopinsert on_request_complete=startinsert! |
:Flemma cancel |
Abort the active request and clean up the spinner. | |
:Flemma switch ... |
Choose or override provider/model parameters. | See below. |
:Flemma import |
Convert Claude Workbench code snippets into .chat format (guide). |
|
:Flemma message:next / :Flemma message:previous |
Jump through message headers. | |
:Flemma tool:execute |
Execute the tool at the cursor position. | |
:Flemma tool:cancel |
Cancel the tool execution at the cursor. | |
:Flemma tool:cancel-all |
Cancel all pending tool executions in the buffer. | |
:Flemma tool:list |
List pending tool executions with IDs and elapsed time. | |
:Flemma autopilot:enable / :...:disable / :...:status |
Toggle autopilot or view its state (status opens the full status buffer). | |
:Flemma sandbox:enable / :...:disable / :...:status |
Toggle sandboxing or view its state (status opens the full status buffer). | |
:Flemma logging:enable / :...:disable / :...:open |
Toggle structured logging and open the log file. | |
:Flemma notification:recall |
Reopen the last usage/cost notification. |
:Flemma switch (no arguments) opens two vim.ui.select pickers: first provider, then model.:Flemma switch openai gpt-5 temperature=0.3 changes provider, model, and overrides parameters in one go.:Flemma switch vertex project_id=my-project location=us-central1 thinking=medium demonstrates long-form overrides. Anything that looks like key=value is accepted; unknown keys are passed to the provider for validation.Define reusable setups under the presets key. Preset names must begin with $; completions prioritise them above built-in providers.
require("flemma").setup({
presets = {
["$fast"] = "vertex gemini-2.5-flash temperature=0.2",
["$review"] = {
provider = "anthropic",
model = "claude-sonnet-4-6",
max_tokens = 6000,
},
},
})
Switch using :Flemma switch $fast or :Flemma switch $review temperature=0.1 to override individual values.
All three providers support extended thinking/reasoning. Flemma provides a single thinking parameter that maps automatically to each provider's native format:
thinking value |
Anthropic (budget) | OpenAI (effort) | Vertex AI (budget) |
|---|---|---|---|
"max" |
32,768 tokens | "max" effort |
32,768 tokens |
"high" (default) |
16,384 tokens | "high" effort |
16,384 tokens |
"medium" |
8,192 tokens | "medium" effort |
8,192 tokens |
"low" |
2,048 tokens | "low" effort |
2,048 tokens |
"minimal" |
128 tokens | "minimal" effort |
128 tokens |
number (e.g. 4096) |
4,096 tokens | closest effort level | 4,096 tokens |
false or 0 |
disabled | disabled | disabled |
Set it once in your config and it works everywhere:
require("flemma").setup({
parameters = {
thinking = "high", -- default: all providers think at maximum
},
})
Or override per-request with :Flemma switch anthropic claude-sonnet-4-6 thinking=medium.
Priority order: Provider-specific parameters (thinking_budget for Anthropic/Vertex, reasoning for OpenAI) take priority over the unified thinking parameter when both are set. This lets you use thinking as the default and override with provider-native syntax when needed.
When thinking is active, the Lualine component shows the resolved level â e.g., claude-sonnet-4-6 (high) or o3 (medium).
| Provider | Defaults | Extra parameters | Notes |
|---|---|---|---|
| Anthropic | claude-sonnet-4-6 |
thinking_budget overrides the unified thinking parameter with an exact token budget (clamped to min 1,024). |
Supports text, image, and PDF attachments. Thinking blocks stream into the buffer. |
| OpenAI | gpt-5 |
reasoning overrides the unified thinking parameter with an explicit effort level ("low", "medium", "high"). |
Cost notifications include reasoning tokens. Lualine shows the reasoning level. |
| Vertex AI | gemini-2.5-pro |
project_id (required), location (default global), thinking_budget overrides with an exact token budget (min 1). |
thinking_budget overrides the unified thinking parameter for Vertex. |
The full model catalogue (including pricing) is in lua/flemma/models.lua. You can access it from Neovim with:
:lua print(vim.inspect(require("flemma.models")))
All three providers support prompt caching. Flemma handles breakpoint placement (Anthropic), cache keys (OpenAI), and implicit caching (Vertex) automatically. The cache_retention parameter controls the strategy where applicable:
| Anthropic | OpenAI | Vertex AI | |
|---|---|---|---|
| Default | "short" (5 min) |
"short" (in-memory) |
Automatic |
| Min. tokens | 1,024â4,096 | 1,024 | 1,024â2,048 |
| Read discount | 90% | 50% | 90% |
When a cache hit occurs, the usage notification shows a Cache: line with read/write token counts. See docs/prompt-caching.md for provider-specific details, caveats, and pricing tables.
Flemma's tool system is what makes it an agent. Models can execute shell commands, read files, write files, and apply edits â and with autopilot, the entire cycle is autonomous: call a tool, get the result, decide what to do next, call another tool, repeat.
**Tool Use:** blocks in its response.flemma:tool status=pending placeholders with an inline preview showing what the tool will do.The result is a fluid back-and-forth: the model proposes actions, you see exactly what each one does, approve them at your own pace, and autopilot picks up where you left off. One prompt can trigger an entire multi-step workflow without losing you in a wall of pending approvals.
With autopilot disabled, the flow is manual: press Ctrl-] to inject review placeholders, again to execute, and again to re-send.
| Tool | Type | Description |
|---|---|---|
bash |
async | Executes shell commands. Configurable shell, working directory, and environment. Supports timeout and cancellation. |
read |
sync | Reads file contents with optional offset and line limit. Relative paths resolve against the .chat file. |
write |
sync | Writes or creates files. Creates parent directories automatically. |
edit |
sync | Find-and-replace with exact text matching. The old text must appear exactly once in the target file. |
By default, file operations (read, write, edit) are auto-approved via the $default preset, while bash requires manual approval. While a tool is pending, Flemma renders a virtual-line preview inside the placeholder showing the tool name and a formatted summary of its arguments â so you can see at a glance that read will open config.lua +0,50 or that bash will run $ make test. Built-in tools ship with tailored preview formatters; custom tools can provide their own via format_preview.
The built-in presets ($readonly, $default) cover common policies; define your own in tools.presets and compose them freely in auto_approve. Override per-buffer via flemma.opt.tools.auto_approve in frontmatter, or set tools.require_approval = false to skip approval entirely. Register your own tools with require("flemma.tools").register() and extend the approval chain with custom resolvers for plugin-level security policies. See docs/tools.md for the full reference on approval presets, per-buffer configuration, custom tool registration, tool previews, and the resolver API.
Autopilot is what turns Flemma from a chat interface into an autonomous agent. It is enabled by default.
When the model responds with tool calls, autopilot takes over: it executes every approved tool, collects the results, and re-sends the conversation â automatically, in a loop, until the model is done or needs your input. One prompt can trigger an entire multi-step workflow: the model reads files to understand a codebase, plans its approach, writes code, runs tests, reads the failures, fixes them, and re-runs â all from a single Ctrl-].
When the model returns multiple tool calls and some require approval, autopilot pauses and places your cursor on the first pending tool. Each pending placeholder shows an inline preview of what the tool will do. Press Alt-Enter to approve and execute the tool under the cursor â the cursor then advances to the next pending tool. Once every tool has a result, autopilot resumes the loop automatically. This sequential flow keeps you in control without breaking your momentum: you review one tool at a time, at your own pace, and the conversation picks back up the moment you're done.
You are always in control. The entire conversation â every tool call, every result, every decision the model makes â is visible in the buffer. You can:
read) and let the model work autonomously.require_approval = true (the default) so autopilot pauses when a tool needs approval. Review the preview, press Alt-Enter to execute, and the loop resumes.tools.autopilot.max_turns, default 100) stops the loop with a warning if exceeded, preventing runaway cost from models that loop without converging.approved flemma:tool block, Flemma detects your changes, skips execution to protect your edits, and warns so you can review. For pending blocks, pasting content is treated as a user-provided result.Toggle autopilot at runtime without changing your config:
:Flemma autopilot:enable â activate for the current session.:Flemma autopilot:disable â deactivate for the current session.:Flemma autopilot:status â open the status buffer and jump to the Autopilot section (shows enabled state, buffer loop state, max turns, and any frontmatter overrides).To disable autopilot globally, set tools.autopilot.enabled = false. See docs/configuration.md for the full option reference.
When sandboxing is enabled (the default), shell commands run inside a read-only filesystem with write access limited to your project directory, the .chat file directory, and /tmp. This prevents a misbehaving model from overwriting dotfiles, deleting system files, or writing outside the project. The sandbox is damage control, not a security boundary â it limits the blast radius of common accidents, not deliberate attacks.
Flemma auto-detects the best available backend. The built-in Bubblewrap backend works on Linux with the bwrap package installed. On platforms without a compatible backend, Flemma silently degrades to unsandboxed execution â no configuration changes needed.
-- The defaults work out of the box on Linux with bwrap installed.
-- Customise the policy to tighten or loosen restrictions:
require("flemma").setup({
sandbox = {
policy = {
rw_paths = { "$CWD" }, -- only the project directory is writable
network = false, -- no network access
},
},
})
Override per-buffer via flemma.opt.sandbox in frontmatter, or toggle at runtime with :Flemma sandbox:enable/disable. See docs/sandbox.md for the full reference on policy options, path variables, custom backends, and security considerations.
Flemma's prompt pipeline supports Lua/JSON frontmatter, inline {{ expressions }}, and an include() helper for composable prompts. Errors surface as diagnostics before the request leaves your editor. Embed local files with @./path syntax â Flemma detects MIME types and formats attachments per-provider. See docs/templates.md for the full reference.
Each completed request emits a floating report that names the provider/model, lists input/output tokens (reasoning tokens are counted under thoughts), and â when pricing is enabled â shows the per-request and cumulative session cost derived from lua/flemma/models.lua. When prompt caching is active, a Cache: line shows read and write token counts. Token accounting persists for the lifetime of the Neovim instance; call require("flemma.session").get():reset() to zero the counters without restarting. pricing.enabled = false suppresses the dollar amounts while keeping token totals.
Notifications are buffer-local â each .chat buffer gets its own notification stack, positioned relative to its window. Notifications for hidden buffers are queued and shown when the buffer becomes visible. Recall the most recent notification with :Flemma notification:recall.
For programmatic access to token usage and cost data, see docs/session-api.md.
Flemma adapts to your colour scheme with theme-aware highlights, line backgrounds, rulers, sign column indicators, and folding. Every visual element is configurable â see docs/ui.md for the full reference.
The bundled Lualine component shows the active model and thinking level in your statusline.
Flemma works without arguments â require("flemma").setup({}) uses sensible defaults (Anthropic provider, thinking = "high", prompt caching enabled). Every option is documented with inline comments in the full configuration reference.
Key defaults:
| Parameter | Default | Description |
|---|---|---|
provider |
"anthropic" |
"anthropic" / "openai" / "vertex" |
thinking |
"high" |
Unified thinking level across providers |
cache_retention |
"short" |
Prompt caching strategy |
max_tokens |
4000 |
Maximum response tokens |
temperature |
0.7 |
Sampling temperature (disabled when thinking is active) |
The repository provides a Nix shell so everyone shares the same toolchain:
nix develop
Inside the shell you gain convenience wrappers:
flemma-fmt â run nixfmt, stylua, and prettier across the repo.flemma-amp â open the Amp CLI, preconfigured for this project.flemma-codex â launch the OpenAI Codex helper.flemma-claude â launch Claude Code for this project.Run the automated tests with:
make test
The suite boots headless Neovim via tests/minimal_init.lua and executes Plenary+Busted specs in tests/flemma/, printing detailed results for each spec so you can follow along.
Other useful Makefile targets:
make lint # Run luacheck on all Lua files
make check # Run lua-language-server type checking
make develop # Launch Neovim with Flemma loaded for local testing
make screencast # Create a VHS screencast
To exercise the plugin without installing it globally, run make develop â it launches Neovim with Flemma on the runtime path and opens a scratch .chat buffer.
[!NOTE] Almost every line of code in Flemma has been authored through AI pair-programming tools (Claude Code as of late, Amp and Aider in the past). Traditional contributions are welcome â just keep changes focused, documented, and tested.
Flemma is for the technical writers, researchers, creators, and tinkerers, for those who occasionally get in hot water and need advice. It's for everyone who wants to experiment with AI.
With autopilot and built-in tools (bash, file read/write/edit), Flemma is a fully autonomous coding agent that lives inside your editor. Give it a task â "refactor this module", "add tests for the auth flow", "find and fix the bug in checkout" â and watch it work: reading files, planning changes, writing code, running tests, iterating on failures. You stay in Neovim the whole time, with full visibility into every step. Flemma is not trying to replace dedicated agents like Claude Code or Codex, but it gives you an agent that speaks your language â Vim buffers, not a separate terminal.
...accidentally pressing <C-R> and refreshing the page midway through a prompt (or <C-W> trying to delete a word)... or Chrome sending a tab to sleep whilst I had an unsaved session... or having to worry about whether files I shared with Claude Workbench were stored on some Anthropic server indefinitely. I can be fast! I can be reckless! I can tinker! I can use my Vim keybindings and years of muscle memory!
If I have an idea, it's a buffer away. Should I want to branch off and experiment, I'd duplicate the .chat file and go in a different direction. Is the conversation getting too long? I'd summarize a set of instructions and start with them in a new .chat file, then share them each time I need a fresh start. Need backups or history? I have Git for that.
.chat buffers in Flemma.There really is no limit to what you can do with Flemma â if you can write it down and reason about it, you can use Flemma to help you with it.
On a personal level, I've used Flemma to generate bedtime stories with recurring characters for my kids, made small financial decisions based on collected evidence, asked for advice on how to respond to difficult situations, consulted (usual disclaimer, blah blah) it for legal advice and much more.
.chat and the first message starts with @You: or @System:..chat file and that the provider supports its MIME type. Use ;type= to override when necessary."high"), Anthropic and OpenAI disable temperature. Set thinking = false if you need temperature control.parameters.vertex.project_id and authentication. Run gcloud auth application-default print-access-token manually to ensure credentials are valid.**Tool Use:** block. Only tools with registered executors can be run â check :lua print(vim.inspect(require("flemma.tools").get_all())).keymaps.enabled = false and register your own :Flemma commands.:Flemma status (or :Flemma sandbox:status) and verify the path is inside rw_paths. Add it to sandbox.policy.rw_paths or disable sandboxing to troubleshoot..chat buffers, ensure each buffer has been saved (unsaved buffers lack __dirname for path resolution).Flemma is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
Happy prompting!