neovimcraft

plugins configs about

CREATED

2025-02-19

UPDATED

2026-02-27

Flemma 🪶

[!CAUTION] Actively Evolving

Flemma is growing fast – new tools, providers, and UI features land regularly. Expect occasional breaking changes while the project matures. Pin a commit if you need a steady target.

Flemma turns Neovim into an AI agent. Give it a task, and it works – calling tools, reading and editing files, running shell commands, and re-sending results back to the model in a fully autonomous loop. You stay in control: every action is visible in the .chat buffer, every tool call can require your approval, and you can take the wheel at any point. But when you trust the model, Flemma gets out of the way and lets it drive.

Streaming conversations, reusable prompt templates, file attachments, cost tracking, and ergonomic commands for Anthropic, OpenAI, and Google Vertex AI.

https://github.com/user-attachments/assets/c4c1ab0d-a83c-4a19-86f4-926f52ee2026

Autonomous agent loop – Flemma executes approved tool calls and re-sends results automatically, repeating until the task is done or your approval is needed. One keypress can kick off an entire multi-step workflow.
Tool calling – bash, file read/edit/write, with approval policies, parallel execution, and inline previews that show what each tool will do before you approve it. Register your own tools, approval resolvers, and preview formatters.
User at the wheel – every tool call is visible in the buffer with a preview of what it will do. Approve tools one at a time with Alt-Enter, bulk-approve with Ctrl-], or let autopilot handle everything. Pause, inspect, edit, resume at any point.
Multi-provider – Anthropic, OpenAI, and Vertex AI through one unified interface.
Extended thinking – unified thinking parameter across all providers, with automatic mapping to Anthropic budgets, OpenAI reasoning effort, and Vertex thinking budgets.
Template system – Lua/JSON frontmatter, inline {{ expressions }}, include() helpers.
Context attachments – reference local files with @./path; MIME detection and provider-aware formatting.
Usage reporting – per-request and session token totals, costs, and cache metrics.
Filesystem sandboxing – shell commands run inside a read-only rootfs with write access limited to your project directory. Limits the blast radius of common accidents. Auto-detects the best available backend; silently degrades on platforms without one.
Git-trackable conversations – .chat files are plain text. Commit them, diff them, branch them, share them. No opaque database, no export step – your conversation history lives in version control the moment you save.
Theme-aware UI – line highlights, rulers, signs, tool previews, and folding that adapt to your colour scheme.

Installation
Requirements
Quick Start
The Buffer Is the State
Understanding .chat Buffers
Commands and Provider Management
Providers
Tool Calling
Autopilot
Sandboxing
Template System
Usage, Pricing, and Notifications
UI Customisation
Configuration Reference
Developing and Testing
FAQ
Troubleshooting Checklist
License

Installation

Flemma works with any plugin manager. With lazy.nvim you only need to declare the plugin – opts = {} triggers require("flemma").setup({}) automatically:

{
  "Flemma-Dev/flemma.nvim",
  opts = {},
}

For managers that do not wire opts, call require("flemma").setup({}) yourself after the plugin is on the runtime path.

Requirements

Requirement	Why it matters
Neovim 0.11 or newer	Uses Tree-sitter folding APIs introduced in 0.11 and relies on `vim.fs` helpers.
`curl`	Streaming is handled by spawning `curl` with Server-Sent Events enabled.
Markdown Tree-sitter grammar	Flemma registers `.chat` buffers to reuse the Markdown parser for syntax highlighting and folding.
`file` CLI (optional but recommended)	Provides reliable MIME detection for `@./path` attachments. When missing, extensions are used as a best effort.
`bwrap` (optional, Linux)	Enables filesystem sandboxing for tool execution. Without it, tools run unsandboxed.

Provider credentials

Provider	Environment variable	Notes
Anthropic	`ANTHROPIC_API_KEY`
OpenAI	`OPENAI_API_KEY`	Supports GPT-5 family, including reasoning effort settings.
Google Vertex AI	`VERTEX_AI_ACCESS_TOKEN` or service-account credentials	Requires additional configuration (see below).

When environment variables are absent Flemma looks for secrets in the Secret Service keyring. Store them once and every Neovim instance can reuse them:

secret-tool store --label="Anthropic API Key" service anthropic key api
secret-tool store --label="OpenAI API Key" service openai key api
secret-tool store --label="Vertex AI Service Account" service vertex key api project_id your-gcp-project

Create a service account in Google Cloud and grant it the Vertex AI user role.
Download its JSON credentials and either:
- export them via VERTEX_SERVICE_ACCOUNT='{"type": "..."}', or
- store them in the Secret Service entry above (the JSON is stored verbatim).
Ensure the Google Cloud CLI is on your $PATH; Flemma shells out to gcloud auth application-default print-access-token whenever it needs to refresh the token.
Set the project/location in configuration or via :Flemma switch vertex gemini-2.5-pro project_id=my-project location=us-central1.

Note: If you only supply VERTEX_AI_ACCESS_TOKEN, Flemma uses that token until it expires and skips gcloud.

Quick Start

Configure the plugin:
```
require("flemma").setup({})
```
Create a new file that ends with .chat. Flemma only activates on that extension.

Type a message, for example:

@You: Turn the notes below into a short project update.
- Added Vertex thinking budget support.
- Refactored :Flemma command routing.
- Documented presets in the README.

Press Ctrl-] (normal or insert mode) or run :Flemma send. Flemma freezes the buffer while the request is streaming and shows @Assistant: Thinking.... With autopilot enabled (the default), tool calls are executed and re-sent automatically – you only need to intervene when a tool requires manual approval.
When the reply finishes, a floating notification lists token counts and cost for the request and the session.

Cancel an in-flight response with Ctrl-C or :Flemma cancel.

The Buffer Is the State

Most AI tools keep the real conversation hidden – in a SQLite file or a JSON log you can't touch. Flemma doesn't. The .chat buffer is the conversation, and nothing exists outside it. What you see is exactly what the model receives. Edit an assistant response to correct a hallucination, delete a tangent, rewrite your own message, paste in a tool result by hand – it all just works because there is no shadow state to fall out of sync. Want to fork a conversation? Duplicate the file. Want version history? You have Git. Switch from GPT to Claude mid-conversation, or turn thinking on for one turn and off for the next – every choice lives in the buffer where you can see and control it.

Understanding `.chat` Buffers

Structure

```lua
release = {
  version = "v25.10-1",
  focus = "command presets and UI polish",
}
notes = [[
- Presets appear first in :Flemma switch completion.
- Thinking tags have dedicated highlights.
- Logging toggles now live under :Flemma logging:*.
]]
```

@System: You turn engineering notes into concise changelog entries.

@You: Summarise {{release.version}} with emphasis on {{release.focus}} using the points below:
{{notes}}

@Assistant:
- Changelog bullets...
- Follow-up actions...

<thinking>
Model thoughts stream here and auto-fold.
</thinking>

Frontmatter sits on the first line and must be fenced with triple backticks. Lua and JSON parsers ship with Flemma; you can register more via flemma.codeblock.parsers.register("yaml", parser_fn). Lua frontmatter also exposes flemma.opt for per-buffer tool selection, approval, and provider parameter overrides.
Messages begin with @System:, @You:, or @Assistant:. The parser is whitespace-tolerant and handles blank lines between messages.
Thinking blocks appear only in assistant messages. When thinking is enabled (default "high"), Anthropic and Vertex AI models stream <thinking> sections; Flemma folds them automatically and keeps dedicated highlights for the tags and body.

[!NOTE] Cross-provider thinking. When you switch providers mid-conversation, thinking blocks from the previous provider are visible in the buffer but are not forwarded to the new provider's API. The visible text inside <thinking> tags is a summary for your reference; the actual reasoning data lives in provider-specific signature attributes on the tag. Only matching-provider signatures are replayed.

Folding and layout

Fold level	What folds	Why
Level 2	The frontmatter block	Keep templates out of the way while you focus on chat history.
Level 2	`<thinking>...</thinking>`	Reasoning traces are useful, but often secondary to the answer.
Level 1	Each message	Collapse long exchanges without losing context.

Toggle folds with your usual mappings (za, zc, etc.). The fold text shows a snippet of the hidden content so you know whether to expand it. The initial fold level is configurable via editing.foldlevel (default 1, which collapses thinking blocks).

Between messages, Flemma draws a ruler using the configured ruler.char and highlight. This keeps multi-step chats legible even with folds open.

Navigation and text objects

Inside .chat buffers Flemma defines:

]m / [m – jump to the next/previous message header.
im / am (configurable) – select the inside or entire message as a text object. am selects linewise and includes thinking blocks and trailing blank lines, making dam delete entire conversation turns. im skips <thinking> sections so yanking im never includes reasoning traces.
Buffer-local mappings for send/cancel default to <C-]> and <C-c> in normal mode. <C-]> is a hybrid key with three phases: inject approval placeholders, execute approved tools, send the conversation. <M-CR> (Alt-Enter) executes the single tool under the cursor – useful for stepping through pending tools one at a time. Insert-mode <C-]> behaves identically to normal mode but re-enters insert when the operation finishes.

Disable or remap these through the keymaps section (see Configuration Reference).

Commands and Provider Management

Use the single entry point :Flemma {command}. Autocompletion lists every available sub-command.

Command	Purpose	Example
`:Flemma status [verbose]`	Show runtime status (provider, parameters, autopilot, sandbox, tools) in a scratch buffer. `verbose` appends the full config dump with Lua highlighting.	`:Flemma status verbose`
`:Flemma send [key=value ...]`	Send the current buffer. Optional callbacks run before/after the request.	`:Flemma send on_request_start=stopinsert on_request_complete=startinsert!`
`:Flemma cancel`	Abort the active request and clean up the spinner.
`:Flemma switch ...`	Choose or override provider/model parameters.	See below.
`:Flemma import`	Convert Claude Workbench code snippets into `.chat` format (guide).
`:Flemma message:next` / `:Flemma message:previous`	Jump through message headers.
`:Flemma tool:execute`	Execute the tool at the cursor position.
`:Flemma tool:cancel`	Cancel the tool execution at the cursor.
`:Flemma tool:cancel-all`	Cancel all pending tool executions in the buffer.
`:Flemma tool:list`	List pending tool executions with IDs and elapsed time.
`:Flemma autopilot:enable` / `:...:disable` / `:...:status`	Toggle autopilot or view its state (status opens the full status buffer).
`:Flemma sandbox:enable` / `:...:disable` / `:...:status`	Toggle sandboxing or view its state (status opens the full status buffer).
`:Flemma logging:enable` / `:...:disable` / `:...:open`	Toggle structured logging and open the log file.
`:Flemma notification:recall`	Reopen the last usage/cost notification.

Switching providers and models

:Flemma switch (no arguments) opens two vim.ui.select pickers: first provider, then model.
:Flemma switch openai gpt-5 temperature=0.3 changes provider, model, and overrides parameters in one go.
:Flemma switch vertex project_id=my-project location=us-central1 thinking=medium demonstrates long-form overrides. Anything that looks like key=value is accepted; unknown keys are passed to the provider for validation.

Named presets

Define reusable setups under the presets key. Preset names must begin with $; completions prioritise them above built-in providers.

require("flemma").setup({
  presets = {
    ["$fast"] = "vertex gemini-2.5-flash temperature=0.2",
    ["$review"] = {
      provider = "anthropic",
      model = "claude-sonnet-4-6",
      max_tokens = 6000,
    },
  },
})

Switch using :Flemma switch $fast or :Flemma switch $review temperature=0.1 to override individual values.

Providers

Unified thinking

All three providers support extended thinking/reasoning. Flemma provides a single thinking parameter that maps automatically to each provider's native format:

`thinking` value	Anthropic (budget)	OpenAI (effort)	Vertex AI (budget)
`"max"`	32,768 tokens	`"max"` effort	32,768 tokens
`"high"` (default)	16,384 tokens	`"high"` effort	16,384 tokens
`"medium"`	8,192 tokens	`"medium"` effort	8,192 tokens
`"low"`	2,048 tokens	`"low"` effort	2,048 tokens
`"minimal"`	128 tokens	`"minimal"` effort	128 tokens
number (e.g. `4096`)	4,096 tokens	closest effort level	4,096 tokens
`false` or `0`	disabled	disabled	disabled

Set it once in your config and it works everywhere:

require("flemma").setup({
  parameters = {
    thinking = "high",     -- default: all providers think at maximum
  },
})

Or override per-request with :Flemma switch anthropic claude-sonnet-4-6 thinking=medium.

Priority order: Provider-specific parameters (thinking_budget for Anthropic/Vertex, reasoning for OpenAI) take priority over the unified thinking parameter when both are set. This lets you use thinking as the default and override with provider-native syntax when needed.

When thinking is active, the Lualine component shows the resolved level – e.g., claude-sonnet-4-6 (high) or o3 (medium).

Provider-specific capabilities

Provider	Defaults	Extra parameters	Notes
Anthropic	`claude-sonnet-4-6`	`thinking_budget` overrides the unified `thinking` parameter with an exact token budget (clamped to min 1,024).	Supports text, image, and PDF attachments. Thinking blocks stream into the buffer.
OpenAI	`gpt-5`	`reasoning` overrides the unified `thinking` parameter with an explicit effort level (`"low"`, `"medium"`, `"high"`).	Cost notifications include reasoning tokens. Lualine shows the reasoning level.
Vertex AI	`gemini-2.5-pro`	`project_id` (required), `location` (default `global`), `thinking_budget` overrides with an exact token budget (min 1).	`thinking_budget` overrides the unified `thinking` parameter for Vertex.

The full model catalogue (including pricing) is in lua/flemma/models.lua. You can access it from Neovim with:

:lua print(vim.inspect(require("flemma.models")))

Prompt caching

All three providers support prompt caching. Flemma handles breakpoint placement (Anthropic), cache keys (OpenAI), and implicit caching (Vertex) automatically. The cache_retention parameter controls the strategy where applicable:

	Anthropic	OpenAI	Vertex AI
Default	`"short"` (5 min)	`"short"` (in-memory)	Automatic
Min. tokens	1,024–4,096	1,024	1,024–2,048
Read discount	90%	50%	90%

When a cache hit occurs, the usage notification shows a Cache: line with read/write token counts. See docs/prompt-caching.md for provider-specific details, caveats, and pricing tables.

Tool Calling

Flemma's tool system is what makes it an agent. Models can execute shell commands, read files, write files, and apply edits – and with autopilot, the entire cycle is autonomous: call a tool, get the result, decide what to do next, call another tool, repeat.

How it works

When you send a message, Flemma includes definitions for available tools in the API request.
If the model decides to use tools, it emits **Tool Use:** blocks in its response.
Flemma categorises each tool call against your approval settings: auto-approved tools execute immediately, while tools requiring review get flemma:tool status=pending placeholders with an inline preview showing what the tool will do.
The cursor moves to the first pending tool. Press Alt-Enter to execute it – the cursor advances to the next pending tool automatically. Repeat until all tools are resolved.
Once every tool has a result, autopilot re-sends the conversation and the cycle continues until the model is done or needs your input again.

The result is a fluid back-and-forth: the model proposes actions, you see exactly what each one does, approve them at your own pace, and autopilot picks up where you left off. One prompt can trigger an entire multi-step workflow without losing you in a wall of pending approvals.

With autopilot disabled, the flow is manual: press Ctrl-] to inject review placeholders, again to execute, and again to re-send.

Built-in tools

Tool	Type	Description
`bash`	async	Executes shell commands. Configurable shell, working directory, and environment. Supports timeout and cancellation.
`read`	sync	Reads file contents with optional offset and line limit. Relative paths resolve against the `.chat` file.
`write`	sync	Writes or creates files. Creates parent directories automatically.
`edit`	sync	Find-and-replace with exact text matching. The old text must appear exactly once in the target file.

By default, file operations (read, write, edit) are auto-approved via the $default preset, while bash requires manual approval. While a tool is pending, Flemma renders a virtual-line preview inside the placeholder showing the tool name and a formatted summary of its arguments – so you can see at a glance that read will open config.lua +0,50 or that bash will run $ make test. Built-in tools ship with tailored preview formatters; custom tools can provide their own via format_preview.

The built-in presets ($readonly, $default) cover common policies; define your own in tools.presets and compose them freely in auto_approve. Override per-buffer via flemma.opt.tools.auto_approve in frontmatter, or set tools.require_approval = false to skip approval entirely. Register your own tools with require("flemma.tools").register() and extend the approval chain with custom resolvers for plugin-level security policies. See docs/tools.md for the full reference on approval presets, per-buffer configuration, custom tool registration, tool previews, and the resolver API.

Autopilot

Autopilot is what turns Flemma from a chat interface into an autonomous agent. It is enabled by default.

When the model responds with tool calls, autopilot takes over: it executes every approved tool, collects the results, and re-sends the conversation – automatically, in a loop, until the model is done or needs your input. One prompt can trigger an entire multi-step workflow: the model reads files to understand a codebase, plans its approach, writes code, runs tests, reads the failures, fixes them, and re-runs – all from a single Ctrl-].

When the model returns multiple tool calls and some require approval, autopilot pauses and places your cursor on the first pending tool. Each pending placeholder shows an inline preview of what the tool will do. Press Alt-Enter to approve and execute the tool under the cursor – the cursor then advances to the next pending tool. Once every tool has a result, autopilot resumes the loop automatically. This sequential flow keeps you in control without breaking your momentum: you review one tool at a time, at your own pace, and the conversation picks back up the moment you're done.

You are always in control. The entire conversation – every tool call, every result, every decision the model makes – is visible in the buffer. You can:

Let it run. Auto-approve trusted tools (e.g., read) and let the model work autonomously.
Supervise. Keep require_approval = true (the default) so autopilot pauses when a tool needs approval. Review the preview, press Alt-Enter to execute, and the loop resumes.
Intervene. Press Ctrl-C at any point to stop everything. Edit the buffer. Change the model's plan. Then press Ctrl-] to continue.

Safety

Turn limit: A configurable safety cap (tools.autopilot.max_turns, default 100) stops the loop with a warning if exceeded, preventing runaway cost from models that loop without converging.
Cancellation: Ctrl-C cancels the active request or tool execution and fully disarms autopilot – no surprises when you next press Ctrl-].
Conflict detection: If you edit the content inside an approved flemma:tool block, Flemma detects your changes, skips execution to protect your edits, and warns so you can review. For pending blocks, pasting content is treated as a user-provided result.

Runtime control

Toggle autopilot at runtime without changing your config:

:Flemma autopilot:enable – activate for the current session.
:Flemma autopilot:disable – deactivate for the current session.
:Flemma autopilot:status – open the status buffer and jump to the Autopilot section (shows enabled state, buffer loop state, max turns, and any frontmatter overrides).

To disable autopilot globally, set tools.autopilot.enabled = false. See docs/configuration.md for the full option reference.

Sandboxing

When sandboxing is enabled (the default), shell commands run inside a read-only filesystem with write access limited to your project directory, the .chat file directory, and /tmp. This prevents a misbehaving model from overwriting dotfiles, deleting system files, or writing outside the project. The sandbox is damage control, not a security boundary – it limits the blast radius of common accidents, not deliberate attacks.

Flemma auto-detects the best available backend. The built-in Bubblewrap backend works on Linux with the bwrap package installed. On platforms without a compatible backend, Flemma silently degrades to unsandboxed execution – no configuration changes needed.

-- The defaults work out of the box on Linux with bwrap installed.
-- Customise the policy to tighten or loosen restrictions:
require("flemma").setup({
  sandbox = {
    policy = {
      rw_paths = { "$CWD" },    -- only the project directory is writable
      network = false,          -- no network access
    },
  },
})

Override per-buffer via flemma.opt.sandbox in frontmatter, or toggle at runtime with :Flemma sandbox:enable/disable. See docs/sandbox.md for the full reference on policy options, path variables, custom backends, and security considerations.

Template System

Flemma's prompt pipeline supports Lua/JSON frontmatter, inline {{ expressions }}, and an include() helper for composable prompts. Errors surface as diagnostics before the request leaves your editor. Embed local files with @./path syntax – Flemma detects MIME types and formats attachments per-provider. See docs/templates.md for the full reference.

Usage, Pricing, and Notifications

Each completed request emits a floating report that names the provider/model, lists input/output tokens (reasoning tokens are counted under thoughts), and – when pricing is enabled – shows the per-request and cumulative session cost derived from lua/flemma/models.lua. When prompt caching is active, a Cache: line shows read and write token counts. Token accounting persists for the lifetime of the Neovim instance; call require("flemma.session").get():reset() to zero the counters without restarting. pricing.enabled = false suppresses the dollar amounts while keeping token totals.

Notifications are buffer-local – each .chat buffer gets its own notification stack, positioned relative to its window. Notifications for hidden buffers are queued and shown when the buffer becomes visible. Recall the most recent notification with :Flemma notification:recall.

For programmatic access to token usage and cost data, see docs/session-api.md.

UI Customisation

Flemma adapts to your colour scheme with theme-aware highlights, line backgrounds, rulers, sign column indicators, and folding. Every visual element is configurable – see docs/ui.md for the full reference.

The bundled Lualine component shows the active model and thinking level in your statusline.

Configuration Reference

Flemma works without arguments – require("flemma").setup({}) uses sensible defaults (Anthropic provider, thinking = "high", prompt caching enabled). Every option is documented with inline comments in the full configuration reference.

Key defaults:

Parameter	Default	Description
`provider`	`"anthropic"`	`"anthropic"` / `"openai"` / `"vertex"`
`thinking`	`"high"`	Unified thinking level across providers
`cache_retention`	`"short"`	Prompt caching strategy
`max_tokens`	`4000`	Maximum response tokens
`temperature`	`0.7`	Sampling temperature (disabled when thinking is active)

Developing and Testing

The repository provides a Nix shell so everyone shares the same toolchain:

nix develop

Inside the shell you gain convenience wrappers:

flemma-fmt – run nixfmt, stylua, and prettier across the repo.
flemma-amp – open the Amp CLI, preconfigured for this project.
flemma-codex – launch the OpenAI Codex helper.
flemma-claude – launch Claude Code for this project.

Run the automated tests with:

make test

The suite boots headless Neovim via tests/minimal_init.lua and executes Plenary+Busted specs in tests/flemma/, printing detailed results for each spec so you can follow along.

Other useful Makefile targets:

make lint          # Run luacheck on all Lua files
make check         # Run lua-language-server type checking
make develop       # Launch Neovim with Flemma loaded for local testing
make screencast    # Create a VHS screencast

To exercise the plugin without installing it globally, run make develop – it launches Neovim with Flemma on the runtime path and opens a scratch .chat buffer.

[!NOTE] Almost every line of code in Flemma has been authored through AI pair-programming tools (Claude Code as of late, Amp and Aider in the past). Traditional contributions are welcome – just keep changes focused, documented, and tested.

FAQ

Flemma is for the technical writers, researchers, creators, and tinkerers, for those who occasionally get in hot water and need advice. It's for everyone who wants to experiment with AI.

With autopilot and built-in tools (bash, file read/write/edit), Flemma is a fully autonomous coding agent that lives inside your editor. Give it a task – "refactor this module", "add tests for the auth flow", "find and fix the bug in checkout" – and watch it work: reading files, planning changes, writing code, running tests, iterating on failures. You stay in Neovim the whole time, with full visibility into every step. Flemma is not trying to replace dedicated agents like Claude Code or Codex, but it gives you an agent that speaks your language – Vim buffers, not a separate terminal.

...accidentally pressing <C-R> and refreshing the page midway through a prompt (or <C-W> trying to delete a word)... or Chrome sending a tab to sleep whilst I had an unsaved session... or having to worry about whether files I shared with Claude Workbench were stored on some Anthropic server indefinitely. I can be fast! I can be reckless! I can tinker! I can use my Vim keybindings and years of muscle memory!

If I have an idea, it's a buffer away. Should I want to branch off and experiment, I'd duplicate the .chat file and go in a different direction. Is the conversation getting too long? I'd summarize a set of instructions and start with them in a new .chat file, then share them each time I need a fresh start. Need backups or history? I have Git for that.

Write countless technical documents, from PRDs (Product Requirements Document), AKM (Architecture Knowledge Management), infrastructure and architecture diagrams with Mermaid, detailed storyboards for LMS (Learning Management System) content, release notes, FR (Functional Requirements), etc.
Write detailed software design documents using Figma designs as input and the cheap OCR capabilities of Gemini Flash to annotate them, then the excellent reasoning capabilities of Gemini Pro to generate storyboards and interaction flows.
Record video sessions which I later transcribed using Whisper and then turned into training materials using Flemma.
Generate client-facing documentation from very technical input, stripping it of technical jargon and making it accessible to a wider audience.
Create multiple SOW (Statement of Work) documents for clients.
Keep track of evolving requirements and decisions by maintaining a long history of meeting minutes.
Collect large swaths of emails, meeting minutes, Slack conversations, Trello cards, and distill them into actionable tasks and project plans.
As a tool for other AI agents – generate prompts for Midjourney, Reve, etc. and even prompts that I'd feed to different .chat buffers in Flemma.

There really is no limit to what you can do with Flemma – if you can write it down and reason about it, you can use Flemma to help you with it.

On a personal level, I've used Flemma to generate bedtime stories with recurring characters for my kids, made small financial decisions based on collected evidence, asked for advice on how to respond to difficult situations, consulted (usual disclaimer, blah blah) it for legal advice and much more.

Troubleshooting Checklist

Nothing happens when I send: confirm the buffer name ends with .chat and the first message starts with @You: or @System:.
Frontmatter errors: notifications list the exact line and file. Fix the error and resend; Flemma will not contact the provider until the frontmatter parses cleanly.
Attachments ignored: ensure the file exists relative to the .chat file and that the provider supports its MIME type. Use ;type= to override when necessary.
Temperature ignored: when thinking is enabled (default "high"), Anthropic and OpenAI disable temperature. Set thinking = false if you need temperature control.
Vertex refuses requests: double-check parameters.vertex.project_id and authentication. Run gcloud auth application-default print-access-token manually to ensure credentials are valid.
Tool execution doesn't respond: make sure the cursor is on or near the **Tool Use:** block. Only tools with registered executors can be run – check :lua print(vim.inspect(require("flemma.tools").get_all())).
Keymaps clash: disable built-in mappings via keymaps.enabled = false and register your own :Flemma commands.
Sandbox blocks writes: If a tool reports "permission denied" on a path you expect to be writable, run :Flemma status (or :Flemma sandbox:status) and verify the path is inside rw_paths. Add it to sandbox.policy.rw_paths or disable sandboxing to troubleshoot.
Cross-buffer issues: Flemma manages state per-buffer. If something feels off after switching between multiple .chat buffers, ensure each buffer has been saved (unsaved buffers lack __dirname for path resolution).

License

Flemma is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

Happy prompting!

neovimcraft

neovimcraft

Flemma-Dev/flemma.nvim

CREATED

2025-02-19

UPDATED

2026-02-27

Flemma 🪶

Table of Contents

Installation

Requirements

Provider credentials

Quick Start

The Buffer Is the State

Understanding .chat Buffers

Structure

Folding and layout

Navigation and text objects

Commands and Provider Management

Switching providers and models

Named presets

Providers

Unified thinking

Provider-specific capabilities

Prompt caching

Tool Calling

How it works

Built-in tools

Autopilot

Safety

Runtime control

Sandboxing

Template System

Usage, Pricing, and Notifications

UI Customisation

Configuration Reference

Developing and Testing

FAQ

Troubleshooting Checklist

License

Understanding `.chat` Buffers