v100 is my engine for agentic research.
I use it to build, run, study, and evolve autonomous coding agents under real constraints. The core of the project is a Go-based agent runtime with a CLI, TUI, tool safety controls, durable memory, trace replay, benchmarking, eval, policy evolution, and long-running execution paths. Research is one feature of the system, but not the definition of it.
I built v100 to close the loop between idea, execution, observation, and iteration.
v100 is the engine I use for agentic research work:
- running interactive agent sessions against real workspaces
- resuming and replaying runs to understand what happened
- evaluating runs, comparing them, and distilling traces into training data
- evolving agent policies against benchmark suites
- keeping durable memory and retrieval close to the agent runtime
- experimenting with tool use, safety boundaries, and provider routing
- running autonomous research loops when I want the system to execute experiments on its own
The research command is one feature inside that engine. It matters, but it is not the center of gravity of the repo.
The high-level mental model is:
cmd/v100/contains the CLI surfaceinternal/core/contains the run loop, solvers, tracing, checkpoints, and research orchestrationinternal/tools/contains built-in tools the agent can callinternal/providers/wraps model backends behind one interfaceinternal/eval/contains scoring, analysis, benchmarks, and experiment supportinternal/memory/contains durable memory and vector storageinternal/ui/contains the terminal UI pieces- a few Python files remain for experiment targets, but they are not the center of the system
The CLI surface is fairly broad now. The commands I reach for most often are:
v100 run- start an agent runv100 resume <run_id>- resume a previous runv100 replay <run_id>- inspect a run trace as a transcriptv100 runs- browse recent runsv100 memory ...- inspect and manage durable memoryv100 research --config research.toml- run the autonomous research loopv100 bench run <bench.toml>- run benchmark suitesv100 analyze,v100 eval,v100 metrics,v100 diff,v100 verify- inspect run behavior and outcomesv100 evolve ...- mutate and benchmark agent policyv100 compress <run_id>- force-compress long run historiesv100 wake ...- run recurring autonomous wake cycles
Recent work has been concentrated in three areas:
- interactive reliability: fixing CLI confirmation freezes and raw-tty edge cases
- unattended execution:
--continuousonrunandresumefor longer hands-off sessions - retrieval and external context: ATProto indexing/recall and direct
user_postsfetching from a user's PDS
That direction matches how I use the tool: longer runs, less babysitting, better recall, better observability.
Prebuilt releases are published on GitHub for Linux, macOS, and Windows. The release page also includes checksums.txt.
Installer scripts:
- macOS / Linux:
curl -fsSL https://raw.githubusercontent.com/tripledoublev/v100/main/scripts/install.sh | bash - Windows PowerShell:
irm https://raw.githubusercontent.com/tripledoublev/v100/main/scripts/install.ps1 | iex
If you prefer to build from source:
go build ./cmd/v100That gives you a local v100 binary built from the current checkout.
v100 config init
v100 doctorThat writes the default config to the XDG config path and checks the local setup.
v100 run --provider codex --workspace .Add --tui if you want the Bubble Tea interface instead of plain CLI streaming.
If you want unattended multi-step execution:
v100 run --provider codex --workspace . --continuousv100 runs
v100 resume <run_id>
v100 replay <run_id>
v100 metrics <run_id>v100 research is the subsystem for autonomous experiment loops.
It lets me define:
- the target file and context for the agent
- the experiment command and metric to parse
- setup and collect hooks for remote execution
- local or provider-backed compute
- round budgets and optional tracking integration
That is useful when I want the system to drive experiments on its own, but it remains one capability inside the broader engine.
v100 treats tools as a first-class part of the runtime.
- tools are registered centrally and exposed to the model with schemas
- tools can be marked safe or dangerous
- dangerous tools can require confirmation
- reflective steps can be inserted before risky actions
- traces, checkpoints, and replay make tool behavior inspectable after the fact
This is one of the main reasons I use the project: I want agent autonomy, but I also want to see exactly how it behaves when the environment gets messy.
The harness supports multiple model backends behind one interface, including:
- Codex
- OpenAI
- Anthropic
- Gemini
- GLM
- MiniMax
- Ollama
- llama.cpp
There is also separate embedding-provider support for retrieval tools, so I do not have to use the same backend for chat and vector indexing.
cmd/v100/ CLI commands
internal/core/ loop, solvers, tracing, checkpoints, research
internal/tools/ tool implementations
internal/providers/ provider adapters
internal/eval/ scoring, benchmarks, experiments, analysis
internal/memory/ durable memory and vector stores
internal/ui/ terminal UI components
docs/ architecture notes and issue packs
research.toml research loop configuration
This is not meant to be a polished general-purpose framework in the abstract. It is my working engine for agentic research. I use it to try ideas quickly, keep the sharp edges visible, and evolve the system in public through actual use.
That means the repo sometimes carries a mix of:
- serious runtime and eval infrastructure
- rough-edged experimental features
- tooling that exists because I needed it last week
I think that is the right shape for this project.
The highest-value areas to keep pushing on next are:
- provider and tool integration reliability under long unattended runs
- README and docs alignment so the public surface matches the actual product
- eval and benchmark coverage for new runtime behaviors
- research-loop ergonomics for remote and cloud-backed experiments
- memory and retrieval quality, especially around external context sources
MIT
