Jun 10, 2024 3 min read

So you want to build an Agent?

I, like many others lately, have found myself pulled into the Generative AI vortex. I kept some distance from this space for a while, only trying out the big chatbots that came out, but a few months ago I was pulled in to a project involving LLMs, and needed to learn many things very quickly.

Fast forward, and I found myself getting frustrated at the esoteric nature of Generative AI app development, and to a lesser extent, the Python-centric tooling ecosystem. Python does make sense for many things as there’s a very robust data science and ML ecosystem, but I do not call myself a Pythonista. I’ve been almost exclusively a Go programmer for the better part of a decade, so I wanted to try my hand at building some Generative AI tooling in my lingua franca.

I’ve spent a few days exploring the patterns of “Agentic RAG”, which is the latest term for “LLMs using data and tools you provide to perform tasks”. I spent a free weekend building a tool to create an Agent and/or RAG workflow without requiring any code. I think that any developer will agree that after a few years supporting code your own self wrote more than a few months ago, writing no code is generally preferable 🙂

So with the goal of making experimentation (and iteration) with Agents and RAG as quick as possible, I wrote a tool call Ragoo.

With Ragoo, you can write a YAML file which defines the various parts of an Agent or RAG workflow, compose those parts together, and run the workflow without touching a programming language.

Pulling data into a Vector DB

importers:
  - name: k8s-files
    type: file
    config:
      directory: /Users/cohix-lab/workspaces/cohix/kubernetes-the-hard-way/docs/
    steps:
      - type: embedder
        ref: ollama/arctic
        action: generate
        params:
          input: $_chunk
        var: embedding

      - type: storage
        ref: duckdb/main
        action: insert.embedding
        params:
          embedding: $embedding
          ref: $_ref
          batch: $_batch
          collection: k8s

This describes an Importer, which pulls documents from a source, chunks them, and then runs a workflow. This example runs an Embedder (which generates embeddings, or the vector representation of the text), and then inserts it into a database. I’m currently loving DuckDB, and it has built in vector abilities, so let’s use it.

Next up is a RAG workflow:

- name: k8s-docs
    stages:
      - name: k8s-docs-rag
        steps:
          - type: embedder
            ref: ollama/arctic
            action: generate
            params:
              input: $_input
            var: embedding

          - type: storage
            ref: duckdb/main
            action: lookup.cosine
            params:
              embedding: $embedding
              collection: k8s
              threshold: 0.65
              limit: 2
            var: refs

          - type: importer
            ref: k8s-files
            action: resolve.refs
            params:
              refs: $refs
              seperator: \n
            var: context

          - type: service
            ref: ollama/llama
            action: completion
            params:
              prompt: |
                $context
                ----
                Using the information above, answer the question below in 100 words or less.
                If the answer is not contained entirely within the information provided, reply 'I do not know' without any additional text.
                Only provide an answer to the question, do not summarize all of the information.
                ----
                Question: $_input
            var: _response

This workflow runs the Embedder again (this time on a prompt), uses that vector to look up Document References from the DuckDB database based on cosine similarity. The Importer is used to resolve the Document References into the files themselves, and then a template is used to send an “augmented prompt” to an LLM (Ollama running Llama3, in this case).

The full example (find it here) imports the documents from Kubernetes the hard way and lets you ask questions about them. It works pretty well! One interesting thing I noticed is how much better Llama3 is at answering questions compared to Phi3.

I think this is useful as it allows you to easily run experiments with all the different parameters (like cosine similarity threshold, various prompt styles, etc), without needing to invest a ton of time or effort into learning every nuance of Agentic workflows and the myriad of libraries therein.

You can then run the whole thing using the ragoo binary:

go install ./cmd/ragoo/

ragoo ./ragoo.yaml

The application starts up and provides an API endpoint which will run the workflow.

I think Agents are a cool idea, somewhat like a large state machine with super indeterminate state. It’s a fun problem, and can result in some really cool capabilities.

Let me know what you think (connor [at] cohix.network or Linkedin). I’m going to continue weekend hacking on Ragoo, specifically adding more plugins and looking at adding Tools (and experiment with tool_use). I also want Ragoo to be useful as a Go package, which would allow you to experiment with an idea using YAML, and then ”drop down” into code if you want to take the idea further.