Making LLMs Do More of What You Want

Fri Nov 1, 2024

Once we can make LLMs do what we want, we might want to formalize this and scale it up. We've got

generate :: SystemPrompt -> Prompt -> Response
generateChecked :: (String -> Maybe a) -> SystemPrompt -> Prompt -> ?Int -> Maybe a

as a baseline. And, sure we have a vague sketch of something called generateCSSSelector, which is interesting only in a very narrow sense.

Ok, so what's next?

Suppose that, in addition to being able to do String -> String prompts the way that generate does and String -> Maybe a prompts the way that generateChecked, you want your model to be able to call some set of functions that you want to extend to it.

type Env :: Map Name Function

Compiler writers, are you with me here?

transform :: Env -> String -> Maybe (Function, Args)
transform (env, result) = 
  let parsed = maybeJsonParse result
  in case Just json => if (json.functionName and json.args 
                           and json.functionName in env 
                           and (validArgsFor env[json.functionNameame] json.args))
                       then Just (env[json.functionName], json.args)
                       else Nothing
          Nothing => Nothing

I'd bet Schemers, Clojurers and Common Lispers know where this is going too.

define :: Env -> Name -> (Args -> ResMap)
define (env, name, fn) = assoc env name (fn ,args ,@body)

generateToolCall :: Env -> Prompt -> Maybe (Function, Args)
generateToolCall env prompt = 
   let sysprompt = """
   You are a computer specialist. 
   Your job is translating client requests into tool calls.
   Your client has sent a request to use a tool; return the function 
  call corresponding to the request and no other commentary. Return a value 
  of type `{"functionName" :: string, "args" :: {arg_name: arg value}}`. You 
  have access to the tools: {map #(%k, typeSig %v, docstring %v) env}.
"""
   in generateChecked transform sysprompt prompt

Looking at it from out here, this is almost too trivial to bother writing. But in effect, what we've got is a pluggable, fully generalizable toolkit that gives any sufficiently smart model access to tool capabilities. call really is too trivial to bother writing in the notional language we've got; if I had to I'd say something like call = funcall. Which tells you Everything you need to know if you've worked with enough languages, and also, exactly nothing if you didn't. The big point of flexibility that I'm insisting on here is that you can swap out different environments in order to keep your models restricted to a (hopefully, if you've done your job) known-safe set of function bindings.

Python

So lets step down from the realm of notional pseudocode and grab the snake by the tail and head simultaneously.

def generateToolCall(tools, llm, prompt):
    sysprompt = f'You are a computer specialist. Your job is translating client requests into tool calls. Your client has sent a request to use a tool; return the function call corresponding to the request and no other commentary. Return a value of type `{{"functionName" :: string, "args" :: {{arg_name: arg value}} }}`. You have access to the tools: {tools.list()}.'

    return llm.generate_checked(tools.transform, sysprompt, prompt)

There's the head. You take a tools environment, and an llm, and a prompt describing something that asks for a tool call, and you return the tool call. That definition tells us that tools is going to need the methods list and transform at minimum.

class Tools:
    def __init__(self):
        self.env = {}

You know what's up here. An environment is a dictionary. Duh.

    def define(self, toolName, toolFunction):
        assert (
            toolFunction.__annotations__ is not None
        ), "Interned functions must be annotated"
        assert toolFunction.__doc__, "Interned functions must have docstrings"
        if toolName in self._env:
            return False
        self._env[toolName] = {
            "type": {
                k: v
                for k, v in toolFunction.__annotations__.items()
                if not k == "return"
            },
            "function": toolFunction,
        }
        return True

define is clunkier than I'd like, but I mean, what am I supposed to do here? We take a name and a function (and use Python internal methods to assert that it has type annotations and documentation, because those things make it easier to spit at a model). Realistically, I could give it optional type and description so that you can override the given functions' __annotations__ and __doc__, and I could give __name__ the same treatment so that you could pass in lambdas if you really wanted to, even though they're awful in Python. That's about it though.

Honestly, all this definition is doing is reminding me how much simpler this code would be over in Clojure-land. Where I might still put it eventually.

    def list(self):
        return [
            {
                "name": k,
                "type": {
                    k: v
                    for k, v in v["function"].__annotations__.items()
                    if not k == "return"
                },
                "description": v["function"].__doc__,
            }
            for k, v in self.env.items()
        ]

One of the implied methods from earlier. This is why we need to ensure documentation and type annotations; it gives the target model more info to work with.

    def validate(self, tool_call):
        if (
            "functionName" in tool_call
            and "args" in tool_call
            and tool_call["functionName"] in self._env
        ):
            f = self._env[tool_call["functionName"]]
            if not set(tool_call["args"].keys()).difference(f["type"].keys()):
                return True
        return False

    def transform(self, resp):
        parsed, success = loadch(resp)
        if not success:
            return None, False
        if self.validate(parsed):
            return parsed, True
        return None, False

the original pseudocode transform got split up here. Mostly, because I'm going to be a little paranoid and use validate again inside of call. Still, you can see what's up here.

Transform takes a string, tries to JSON parse it using the loadch function we defined last time. If it fails, we bail. Otherwise, we validate the result. If that succeeds, then we have a valid tool_call that we can call with confidence, assuming we've safely defined the underlying function.

validate itself does exactly what the pseudo implied earlier; we check that it's a dict with a functionName and an args, check that the functionName references something in our env, and that the thing it references has the corresponding argument list. If any of that fails, False, otherwise True.

    def call(self, tool_call):
        if self.validate(tool_call):
            return self.env[tool_call["functionName"]]["function"](**tool_call["args"])
        return None

Bam it's a one-liner. In a lisp-like, this would just be funcall, or possibly not even a function at all, just a pair of parens marking it as something to evaluate. Also, technically, this is a Maybe <whatever type your function returns> (note that we return None in the case that the validate call fails).

Don't take the code too seriously in its' current form. I don't think I'm going to keep it precisely the way it is now, but the interface is there and any changes are likely to be cosmetic or QoL-enabling. Check the docs before building anything out of it.

The Upshot

So what's the point of all this?

>>> from typing import Optional, List
>>> def _screenshot(url: str, selectors: Optional[List[str]] = None) -> None:
    "Takes a url and an optional list of selectors. Takes a screenshot"
    print(f"GOT {url}, {selectors}!")
... ... ... 
>>>
>>> from trivialai import tools, ollama
>>> tls = tools.Tools()
>>> tls.define("screenshot", _screenshot)
True
>>> tls.list()
[{'name': 'screenshot', 'type': {'url': <class 'str'>, 'selectors': typing.Optional[typing.List[str]]}, 'description': 'Takes a url and an optional list of selectors. Takes a screenshot'}]
>>> client = ollama.Ollama("gemma2:2b", "http://localhost:11434/")
>>> tools.generate_tool_call(tls, client, "Take a screenshot of the Google website and highlight the search box")
LLMResult(raw=<Response [200]>, content={'functionName': 'screenshot', 'args': {'url': 'https://www.google.com/', 'selectors': ['#search']}})
>>> res = _
>>> res.content
{'functionName': 'screenshot', 'args': {'url': 'https://www.google.com/', 'selectors': ['#search']}}
>>> tls.call(res.content)
GOT https://www.google.com/, ['#search']!
>>>

There.

If you followed this far, I think you know exactly where I'm going.

As always, I'll let you know how it goes.