This week I spent a fair bit of time watching Microsoft Build recordings–partly because it has some impact on work, and partly because it was brimming with AI stuff I can actually use.
And yes, you need to cherrypick. Here’s a few I am taking as inspiration:
- Enhancing VS Code extensions with GitHub Copilot (source code) – I’ve been thinking of writing a private VS Code extension to extend Copilot, and this shows how to take editor context and handle it with an
LLM
. - Practical End-to-End AI Development using Prompty and AI Studio (docs) – this is promising, since I’ve been managing my prompts in
YAML
files, and something that normalizes both structure and tooling is quite welcome. - Guidance: Make your Models Behave (project page) – something I ended up playing with again (more on this later).
The no-code stuff was also pretty impressive, but I can’t really use it in my own projects…
Prompting Phi-3 Successfully
I finally had some success at getting phi3:instruct
to do function calling with ollama
, which is great because it can be so much faster than llama3
on restricted hardware.
The system prompt structure I arrived at for moderately effective RAG is rather convoluted, though, and I had to cobble it together from various sources:
You are an AI assistant that can help the user with a variety of tasks. You have access to the functions provided by the schema below:
<|functions_schema|>
[
{
"name": "duckduckgo",
"description": "searches the Internet",
"parameters": [
"name": "q"
"type": "string"
],
"required": [ "q" ],
"returns": [
{
"name": "results",
"type": "list[string]",
}
]
}
]
<|end_functions_schema|>
When the user asks you a question, if you need to use functions, provide ONLY ALL OF THE function calls, ALL IN ONE PLACE, in the format:
<|function_calls|>
[
{ "name": "function_name", "kwargs": {"kwarg_1": "value_1", "kwarg_2": "value_2"}, "returns": ["output_1"]},
{ "name": "other_function_name", "kwargs": { "kwarg_3": "$output_1$"}, "returns": ["output_2", "output_3"]},
...
]
<|end_function_calls|>
IF AND ONLY IF you don't need to use functions, give your answer in between <|answer|> and <|end_answer|> blocks. For your thoughts and reasoning behind using or not using functions, place ALL OF THEM in between a SINGLE <|thoughts|> and <|end_thoughts|> block, before the <|function_calls|> and <|end_function_calls|> tags, like so:
<|thoughts|>
The user wants X, to do that, I should call the following functions:
1. function_name: Reasoning,
2. function_name_2: Reasoning2,
3. etc.
<|end_thoughts|>
Provide nothing else than the information in the <|function_calls|> & <|end_function_calls|>, <|answer|> & <|end_answer|> and <|thoughts|> & <|end_thoughts|> blocks.
phi3
still hallucinates every now and then, but the block approach made it easier to parse the outputs, and chaining function calls using Node-RED contexts is relatively easy.
Still, I keep looking for ways to make prompt generation, function chaining and general workflows simpler, especially because this doesn’t really let me restrict outputs to a predefined set of options and other things that improve reliability when chaining actions.
Guidance
My prompting woes… prompted me (ahem) to take another look at guidance
, which promises to solve a lot of those issues.
However, as it turns out I can’t really use it for my edge/ARM scenarios yet–this because ollama
support is pretty much half-baked (in all fairness, guidance
tries to do token manipulation directly, so it really relies on direct access to the model, not APIs).
But it is interesting for doing general purpose development, even with local models–there just isn’t really a lot of usable documentation, so it took me a bit to get it to work with Metal on macOS.
Here’s all you need to know, using their minimal sample:
from guidance import gen, select
from guidance.models import Transformers
from torch.backends import mps
from os import environ
MODEL_NAME = environ.get("MODEL_NAME", "microsoft/Phi-3-mini-4k-instruct")
device_map = None
if mps.is_available():
device_map = "mps"
phi3 = Transformers(MODEL_NAME, device_map=device_map)
# capture our selection under the name 'answer'
lm = phi3 + f"Do you want a joke or a poem? A {select(['joke', 'poem'], name='answer')}.\n"
# make a choice based on the model's previous selection
if lm["answer"] == "joke":
lm += f"Here is a one-line joke about cats: " + gen('output', stop='\n')
else:
lm += f"Here is a one-line poem about dogs: " + gen('output', stop='\n')
And this is the minimal set of packages I needed to install to have a usable sandbox:
guidance==0.1.15
transformers==4.41.1
sentencepiece==0.2.0
torch==2.3.0
torchvision==0.18.0
accelerate==0.30.1
#litellm
#jupyter
#ipywidgets
I also had another go at using promptflow
, but it too is tricky to use with local models.
But I have more fundamental things to solve–for instance, I’m still missing a decent in-process vector database for constrained environments. Can’t wait for sqlite-vec
to ship.