Notes on Tiny LLMs

I’ve been updating my quite frequently over the past few months, but haven’t really had time to do much regarding experimenting with local LLMs–partly because the landscape is cluttered with brittle code, and partly because I’m not really looking to replace gpt-3.5-turbo now that I can run my own endpoint on Azure (which is already powering daily RSS summaries and other simple things I care about).

But it’s worth noting I’ve been playing with Phi-2 for a bit (I forked a serendipitously simple MLX wrapper to run it on my Macs), and even though it requires a bunch of additional prompting (it’s a pretty raw thing), it can do some general things pretty well (like the aforementioned summarization).

And given the way it was trained, I guess I was when I wrote that long-term curation of model inputs was key–especially when it comes down to creating smaller, more optimized models.

This page is referenced in: