Large Language Models

This page is a collection of notes and links related to large language models (LLMs), their applications, and the underlying technology. It serves as a reference for understanding the current state of LLMs, their capabilities, and their limitations, and is the result my cleaning up the main and splitting it into more manageable sections. It is not exhaustive, but it should provide a good starting point for anyone interested in the topic.

Models

Interesting models i’ve come across, off the mainstream beaten path:

Field Category Date Link Notes
Large Language Models Function Calling 2024 functionary

can interpret and execute functions/plugins

Octopus-v2

a model designed for both function calling and on-device inference

Multi-modal Models Models 2023 ml-ferret

a multi-modal model from Apple

Small Language Models 2024 TinyLlama

pretraining of a 1.1B Llama model on 3 trillion tokens.

Tools

Miscellaneous tools and applications that use LLMs, or are related to them in some way:

Field Category Date Link Notes
Agent Systems Agent Framework 2024 pipecat

yet another LLM agent framework

Agent Memory 2025 AgenticMemory

a tool for creating and managing memory in LLMs

Autonomous Agents 2023 Auto-GPT

an attempt to provide ChatGPT with a degree of autonomy

Code Agents 2024 plandex

yet another long-running agent tool for complex coding tasks

Multiagent Simulation TinyTroupe

a multiagent persona simulation

Assistants Code Assistants 2025 aider

a tool that enables pair programming with LLMs to edit code in local git repositories.

tabby

a self-hosted AI coding assistant

ollama-copilot

Proxy that allows you to use ollama as a copilot like Github copilot

anon-kode

a fork of Claude-Coder for other LLMs

2024 gpt-pilot

a prototype development tool that leverages GPT

llm-vscode

a VSCode extension that uses llm-ls

emacs-copilot

an Emacs extension for using a local LLM

privy

An open-source alternative to GitHub copilot that runs locally.

2023 localpilot

a MITM proxy that lets you use the GitHub Copilot extension with other LLMs

Desktop Assistants macOSpilot-ai-assistant

An Electron app for macOS

Personal Assistants khoj

an intriguing personal assistant based on local data

Terminal Assistant 2025 tmuxai

a terminal assistant that leverages tmux

Development App Development Platform 2024 dify

an open-source LLM app development platform with a node-based UX

Flow-based 2023 langflow

a node-based GUI for quick iteration of langchain flows

JavaScript Framework 2024 genaiscript

a JavaScript environment for prompt development and structured data extraction for LLMs.

LLM Programming GPTScript

Natural Language Programming against multiple LLMs

Language Server llm-ls

a local language server that leverages LLMs

NLP Toolkit WordLlama

a lightweight NLP toolkit for tasks like fuzzy-deduplication, similarity, and ranking

Prompt Compression LLMLingua

a tool for compressing prompts with minimal loss of information

Workflow Management burr

a tool for creating and managing LLM workflows

Evaluation Evaluation Platform 2025 opik

an open-source platform for evaluating, testing, and monitoring LLM applications.

Model Evaluation 2023 pykoi

a unified interface for data and feedback collection, including model comparisons

PromptTools

self-hostable toools for evaluating LLMs, vector databases, and prompts

Prompt Evaluation ChainForge

a visual programming environment for benchmarking prompts across multiple LLMs

promptfoo

A tool for testing and evaluating LLM prompt quality.

Infrastructure API Compatibility LocalAI

A local, drop-in replacement for the OpenAI API

API Management BricksLLM

an OpenAI gateway in Go to create API keys with rate limits, cost limits and TTLs

Deployment dalai

An automated installer for LLaMA

Distributed Inference 2024 exo

an intriguing P2P clustering solution for running models across several machines

Edge Inference nitro

a self-hosted inference engine for edge computing with an OpenAI API

2023 TinyChatEngine

A local (edge) inference engine in C++ without any dependencies

GPU Optimization 2024 amd_inference

a tool that enables inference on AMD GPUs

Hardware Optimization ipex-llm

a PyTorch extension for Intel hardware

Inference Engines 2023 a1gpt

A C++ implementation of a GPT-2 inference engine

llama.cpp

A C++ port of Facebook’s LLaMA model. Still requires roughly 240GB of (unoptimized) weights, but can run on a 64GB Mac.

minillm

A GPU-focused Python wrapper for LLaMa

llama-rs

A Rust port of llama.cpp

wyGPT

another C++ local inference tool

Model Runner 2025 ramalama

an alternative to Ollama for locally running models

Model Serving 2024 lorax

a framework that allows users to serve thousands of fine-tuned models on a single GPU

Performance Optimization GPTFast

a set of acceleration techniques

Integration Apple Notes notesollama

a plugin for Apple Notes that uses the Accessibility APIs

IRC Bot ollama-bot

a rudimentary IRC bot that communicates with a local instance of ollama

Slack Bot geppetto

a bot for integrating ChatGPT and DALL-E into Slack

Voice Integration pico-cookbook

Recipes for on-device voice AI and local LLM

Interfaces CLI 2023 chatblade

a CLI wrapper for ChatGPT

CLI Markdown 2024 mark

CLI to interact with LLMs using markdown and images

Terminal Interface oterm

a terminal-based interface for LLMs

Text Generation UI koboldcpp

an easy-to-use AI text-generation software for GGML and GGUF models based on llama.cpp

Web Interface open-webui

a web-based interface for LLMs

2023 chatbot-ui

a more or less sensibly designed self-hosted ChatGPT UI

Serve

A containerized solution for using local LLMs via web chat

Knowledge Systems Data Processing 2024 nlm-ingestor

a set of parsers for common file formats

Database RAG korvus

a search SDK that unifies the entire RAG pipeline in a single database query

Document Processing 2025 yek

a tool for importing and chunking text files for RAG

Knowledge Graph RAG 2024 GraphRAG

a data pipeline designed to pre-process knowledge graphs and perform RAG on them

Note-Taking RAG reor

a note taking tool that performs RAG using a local LLM

RAG 2023 embedchain

another framework to create bots from existing datasets

content-chatbot

A way to quickly create custom embeddings off a web site

RAG Framework 2024 R2R

a framework for or rapid development and deployment of production-ready RAG systems with SQLite support

llmware

a framework for developing LLM-based applications including Retrieval Augmented Generation

Research Assistant storm

a tool that researches a topic and generates a full-length report with citations

Search Engine Perplexica

a Perplexity AI search engine clone

Search Tools LLocalSearch

a local tool for searching using LLMs

Media Generation Audio Stories fably

A device that tells bedtime stories to kids, using chunked TTS

Data Visualization lida

automatic generation of visualizations and infographics

Image Generation local-image-gen

A GPTScript tool to generate images

Model Management Format Manipulation gguf-tools

a set of tools for manipulating GGUF format files

Layer Visualization NeuralFlow

a Python script for plotting the intermediate layer outputs of Mistral 7B

Low-Level Training llm.c

LLM training in simple, raw C/CUDA

Model Optimization hqq

an implementation of Half-Quadratic Quantization (HQQ)

2023 GPTQ-for-LLaMa

a way to quantize the LLaMA weights to 4-bit precision

Training simple-llama-finetuner

A way to do LoRA adaptation of LLaMa

alpaca-lora

Another way to do LoRA adaptation of LLaMa

Other Resources

Other resources related to LLMs, including articles, papers, and websites:

Field Category Date Link Notes
Large Language Models Algorithms 2024 word-embedding

an implementation of word2vec skip-gram for word embedding.

Applications 2023 Chie

A cross-platform dekstop application with chat history and extension support

Copilots Obsidian Copilot

an interesting take on how to use semantic search and OpenSearch’s BM25 implementation

Demos 2024 WhisperFusion

an ensemble setup with WhisperSpeech, WhisperLive and Phi

Frameworks 2025 FlashLearn

a simple interface for incorporating Agent LLMs

agno

a lightweight library for building Multimodal Agents

2023 litellm

a simple, lightweight LLM wrapper

AutoChain

Yet another alternative to langchain

Tanuki

yet another LLM framework using decorators for data validation

griptape

a langchain alternative with slighly better internal coding standards

guidance

Control modern language models more effectively and efficiently than traditional prompting or chaining.

langchain

a composable approach for building LLM applications

llama_index

a data framework for LLM applications

txtai

has spinoffs for chat, workflows for medical/scientific papers, semantic search for developers and semantic search for headlines and story text

llmflows

Yet another alternative to langchain, but with an interesting approach at defining workflows

fabric

a componentized approach to building LLM pipelines

Front-Ends 2024 jan

an open-source ChatGPT alternative that runs 100% offline (uses nitro)

2023 SecureAI-Tools

a self-hosted local inference front-end for chatting with document collections

gpt4all

another self-hosted local inference front-end

Jupyter LLMBook

A VS Code notebook interface for LLMs

jupytee

a Jupyter plugin that can handle code generation and image generation, but not switching models (GPT-4)

genai

a Jupyter plugin that can handle code generation and fixes based on tracebacks

ipython-gpt

a Jupyter plugin that can handle multiple models

Libraries 2025 mlx-lm

a Python package for serving large language models on Apple silicon.

2024 chonkie

a lightweight library for efficient text chunking in RAG applications.

databonsai

a Python library that uses LLMs to perform data cleaning

DataDreamer

library for prompting, synthetic data generation, and training workflows

radients

a vactorization library that can handle more than just text

magentic

decorators to create functions that return structured output from an LLM.

2023 guardrails

a package for validating and correcting the outputs of large language models

MemGPT

a memory management/summarization technique for unbounded context

instructor

a clever library that simplifies invoking OpenAI function calls

simpleaichat

A simple wrapper for the ChatGPT AI

Reference 2024 sqlite-hybrid-search

an example of how to do hyrid (vector and FTS) search with SQLite for RAG

2023 Native JSON Output from GPT-4

tips on how to use OpenAI JSON and function calling

Using LLaMA with M1 Mac

Manual instructions for Apple Silicon

Prompt Engineering Guide

a set of lecture notes and detailed examples of prompting techniques

awesome-decentralized-llm

a collection of LLM resources that operate independently

GPT Prompt Archive

A set of sample base prompts for various LLMs

promptbase

Another set of prompting techniques and detailed examples

2022 awesome-chatgpt-prompts

might be a short-lived resource, but an interesting one

Samples 2025 smolGPT

A minimal PyTorch implementation for training your own small LLM

2024 SimpleTinyLlama

a simple PyTorch-based implementation

devlooper

a program synthesis agent that autonomously fixes its output by running tests

2023 gpt-researcher

a simple agent that does online research on any given topic

David Attenborough narrates your life

A pretty hilarious image-to-description example

LibreChat

A self-hosted ChatGPT alternative

sharepoint-indexing-azure-cognitive-search

provides an example of how to use Graph navigation and Cognitive Search indexing

gpt4all

open-source LLM chatbots

Demystifying Advanced RAG Pipelines

An LLM-powered advanced RAG pipeline built from scratch

Wanderlust OpenAI example using Solara

A simple interactive web shell with some nice features

GPT in 60 Lines of NumPy

a tutorial on how to build a GPT model from scratch

Bash One-Liners for LLMs

a collection of one-liners for various LLMs

Tools 2025 letta

a tool for creating and managing memory-backed agents

Vector Databases 2023 chroma

an embedding database

vectordb

A simple vector database that can run in-process

marqo

A vector database that performs vector generation internally

USearch

A Single-File Vector Search Engine

Workflows danswer

a pretty complete GPT/search integration solution with GitHub, Slack and Confluence/JIRA connectors

Samples Large Language Models 2025 tiny-llm

A tutorial on LLM serving using MLX

This page is referenced in: