Large Language Models

This page is a collection of notes and links related to large language models (LLMs), their applications, and the underlying technology. It serves as a reference for understanding the current state of LLMs, their capabilities, and their limitations, and is the result my cleaning up the main AI page and splitting it into more manageable sections. It is not exhaustive, but it should provide a good starting point for anyone interested in the topic.

Models

Interesting models i’ve come across, off the mainstream beaten path:

Field	Category	Date	Link	Notes
Large Language Models	Function Calling	2024	functionary	can interpret and execute functions/plugins
Large Language Models	Function Calling	2024	Octopus-v2	a model designed for both function calling and on-device inference
Multi-modal Models	Models	2023	ml-ferret	a multi-modal model from Apple
Small Language Models	Models	2024	TinyLlama	pretraining of a 1.1B Llama model on 3 trillion tokens.

Tools

Miscellaneous tools and applications that use LLMs, or are related to them in some way:

Field	Category	Date	Link	Notes
Agent Systems	Agent Framework	2024	pipecat	yet another LLM agent framework
	Agent Memory	2025	AgenticMemory	a tool for creating and managing memory in LLMs
	Autonomous Agents	2023	Auto-GPT	an attempt to provide ChatGPT with a degree of autonomy
	Code Agents	2024	plandex	yet another long-running agent tool for complex coding tasks
	Multiagent Simulation	2024	TinyTroupe	a multiagent persona simulation
Assistants	Code Assistant	2025	opencode	a TUI-based code assistant written in Go.
	Code Assistants		aider	a tool that enables pair programming with LLMs to edit code in local git repositories.
			tabby	a self-hosted AI coding assistant
			ollama-copilot	Proxy that allows you to use ollama as a copilot like Github copilot
			anon-kode	a fork of Claude-Coder for other LLMs
		2024	gpt-pilot	a prototype development tool that leverages GPT
			llm-vscode	a VSCode extension that uses llm-ls
			emacs-copilot	an Emacs extension for using a local LLM
			privy	An open-source alternative to GitHub copilot that runs locally.
		2023	localpilot	a MITM proxy that lets you use the GitHub Copilot extension with other LLMs
	Desktop Assistants		macOSpilot-ai-assistant	An Electron app for macOS
	Personal Assistants		khoj	an intriguing personal assistant based on local data
	Terminal Assistant	2025	tmuxai	a terminal assistant that leverages `tmux`
Development	App Development Platform	2024	dify	an open-source LLM app development platform with a node-based UX
	Flow-based	2023	langflow	a node-based GUI for quick iteration of langchain flows
	JavaScript Framework	2024	genaiscript	a JavaScript environment for prompt development and structured data extraction for LLMs.
	LLM Programming		GPTScript	Natural Language Programming against multiple LLMs
	Language Server		llm-ls	a local language server that leverages LLMs
	NLP Toolkit		WordLlama	a lightweight NLP toolkit for tasks like fuzzy-deduplication, similarity, and ranking
	Prompt Compression		LLMLingua	a tool for compressing prompts with minimal loss of information
	Workflow Management		burr	a tool for creating and managing LLM workflows
Evaluation	Evaluation Platform	2025	opik	an open-source platform for evaluating, testing, and monitoring LLM applications.
	Model Evaluation	2023	pykoi	a unified interface for data and feedback collection, including model comparisons
	Model Evaluation		PromptTools	self-hostable toools for evaluating LLMs, vector databases, and prompts
	Prompt Evaluation		ChainForge	a visual programming environment for benchmarking prompts across multiple LLMs
	Prompt Evaluation		promptfoo	A tool for testing and evaluating LLM prompt quality.
Infrastructure	API Compatibility		LocalAI	A local, drop-in replacement for the OpenAI API
	API Management		BricksLLM	an OpenAI gateway in Go to create API keys with rate limits, cost limits and TTLs
	Deployment		dalai	An automated installer for LLaMA
	Distributed Inference	2024	exo	an intriguing P2P clustering solution for running models across several machines
	Edge Inference	2024	nitro	a self-hosted inference engine for edge computing with an OpenAI API
	Edge Inference	2023	TinyChatEngine	A local (edge) inference engine in C++ without any dependencies
	GPU Optimization	2024	amd_inference	a tool that enables inference on AMD GPUs
	Hardware Optimization	2024	ipex-llm	a PyTorch extension for Intel hardware
	Inference Engines	2023	a1gpt	A C++ implementation of a GPT-2 inference engine
			llama.cpp	A C++ port of Facebook’s LLaMA model. Still requires roughly 240GB of (unoptimized) weights, but can run on a 64GB Mac.
			minillm	A GPU-focused Python wrapper for LLaMa
			llama-rs	A Rust port of llama.cpp
			wyGPT	another C++ local inference tool
	Model Runner	2025	ramalama	an alternative to Ollama for locally running models
	Model Serving	2024	lorax	a framework that allows users to serve thousands of fine-tuned models on a single GPU
	Performance Optimization		GPTFast	a set of acceleration techniques
Integration	Apple Notes		notesollama	a plugin for Apple Notes that uses the Accessibility APIs
	IRC Bot		ollama-bot	a rudimentary IRC bot that communicates with a local instance of ollama
	Slack Bot		geppetto	a bot for integrating ChatGPT and DALL-E into Slack
	Voice Integration		pico-cookbook	Recipes for on-device voice AI and local LLM
Interfaces	CLI	2023	chatblade	a CLI wrapper for ChatGPT
	CLI Markdown	2024	mark	CLI to interact with LLMs using markdown and images
	Terminal Interface		oterm	a terminal-based interface for LLMs
	Text Generation UI		koboldcpp	an easy-to-use AI text-generation software for GGML and GGUF models based on llama.cpp
	Web Interface		open-webui	a web-based interface for LLMs
		2023	chatbot-ui	a more or less sensibly designed self-hosted ChatGPT UI
		2023	Serve	A containerized solution for using local LLMs via web chat
Knowledge Systems	Data Processing	2024	nlm-ingestor	a set of parsers for common file formats
	Database RAG	2024	korvus	a search SDK that unifies the entire RAG pipeline in a single database query
	Document Processing	2025	yek	a tool for importing and chunking text files for RAG
	Knowledge Graph RAG	2024	GraphRAG	a data pipeline designed to pre-process knowledge graphs and perform RAG on them
	Note-Taking RAG	2024	reor	a note taking tool that performs RAG using a local LLM
	RAG	2023	embedchain	another framework to create bots from existing datasets
	RAG	2023	content-chatbot	A way to quickly create custom embeddings off a web site
	RAG Framework	2024	R2R	a framework for or rapid development and deployment of production-ready RAG systems with SQLite support
	RAG Framework		llmware	a framework for developing LLM-based applications including Retrieval Augmented Generation
	Research Assistant		storm	a tool that researches a topic and generates a full-length report with citations
	Search Engine		Perplexica	a Perplexity AI search engine clone
	Search Tools		LLocalSearch	a local tool for searching using LLMs
Media Generation	Audio Stories		fably	A device that tells bedtime stories to kids, using chunked TTS
	Data Visualization		lida	automatic generation of visualizations and infographics
	Image Generation		local-image-gen	A GPTScript tool to generate images
Model Management	Format Manipulation		gguf-tools	a set of tools for manipulating GGUF format files
	Layer Visualization		NeuralFlow	a Python script for plotting the intermediate layer outputs of Mistral 7B
	Low-Level Training		llm.c	LLM training in simple, raw C/CUDA
	Model Optimization		hqq	an implementation of Half-Quadratic Quantization (HQQ)
	Model Optimization	2023	GPTQ-for-LLaMa	a way to quantize the LLaMA weights to 4-bit precision
	Training		simple-llama-finetuner	A way to do LoRA adaptation of LLaMa
	Training		alpaca-lora	Another way to do LoRA adaptation of LLaMa

Other Resources

Other resources related to LLMs, including articles, papers, and websites:

Field	Category	Date	Link	Notes
Large Language Models	Algorithms	2024	word-embedding	an implementation of word2vec skip-gram for word embedding.
	Applications	2023	Chie	A cross-platform dekstop application with chat history and extension support
	Copilots	2023	Obsidian Copilot	an interesting take on how to use semantic search and OpenSearch’s BM25 implementation
	Demos	2024	WhisperFusion	an ensemble setup with WhisperSpeech, WhisperLive and Phi
	Frameworks	2025	FlashLearn	a simple interface for incorporating Agent LLMs
		2025	agno	a lightweight library for building Multimodal Agents
		2023	litellm	a simple, lightweight LLM wrapper
			AutoChain	Yet another alternative to langchain
			Tanuki	yet another LLM framework using decorators for data validation
			griptape	a `langchain` alternative with slighly better internal coding standards
			guidance	Control modern language models more effectively and efficiently than traditional prompting or chaining.
			langchain	a composable approach for building LLM applications
			llama_index	a data framework for LLM applications
			txtai	has spinoffs for chat, workflows for medical/scientific papers, semantic search for developers and semantic search for headlines and story text
			llmflows	Yet another alternative to langchain, but with an interesting approach at defining workflows
			fabric	a componentized approach to building LLM pipelines
	Front-Ends	2024	jan	an open-source ChatGPT alternative that runs 100% offline (uses nitro)
		2023	SecureAI-Tools	a self-hosted local inference front-end for chatting with document collections
			gpt4all	another self-hosted local inference front-end
	Jupyter		LLMBook	A VS Code notebook interface for LLMs
			jupytee	a Jupyter plugin that can handle code generation and image generation, but not switching models (GPT-4)
			genai	a Jupyter plugin that can handle code generation and fixes based on tracebacks
			ipython-gpt	a Jupyter plugin that can handle multiple models
	Libraries	2025	mlx-lm	a Python package for serving large language models on Apple silicon.
		2024	chonkie	a lightweight library for efficient text chunking in RAG applications.
			databonsai	a Python library that uses LLMs to perform data cleaning
			DataDreamer	library for prompting, synthetic data generation, and training workflows
			radients	a vactorization library that can handle more than just text
			magentic	decorators to create functions that return structured output from an LLM.
		2023	guardrails	a package for validating and correcting the outputs of large language models
			MemGPT	a memory management/summarization technique for unbounded context
			instructor	a clever library that simplifies invoking OpenAI function calls
			simpleaichat	A simple wrapper for the ChatGPT AI
	Reference	2024	sqlite-hybrid-search	an example of how to do hyrid (vector and FTS) search with SQLite for RAG
		2023	Native JSON Output from GPT-4	tips on how to use OpenAI JSON and function calling
			Using LLaMA with M1 Mac	Manual instructions for Apple Silicon
			Prompt Engineering Guide	a set of lecture notes and detailed examples of prompting techniques
			awesome-decentralized-llm	a collection of LLM resources that operate independently
			GPT Prompt Archive	A set of sample base prompts for various LLMs
			promptbase	Another set of prompting techniques and detailed examples
		2022	awesome-chatgpt-prompts	might be a short-lived resource, but an interesting one
	Samples	2025	smolGPT	A minimal PyTorch implementation for training your own small LLM
		2024	SimpleTinyLlama	a simple PyTorch-based implementation
		2024	devlooper	a program synthesis agent that autonomously fixes its output by running tests
		2023	gpt-researcher	a simple agent that does online research on any given topic
			David Attenborough narrates your life	A pretty hilarious image-to-description example
			LibreChat	A self-hosted ChatGPT alternative
			sharepoint-indexing-azure-cognitive-search	provides an example of how to use Graph navigation and Cognitive Search indexing
			gpt4all	open-source LLM chatbots
			Demystifying Advanced RAG Pipelines	An LLM-powered advanced RAG pipeline built from scratch
			Wanderlust OpenAI example using Solara	A simple interactive web shell with some nice features
			GPT in 60 Lines of NumPy	a tutorial on how to build a GPT model from scratch
			Bash One-Liners for LLMs	a collection of one-liners for various LLMs
	Tools	2025	Anemll	porting of LLMs to tensor processors, starting with the Apple Neural Engine (ANE)
	Tools	2025	letta	a tool for creating and managing memory-backed agents
	Vector Databases	2023	chroma	an embedding database
			vectordb	A simple vector database that can run in-process
			marqo	A vector database that performs vector generation internally
			USearch	A Single-File Vector Search Engine
	Workflows		danswer	a pretty complete GPT/search integration solution with GitHub, Slack and Confluence/JIRA connectors
Samples	Large Language Models	2025	tiny-llm	A tutorial on LLM serving using MLX

Tao of Mac

Large Language Models

Models

Tools

Other Resources

This page is referenced in: