This page is a collection of notes and links related to large language models (LLMs), their applications, and the underlying technology. It serves as a reference for understanding the current state of LLMs, their capabilities, and their limitations, and is the result my cleaning up the main AI page and splitting it into more manageable sections. It is not exhaustive, but it should provide a good starting point for anyone interested in the topic.
Models
Interesting models i’ve come across, off the mainstream beaten path:
Field | Category | Date | Link | Notes |
---|---|---|---|---|
Large Language Models | Function Calling | 2024 | functionary | can interpret and execute functions/plugins |
Octopus-v2 | a model designed for both function calling and on-device inference |
|||
Multi-modal Models | Models | 2023 | ml-ferret | a multi-modal model from Apple |
Small Language Models | 2024 | TinyLlama | pretraining of a 1.1B Llama model on 3 trillion tokens. |
Tools
Miscellaneous tools and applications that use LLMs, or are related to them in some way:
Field | Category | Date | Link | Notes |
---|---|---|---|---|
Agent Systems | Agent Framework | 2024 | pipecat | yet another LLM agent framework |
Agent Memory | 2025 | AgenticMemory | a tool for creating and managing memory in LLMs |
|
Autonomous Agents | 2023 | Auto-GPT | an attempt to provide ChatGPT with a degree of autonomy |
|
Code Agents | 2024 | plandex | yet another long-running agent tool for complex coding tasks |
|
Multiagent Simulation | TinyTroupe | a multiagent persona simulation |
||
Assistants | Code Assistants | 2025 | aider | a tool that enables pair programming with LLMs to edit code in local git repositories. |
tabby | a self-hosted AI coding assistant |
|||
ollama-copilot | Proxy that allows you to use ollama as a copilot like Github copilot |
|||
anon-kode | a fork of Claude-Coder for other LLMs |
|||
2024 | gpt-pilot | a prototype development tool that leverages GPT |
||
llm-vscode | a VSCode extension that uses llm-ls |
|||
emacs-copilot | an Emacs extension for using a local LLM |
|||
privy | An open-source alternative to GitHub copilot that runs locally. |
|||
2023 | localpilot | a MITM proxy that lets you use the GitHub Copilot extension with other LLMs |
||
Desktop Assistants | macOSpilot-ai-assistant | An Electron app for macOS |
||
Personal Assistants | khoj | an intriguing personal assistant based on local data |
||
Terminal Assistant | 2025 | tmuxai | a terminal assistant that leverages |
|
Development | App Development Platform | 2024 | dify | an open-source LLM app development platform with a node-based UX |
Flow-based | 2023 | langflow | a node-based GUI for quick iteration of langchain flows |
|
JavaScript Framework | 2024 | genaiscript | a JavaScript environment for prompt development and structured data extraction for LLMs. |
|
LLM Programming | GPTScript | Natural Language Programming against multiple LLMs |
||
Language Server | llm-ls | a local language server that leverages LLMs |
||
NLP Toolkit | WordLlama | a lightweight NLP toolkit for tasks like fuzzy-deduplication, similarity, and ranking |
||
Prompt Compression | LLMLingua | a tool for compressing prompts with minimal loss of information |
||
Workflow Management | burr | a tool for creating and managing LLM workflows |
||
Evaluation | Evaluation Platform | 2025 | opik | an open-source platform for evaluating, testing, and monitoring LLM applications. |
Model Evaluation | 2023 | pykoi | a unified interface for data and feedback collection, including model comparisons |
|
PromptTools | self-hostable toools for evaluating LLMs, vector databases, and prompts |
|||
Prompt Evaluation | ChainForge | a visual programming environment for benchmarking prompts across multiple LLMs |
||
promptfoo | A tool for testing and evaluating LLM prompt quality. |
|||
Infrastructure | API Compatibility | LocalAI | A local, drop-in replacement for the OpenAI API |
|
API Management | BricksLLM | an OpenAI gateway in Go to create API keys with rate limits, cost limits and TTLs |
||
Deployment | dalai | An automated installer for LLaMA |
||
Distributed Inference | 2024 | exo | an intriguing P2P clustering solution for running models across several machines |
|
Edge Inference | nitro | a self-hosted inference engine for edge computing with an OpenAI API |
||
2023 | TinyChatEngine | A local (edge) inference engine in C++ without any dependencies |
||
GPU Optimization | 2024 | amd_inference | a tool that enables inference on AMD GPUs |
|
Hardware Optimization | ipex-llm | a PyTorch extension for Intel hardware |
||
Inference Engines | 2023 | a1gpt | A C++ implementation of a GPT-2 inference engine |
|
llama.cpp | A C++ port of Facebook’s LLaMA model. Still requires roughly 240GB of (unoptimized) weights, but can run on a 64GB Mac. |
|||
minillm | A GPU-focused Python wrapper for LLaMa |
|||
llama-rs | A Rust port of llama.cpp |
|||
wyGPT | another C++ local inference tool |
|||
Model Runner | 2025 | ramalama | an alternative to Ollama for locally running models |
|
Model Serving | 2024 | lorax | a framework that allows users to serve thousands of fine-tuned models on a single GPU |
|
Performance Optimization | GPTFast | a set of acceleration techniques |
||
Integration | Apple Notes | notesollama | a plugin for Apple Notes that uses the Accessibility APIs |
|
IRC Bot | ollama-bot | a rudimentary IRC bot that communicates with a local instance of ollama |
||
Slack Bot | geppetto | a bot for integrating ChatGPT and DALL-E into Slack |
||
Voice Integration | pico-cookbook | Recipes for on-device voice AI and local LLM |
||
Interfaces | CLI | 2023 | chatblade | a CLI wrapper for ChatGPT |
CLI Markdown | 2024 | mark | CLI to interact with LLMs using markdown and images |
|
Terminal Interface | oterm | a terminal-based interface for LLMs |
||
Text Generation UI | koboldcpp | an easy-to-use AI text-generation software for GGML and GGUF models based on llama.cpp |
||
Web Interface | open-webui | a web-based interface for LLMs |
||
2023 | chatbot-ui | a more or less sensibly designed self-hosted ChatGPT UI |
||
Serve | A containerized solution for using local LLMs via web chat |
|||
Knowledge Systems | Data Processing | 2024 | nlm-ingestor | a set of parsers for common file formats |
Database RAG | korvus | a search SDK that unifies the entire RAG pipeline in a single database query |
||
Document Processing | 2025 | yek | a tool for importing and chunking text files for RAG |
|
Knowledge Graph RAG | 2024 | GraphRAG | a data pipeline designed to pre-process knowledge graphs and perform RAG on them |
|
Note-Taking RAG | reor | a note taking tool that performs RAG using a local LLM |
||
RAG | 2023 | embedchain | another framework to create bots from existing datasets |
|
content-chatbot | A way to quickly create custom embeddings off a web site |
|||
RAG Framework | 2024 | R2R | a framework for or rapid development and deployment of production-ready RAG systems with SQLite support |
|
llmware | a framework for developing LLM-based applications including Retrieval Augmented Generation |
|||
Research Assistant | storm | a tool that researches a topic and generates a full-length report with citations |
||
Search Engine | Perplexica | a Perplexity AI search engine clone |
||
Search Tools | LLocalSearch | a local tool for searching using LLMs |
||
Media Generation | Audio Stories | fably | A device that tells bedtime stories to kids, using chunked TTS |
|
Data Visualization | lida | automatic generation of visualizations and infographics |
||
Image Generation | local-image-gen | A GPTScript tool to generate images |
||
Model Management | Format Manipulation | gguf-tools | a set of tools for manipulating GGUF format files |
|
Layer Visualization | NeuralFlow | a Python script for plotting the intermediate layer outputs of Mistral 7B |
||
Low-Level Training | llm.c | LLM training in simple, raw C/CUDA |
||
Model Optimization | hqq | an implementation of Half-Quadratic Quantization (HQQ) |
||
2023 | GPTQ-for-LLaMa | a way to quantize the LLaMA weights to 4-bit precision |
||
Training | simple-llama-finetuner | A way to do LoRA adaptation of LLaMa |
||
alpaca-lora | Another way to do LoRA adaptation of LLaMa |
Other Resources
Other resources related to LLMs, including articles, papers, and websites:
Field | Category | Date | Link | Notes |
---|---|---|---|---|
Large Language Models | Algorithms | 2024 | word-embedding | an implementation of word2vec skip-gram for word embedding. |
Applications | 2023 | Chie | A cross-platform dekstop application with chat history and extension support |
|
Copilots | Obsidian Copilot | an interesting take on how to use semantic search and OpenSearch’s BM25 implementation |
||
Demos | 2024 | WhisperFusion | an ensemble setup with WhisperSpeech, WhisperLive and Phi |
|
Frameworks | 2025 | FlashLearn | a simple interface for incorporating Agent LLMs |
|
agno | a lightweight library for building Multimodal Agents |
|||
2023 | litellm | a simple, lightweight LLM wrapper |
||
AutoChain | Yet another alternative to langchain |
|||
Tanuki | yet another LLM framework using decorators for data validation |
|||
griptape | a |
|||
guidance | Control modern language models more effectively and efficiently than traditional prompting or chaining. |
|||
langchain | a composable approach for building LLM applications |
|||
llama_index | a data framework for LLM applications |
|||
txtai | has spinoffs for chat, workflows for medical/scientific papers, semantic search for developers and semantic search for headlines and story text |
|||
llmflows | Yet another alternative to langchain, but with an interesting approach at defining workflows |
|||
fabric | a componentized approach to building LLM pipelines |
|||
Front-Ends | 2024 | jan | an open-source ChatGPT alternative that runs 100% offline (uses nitro) |
|
2023 | SecureAI-Tools | a self-hosted local inference front-end for chatting with document collections |
||
gpt4all | another self-hosted local inference front-end |
|||
Jupyter | LLMBook | A VS Code notebook interface for LLMs |
||
jupytee | a Jupyter plugin that can handle code generation and image generation, but not switching models (GPT-4) |
|||
genai | a Jupyter plugin that can handle code generation and fixes based on tracebacks |
|||
ipython-gpt | a Jupyter plugin that can handle multiple models |
|||
Libraries | 2025 | mlx-lm | a Python package for serving large language models on Apple silicon. |
|
2024 | chonkie | a lightweight library for efficient text chunking in RAG applications. |
||
databonsai | a Python library that uses LLMs to perform data cleaning |
|||
DataDreamer | library for prompting, synthetic data generation, and training workflows |
|||
radients | a vactorization library that can handle more than just text |
|||
magentic | decorators to create functions that return structured output from an LLM. |
|||
2023 | guardrails | a package for validating and correcting the outputs of large language models |
||
MemGPT | a memory management/summarization technique for unbounded context |
|||
instructor | a clever library that simplifies invoking OpenAI function calls |
|||
simpleaichat | A simple wrapper for the ChatGPT AI |
|||
Reference | 2024 | sqlite-hybrid-search | an example of how to do hyrid (vector and FTS) search with SQLite for RAG |
|
2023 | Native JSON Output from GPT-4 | tips on how to use OpenAI JSON and function calling |
||
Using LLaMA with M1 Mac | Manual instructions for Apple Silicon |
|||
Prompt Engineering Guide | a set of lecture notes and detailed examples of prompting techniques |
|||
awesome-decentralized-llm | a collection of LLM resources that operate independently |
|||
GPT Prompt Archive | A set of sample base prompts for various LLMs |
|||
promptbase | Another set of prompting techniques and detailed examples |
|||
2022 | awesome-chatgpt-prompts | might be a short-lived resource, but an interesting one |
||
Samples | 2025 | smolGPT | A minimal PyTorch implementation for training your own small LLM |
|
2024 | SimpleTinyLlama | a simple PyTorch-based implementation |
||
devlooper | a program synthesis agent that autonomously fixes its output by running tests |
|||
2023 | gpt-researcher | a simple agent that does online research on any given topic |
||
David Attenborough narrates your life | A pretty hilarious image-to-description example |
|||
LibreChat | A self-hosted ChatGPT alternative |
|||
sharepoint-indexing-azure-cognitive-search | provides an example of how to use Graph navigation and Cognitive Search indexing |
|||
gpt4all | open-source LLM chatbots |
|||
Demystifying Advanced RAG Pipelines | An LLM-powered advanced RAG pipeline built from scratch |
|||
Wanderlust OpenAI example using Solara | A simple interactive web shell with some nice features |
|||
GPT in 60 Lines of NumPy | a tutorial on how to build a GPT model from scratch |
|||
Bash One-Liners for LLMs | a collection of one-liners for various LLMs |
|||
Tools | 2025 | letta | a tool for creating and managing memory-backed agents |
|
Vector Databases | 2023 | chroma | an embedding database |
|
vectordb | A simple vector database that can run in-process |
|||
marqo | A vector database that performs vector generation internally |
|||
USearch | A Single-File Vector Search Engine |
|||
Workflows | danswer | a pretty complete GPT/search integration solution with GitHub, Slack and Confluence/JIRA connectors |
||
Samples | Large Language Models | 2025 | tiny-llm | A tutorial on LLM serving using MLX |