HOPE - Hierarchical Optimization with Persistent Experience

Implementation of the HOPE architecture based on:

Architecture Overview

HOPE combines core components from the Nested Learning and Titans papers, plus the MIRAS unified framework:

Self-Modifying Titans: Memory attention with delta rule updates (Eq. 28-29)
Continuum Memory System (CMS): Multi-frequency FFN chain (Eq. 30-31)
MIRAS Framework: Unified memory system with configurable attentional bias and retention gates

Delta Rule (Eq. 28-29)

M_{t+1} = M_t - M_t * k_t * k_t^T - eta * (M_t * k_t - v_t) * k_t^T

Where:

First term: Forgetting (removes old association for key)
Second term: Learning (gradient descent on L2 loss)

MIRAS Framework

The MIRAS framework unifies sequence models through 4 design choices:

Choice	Options	Description
Memory Architecture	Vector, Matrix, MLP	How memory is structured
Attentional Bias	L2, Lp, Huber, KL	Internal memory objective
Retention Gate	L2, Lq, KL, Elastic Net	How to retain past state
Learning Algorithm	GD, GD+Momentum, Newton	How to update memory

Three pre-configured MIRAS models:

Model	Attentional Bias	Retention Gate	Use Case
Moneta	Lp (p in (1,2))	Lq (q in (1,2))	Robust to key collisions
Yaad	Huber Loss	L2	Robust to outlier values
Memora	L2	KL-divergence	Soft thresholding

Installation

Using uv (recommended):

uv sync

Or using pip:

pip install torch

Usage

Basic Usage

from src.config import HopeSmallConfig
from src.model import HopeForCausalLM

config = HopeSmallConfig(vocab_size=32000)
model = HopeForCausalLM(config)

# Forward pass
outputs = model(input_ids=input_ids, labels=labels)
loss = outputs["loss"]

Memory Management

from src.model import Hope

model = Hope(config)
memory_states = None

for batch in dataloader:
    logits, memory_states = model(
        batch["input_ids"],
        memory_states=memory_states,
        return_memory=True,
    )

MIRAS Models

from src.layers import Moneta, Yaad, Memora, MirasMemory

# Pre-configured models
moneta = Moneta(dim=512, num_heads=8, p=1.5, q=1.5)
yaad = Yaad(dim=512, num_heads=8, huber_delta=1.0)
memora = Memora(dim=512, num_heads=8, kl_temperature=1.0)

# Custom configuration
memory = MirasMemory(
    dim_key=64, dim_value=64,
    attentional_bias="huber",  # l2, lp, huber, kl, dot_product
    retention_gate="elastic_net",  # l2, lq, kl, elastic_net, bregman
    learning_rate=0.1,
    retention_strength=0.1,
)

Text Generation

generated = model.generate(
    prompt,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.9,
)

Model Sizes

Size	Parameters	dim	layers	heads
Small	~125M	512	8	8
Base	~350M	768	12	12
Large	~760M	1024	24	16
XL	~1.3B	2048	24	32

Training

uv run python train.py --model_size small --batch_size 8 --learning_rate 1e-4

Options:

--optimizer: adamw, adam_delta, sgd_delta, deep_momentum, muon
--lr_scheduler: cosine, linear, constant
--dtype: float32, float16, bfloat16

Testing

uv run python test_hope.py

Examples

uv run python example.py

Project Structure

src/
    __init__.py
    config.py              # Model configurations
    model.py               # Main Hope model
    optimizers.py          # Deep optimizers (DMGD, Muon, etc.)
    modules/
        __init__.py
        titans.py          # Self-Modifying Titans (MAC, MAG, MAL)
        continuum_memory.py  # CMS and variants
        hope_block.py      # Combined HOPE block
    layers/
        __init__.py
        associative_memory.py  # Delta rule memory
        neural_memory.py       # MLP-based neural memory
        attentional_bias.py    # MIRAS attentional bias (L2, Lp, Huber, KL)
        retention_gates.py     # MIRAS retention gates (L2, Lq, KL, Elastic Net)
        miras_memory.py        # MIRAS models (Moneta, Yaad, Memora)

Reference

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml
test_hope.py		test_hope.py
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HOPE - Hierarchical Optimization with Persistent Experience

Architecture Overview

Delta Rule (Eq. 28-29)

MIRAS Framework

Installation

Usage

Basic Usage

Memory Management

MIRAS Models

Text Generation

Model Sizes

Training

Testing

Examples

Project Structure

Reference

License

About

Uh oh!

Releases

Packages

Languages

License

Ray0907/Hope

Folders and files

Latest commit

History

Repository files navigation

HOPE - Hierarchical Optimization with Persistent Experience

Architecture Overview

Delta Rule (Eq. 28-29)

MIRAS Framework

Installation

Usage

Basic Usage

Memory Management

MIRAS Models

Text Generation

Model Sizes

Training

Testing

Examples

Project Structure

Reference

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages