Subquadratic – SubQ 1.1 Small のご紹介

hackernews score 0.95 好み 0.00 en

Subquadratic – SubQ 1.1 Small のご紹介

原題: Subquadratic – Introducing SubQ 1.1 Small

subq 1.1 smallsubquadratic sparse attentionlong-context retrievalenterprise aiattention mechanismcompute efficiencyneedle-in-a-haystackcontext length

原文 ↗

日本語訳

# Subquadratic – SubQ 1.1 Smallのご紹介

## SubQ 1.1 Smallのご紹介

2026年6月16日

エンタープライズAIにおける最も困難な課題には、共通のパターンがあります。それは、コードベース全体、文書コレクション、契約書、財務報告書といった、完全なアーティファクト（成果物）に対する推論を必要とする点です。

長年、業界はこの問題に対し、リトリーバル（検索）パイプライン、チャンク分割戦略、エージェント的なスキャフォールディング（枠組み）を構築することで対処してきました。これらは有用なツールですが、結局のところモデル・アーキテクチャのコンテキスト制限を回避するための「回避策」に過ぎませんでした。根本的な制約はアテンション（注意機構）にありました。コンテキスト長に対して計算量が二次関数的に増加するため、大規模なアーティファクトに対する直接的な推論は、コスト面で極めて困難だったのです。

SubQは、この制約を取り除くために構築されました。本日、私たちはSubQ 1.1 Smallのモデルカードを公開します。これは、当社のSubquadratic Sparse Attention (SSA) モデルの第2世代であり、最も小規模なサイズです。現在、特定のデザインパートナー向けにSubQ 1.1 Smallのデプロイを進めており、年内には2Mから12Mトークンに及ぶ、より幅広いラインナップのモデルを展開する予定です。

### 主な特徴

- **Needle-in-a-haystack（針の穴探し）テストにおいて、12Mトークンまでのほぼ完璧な長文コンテキスト・リトリーバルを実現。** アテンションの計算量を最大で1,000倍近く削減。

- **長文コンテキストの最適化と汎用的な推論能力のバランス。** 知識、コーディング、および非コーディングのエンタープライズ・エージェント・ベンチマークにおいて、高い性能を維持。

- **1Mトークンの場合、SubQ 1.1 Smallは、Dense Attention（密なアテンション）よりも計算量を64.5分の1に抑え、FlashAttention-2よりも56倍高速に動作します。**

これらの結果は、SSAの効率化によって可能となったスケーリングの優位性を反映しています。

### ベンチマーク

SubQ 1.1 Smallは、長文コンテキスト・リトリーバル、コンテキスト長の汎化、知識、コーディング、および長期間の（long-horizon）エージェント・タスクの5つの軸で評価されました。

#### 長文コンテキスト・リトリーバルと汎化

私たちは、Needle-In-A-Haystack (NIAH) とNvidiaのRULERテストを選択しました。これらを組み合わせることで、モデルが巨大なコンテキストの深くに埋もれた単一の事実を見つけ出せるか、そしてコンテキスト全体にわたって情報を結びつけられるかをテストできるからです。

NIAHは「精度」を測るテストです。長いコンテキスト内の制御された深さに、検索可能な事実を一つ配置し、モデルにそれを正確に返させます。SubQ 1.1 Smallは、1M、2M、6M、および12Mトークンの各段階でほぼ完璧なスコアを記録しました。このモデルは主に1Mトークンのコンテキストで学習されましたが、アテンションの関連性をわずか0.13%に圧縮しているにもかつの、その12倍の長さにおいてもほぼ完璧なリトリーバルを維持しました。この汎化性能は、SSAが固定された位置パターンではなく、コンテンツの関連性に基づいてアテンションをルーティングすることの直接的な成果です。

RULERは「能力」を測るテストです。これは13のタスクで構成され、単一の事実の検索にとどまらず、マルチホップの変数追跡、頻度抽出、およびコンテキスト全体にわたる集計など、実際のアーティファクトを用いたワークロードが要求する推論をカバーしています。SubQ 1.1 Smallは、128Kにおいて99.12%のスコアを記録しました。

#### 一般知識と推論

SubQ 1.1 Smallは、妥協することなく、長文コンテキストの最適化と汎用的な推論能力のバランスを実現しています。GPQA Diamondの85.4%というスコアは、中位のフロンティアモデルをわずかに下回るものの、小規模なモデルを大きく上回っています。LiveCodeBenchの89.7% (pass@4) は、最先端のモデルに匹敵します。AutomationBench Financeの13%は、同ベンチマークにおける最強クラスのモデルに肉薄しており、中位および小規模なベースラインを上回っています。なお、このベンチマークにおける絶対的なスコアは、すべてのモデルにおいて低めです。

#### 効率性

SSAは、O(n²)のDense Attentionパスを、コンテキスト長に対して線形にスケールする学習済みのスパース（疎）な定式化に置き換えます。SSAのDense Attentionに対する優位性は、コンテキスト長が長くなるほど増大します。1Mトークンの場合、SubQは単一のアテンエーション・レイヤーにおいて、Dense Attentionよりも64.5倍少ない計算量で、FlashAttention-2よりも56倍高速に動作します。実用面において、これは長文コンテキストの学習および推論の経済性を劇的に変えるものです。

メカニズムの詳細な内訳、およびFlashAttention、DeepSeekのスパース・アテンション、リカレント・アーキテクチャとの比較については、テクニカルレポートをご参照ください。

> **SubQは、1Mトークンのコンテキストにおいて、Dense Attentionよりも64.5倍少ない計算量で、FlashAttention-2よりも56倍高速です。**

### 第三者評価

上記のベンチマーク結果は、Appen社によって独立して検証されました。詳細なレポートへのリンクはこちらです。

### 学習

私たちは、既存のオープンウェイトのフロンティアモデルをベースとし、Dense AttentionをSSAに置き換え、段階的なコンテキスト拡張（262K, 512K, 1M, 2M）を経て、書籍、文書、リポジトリ規模のコードといった自然な長文アーティファクトを用いた約1兆トークルの継続的な事前学習を行うことで、長文コンテキスト能力を構築しました。

長文コンテキスト・リトリーバルの向上において、最も強力なレバー（手段）となったのは、SSAアルゴリズムの効率性によって可能となった「長文コンテキストの継続的な事前学習」でした。12Mトークンの汎化結果は、これら両方の要因を反映しています。SSAの選択基準は絶対的な位置に依存せず、その汎化性能を信頼性高く活用する能力は、長文データを用いた学習を通じて培われるからです。

さらに、長文タスクと短文タスクの能力バランスを最適化するため、6〜7世代にわたる100回以上の実験を行いました。このような反復的な試行は、SSAによって、数百万トークンの実験を「稀なイベント」ではなく「標準的な手順」として実行できるようになったからこそ可能となったものであり、研究ループの効率化に大きく貢献しました。

### ユースケース

SubQは、情報を断片化させることなく、アーティファクト全体に分散した情報に対して推論を行うワークロード向けに設計されています。初期の研究におけるユースケースは以下の通りです。

- **財務分析およびデューデリジェンス**: 提出書類、収益報告書、契約書、内部記録などは、組み合わさって初めて意味をなします。SubQは、各文書を

原文（英語）を表示

Introducing SubQ 1.1 Small

Date

June 16, 2026

The hardest enterprise AI problems share a common shape. They require reasoning over complete artifacts: entire codebases, document collections, contracts, financial filings.

For years, the industry worked around this problem by building retrieval pipelines, chunking strategies, and agentic scaffolding — useful tools, but ultimately workarounds for context limitations of the model architecture. The underlying constraint was attention: compute that scales quadratically with context length, making direct reasoning over large artifacts prohibitively expensive.

SubQ is built to remove that constraint. Today we're releasing the model card for SubQ 1.1 Small — the second iteration of our Subquadratic Sparse Attention (SSA) model, at the smallest size. We are in the process of deploying SubQ 1.1 Small with select design partners and plan to deploy a broader lineup of models ranging from 2M to 12M tokens later in the year.

Key Features

- Near-perfect long-context retrieval up to 12M tokens on the needle-in-a-haystack test, with up to nearly 1,000x attention compute reduction.

- A balance of long-context optimization and general reasoning ability, with strong performance retained across knowledge, coding, and non-coding enterprise agent benchmarks.

- At 1M tokens, SubQ 1.1 Small requires 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2.

These results reflect the scaling advantage that SSA's efficiency gains make possible.

Benchmarks

SubQ 1.1 Small was evaluated across five axes, covering long-context retrieval, context-length generalization, knowledge, coding, and long-horizon agentic tasks.

Long-Context Retrieval & Generalization

We selected Needle-In-A-Haystack (NIAH) and Nvidia's RULER test because together they test whether the model can find a single fact buried deep in a large context, and whether it can connect the dots across that context.

NIAH is the precision test. It places one retrievable fact at a controlled depth within a long context and asks the model to return it exactly. SubQ 1.1 Small scores near-perfect at 1M, 2M, 6M, and 12M tokens. The model was trained predominantly at 1M tokens yet the retrieval held near perfectly at 12x that length, despite compressing attention to just 0.13% of relationships. This generalization is a direct consequence of SSA routing attention based on content relevance rather than fixed positional patterns.

RULER is the capability test. It's 13 tasks go beyond single-fact lookup to cover multi-hop variable tracing, frequency extraction, and aggregation across the full context using the kind of reasoning complete-artifact workloads actually require. SubQ 1.1 Small scores 99.12% at 128K.

General Knowledge & Reasoning

SubQ 1.1 Small balances long-context optimization with general reasoning ability without compromise. GPQA Diamond at 85.4% sits just below mid-tier frontier models and well above the smaller tier. LiveCodeBench at 89.7% pass@4 is close to the absolute frontier. AutomationBench Finance at 13% places SubQ 1.1 Small close to the strongest models on that benchmark, ahead of mid-tier and smaller baselines. Absolute scores remain low across all models on this benchmark.

Efficiency

SSA replaces the O(n²) dense attention pass with a learned sparse formulation that scales linearly with context length. SSA's advantage over dense attention grows as context length increases. At 1M tokens, SubQ requires 64.5x fewer compute than dense attention and runs 56x faster than FlashAttention-2 on a single attention layer. In practice, this drastically changes the economics of long-context training and inference.

A full breakdown of the mechanism and how it compares to FlashAttention, DeepSeek sparse attention, and recurrent architectures is in the Technical Report.

SubQ uses 64.5x less compute than dense attention, and is 56× faster than FlashAttention-2 at 1M-token context

Third-Party Evaluation

The benchmark results above were independently verified by Appen. Link to full report here.

Training

We started with an existing open-weight frontier model, replaced dense attention with SSA, and built long-context capability through staged context extension (262K, 512K, 1M, 2M) followed by roughly one trillion tokens of continued pretraining on naturally long artifacts: books, documents, and repository-scale code.

The strongest lever we found for improving long-context retrieval was long-context continued pretraining, made possible by the efficiency of the SSA algorithm. The 12M generalization result reflects both factors: SSA's selection criterion is independent of absolute position, and the capability to use that generalization reliably develops through training on long data.

Additionally, we ran more than one hundred experiments across six to seven model generations to get the balance of capabilities between long- and short-context tasks right. That kind of iteration is only possible because SSA enabled our team to run multi-million-token experiments as a standard procedure rather than a rare event, making the research loop more efficient.

Use Cases

SubQ is designed for workloads that require reasoning over information distributed across the artifact without fragmentation. Here are just a few of the use cases from our initial research:

- Financial analysis and due diligence. Filings, earnings reports, contracts, and internal records are only meaningful in combination. SubQ reasons across the full collection rather than summarizing each document in isolation.

- Legal and contract work. A contract may define a term on page 2, qualify it on page 12, and carve out an exception on page 46. Retrieval finds the sentence but loses the relationships. SubQ holds the whole document and reasons across it directly.

- Software engineering. Codebases distribute logic across files, modules, and dependencies in ways that short-context models can't hold at once. SubQ loads an entire repository into a single context window, enabling architecture-level reasoning, cross-file refactoring, and dependency tracing in one pass. We believe there will be significant value for long-context models in planning, review, and long-horizon memory within coding.

What's Next

We'll be kicking off with the first cohort of design partners in the next few weeks, with broader rollout through the quarter and general model releases by end of year.

← 一覧に戻る