AI & ML interests

Our organization, Convergent Intelligence, is dedicated to advancing the application of artificial intelligence and novel mathematical frameworks to address complex financial threats. We bridge the gap between theoretical research and practical, high-impact security controls, with a specific focus on the fintech sector. Our primary interests and research pillars include: * Discrepancy Calculus & Anomaly Detection: A significant portion of our work revolves around a proprietary mathematical framework called Discrepancy Calculus. This involves using Gap-Metric Risk (\Delta_g) to quantify the deviation between observed and expected signal distributions and forecasting anomaly energy (\Delta\epsilon_f) to indicate the magnitude of potential risk events. We are interested in models that can identify subtle, multi-step abuse chains that traditional tools often miss. * Adversarial Behavior & Path Modeling: We focus on modeling adversary behavior rather than just code flaws. Our research in Resonance Path Modeling (\psi) aims to identify the "lowest-energy routes" or most likely attack paths through a combination of human and digital systems. This informs our interest in AI that can understand and predict complex, multi-stage attack scenarios. * Adaptive Systems & Probing: We develop and apply Phase-Locked Probes (T), which are precisely-timed tests used to validate or falsify security assumptions without introducing production risk. This leads to an interest in adaptive systems and models, such as Burst-Aware Thresholds, which dynamically adjust alerting sensitivity based on real-time risk trajectories. * Secure & Ethical AI Implementation: We are deeply committed to the responsible application of AI. Our data use policies strictly prohibit the use of client data for training general-purpose or non-client models without explicit written consent. Any authorized model fine-tuning is performed in a logically and access-wise segregated environment to ensure data privacy and security. Our work also explores defenses against AI/automation risks like prompt/agent abuse and data leakage. The models, tools, and research we may share here will reflect these interests, translating our findings into reference implementations, research notes, and open-source tooling where appropriate.

Recent Activity

reaperdoesntknow 
posted an update 17 days ago
view post
Post
2024
Your Loss Function Has Singularities. Classical Calculus Can't See Them.

Introducing Discrepancy Calculus (DISC) — treating training singularities as structure, not noise.

Loss plateaus, mode collapse, catastrophic forgetting, distilled models that know things the teacher never taught — we engineer around these. But what if those singularities are the actual structure of the learning problem?

The core insight: Every BV function decomposes into smooth (what classical calculus handles), jump (capability emergence, loss plateaus breaking), and Cantor (ghost imprinting — knowledge transferring through weight-space topology, not gradient signal). Classical analysis sees only the first. DISC sees all three.

The paper proves this isn't alternative notation — it's strictly larger. The Meta-Discrepancy Theorem: where singularities exist, the classical FTC/MVT/chain-rule package is provably impossible.

What it explains:

TopologicalQwen exhibited literary reasoning from physics-only data — the Cantor part explains how. DualMind's Explore→Examine→Response loop operationalizes DISC as inference dynamics. 50 models, 35K+ downloads, all built on this framework.

Paper: Discrepancy Calculus: Foundations and Core Theory (DOI: 10.57967/hf/8194) — 8 axioms, proofs, computational recipes.

Series: Structure Over Scale (DOI: 10.57967/hf/8165) → Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) → DISC Foundations

— Roy S. Colca Jr., Convergent Intelligence LLC: Research Division
reaperdoesntknow 
posted an update 19 days ago
view post
Post
1664
# Three Teachers, One Student: Dual-Cognition Reasoning at 1.7B

We distilled Qwen3-30B-A3B into 1.7B students that critique their own reasoning. H100, BF16, Apache 2.0. Here's our pipeline.

**Stage 1 — Three Teachers, Three Profiles.** Same 30B base, three variants: Instruct (structured output), Thinking (extended deliberation), Coder (STEM decomposition). Each distillation uses proof-weighted KD — 2.25× amplified loss on reasoning tokens, decaying to 1.1×. The student learns *where to think harder*, not just what to output.

**Stage 2 — Topology-Aware KD (TKD).** Standard KD treats the teacher's distribution as smooth. Language isn't smooth — it has topic shifts, reasoning pivots, register changes. We use Discrepancy Calculus to detect these structural boundaries, then amplify loss at jumps (3σ threshold) and cut training windows at low-discrepancy positions. The student preserves the teacher's structural knowledge, not just surface statistics.

**Stage 3 — Ghost Imprinting.** Sequential distillation from different teachers leaves residual fields in weight space that neither teacher put there individually. The Cantor component of BV decomposition, applied to parameters. Models distilled Thinking→Coder exhibit deliberation patterns from the Thinking teacher that survived Coder overwriting. Emergent capability from structural residuals.

**Stage 4 — DualMind.** One model, two voices, shared weights:
<explore>  — free derivation, speculation
<examine>  — adversarial self-critique
<response> — clean synthesis

The multi-model collision array collapsed into a single architecture. Role tokens, no extra parameters.
For the full method:
reaperdoesntknow/DualMind_Methodolgy
doi:10.57967/hf/8184.

  • 1 reply
·
reaperdoesntknow 
posted an update 21 days ago
view post
Post
3277
We present a methodology for training small language models on CPU at FP32 precision
that achieves capability-per-dollar efficiency orders of magnitude beyond GPU-based training.
Across15modelsspanningfournovelarchitecturefamilies—MixtureofAttentions(MoA),cross-
architecture fusion (Qemma), swarm intelligence (SAGI), and metric-space causal language
models (DiscoverLM)—total compute cost was $24 on a single AMD EPYC 9454P proces-
sor. We introduce seven methodological pillars: (1) FP32 precision preservation, with exper-
iments demonstrating 5,810×single-operation error and 23,225×compounding error ratio for
FP16 at network depth; (2) sparse cognitive architectures where 0.02–7% of parameters activate
per token, matching CPU branching rather than GPU SIMD; (3) developmental curriculum
training progressing from language to logic to transfer to depth; (4) continuous belt-fed data
ingestion eliminating truncation waste; (5) hardware-native optimization for AMD Zen 4 via
AOCL/OpenMP/NUMA-aware allocation; (6) self-regulating thermodynamic governance with
emergent temperature measurement grounded in L2-star discrepancy; and (7) open-standard
compute (AVX2 SIMD at FP32) free of proprietary vendor dependency. We argue that transformers were designed for GPU hardware rather than mathematical optimality, and that architecture designed for geometric correctness—metric-space attention, triangle inequality enforcement, sparse expert routing—naturally favor CPU execution. For sub-2B parameter models, CPU training produces more capable models at a fraction of the cost.
  • 6 replies
·