new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

May 20

SNLP: Layer-Parallel Inference via Structured Newton Corrections

Autoregressive language models execute Transformer layers sequentially, creating a latency bottleneck that is not removed by conventional tensor or pipeline parallelism. We study whether this layerwise dependency can be relaxed by treating the hidden-state trace across layers as the solution of a nonlinear residual equation and solving it with parallel Newton-style updates. While this view is principled, exact Newton corrections require expensive Jacobian-vector products and naive fixed-point iterations are unstable on trained Transformers. We introduce Structured Newton Layer Parallelism (SNLP), a training and inference framework that replaces exact layer Jacobians with cheap architecture-induced surrogate dynamics. In residual Transformers, this yields Identity Newton (IDN), where the correction reduces to a prefix-sum-like update; in mHC-style architectures, HC Newton (HCN) uses the model's residual mixing matrix. We further introduce SNLP-aware regularization, which trains models to make one or a few structured Newton iterations accurately approximate the sequential forward. Experiments on nanochat-scale Transformers show that SNLP regularization improves layer-parallel compatibility and can also improve standard sequential perplexity, reducing baseline PPL by 4.7%-23.4%. At inference time, SNLP combined with layer fusion and chunkwise decomposition achieves practical wall-clock speedups: on a 0.5B Nanochat model, it reaches 2.3x speedup while still improving PPL by 6.1%. These results suggest that layer-parallel inference is not merely a numerical approximation to sequential execution, but can act as a useful solver-induced inference bias. We also characterize limitations: off-the-shelf pretrained models are less amenable to this procedure, and exact convergence recovers the sequential computation rather than providing monotonic inference-time scaling.

RedHatAI Red Hat AI
·
May 17 1

The Newton Scheme for Deep Learning

We introduce a neural network (NN) strictly governed by Newton's Law, with the nature required basis functions derived from the fundamental classic mechanics. Then, by classifying the training model as a quick procedure of 'force pattern' recognition, we developed the Newton physics-based NS scheme. Once the force pattern is confirmed, the neuro network simply does the checking of the 'pattern stability' instead of the continuous fitting by computational resource consuming big data-driven processing. In the given physics's law system, once the field is confirmed, the mathematics bases for the force field description actually are not diverged but denumerable, which can save the function representations from the exhaustible available mathematics bases. In this work, we endorsed Newton's Law into the deep learning technology and proposed Newton Scheme (NS). Under NS, the user first identifies the path pattern, like the constant acceleration movement.The object recognition technology first loads mass information, then, the NS finds the matched physical pattern and describe and predict the trajectory of the movements with nearly zero error. We compare the major contribution of this NS with the TCN, GRU and other physics inspired 'FIND-PDE' methods to demonstrate fundamental and extended applications of how the NS works for the free-falling, pendulum and curve soccer balls.The NS methodology provides more opportunity for the future deep learning advances.

  • 6 authors
·
Oct 15, 2018

Unsupervised Discovery of Formulas for Mathematical Constants

Ongoing efforts that span over decades show a rise of AI methods for accelerating scientific discovery, yet accelerating discovery in mathematics remains a persistent challenge for AI. Specifically, AI methods were not effective in creation of formulas for mathematical constants because each such formula must be correct for infinite digits of precision, with "near-true" formulas providing no insight toward the correct ones. Consequently, formula discovery lacks a clear distance metric needed to guide automated discovery in this realm. In this work, we propose a systematic methodology for categorization, characterization, and pattern identification of such formulas. The key to our methodology is introducing metrics based on the convergence dynamics of the formulas, rather than on the numerical value of the formula. These metrics enable the first automated clustering of mathematical formulas. We demonstrate this methodology on Polynomial Continued Fraction formulas, which are ubiquitous in their intrinsic connections to mathematical constants, and generalize many mathematical functions and structures. We test our methodology on a set of 1,768,900 such formulas, identifying many known formulas for mathematical constants, and discover previously unknown formulas for pi, ln(2), Gauss', and Lemniscate's constants. The uncovered patterns enable a direct generalization of individual formulas to infinite families, unveiling rich mathematical structures. This success paves the way towards a generative model that creates formulas fulfilling specified mathematical properties, accelerating the rate of discovery of useful formulas.

  • 6 authors
·
Dec 21, 2024

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the embedding vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. Intriguingly it benefits from very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.

  • 5 authors
·
Mar 4, 2021

Eyla: Toward an Identity-Anchored LLM Architecture with Integrated Biological Priors -- Vision, Implementation Attempt, and Lessons from AI-Assisted Development

We present the design rationale, implementation attempt, and failure analysis of Eyla, a proposed identity-anchored LLM architecture that integrates biologically-inspired subsystems -- including HiPPO-initialized state-space models, zero-initialized adapters, episodic memory retrieval, and calibrated uncertainty training -- into a unified agent operating system running on consumer hardware. Unlike existing approaches that optimize models for generic helpfulness, Eyla targets identity consistency: the ability to maintain a coherent self-model under adversarial pressure, admit uncertainty, and resist manipulation. We propose the Identity Consistency Score (ICS), a novel benchmark for evaluating this property across LLMs. We then present an honest account of attempting to implement this architecture using AI coding assistants (Claude Code, Cursor) as a non-programmer, documenting a $1,000+ failure that produced a 1.27B parameter model with 86 brain subsystems contributing less than 2% to output. Our analysis identifies five systematic failure modes of AI-assisted development for novel architectures and offers concrete recommendations. To our knowledge, this is the first paper to combine an architectural vision with a documented first-person failure analysis of AI-assisted LLM development, providing lessons for both the AI systems and AI-assisted software engineering communities.

  • 1 authors
·
Mar 9

StableIdentity: Inserting Anybody into Anywhere at First Sight

Recent advances in large pretrained text-to-image models have shown unprecedented capabilities for high-quality human-centric generation, however, customizing face identity is still an intractable problem. Existing methods cannot ensure stable identity preservation and flexible editability, even with several images for each subject during training. In this work, we propose StableIdentity, which allows identity-consistent recontextualization with just one face image. More specifically, we employ a face encoder with an identity prior to encode the input face, and then land the face representation into a space with an editable prior, which is constructed from celeb names. By incorporating identity prior and editability prior, the learned identity can be injected anywhere with various contexts. In addition, we design a masked two-phase diffusion loss to boost the pixel-level perception of the input face and maintain the diversity of generation. Extensive experiments demonstrate our method outperforms previous customization methods. In addition, the learned identity can be flexibly combined with the off-the-shelf modules such as ControlNet. Notably, to the best knowledge, we are the first to directly inject the identity learned from a single image into video/3D generation without finetuning. We believe that the proposed StableIdentity is an important step to unify image, video, and 3D customized generation models.

  • 7 authors
·
Jan 29, 2024 2

Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset

Large-scale sharing of dialogue-based data is instrumental for advancing the science of teaching and learning, yet rigorous de-identification remains a major barrier. In mathematics tutoring transcripts, numeric expressions frequently resemble structured identifiers (e.g., dates or IDs), leading generic Personally Identifiable Information (PII) detection systems to over-redact core instructional content and reduce dataset utility. This work asks how PII can be detected in math tutoring transcripts while preserving their educational utility. To address this challenge, we investigate the "numeric ambiguity" problem and introduce MathEd-PII, the first benchmark dataset for PII detection in math tutoring dialogues, created through a human-in-the-loop LLM workflow that audits upstream redactions and generates privacy-preserving surrogates. The dataset contains 1,000 tutoring sessions (115,620 messages; 769,628 tokens) with validated PII annotations. Using a density-based segmentation method, we show that false PII redactions are disproportionately concentrated in math-dense regions, confirming numeric ambiguity as a key failure mode. We then compare four detection strategies: a Presidio baseline and LLM-based approaches with basic, math-aware, and segment-aware prompting. Math-aware prompting substantially improves performance over the baseline (F1: 0.821 vs. 0.379) while reducing numeric false positives, demonstrating that de-identification must incorporate domain context to preserve analytic utility. This work provides both a new benchmark and evidence that utility-preserving de-identification for tutoring data requires domain-aware modeling.

  • 10 authors
·
Feb 17

Source Known Identifiers: A Three-Tier Identity System for Distributed Applications

Distributed applications need identifiers that satisfy storage efficiency, chronological sortability, origin metadata embedding, zero-lookup verifiability, confidentiality for external consumers, and multi-century addressability. Based on our literature survey, no existing scheme provides all six of these identifier properties within a unified system. This paper introduces Source Known Identifiers (SKIDs), a three-tier identity system that projects a single entity identity across trust boundaries, addressing all six properties. The first tier, Source Known ID (SKID), is a 64-bit signed integer embedding a timestamp with a 250-millisecond precision, application topology, and a per-entity-type sequence counter. It serves as the database primary key, providing compact storage (8 bytes) and natural B-tree ordering for optimized database indexing. The second tier, Source Known Entity ID (SKEID), extends the SKID into a 128-bit Universally Unique Identifier (UUID) compatible value by adding an entity type discriminator, an epoch selector, and a BLAKE3 keyed message authentication code (MAC). SKEIDs enable zero-lookup verification of identifier origin, integrity, and entity type within trusted environments, with a big-endian byte layout that preserves chronological ordering in lexicographic UUID string comparisons. The third tier, Secure SKEID, encrypts the entire SKEID using AES-256 symmetric encryption as a single-block pseudorandom permutation, producing ciphertext indistinguishable from random bytes while remaining compatible with standard UUID data-type parsers in string representation. Deterministic bidirectional transformations connect all three tiers.

  • 1 authors
·
Mar 30

Proteus-ID: ID-Consistent and Motion-Coherent Video Customization

Video identity customization seeks to synthesize realistic, temporally coherent videos of a specific subject, given a single reference image and a text prompt. This task presents two core challenges: (1) maintaining identity consistency while aligning with the described appearance and actions, and (2) generating natural, fluid motion without unrealistic stiffness. To address these challenges, we introduce Proteus-ID, a novel diffusion-based framework for identity-consistent and motion-coherent video customization. First, we propose a Multimodal Identity Fusion (MIF) module that unifies visual and textual cues into a joint identity representation using a Q-Former, providing coherent guidance to the diffusion model and eliminating modality imbalance. Second, we present a Time-Aware Identity Injection (TAII) mechanism that dynamically modulates identity conditioning across denoising steps, improving fine-detail reconstruction. Third, we propose Adaptive Motion Learning (AML), a self-supervised strategy that reweights the training loss based on optical-flow-derived motion heatmaps, enhancing motion realism without requiring additional inputs. To support this task, we construct Proteus-Bench, a high-quality dataset comprising 200K curated clips for training and 150 individuals from diverse professions and ethnicities for evaluation. Extensive experiments demonstrate that Proteus-ID outperforms prior methods in identity preservation, text alignment, and motion quality, establishing a new benchmark for video identity customization. Codes and data are publicly available at https://grenoble-zhang.github.io/Proteus-ID/.

  • 7 authors
·
Jun 30, 2025

StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under stylization. They often mistake changes in texture or color palette for identity drift or fail to detect geometric exaggerations. This reveals the lack of a style-agnostic framework to evaluate and supervise identity consistency across varying styles and strengths. To address this gap, we introduce StyleID, a human perception-aware dataset and evaluation framework for facial identity under stylization. StyleID comprises two datasets: (i) StyleBench-H, a benchmark that captures human same-different verification judgments across diffusion- and flow-matching-based stylization at multiple style strengths, and (ii) StyleBench-S, a supervision set derived from psychometric recognition-strength curves obtained through controlled two-alternative forced-choice (2AFC) experiments. Leveraging StyleBench-S, we fine-tune existing semantic encoders to align their similarity orderings with human perception across styles and strengths. Experiments demonstrate that our calibrated models yield significantly higher correlation with human judgments and enhanced robustness for out-of-domain, artist drawn portraits. All of our datasets, code, and pretrained models are publicly available at https://kwanyun.github.io/StyleID_page/

  • 6 authors
·
Apr 22 4

I'm Spartacus, No, I'm Spartacus: Measuring and Understanding LLM Identity Confusion

Large Language Models (LLMs) excel in diverse tasks such as text generation, data analysis, and software development, making them indispensable across domains like education, business, and creative industries. However, the rapid proliferation of LLMs (with over 560 companies developing or deploying them as of 2024) has raised concerns about their originality and trustworthiness. A notable issue, termed identity confusion, has emerged, where LLMs misrepresent their origins or identities. This study systematically examines identity confusion through three research questions: (1) How prevalent is identity confusion among LLMs? (2) Does it arise from model reuse, plagiarism, or hallucination? (3) What are the security and trust-related impacts of identity confusion? To address these, we developed an automated tool combining documentation analysis, self-identity recognition testing, and output similarity comparisons--established methods for LLM fingerprinting--and conducted a structured survey via Credamo to assess its impact on user trust. Our analysis of 27 LLMs revealed that 25.93% exhibit identity confusion. Output similarity analysis confirmed that these issues stem from hallucinations rather than replication or reuse. Survey results further highlighted that identity confusion significantly erodes trust, particularly in critical tasks like education and professional use, with declines exceeding those caused by logical errors or inconsistencies. Users attributed these failures to design flaws, incorrect training data, and perceived plagiarism, underscoring the systemic risks posed by identity confusion to LLM reliability and trustworthiness.

  • 8 authors
·
Nov 15, 2024

InstantID: Zero-shot Identity-Preserving Generation in Seconds

There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images. Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we introduce InstantID, a powerful diffusion model-based solution. Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity. To achieve this, we design a novel IdentityNet by imposing strong semantic and weak spatial conditions, integrating facial and landmark images with textual prompts to steer the image generation. InstantID demonstrates exceptional performance and efficiency, proving highly beneficial in real-world applications where identity preservation is paramount. Moreover, our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin. Our codes and pre-trained checkpoints will be available at https://github.com/InstantID/InstantID.

  • 5 authors
·
Jan 15, 2024 8