What if “Attention Is All You Need”… isn’t entirely true?

At Ankpal, we spend a lot of time thinking about systems – how they behave, how they scale, and more importantly, how they can be improved at a fundamental level.

While most of the world is building on top of AI – agents, wrappers, workflows – we believe it’s equally important to question the foundations themselves.

Because sometimes, the biggest breakthroughs don’t come from optimization…
they come from rethinking the assumptions.


📄 A New Direction in Sequence Modeling

We’re proud to share a recent research paper by our CTO (Gowrav Vishwakarma):

👉 https://arxiv.org/abs/2604.05030?context=cs.AI

This work introduces Phase-Associative Memory (PAM) – a sequence modeling approach that explores an alternative to the dominant transformer architecture.

Co-authored with Christopher J. Agostino, the work brings together perspectives from machine learning, physics, and quantum-inspired semantics.


🧠 Rethinking How We Model Language

Modern AI systems largely rely on a core assumption:

Language can be modeled as patterns in real-valued vector space, optimized through attention and next-token prediction.

PAM explores a different possibility:

👉 What if meaning is not just about predicting the next token…
👉 but about interpreting a state in a richer mathematical space?

Instead of operating purely in real-valued representations, PAM works in a complex-valued Hilbert space, where:

  • representations carry phase, not just magnitude
  • memory is built through associative accumulation (outer products)
  • retrieval depends on phase alignment, rather than softmax attention

⚙️ Early Results (With Honest Context)

At ~100M parameters on WikiText-103:

  • PAM reaches a validation perplexity of ~30
  • A matched transformer reaches ~27

So no – this is not outperforming transformers.

But here’s what makes it interesting:

  • It operates with ~4× computational overhead
  • Yet stays within ~10% performance gap
  • And does so without heavy optimization or custom kernels

This suggests something deeper:

The underlying representation may be capturing structure efficiently – even in an early, unoptimized state.


🔍 Why This Matters

The industry today is heavily focused on scaling:

  • bigger models
  • more data
  • more compute

But there’s a growing question:

Are we scaling the right abstraction?

If language and meaning are fundamentally contextual and non-separable, then representing them purely in real-valued space might be an approximation – not a complete solution.

PAM doesn’t claim to replace transformers.

But it opens a direction:

👉 There may be other mathematical frameworks better suited for modeling language.


🚀 What This Says About Ankpal

This research is independent of Ankpal’s core product – but it reflects something fundamental about how we think as a team.

At Ankpal, we don’t just focus on building solutions.
We value:

  • first-principles thinking
  • questioning established norms
  • and exploring ideas that may not have immediate payoff, but long-term impact

Because innovation doesn’t always come from doing things faster –
sometimes it comes from asking:

“Are we even solving this the right way?”


🌱 Looking Ahead

This is early work.

It’s not optimized.
It’s not scaled.
It’s not production-ready.

But it points toward something worth exploring.

And if there’s even a small chance that language is better modeled beyond real-valued space…

Then there’s a lot more left to discover.


💬 We’d Love to Hear Your Thoughts

If you’re working on:

  • alternative architectures
  • interpretability
  • or new mathematical approaches to AI

We’d love to connect and discuss.

– Gowrav Vishwakarma,
CTO,
Ankpal https://ankpal.com
https://www.linkedin.com/in/gowravvishwakarma/

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *