Transformers and Large Language Models: Intro to the foundational architecture of Generative AI

July 31, 2025
4
mins read

More than a billion+ users later, LLMs have been adopted faster than any tech in history, powered by the experience shift from “searching” for information and being presented links to getting targeted answers that are “generated” in milliseconds. With this shift came fast-changing expectations across enterprise software, telco operations and daily productivity.

What is happening and what is behind the boost into the future?

It’s been one year since Partha Seetala, president of Rakuten Cloud, launched his AI training series, A Comprehensive and Intuitive Introduction to Deep Learning (CIDL).

The first season opened with “An Intuitive Introduction to Neural Networks,” delivering a clear message: in today’s landscape, not understanding AI can seriously hold back your career or your product. What makes Partha’s sessions stand out is how well they balance depth and accessibility, crafted to make complex AI concepts understandable, whether you're an engineer or an executive. They have become required viewing for teams in telco, tech and beyond.

Recently, we discussed key takeaways from season two, which focused on how neural networks process sequence data like text and timeseries information. This spanned techniques like embeddings, RNNs, LSTMs, Seq2Seq and attention. (Check out our interview with Partha on the role of these approaches from last week’s Zero-Touch Live.)

Understanding AI model behavior and how to influence it is critical. Equally important is understanding the role architecture plays.

In season three, viewers learn the details behind how Large Language Models (LLMs) work, including how machines compress large volumes of human knowledge into a transformer neural network and present it back in highly targeted ways when queried. Season three will focus not just on the what and how of LLMs and transformers, but also the why. In particular, why components are structured the way they are.

Episode one is now available and kicks off with the architecture that redefined AI. This puts the focus on transformers, which is the architecture powering today’s LLMs and enterprise AI systems.

Why transformers matter

Transformers represent a leap in design, introducing parallelism, context awareness and general purpose learning. They are the foundation of modern LLMs, evolving generative AI from theory to practical deployment and giving us household names like ChatGPT, Gemini and Claude.

Two breakthroughs have been incredibly important:

  • Positional encoding enabled models to process entire sequences in parallel versus simply word-by-word like RNNs.
  • Self-attention allowed models to dynamically weigh context and meaning for each word/token (i.e., not just memorize, but actually understand).

In telecom especially, AI models won’t be delivered as boxed solutions from vendors. (If you are offered one, be incredibly wary!) Rather, these models are becoming embedded in infrastructure, workflows and especially data. That means engineers must understand how transformers fundamentally work.

It goes back to Partha’s recurring mantra that AI cannot be viewed as a black box or magic.

What to expect in season three

Season three dives into three transformer types:

  • Encoder-only. Used for classification, extractive QA, etc. (e.g., BERT, Electra).
  • Decoder-only. Used for generative tasks, including LLMs (e.g., GPT).
  • Encoder-decoder. Used for translation and generation tasks (e.g., T5, MarianMT, BART).

As in previous seasons, the focus is on intuitive understanding, not just formulas. With this in mind, Partha breaks down each architectural component of the Transformer, including embedding, positional encoding, self-attention, feedforward layers, normalization and stacking:

  • Embedding. Words are turned into dense vectors so the model can “see” them as numbers.
  • Positional encoding. Extra numbers are added to tell the model where each word sits in the sentence.
  • Self-attention. Every word looks at every other word to decide which ones matter most.
  • Feed-forward layers. Simple neural nets give each word a quick, non-linear polish between attention rounds.
  • Normalization. Outputs are scaled and shifted so training stays stable and fast.
  • Stacking. Blocks are piled atop one another to build deeper, more powerful understanding.

Throughout the season, approaches for training and fine-tuning will be covered, as well as the role of emergent behavior, agent architectures, retrieval-augmented generation (RAG) and reasoning models. This means ultimately expanding focus beyond today’s LLMs.

This course isn’t just for teams building models from scratch. It’s equally valuable for evaluating, tuning and integrating foundation models into real systems. This is especially true in telecom, where alignment with operational data, constraints and intent is essential.

Check out season three today

In telecom and enterprise tech, deploying AI isn’t just about what models can do but about understanding how they work. Season three teaches the architectural fluency to build, adapt and apply transformer models in ways that align with real-world constraints and goals. Episode one is available now with more episodes on the way soon.

Have a question for Partha Seetala or want to see specific topics covered in an upcoming course? Mention him in the comments to start a conversation. And remember to subscribe to the Zero-Touch newsletter to have insights like these sent to your inbox every week.

Partha Seetala
President, Cloud BU + Chief AI Officer, Rakuten Symphony

AI
Related Newsletter
Beyond adoption: What it really takes to build an AI-first culture
Enterprise AI adoption is not a matter of if but how. The largest companies in the world are making significant investments across tech, people and resources, angling for an advantage that could pay dividends for those that establish leadership positions.
September 11, 2025
5
MINUTES
From silicon to services: What MWC confirmed about telecom’s future
In this special MWC Barcelona edition of Zero-Touch, Rakuten Symphony SVP Partner & Portfolio Sheheryar Khakwani (SK) reflects on why telecom’s next phase of growth hinges on how industry stakeholders collaborate to turn capability into service.
March 5, 2026
4
MINUTES
How agentic networks break barriers: Dispatch from AWS hackathon at FYUZ
In this week’s Zero-Touch newsletter, Rakuten Symphony CMO Geoff Hollingworth shares his observations on telecom’s persistent “sexiness” challenge and why his experience at the recent AWS Breaking Barriers Hackathon, held at FYUZ 2025, has him thinking telcos may finally be getting their innovation groove back. Then AWS 6G and AIML Technologist Ejaz Sial and AWS Industries Director Technology Kaniz Mahdi share a readout from the hackathon, which brought together 377 participants from 68 projects and crowned a winner that showcased impressive, AI-driven RAN learning and optimization. First up, Geoff Hollingworth.
November 20, 2025
5
MINUTES
Let the machine convince you: Building trust for autonomous operations
“Gradually, then suddenly.” In a 1926 novel, Ernest Hemingway used the phrase to describe how a man went bankrupt. His words also perfectly capture how operators will embrace automation and AI. I shared this quote in my opening remarks last week at Mobile World Congress Las Vegas during the Agentic AI Summit session, Smart Networks, Smart Services: Agentic AI for Telcos.
October 23, 2025
4
MINUTES