Login Sign Up

Transformers in NLP – Text Generation, Summarization, and Translation

The Transformer architecture has revolutionized NLP by enabling models to generate, summarize, and translate text more effectively than traditional approaches like RNNs and LSTMs. With the introduction of self-attention and parallel processing, Transformers have significantly improved the quality, fluency, and speed of text-based applications.

This lecture explores three major applications of Transformers in NLP:

  • Text Generation – Autoregressive models like GPT create human-like text.
  • Summarization – Encoder-decoder models like BART generate concise summaries.
  • Translation – Seq2Seq models like T5 and mT5 enable high-quality translations.

1. Text Generation with Transformers

How Transformers Generate Text

Text generation is typically performed by decoder-only Transformers, such as the GPT family (GPT-2, GPT-3, GPT-4), which use autoregressive language modeling. These models generate text one token at a time, predicting the next token based on previously generated tokens.

Key Features of Transformer-Based Text Generation

  • Autoregressive Modeling: Uses past tokens to predict the next token.
  • Few-Shot and Zero-Shot Learning: GPT models can generate high-quality responses with minimal training on new tasks.
  • Controllable Output: By using prompts, models can be guided to generate structured text.

Example: Using GPT for Text Generation

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import warnings

def generate_text(prompt, max_length=150, model_name="gpt2"):
    """
    Generate text using a pre-trained language model with improved configuration.
    
    Args:
    prompt (str): Starting text to generate from
    max_length (int): Maximum total length of generated text
    model_name (str): Hugging Face model identifier
    
    Returns:
    str: Generated text
    """
    # Suppress specific HuggingFace warnings
    warnings.filterwarnings("ignore", category=UserWarning)
    
    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
    
    # Create text generation pipeline with explicit truncation
    generator = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_length=max_length,
    truncation=True, # Explicitly set truncation
    pad_token_id=tokenizer.eos_token_id # Set padding token
    )
    
    # Generate text
    try:
    generated_texts = generator(prompt)
    return generated_texts[0]['generated_text']
    except Exception as e:
    print(f"Error during text generation: {e}")
    return prompt

# Example usage
def main():
    # Different prompts to showcase text generation
    prompts = [
    "Once upon a time,",
    "In a world where technology revolutionized everything,",
    "The curious scientist discovered",
    ]
    
    # Generate text for each prompt
    for prompt in prompts:
    print(f"Prompt: {prompt}")
    result = generate_text(prompt, max_length=200)
    print(f"Generated Text: {result}\n")
  
if __name__ == "__main__":
main()

 Output:

Prompt: Once upon a time,
Device set to use cpu
Generated Text: Once upon a time, our country is plagued by endless conflicts over territory, including the borders of the US, Europe, and the Middle East.

President Donald Trump was not on the same page at all, with many Democrats citing the Russian annexation of Crimea and the U.S. attack.

In a statement issued Wednesday, White House press secretary Sean Spicer said "there are no issues" over the White House's decision to not issue a travel ban, despite the fact Russia and its Foreign Minister Lavrov had warned of "significant potential" repercussions.

"The decision will have a very significant impact within the context of our country and our security. Those who want to travel should be welcome," he said.

"Our country should be welcoming all those who wish to come to our country, and also be respectful of those Americans who have been murdered or tortured on a foreign soil," the statement continued. "We want to ensure that our border is not taken over by people who

Prompt: In a world where technology revolutionized everything,
Device set to use cpu
Generated Text: In a world where technology revolutionized everything, the internet is becoming a global issue. And while I believe that there will always be more people with more options online, the challenges of moving to digital platforms like Facebook and Twitter, and more freedom from censorship continue to define us.

We'll continue to work to put a human face into our world. But let's not overdo the message by assuming we've built an infinite computer system. For this reason, the Internet is a powerful tool to push us to new heights of understanding about the future of all people.

Today in the world, there is an incredible amount of science to understand how our brains are capable of understanding everything we do, and how we can improve our lives and our ability to live. But we don't have the technology to create a computerized world. The real danger is not technology but it's ignorance: the very idea that you can write a computer program in three days, with the same kind of complexity and

Prompt: The curious scientist discovered
Device set to use cpu
Generated Text: The curious scientist discovered that not only was this an astonishing discovery. "The theory says that if you were to find out if something is wrong with your DNA from inside a person's own person, you would find out," he explained. Dr Steve Bove, director of animal services at the Natural History Museum of London, told BBC News: "This is very much an exciting discovery and a fascinating study. "What is a known way to tell us if something is an animal? "We still don't know very much about these organisms, but here we can tell us much about how they function, even if there were no such thing as a human-like organism yet."

2. Summarization with Transformers

How Transformers Summarize Text

Summarization is commonly handled by encoder-decoder Transformer models, such as:

  • BART (Bidirectional and Auto-Regressive Transformer)
  • T5 (Text-to-Text Transfer Transformer)
  • PEGASUS (Pretraining with Extracted Gap-Sentences)

These models first encode the input text, then decode a shorter version while retaining key information.

Types of Summarization

  • Extractive Summarization: Selects key sentences from the input (e.g., BERTSUM).
  • Abstractive Summarization: Generates a new, shorter version in its own words (e.g., BART, T5).

Real-World Applications

  • News Summarization (e.g., Google News)
  • Legal Document Summarization
  • Scientific Paper Summarization (e.g., Semantic Scholar)

Example: Summarization with BART

 
import textwrap
from transformers import pipeline

def demonstrate_bart_summarization():
    # Initialize the BART summarization pipeline
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    
    # Different types of text to summarize
    texts = {
        "Scientific": "Transformers have significantly changed the landscape of Natural Language Processing (NLP) by introducing the self-attention mechanism. Unlike traditional Recurrent Neural Networks (RNNs), transformer models can process entire sequences in parallel, leading to more efficient and effective language understanding and generation.",
        
        "News Article": "A breakthrough in renewable energy technology has been announced by a team of researchers at Stanford University. The new solar panel design increases energy conversion efficiency by 40% compared to current commercial panels. This innovation could potentially reduce solar energy costs and make sustainable power more accessible to communities worldwide.",
        
        "Academic": "The interdisciplinary study of cognitive neuroscience explores the complex relationships between brain functions and cognitive processes. By integrating methodologies from psychology, biology, and computer science, researchers aim to develop comprehensive models of human perception, memory, and decision-making mechanisms."
    }
    
    # Summarization parameters
    summarization_params = {
        "max_length": 50,   # Maximum length of summary
        "min_length": 20,   # Minimum length of summary
        "do_sample": False  # Use deterministic summarization
    }
    
    # Demonstrate summarization for each text
    print("BART Summarization Examples:\n")
    for text_type, text in texts.items():
        print(f"{text_type} Text:")
        print("Original:")
        print(textwrap.fill(text, width=80))
        print("\nSummary:")
        summary = summarizer(text, **summarization_params)[0]['summary_text']
        print(textwrap.fill(summary, width=80))
        print("\n" + "-"*80 + "\n")

# Run the demonstration
demonstrate_bart_summarization() 

Output:

BART Summarization Examples:

Scientific Text:
Original:
Transformers have significantly changed the landscape of Natural Language
Processing (NLP) by introducing the self-attention mechanism. Unlike traditional
Recurrent Neural Networks (RNNs), transformer models can process entire
sequences in parallel, leading to more efficient and effective language
understanding and generation.

Summary:
Transformers can process entire sequences in parallel, leading to more efficient
and effective language understanding and generation. Transformers have
significantly changed the landscape of Natural Language Processing (NLP) by
introducing the self-attention mechanism.

--------------------------------------------------------------------------------

News Article Text:
Original:
A breakthrough in renewable energy technology has been announced by a team of
researchers at Stanford University. The new solar panel design increases energy
conversion efficiency by 40% compared to current commercial panels. This
innovation could potentially reduce solar energy costs and make sustainable
power more accessible to communities worldwide.

Summary:
A breakthrough in renewable energy technology has been announced. The new solar
panel design increases energy conversion efficiency by 40% compared to current
commercial panels.

--------------------------------------------------------------------------------

Academic Text:
Original:
The interdisciplinary study of cognitive neuroscience explores the complex
relationships between brain functions and cognitive processes. By integrating
methodologies from psychology, biology, and computer science, researchers aim to
develop comprehensive models of human perception, memory, and decision-making
mechanisms.

Summary:
Interdisciplinary study of cognitive neuroscience explores the complex
relationships between brain functions and cognitive processes. Researchers aim
to develop comprehensive models of human perception, memory, and decision-making
mechanisms.

--------------------------------------------------------------------------------



3. Translation with Transformers

How Transformers Handle Machine Translation

Machine translation (MT) uses encoder-decoder architectures, where:

  • The encoder processes the source language.
  • The decoder generates text in the target language.
  • T5: A general-purpose text-to-text model supporting multilingual translation.
  • mT5 (Multilingual T5): Optimized for translation in 100+ languages.
  • M2M-100: Facebook’s model that translates directly between 100+ languages without relying on English as an intermediate.

Example: Translating Text with mT5

 

from transformers import MarianMTModel, MarianTokenizer

def translate_text(text, source_lang, target_lang):
    """
    Translate text between languages using Helsinki-NLP's Marian MT models.
    
    Args:
        text (str): Text to be translated
        source_lang (str): Source language code (e.g., 'en', 'fr', 'es')
        target_lang (str): Target language code
    
    Returns:
        str: Translated text
    """
    try:
        # Format language codes for model name
        source_lang = source_lang.lower()
        target_lang = target_lang.lower()
        
        # Determine the correct model name based on language pair
        if source_lang == "fr" and target_lang == "es":
            # For French to Spanish, we'll use Romance to Spanish model
            model_name = "Helsinki-NLP/opus-mt-fr-es"  # Direct model
        else:
            model_name = f"Helsinki-NLP/opus-mt-{source_lang}-{target_lang}"
        
        # Load tokenizer and model
        tokenizer = MarianTokenizer.from_pretrained(model_name)
        model = MarianMTModel.from_pretrained(model_name)
        
        # Tokenize and translate
        inputs = tokenizer(text, return_tensors="pt", padding=True)
        translated = model.generate(**inputs, max_length=100)
        translated_text = tokenizer.batch_decode(translated, skip_special_tokens=True)[0]
        
        return translated_text
    
    except Exception as e:
        print(f"Detailed error: {type(e).__name__} - {str(e)}")
        
        # If direct translation failed, try using English as pivot
        if source_lang != "en" and target_lang != "en":
            try:
                print(f"Trying two-step translation via English...")
                # First translate to English
                en_model_name = f"Helsinki-NLP/opus-mt-{source_lang}-en"
                en_tokenizer = MarianTokenizer.from_pretrained(en_model_name)
                en_model = MarianMTModel.from_pretrained(en_model_name)
                
                # Source → English
                en_inputs = en_tokenizer(text, return_tensors="pt", padding=True)
                en_translated = en_model.generate(**en_inputs, max_length=100)
                english_text = en_tokenizer.batch_decode(en_translated,skip_special_tokens=True[0]
                
                # English → Target
                target_model_name = f"Helsinki-NLP/opus-mt-en-{target_lang}"
                target_tokenizer = MarianTokenizer.from_pretrained(target_model_name)
                target_model = MarianMTModel.from_pretrained(target_model_name)
                
                # English → Target
                target_inputs = target_tokenizer(english_text, return_tensors="pt", padding=True)
                final_translated = target_model.generate(**target_inputs, max_length=100)
                final_text = target_tokenizer.batch_decode(final_translated, skip_special_tokens=True)[0]
                
                return final_text
            except Exception as pivot_error:
                return f"Translation failed: {str(e)}. Pivot translation also failed: {str(pivot_error)}"
        
        return f"Translation failed: {str(e)}"

def main():
    # Translation examples
    translations = [
        {"text": "Hello, how are you?", "source": "en", "target": "fr"},
        {"text": "Bonjour le monde", "source": "fr", "target": "es"},
        {"text": "Machine learning is fascinating", "source": "en", "target": "de"}
    ]
    
    # Perform translations
    for translation in translations:
        result = translate_text(
            translation['text'], 
            translation['source'], 
            translation['target']
        )
        print(f"Original ({translation['source']}): {translation['text']}")
        print(f"Translated ({translation['target']}): {result}\n")

if __name__ == "__main__":
    main()

Output:

Original (en): Hello, how are you?
Translated (fr): Bonjour, comment allez-vous ?

Original (fr): Bonjour le monde
Translated (es): Hola, mundo.

Original (en): Machine learning is fascinating
Translated (de): Maschinelles Lernen ist faszinierend

Advancements in Translation

  • Zero-Shot Translation: Models can translate unseen language pairs.
  • Massive Multilingual Training: Allows translation across 100+ languages.
  • Cross-Lingual Knowledge Sharing: Improves translation quality by leveraging multiple languages in training.

4. Challenges and Future Directions

While Transformers have made breakthroughs, challenges remain:

  • Computational Costs: Large models require high memory and processing power.
  • Bias and Hallucinations: Models can generate inaccurate translations or biased summaries.
  • Handling Low-Resource Languages: Many languages lack sufficient training data.