Login Sign Up

Getting Started with Generative AI: Running Your First Text Generation Model in Google Colab

Language models are the backbone of modern AI applications, enabling powerful text generation, understanding, and completion. If you’re looking to experiment with large language models (LLMs), one of the best places to start is the Hugging Face Hub—a platform that hosts a vast collection of pre-trained models across various domains, including text, image, and audio processing.

At the time of writing, Hugging Face offers more than 800,000 models, making it an invaluable resource for developers and AI enthusiasts. Among these, Phi-3-mini is an exciting model. With 3.8 billion parameters, it balances efficiency and performance, making it suitable for devices with limited GPU memory. It even supports quantization, allowing it to run on systems with less than 6GB of VRAM.

Loading a Language Model and Tokenizer

Before we generate text, we need to load two key components:

  1. The generative model – This is the AI model responsible for producing text.
  2. The tokenizer – This processes input text into tokens before passing it to the model.

For this, we use the Transformers library, a powerful tool developed by Hugging Face. Below is the Python code to load Phi-3-mini in Google Colab:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer

model = AutoModelForCausalLM.from_pretrained(

    "microsoft/Phi-3-mini-4k-instruct",

    device_map="cuda",  # Use GPU if available

    torch_dtype="auto",

    trust_remote_code=True,

)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
load Phi-3-mini

When executed, this will download the model, which may take a few minutes depending on your internet speed.

Simplifying Text Generation with Pipelines

Instead of manually handling model inputs and outputs, we can simplify the process using transformers.pipeline. This function encapsulates the model, tokenizer, and text generation logic, making it easy to work with.

from transformers import pipeline

# Create a text generation pipeline

generator = pipeline(

    "text-generation",

    model=model,

    tokenizer=tokenizer,

    return_full_text=False,

    max_new_tokens=500,

    do_sample=False  # Use deterministic generation

)
Text Generation with Pipelines

Generating Text with a Prompt

Now, let’s instruct the model to generate a joke about ducks:

# Define the prompt

messages = [

    {"role": "user", "content": "Create a funny joke about ducks."}

]

# Generate output

output = generator(messages)

print(output[0]["generated_text"])
Generating Text with a Prompt

And that’s it! You’ve successfully generated text using a powerful language model.

Key Parameters Explained

  • return_full_text: When set to False, only the model’s response is returned (excluding the original prompt).
  • max_new_tokens: Limits the number of tokens generated, preventing excessively long outputs.
  • do_sample: When set to False, the model selects the most probable next token instead of sampling from multiple possibilities. (We explore more sampling techniques in later chapters.)

Running This in Google Colab

If you don’t have access to a high-end GPU, you can easily run this setup in Google Colab, a free cloud-based Jupyter notebook environment provided by Google. Simply follow these steps:

  1. Open Google Colab (colab.research.google.com).
  2. Create a new notebook.
  3. Install the required libraries (if not already installed):
    !pip install transformers torch
  4. Copy and paste the above code snippets into your notebook cells.
  5. Run the cells and watch the model generate text!