Language models are the backbone of modern AI applications, enabling powerful text generation, understanding, and completion. If you’re looking to experiment with large language models (LLMs), one of the best places to start is the Hugging Face Hub—a platform that hosts a vast collection of pre-trained models across various domains, including text, image, and audio processing.
At the time of writing, Hugging Face offers more than 800,000 models, making it an invaluable resource for developers and AI enthusiasts. Among these, Phi-3-mini is an exciting model. With 3.8 billion parameters, it balances efficiency and performance, making it suitable for devices with limited GPU memory. It even supports quantization, allowing it to run on systems with less than 6GB of VRAM.
Before we generate text, we need to load two key components:
For this, we use the Transformers library, a powerful tool developed by Hugging Face. Below is the Python code to load Phi-3-mini in Google Colab:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-4k-instruct",
device_map="cuda", # Use GPU if available
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")When executed, this will download the model, which may take a few minutes depending on your internet speed.
Instead of manually handling model inputs and outputs, we can simplify the process using transformers.pipeline. This function encapsulates the model, tokenizer, and text generation logic, making it easy to work with.
from transformers import pipeline
# Create a text generation pipeline
generator = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
return_full_text=False,
max_new_tokens=500,
do_sample=False # Use deterministic generation
)Now, let’s instruct the model to generate a joke about ducks:
# Define the prompt
messages = [
{"role": "user", "content": "Create a funny joke about ducks."}
]
# Generate output
output = generator(messages)
print(output[0]["generated_text"])And that’s it! You’ve successfully generated text using a powerful language model.
If you don’t have access to a high-end GPU, you can easily run this setup in Google Colab, a free cloud-based Jupyter notebook environment provided by Google. Simply follow these steps: