Recommendation systems have transformed the way we discover products online. By leveraging machine learning techniques like Word2Vec, we can generate product embeddings based on their occurrence in user purchase histories. This approach allows us to recommend similar products often appearing together in transactions, mimicking how words are related in sentences.
In this tutorial, we’ll explore how to build a product recommendation system using the Word2Vec algorithm. We’ll train a model on a dataset of customer transactions and use it to suggest products similar to a given item.
For this project, we use a dataset containing purchase histories from various customers. Each transaction (grouped by InvoiceNo) is treated like a sentence, and each product (StockCode) within it is considered a word. By training a Word2Vec model, we can capture product relationships and recommend items based on their embeddings.
We begin by loading the dataset, which consists of customer transactions:
import pandas as pd
from urllib import request
# Load the transaction dataset
url = "https://raw.githubusercontent.com/bigb0ss/Retail-datasets/refs/heads/master/Online%20Retail.csv"
df = pd.read_csv(url, encoding='ISO-8859-1')
# Filter relevant columns
df = df[['InvoiceNo', 'StockCode', 'Description', 'CustomerID']].dropna()
# Group transactions by InvoiceNo
transactions = df.groupby('InvoiceNo')['StockCode'].apply(list).tolist()Once we have the dataset ready, we can train a Word2Vec model on the transactions. This model learns to represent products as vectors in a high-dimensional space, allowing us to measure the similarity between them.
from gensim.models import Word2Vec
# Train the Word2Vec model
model = Word2Vec(
transactions, vector_size=32, window=10, negative=10, min_count=1, workers=4
)Now that the model is trained, we can find products similar to a given item. Let’s say we want recommendations for a product with StockCode = 85123A. We can query the model as follows:
_id = '85123A'
similar_products = model.wv.most_similar(positive=[product_id], topn=5)This outputs a list of StockCode values that are most similar to 85123A based on their embeddings. We can then fetch their details from the dataset:
To make the recommendations more user-friendly, we can present them in a structured format:
import pandas as pd
def display_recommendations(product_id):
recommendations = get_recommendations(product_id)
if recommendations.empty:
print("No recommendations found.")
else:
print("Recommended Products:")
print(recommendations.to_string(index=False))
# Example usage
display_recommendations('85123A')Output:
| Recommended Products: StockCode Description 21730 GLASS STAR FROSTED T-LIGHT HOLDER 21733 RED HANGING HEART T-LIGHT HOLDER 22189 CREAM HEART CARD HOLDER 22804 CANDLEHOLDER PINK HANGING HEART 71477 COLOUR GLASS. STAR T-LIGHT HOLDER 22804 PINK HANGING HEART T-LIGHT HOLDER 71477 COLOURED GLASS STAR T-LIGHT HOLDER |
These recommendations align well with similar household and gift-related items, indicating that our model has effectively captured product relationships.