Login Sign Up

Building a Product Recommendation System using Embedding with Word2Vec

Recommendation systems have transformed the way we discover products online. By leveraging machine learning techniques like Word2Vec, we can generate product embeddings based on their occurrence in user purchase histories. This approach allows us to recommend similar products often appearing together in transactions, mimicking how words are related in sentences.

In this tutorial, we’ll explore how to build a product recommendation system using the Word2Vec algorithm. We’ll train a model on a dataset of customer transactions and use it to suggest products similar to a given item.

Dataset Overview

For this project, we use a dataset containing purchase histories from various customers. Each transaction (grouped by InvoiceNo) is treated like a sentence, and each product (StockCode) within it is considered a word. By training a Word2Vec model, we can capture product relationships and recommend items based on their embeddings.

We begin by loading the dataset, which consists of customer transactions:

import pandas as pd

from urllib import request

# Load the transaction dataset

url = "https://raw.githubusercontent.com/bigb0ss/Retail-datasets/refs/heads/master/Online%20Retail.csv"

df = pd.read_csv(url, encoding='ISO-8859-1')

# Filter relevant columns

df = df[['InvoiceNo', 'StockCode', 'Description', 'CustomerID']].dropna()

# Group transactions by InvoiceNo

transactions = df.groupby('InvoiceNo')['StockCode'].apply(list).tolist()

Training the Product Embedding Model

Once we have the dataset ready, we can train a Word2Vec model on the transactions. This model learns to represent products as vectors in a high-dimensional space, allowing us to measure the similarity between them.

from gensim.models import Word2Vec

# Train the Word2Vec model

model = Word2Vec(

    transactions, vector_size=32, window=10, negative=10, min_count=1, workers=4

)

Finding Similar Products

Now that the model is trained, we can find products similar to a given item. Let’s say we want recommendations for a product with StockCode = 85123A. We can query the model as follows:

_id = '85123A'

similar_products = model.wv.most_similar(positive=[product_id], topn=5)

This outputs a list of StockCode values that are most similar to 85123A based on their embeddings. We can then fetch their details from the dataset:

Displaying Recommended Products

To make the recommendations more user-friendly, we can present them in a structured format:

import pandas as pd

def display_recommendations(product_id):

    recommendations = get_recommendations(product_id)

    if recommendations.empty:

        print("No recommendations found.")

    else:

        print("Recommended Products:")

        print(recommendations.to_string(index=False))

# Example usage

display_recommendations('85123A')

Output:

Recommended Products:

StockCode                        Description    21730  GLASS STAR FROSTED T-LIGHT HOLDER    21733   RED HANGING HEART T-LIGHT HOLDER    22189            CREAM HEART CARD HOLDER    22804    CANDLEHOLDER PINK HANGING HEART    71477  COLOUR GLASS. STAR T-LIGHT HOLDER    22804  PINK HANGING HEART T-LIGHT HOLDER    71477 COLOURED GLASS STAR T-LIGHT HOLDER


These recommendations align well with similar household and gift-related items, indicating that our model has effectively captured product relationships.