Semantic Search with Language Models: From Concepts to Code

Traditional keyword-based search engines can only take us so far. What if we could search based on meaning rather than matching exact words? That’s where semantic search comes in, powered by modern language models and embeddings. In this post, we’ll walk through the core concepts, explain how dense retrieval works, and build a semantic search tool using sentence-transformers and FAISS.

What Is Semantic Search?

Semantic search focuses on understanding the intent behind a query and matching it with semantically similar content. Instead of comparing words, we compare embeddings—vector representations of text—generated by a language model.

Think of each sentence as a point in a high-dimensional space. Texts with similar meanings will be closer together in this space.

Dense Retrieval in Action

Here’s how dense retrieval works in practice:

Text corpus → embeddings: Convert your data (documents, sentences) into vectors using a pre-trained model.
Query → embedding: Convert the search query into a vector.
Vector search: Find the vectors in the dataset that are closest to the query vector.

Let’s build it step-by-step using the Wikipedia summary of Interstellar.

Setup Tools We’ll Use

Install the libraries:

pip install sentence-transformers faiss-cpu pandas

Step 1: Prepare the Text

Let’s work with a sample document about Interstellar:

text = """

Interstellar is a 2014 science fiction film directed by Christopher Nolan. 

The film stars Matthew McConaughey and Anne Hathaway. 

Set in a dystopian future, it follows astronauts searching for a new home for humanity.

Kip Thorne, a theoretical physicist, was a scientific consultant on the film.

Interstellar premiered in 2014 and was praised for scientific accuracy and visual effects.

It grossed over $677 million worldwide.

"""

# Split and clean sentences

sentences = [s.strip() for s in text.split('.') if s.strip()]

Step 2: Generate Embeddings

We’ll use sentence-transformers for generating embeddings:

from sentence_transformers import SentenceTransformer

import numpy as np

model = SentenceTransformer('all-MiniLM-L6-v2')

sentence_embeddings = model.encode(sentences)

Step 3: Create a FAISS Search Index

FAISS is a library optimized for fast vector search:

import faiss

dimension = sentence_embeddings.shape[1]

index = faiss.IndexFlatL2(dimension)

index.add(np.array(sentence_embeddings).astype('float32'))

Step 4: Search by Semantic Meaning

Let’s define a simple search function:

import pandas as pd

def semantic_search(query, k=3):

    query_embedding = model.encode([query])

    distances, indices = index.search(np.array(query_embedding).astype('float32'), k)

    results = pd.DataFrame({

        'Text': [sentences[i] for i in indices[0]],

        'Distance': distances[0]

    })

    return results

# Try it out:

semantic_search("how accurate was the science")

You’ll get a ranked list of the most relevant sentences—not based on exact word matches but semantic similarity.

Keyword Search Comparison

For comparison, let’s also look at a simple keyword-based method using BM25:

from rank_bm25 import BM25Okapi

import string

def tokenize(text):

    return [word.strip(string.punctuation).lower() for word in text.split() if word]

tokenized_corpus = [tokenize(s) for s in sentences]

bm25 = BM25Okapi(tokenized_corpus)

def keyword_search(query, k=3):

    tokenized_query = tokenize(query)

    scores = bm25.get_scores(tokenized_query)

    top_indices = np.argsort(scores)[::-1][:k]

    results = pd.DataFrame({

        'Text': [sentences[i] for i in top_indices],

        'Score': [scores[i] for i in top_indices]

    })

    return results

# Now compare both methods with:

semantic_search("how accurate was the science")

keyword_search("how accurate was the science")

Limitations of Dense Retrieval

Not always relevant: If no good answer exists in the data, it still returns something.
Exact phrase matching: Sometimes you just need a literal match—dense retrieval may not help.
Domain sensitivity: A model trained on web data might not perform well on legal or medical texts.

Smart Chunking for Long Docs

Since models can’t handle unlimited text length, we split long documents into chunks:

Sentence-level: May be too fine-grained.
Paragraph-level: Often the sweet spot.
Overlapping chunks: Retain context between chunks.

Use overlap like this for chunking:

def chunk_text(text, chunk_size=3, overlap=1):

    sentences = [s.strip() for s in text.split('.') if s.strip()]

    chunks = []

    for i in range(0, len(sentences), chunk_size - overlap):

        chunk = '. '.join(sentences[i:i + chunk_size])

        chunks.append(chunk)

    return chunks

Bonus Tip: Fine-Tuning Embeddings

You can further improve results by fine-tuning embedding models using positive and negative query-document pairs:

Positive: “Interstellar release date” → “Interstellar premiered in 2014…”
Negative: “Interstellar cast” → same document (but irrelevant)

The model learns to bring positive pairs closer and push irrelevant ones farther apart.

Semantic search opens up a new dimension for information retrieval. You’re no longer bound by keywords—now your queries can be understood in context. And with tools like sentence-transformers and FAISS, it’s easier than ever to build your own intelligent search system.

Semantic Search with Language Models: From Concepts to Code

Foundations of LLMs

Fine-Tuning Pretrained Models

Model Evaluation

What Is Semantic Search?

Dense Retrieval in Action

Setup Tools We’ll Use

Keyword Search Comparison

Limitations of Dense Retrieval

Smart Chunking for Long Docs

Bonus Tip: Fine-Tuning Embeddings

Login

Semantic Search with Language Models: From Concepts to Code

Foundations of LLMs

Fine-Tuning Pretrained Models

Model Evaluation

What Is Semantic Search?

Dense Retrieval in Action

Setup Tools We’ll Use

Keyword Search Comparison

Limitations of Dense Retrieval

Smart Chunking for Long Docs

Bonus Tip: Fine-Tuning Embeddings