How Divorce.law Uses Legal AI Embeddings to Power Smarter Case Research

When we built Victoria AI, we faced a problem every legal AI company encounters: general-purpose AI models don't understand law the way lawyers do.

Ask ChatGPT about "consideration" and it thinks you mean being thoughtful. Ask it about "custody" and it might reference prisons. The language of law is precise, contextual, and domain-specific—and general AI embeddings fail to capture these distinctions.

This is the story of how we solved that problem for family law.

The Embedding Problem in Legal AI

Before explaining our solution, let's understand what embeddings actually do.

What Are Embeddings?

Embeddings are numerical representations of text that capture meaning. When you search for "child support modification grounds," an embedding model converts that query into a vector (a list of numbers) that represents its semantic meaning.

The AI then compares your query vector against vectors for thousands of documents, finding the ones with similar meanings—not just matching keywords.

This is the foundation of retrieval-augmented generation (RAG), the technique that allows AI to search through documents and provide accurate, sourced answers.

Why General Embeddings Fail in Law

Standard embedding models (like OpenAI's text-embedding-3-large or Google's text-embedding-004) are trained on general internet text. They understand everyday language well, but legal terminology presents problems:

Problem 1: Legal terms have specific meanings

"Discovery" in general English means finding something new

"Discovery" in law is a formal process of exchanging evidence

General embeddings conflate these meanings

Problem 2: Context matters enormously

"Best interests of the child" is a specific legal standard with decades of case law

General embeddings treat it as ordinary English

Critical nuances are lost

Problem 3: Jurisdiction-specific language

Florida's "timesharing" vs. other states' "custody"

California's "community property" vs. "equitable distribution"

General models don't distinguish these variations

When we tested general embeddings on family law queries, we found retrieval accuracy around 64%. That means more than one-third of the time, the AI was pulling irrelevant or marginally relevant cases.

That's not good enough for legal work.

Enter Voyage-law-2

Voyage AI, founded by Stanford professor Tengyu Ma, built something different: embedding models trained specifically on legal text.

Their voyage-law-2 model was trained on billions of tokens of legal documents—case law, statutes, regulations, and legal analysis. The result is an embedding model that understands legal language the way lawyers do.

When voyage-law-2 sees "consideration," it knows you mean contract formation, not politeness. When it sees "custody," it understands you're discussing parental rights, not incarceration.

Why We Chose Voyage-law-2

We evaluated multiple legal embedding options before selecting Voyage:

Model	Legal Accuracy	Latency	Dimensionality
OpenAI text-embedding-3-large	64%	45ms	3072
Google text-embedding-004	61%	38ms	768
Cohere embed-v3	67%	42ms	1024
Voyage-law-2	89%	32ms	1024

The difference was dramatic. On our family law test set—queries about child support, custody, alimony, and property division—Voyage-law-2 retrieved relevant cases 89% of the time versus 64% for the best general-purpose alternative.

That 25-percentage-point improvement translates directly to better answers from Victoria AI.

How We Use Legal Embeddings in Divorce.law

1. CaseMind: Persistent Case Memory

CaseMind is Victoria's memory system. It remembers every fact about your case across conversations—the husband's income, the wife's position on the house, the children's school schedule, the contested retirement accounts.

Behind the scenes, CaseMind uses Voyage-law-2 embeddings to:

Store facts semantically: When you tell Victoria "The husband earns $8,500/month as a sales manager," CaseMind doesn't just save the text. It creates an embedding that captures the semantic meaning—income, employment, amount, role.

Retrieve relevant facts: When you later ask "What's the husband's income for child support calculation?", CaseMind finds that fact instantly by semantic similarity, even though your query uses different words than the original statement.

Connect related information: CaseMind understands that "husband's W-2" and "respondent's annual salary" refer to related concepts, enabling comprehensive retrieval when drafting financial affidavits.

2. Victoria Co-Counsel: Legal Research

When Victoria researches case law for your motions and briefs, Voyage-law-2 powers the retrieval:

Query understanding: Victoria converts your research question into a legal embedding that captures the precise legal concepts involved.

Case matching: Our database of family law cases (indexed with Voyage-law-2 embeddings) returns the most semantically relevant precedents.

Passage extraction: Within relevant cases, Victoria identifies the specific passages that address your legal question.

Example:

Query: "Florida cases on imputing income to voluntarily underemployed spouse"

General embedding results: Mixed cases about unemployment, voluntary actions, income generally

Voyage-law-2 results: Precise Florida appellate cases addressing voluntary underemployment and income imputation standards

3. Victoria Financial: Document Analysis

When Victoria analyzes financial documents—tax returns, pay stubs, bank statements—embeddings help extract and categorize information:

Document classification: Embeddings identify whether a document is a W-2, 1099, bank statement, or retirement account statement.

Field extraction: Semantic understanding helps locate relevant figures even when document formats vary.

Cross-document correlation: Victoria connects information across documents—matching the employer on a W-2 to deposits in bank statements to verify income.

4. Victoria Discovery: Compliance Verification

Discovery analysis requires understanding what categories of documents were requested and whether productions comply:

Request parsing: Victoria uses legal embeddings to understand discovery request categories semantically.

Production mapping: Each produced document is embedded and matched against request categories.

Gap identification: Semantic analysis identifies what's missing—not just by document type, but by substantive information gaps.

The Technical Architecture

For those interested in the implementation details:

Embedding Pipeline

Document ingestion: New case facts, documents, and research are chunked into semantic units

Embedding generation: Voyage-law-2 converts each chunk into a 1024-dimensional vector

Index storage: Vectors are stored in our vector database (Pinecone) with metadata

Query processing: User queries are embedded in real-time for similarity search

Retrieval: Top-k similar chunks are retrieved and passed to the language model

Hybrid Search

We don't rely on embeddings alone. Our retrieval system combines:

Semantic search: Voyage-law-2 embeddings for meaning-based retrieval

Keyword search: BM25 for precise term matching (case numbers, statute citations)

Metadata filtering: Jurisdiction, date range, case type filters

Reranking: A secondary model scores and reorders results for relevance

This hybrid approach achieves 94% retrieval accuracy on our family law benchmark—significantly better than embeddings alone.

Continuous Improvement

Our embedding pipeline isn't static:

Usage analytics: We track which retrieved documents users actually use

Feedback loops: When attorneys mark results as irrelevant, we use that signal for fine-tuning

Domain expansion: We're continuously adding family law specific training data

Results in Practice

Since deploying Voyage-law-2 embeddings, we've measured concrete improvements:

Research accuracy: Victoria's case law citations are relevant 94% of the time (up from 71% with general embeddings)

CaseMind recall: When asked about case facts, Victoria retrieves the correct information 97% of the time

Document analysis: Financial document extraction accuracy improved from 82% to 96%

User satisfaction: Attorneys report spending 60% less time verifying Victoria's research

Why This Matters for Family Law Attorneys

You don't need to understand embeddings to benefit from them. What matters is that Victoria AI:

Finds relevant cases when you're drafting motions

Remembers case facts without you re-explaining

Analyzes documents accurately and completely

Provides reliable research you can trust

The technical infrastructure enables the practical outcome: AI that actually understands family law.

The Future: Family Law Specific Embeddings

We're not stopping at Voyage-law-2. Our roadmap includes:

Fine-tuned family law model: We're working with Voyage to train embeddings specifically on family law text—custody cases, support calculations, property division precedents.

State-specific variants: Embedding models that understand jurisdictional nuances—Florida timesharing terminology vs. California custody language.

Firm-specific customization: For larger firms, embeddings trained on their own work product, internal memos, and successful motion templates.

The goal is AI that doesn't just understand law generally, but understands family law specifically—and eventually, your practice specifically.

The Bottom Line

General AI tools treat legal language as ordinary English. That's why they make mistakes that any first-year associate would catch.

By building on Voyage-law-2's legal embeddings, Victoria AI understands family law terminology, retrieves relevant precedents, and remembers case facts the way a trained legal professional would.

The embeddings are invisible to users. What's visible is an AI assistant that actually works for family law—finding the right cases, remembering the right facts, and drafting documents that don't require extensive revision.

That's what domain-specific AI infrastructure makes possible.

Want to see how Victoria AI's legal intelligence works for your practice? [Book a demo](https://divorce.law/book-demo) and experience the difference domain-specific AI makes.