Research

Teaching machines to find what experts are actually looking for

My research sits at the intersection of information retrieval, natural language processing, and explainable AI — using patents as a proving ground for methods that generalize to any large, specialized document collection.

Research Pillars

Six interconnected research themes

Patent Information Retrieval

Designing retrieval and ranking systems that connect a search query to the most relevant prior art across millions of patent documents — the technical core of automated prior-art search.

LLMs for Prior-Art Search

Evaluating and adapting large language models — GPT, Gemini, and open models — for patent retrieval, with a focus on where they help, where they fail, and how to measure both rigorously.

Explainable AI for Patent Text

Building systems that highlight the exact evidence behind a retrieval or classification decision, so patent examiners and analysts can verify, not just trust, an AI's output.

Domain-Specific Embeddings

Fine-tuning and quantizing transformer models such as BERT-for-Patents for efficient, domain-adapted search — squeezing state-of-the-art retrieval into production-ready footprints.

Topic Modeling & Knowledge Discovery

Mapping large technical collections, from health-informatics patents to engineering literature, to surface emerging themes, sub-fields, and structure that no single reader could find manually.

Patent Analytics & Text Mining

Applying sentiment analysis, automated summarization, and natural query understanding to turn dense, unstructured patent text into something an analyst can act on quickly.

Selected Projects

Research in action

Full publication list

Few-Shot Fine-Tuning & Quantized Embeddings for Patent Retrieval

A retrieval pipeline that fine-tunes embeddings with limited labeled data and compresses them through quantization, holding onto retrieval quality while cutting the computational cost.

ICAAI, ACM, 2025

Rethinking Patent Retrieval with Language Models

A study toward scalable, efficient patent search with language models, examining where modern LLM-based retrieval architectures outperform classical IR baselines — and where they don't.

World Patent Information, Elsevier, 2025

ChatGPT vs. Google Gemini for Prior-Art Search

A head-to-head evaluation of two frontier LLMs on prior-art search using European Search Reports as ground truth, surfacing concrete strengths and failure modes for each model.

SemTech4STLD, ESWC, 2024

Explainable AI for Highlighting & Searching in Patent Text

An explainability layer for patent search that highlights the specific passages driving a result, making AI-assisted search auditable for examiners rather than a black box.

PatentSemTech, SIGIR, 2023

Topic Models for Health-Informatics Patent Retrieval

An investigation into topic modeling techniques for decoding health-informatics patents, mapping a fast-moving technical domain into structured, searchable themes.

HINT24, Springer LNNS, 2024

Natural Query Understanding for Patent Prior-Art Search

A diagnostic approach to a deceptively simple question — is your search query well-formed? — that improves prior-art search by catching malformed queries before they fail silently.

World Patent Information, Elsevier, 2023

Looking Ahead

A future AI & Knowledge Discovery Lab

As I complete my Ph.D., my research agenda is broadening from patent-specific systems toward a general framework for AI-driven knowledge discovery — applicable wherever organizations sit on top of large, specialized, hard-to-search document collections.

Emerging directions

Retrieval-augmented generation for enterprise and technical knowledge bases.
Rigorous evaluation frameworks for LLM-based search and retrieval.
Cross-lingual and multilingual retrieval for global patent and scientific corpora.
Innovation analytics — using patent and literature data to map technological trends.

Discuss a research collaboration