Research
Teaching machines to find what experts are actually looking for
My research sits at the intersection of information retrieval, natural language processing, and explainable AI — using patents as a proving ground for methods that generalize to any large, specialized document collection.
Research Pillars
Six interconnected research themes
01
Patent Information Retrieval
Designing retrieval and ranking systems that connect a search query to the most relevant prior art across millions of patent documents — the technical core of automated prior-art search.
02
LLMs for Prior-Art Search
Evaluating and adapting large language models — GPT, Gemini, and open models — for patent retrieval, with a focus on where they help, where they fail, and how to measure both rigorously.
03
Explainable AI for Patent Text
Building systems that highlight the exact evidence behind a retrieval or classification decision, so patent examiners and analysts can verify, not just trust, an AI's output.
04
Domain-Specific Embeddings
Fine-tuning and quantizing transformer models such as BERT-for-Patents for efficient, domain-adapted search — squeezing state-of-the-art retrieval into production-ready footprints.
05
Topic Modeling & Knowledge Discovery
Mapping large technical collections, from health-informatics patents to engineering literature, to surface emerging themes, sub-fields, and structure that no single reader could find manually.
06
Patent Analytics & Text Mining
Applying sentiment analysis, automated summarization, and natural query understanding to turn dense, unstructured patent text into something an analyst can act on quickly.
Selected Projects
Research in action
Few-Shot Fine-Tuning & Quantized Embeddings for Patent Retrieval
A retrieval pipeline that fine-tunes embeddings with limited labeled data and compresses them through quantization, holding onto retrieval quality while cutting the computational cost.
ICAAI, ACM, 2025
Rethinking Patent Retrieval with Language Models
A study toward scalable, efficient patent search with language models, examining where modern LLM-based retrieval architectures outperform classical IR baselines — and where they don't.
World Patent Information, Elsevier, 2025
ChatGPT vs. Google Gemini for Prior-Art Search
A head-to-head evaluation of two frontier LLMs on prior-art search using European Search Reports as ground truth, surfacing concrete strengths and failure modes for each model.
SemTech4STLD, ESWC, 2024
Explainable AI for Highlighting & Searching in Patent Text
An explainability layer for patent search that highlights the specific passages driving a result, making AI-assisted search auditable for examiners rather than a black box.
PatentSemTech, SIGIR, 2023
Topic Models for Health-Informatics Patent Retrieval
An investigation into topic modeling techniques for decoding health-informatics patents, mapping a fast-moving technical domain into structured, searchable themes.
HINT24, Springer LNNS, 2024
Natural Query Understanding for Patent Prior-Art Search
A diagnostic approach to a deceptively simple question — is your search query well-formed? — that improves prior-art search by catching malformed queries before they fail silently.
World Patent Information, Elsevier, 2023
Looking Ahead
A future AI & Knowledge Discovery Lab
As I complete my Ph.D., my research agenda is broadening from patent-specific systems toward a general framework for AI-driven knowledge discovery — applicable wherever organizations sit on top of large, specialized, hard-to-search document collections.
Emerging directions
- Retrieval-augmented generation for enterprise and technical knowledge bases.
- Rigorous evaluation frameworks for LLM-based search and retrieval.
- Cross-lingual and multilingual retrieval for global patent and scientific corpora.
- Innovation analytics — using patent and literature data to map technological trends.