Brhanu Fentaw Znabu

Ph.D. student in Biomedical Engineering at the University of Nebraska–Lincoln, advised by Nicole R. Sexton at the Nebraska Center for Virology.

I build and evaluate genomic foundation models for biological sequence analysis. My current work focuses on domain-adaptive pre-training of large language models (DNABERT-2) for viral host-range prediction and epidemic emergence forecasting, with an emphasis on rigorous leakage-aware evaluation and model interpretability.

Brhanu Fentaw Znabu

News

May 2026 Poster accepted at ISMB 2026 (MLCSB COSI): “Investigating Host-Range Determinants of Flaviviruses Using Foundation Model Embeddings: A Leakage-Aware Evaluation Framework.”
Apr 2026 Nominated by the University of Nebraska–Lincoln for the Google PhD Fellowship 2026 in AI for Health.
Apr 2026 Served as a Poster Judge at the School of Biological Sciences Undergraduate Research Symposium, UNL.
Mar 2026 Competed in the UNL Engineering Pitch Competition with Traversa, a startup building AI-driven infectious disease risk assessment for travelers.
Feb 2026 Poster presentation at the Annual Engineering Symposium, UNL: “Deciphering the host-switching grammar of flaviviruses using foundation model embeddings.”
Oct 2025 Poster presentation at the NCV Annual Virology Symposium: “Cracking the viral code: codon usage and CpG patterns predict host specificity.”
Jun 2025 Started Research Internship at Microsoft Research, Health Futures — Biomedical Platforms and Genomics team, Mountain View, CA.
Apr 2025 Paper on MoSeq-based 3D behavioral profiling in diabetic neuropathy published in Scientific Reports.
Mar 2025 Accepted offer for a Research Internship — Genomics at Microsoft Research, Health Futures group.
2025 Submitted manuscript on leakage-aware evaluation of foundation model embeddings for flavivirus host-range prediction to PLOS Computational Biology.
Aug 2024 Joined the Sexton Lab as a Ph.D. student at the Nebraska Center for Virology, UNL.

Publications

ArboFM study design and analysis pipeline
In Preparation
A genomic foundation model for predicting arbovirus epidemic emergence across RNA virus families
Znabu BF, Sexton NR.
Target: Bioinformatics (Oxford)

ArboFM: domain-adapted DNABERT-2 via continued MLM pre-training on 120K+ arbovirus genome windows (9,299 genomes, 362 species, 6 families). Predicts epidemic emergence with AP = 0.978 (Flaviviridae). Retrospective temporal validation detects Zika, chikungunya, and West Nile epidemic lineages before documented emergence.

Flavivirus dataset curation and leakage-aware evaluation framework
In Preparation
Deciphering the host-switching grammar of flaviviruses using foundation model embeddings: a leakage-aware evaluation framework
Znabu BF, Sexton NR.
Target: PLOS Computational Biology

DNABERT-2 embeddings on 3,031 flavivirus genomes. Quantifies 15-percentage-point performance inflation from phylogenetic leakage. Identifies UpA-containing motifs linked to the OAS/RNaseL innate immunity pathway via genome-localized attribution.

Cross-family dataset construction and genome architecture
In Preparation
Cross-family comparison of host-switching grammar in arthropod-borne viruses reveals convergent evolution of family-specific immune evasion strategies
Znabu BF, Sexton NR.
Target: Virus Evolution (Oxford)

Cross-family transfer learning across 4 arbovirus families (6,052 genomes, 124,883 windows). Demonstrates host-switching grammar is predominantly family-specific (Jaccard ≈ 0), supporting convergent evolution.

Flavivirus host range dataset curation workflow
In Preparation
Genome-scale classification of flavivirus host range using composition and codon-usage signatures
Znabu BF, Sexton NR.
Target: Journal of Virology

Species-stratified ML classifiers on 1,285 curated flavivirus genomes with 97 compositional features. Achieves PR-AUC = 1.000 for ISFV vs. dual-host classification. CpG suppression identified as the primary discriminative feature.

ViraPredict study design and dengue genome distribution
In Preparation
Probabilistic forecasting of dengue virus evolution using multi-modal foundation models and fitness landscape analysis
Znabu BF.
Target: TBD

ViraPredict: multi-modal framework combining ESM-2 protein language models (650M params), fitness landscape modeling, and LoRA-fine-tuned DNABERT-2 with temporal embeddings on 29,493 curated dengue genomes across 130 countries.

MoSeq experimental design and AR-HMM behavioral analysis pipeline
Published 2025
MoSeq based 3D behavioral profiling uncovers neuropathic behavior changes in diabetic mouse model
Ashiquzzaman A*, Lee E*, Znabu BF*, Sakib AN, Chung G, Kim SS, Kim YR, Kwon H-S, Chung E.
Scientific Reports 15, 15114 (2025)
Gene expression heatmap and signature correlations in LUAD
Interpretable deep learning-based multi-omics integration for prognosis in hepatocellular carcinoma
Znabu BF, Atif Z.
bioRxiv (2026)

Attention-based multi-branch deep learning framework integrating multi-omics data for interpretable survival prediction in hepatocellular carcinoma.

NativeReady pipeline from data sources to public release
NativeReady: an open benchmark and sequence-based triage model for native mass spectrometry suitability
Znabu BF, Atif Z.
bioRxiv (2026)

Experience

Research Intern — Genomics Jun – Aug 2025
Microsoft Research, Health Futures — Biomedical Platforms and Genomics · Mountain View, CA
Worked with the Biomedical Platforms and Genomics team within Microsoft Health Futures, an interdisciplinary group developing next-generation methods and tools for health and life sciences. Contributed to genomics research and discovery platform development, working with large genomic datasets, language models, and ML model validation within distributed computing environments.
Genomics Language Models ML Validation Azure Python
PhD Researcher Aug 2024 – Present
Sexton Lab, University of Nebraska–Lincoln · Nebraska Center for Virology
Building genomic foundation models (ArboFM) for arbovirus epidemic emergence prediction. Developing leakage-aware evaluation frameworks, cross-family transfer learning, and multi-modal forecasting systems for viral evolution.
Foundation Models DNABERT-2 ESM-2 Transfer Learning PyTorch
Research Assistant Mar 2021 – Jun 2024
Neurophotonics Lab, GIST · South Korea
Developed unsupervised autoregressive and Hidden Markov models for behavioral time-series analysis. Benchmarked deep learning-based 3D pose estimation against traditional 2D methods.
AR-HMM DeepLabCut Behavioral Analysis MATLAB

About

My research sits at the intersection of machine learning and genomics. I develop foundation models and rigorous evaluation frameworks for biological sequence analysis, with applications in viral emergence prediction, host-range classification, and model interpretability. I work primarily with PyTorch, Hugging Face Transformers, and the SciPy/PyData ecosystem on HPC GPU clusters.

Prior to my PhD, I earned an M.Sc. in Biomedical Science and Engineering from Gwangju Institute of Science and Technology (GIST), South Korea, where I applied unsupervised machine learning (autoregressive models, Hidden Markov models) to 3D behavioral phenotyping in diabetic neuropathy mouse models. I also hold a B.Sc. in Biomedical Engineering from Jimma University, Ethiopia, graduating ranked 4th of 180.

When I'm not at the computer, you'll find me exploring coffee shops ☕ or at the gym 🏋.

Awards & Grants

2020 Korean Government Full Scholarship for Master's Study, GIST
2020 Research Grant for Colostomy Device Development, Hawassa University
2018 Best B.Sc. Thesis Award, Ethiopian Science, Technology & Innovation
2018 Graduated with Distinction, Ranked 4th of 180, Jimma University