Brhanu Fentaw Znabu

Ph.D. student in Biomedical Engineering at the University of Nebraska–Lincoln, advised by Nicole R. Sexton at the Nebraska Center for Virology.

I build and evaluate genomic foundation models for biological sequence analysis. My current work focuses on domain-adaptive pre-training of large language models (DNABERT-2) for viral host-range prediction and epidemic emergence forecasting, with an emphasis on rigorous leakage-aware evaluation and model interpretability. I am also exploring discrete diffusion models and conditional generative frameworks for protein variant design, combining protein language model representations with fitness-guided generation.

Brhanu Fentaw Znabu

News

May 2026 Poster accepted at ISMB 2026 (MLCSB COSI): “Investigating Host-Range Determinants of Flaviviruses Using Foundation Model Embeddings: A Leakage-Aware Evaluation Framework.”
Apr 2026 Nominated by the University of Nebraska–Lincoln for the Google PhD Fellowship 2026 in AI for Health.
Apr 2026 Served as a Poster Judge at the School of Biological Sciences Undergraduate Research Symposium, UNL.
Mar 2026 Competed in the UNL Engineering Pitch Competition with Traversa, a startup building AI-driven infectious disease risk assessment for travelers.
Feb 2026 Poster presentation at the Annual Engineering Symposium, UNL: “Deciphering the host-switching grammar of flaviviruses using foundation model embeddings.”
Oct 2025 Poster presentation at the NCV Annual Virology Symposium: “Cracking the viral code: codon usage and CpG patterns predict host specificity.”
Apr 2025 Paper on MoSeq-based 3D behavioral profiling in diabetic neuropathy published in Scientific Reports.
2025 Submitted manuscript on leakage-aware evaluation of foundation model embeddings for flavivirus host-range prediction to PLOS Computational Biology.
Aug 2024 Joined the Sexton Lab as a Ph.D. student at the Nebraska Center for Virology, UNL.

Publications

ArboFM study design and analysis pipeline
In Preparation
A genomic foundation model for predicting arbovirus epidemic emergence across RNA virus families
Znabu BF, Sexton NR.
Target: Bioinformatics (Oxford)

ArboFM: domain-adapted DNABERT-2 via continued MLM pre-training on 120K+ arbovirus genome windows (9,299 genomes, 362 species, 6 families). Predicts epidemic emergence with AP = 0.978 (Flaviviridae). Retrospective temporal validation detects Zika, chikungunya, and West Nile epidemic lineages before documented emergence.

Flavivirus dataset curation and leakage-aware evaluation framework
In Preparation
Deciphering the host-switching grammar of flaviviruses using foundation model embeddings: a leakage-aware evaluation framework
Znabu BF, Sexton NR.
Target: PLOS Computational Biology

DNABERT-2 embeddings on 3,031 flavivirus genomes. Quantifies 15-percentage-point performance inflation from phylogenetic leakage. Identifies UpA-containing motifs linked to the OAS/RNaseL innate immunity pathway via genome-localized attribution.

Cross-family dataset construction and genome architecture
In Preparation
Cross-family comparison of host-switching grammar in arthropod-borne viruses reveals convergent evolution of family-specific immune evasion strategies
Znabu BF, Sexton NR.
Target: Virus Evolution (Oxford)

Cross-family transfer learning across 4 arbovirus families (6,052 genomes, 124,883 windows). Demonstrates host-switching grammar is predominantly family-specific (Jaccard ≈ 0), supporting convergent evolution.

Flavivirus host range dataset curation workflow
In Preparation
Genome-scale classification of flavivirus host range using composition and codon-usage signatures
Znabu BF, Sexton NR.
Target: Journal of Virology

Species-stratified ML classifiers on 1,285 curated flavivirus genomes with 97 compositional features. Achieves PR-AUC = 1.000 for ISFV vs. dual-host classification. CpG suppression identified as the primary discriminative feature.

ViraPredict study design and dengue genome distribution
In Preparation
Probabilistic forecasting of dengue virus evolution using multi-modal foundation models and fitness landscape analysis
Znabu BF.
Target: TBD

ViraPredict: multi-modal framework combining ESM-2 protein language models (650M params), fitness landscape modeling, and LoRA-fine-tuned DNABERT-2 with temporal embeddings on 29,493 curated dengue genomes across 130 countries.

MoSeq experimental design and AR-HMM behavioral analysis pipeline
Published 2025
MoSeq based 3D behavioral profiling uncovers neuropathic behavior changes in diabetic mouse model
Ashiquzzaman A*, Lee E*, Znabu BF*, Sakib AN, Chung G, Kim SS, Kim YR, Kwon H-S, Chung E.
Scientific Reports 15, 15114 (2025)
Gene expression heatmap and signature correlations in LUAD
Interpretable deep learning-based multi-omics integration for prognosis in hepatocellular carcinoma
Znabu BF, Atif Z.
bioRxiv (2026)

Attention-based multi-branch deep learning framework integrating multi-omics data for interpretable survival prediction in hepatocellular carcinoma.

NativeReady pipeline from data sources to public release
NativeReady: an open benchmark and sequence-based triage model for native mass spectrometry suitability
Znabu BF, Atif Z.
bioRxiv (2026)

Experience

PhD Researcher Aug 2024 – Present
Sexton Lab, University of Nebraska–Lincoln · Nebraska Center for Virology
Building genomic foundation models (ArboFM) for arbovirus epidemic emergence prediction. Developing leakage-aware evaluation frameworks, cross-family transfer learning, and multi-modal forecasting systems for viral evolution.
Foundation Models DNABERT-2 ESM-2 Transfer Learning PyTorch
Research Assistant Mar 2021 – Jun 2024
Neurophotonics Lab, GIST · South Korea
Developed unsupervised autoregressive and Hidden Markov models for behavioral time-series analysis. Benchmarked deep learning-based 3D pose estimation against traditional 2D methods.
AR-HMM DeepLabCut Behavioral Analysis MATLAB

About

My research sits at the intersection of machine learning and genomics. I develop foundation models and rigorous evaluation frameworks for biological sequence analysis, with applications in viral emergence prediction, host-range classification, and model interpretability. My work increasingly spans generative modeling, where I apply discrete diffusion and classifier-free guidance techniques to design protein variants with targeted functional properties. I work primarily with PyTorch, Hugging Face Transformers, and the SciPy/PyData ecosystem on HPC GPU clusters.

Prior to my PhD, I earned an M.Sc. in Biomedical Science and Engineering from Gwangju Institute of Science and Technology (GIST), South Korea, where I applied unsupervised machine learning (autoregressive models, Hidden Markov models) to 3D behavioral phenotyping in diabetic neuropathy mouse models. I also hold a B.Sc. in Biomedical Engineering from Jimma University, Ethiopia, graduating ranked 4th of 180.

When I'm not at the computer, you'll find me exploring coffee shops ☕ or at the gym 🏋.

Awards & Grants

2020 Korean Government Full Scholarship for Master's Study, GIST
2020 Research Grant for Colostomy Device Development, Hawassa University
2018 Best B.Sc. Thesis Award, Ethiopian Science, Technology & Innovation
2018 Graduated with Distinction, Ranked 4th of 180, Jimma University