Multi-Transformer DE-Fused MHSA-GRU for Toxicity Prediction

Algorithm Workflow

Our multi-stage deep learning pipeline designed for accurate and reliable toxicity prediction.

Multi-Transformer DE-Fused MHSA-GRU for Toxicity Prediction Workflow
Download Image

Methodology & Approach

Detailed breakdown of our multi-stage deep learning approach

1

Stage 1: Transformer-Based Feature Extraction

ProtBERT embedding extraction

ESM2-T33 embedding extraction

ProtT5-XL-U50 embedding extraction

BioT5 embedding extraction

2

Stage 2: Feature Fusion

Differential Evolution (DE) optimization

Search for optimal weighted fusion of all transformer families

Produce unified multi-model feature vector

3

Stage 3: SHAP Feature Selection

Compute SHAP values for fused features

Select top 500 most informative features

4

Stage 4: Classification

MHSA-GRU (Multi-Head Self Attention GRU)

Final toxicity prediction

Confidence estimation

Key Innovations

Novel contributions and technological advances in our approach

Fusion of four transformer-model families

Differential Evolution for optimal feature integration

SHAP-based feature selection

MHSA-GRU classifier for sequential modeling

Highly interpretable toxicity predictions

Technical Architecture

Deep learning pipeline components and model architecture

Feature Extraction

ProtBERT

Bidirectional transformer pretrained on protein sequences

ESM2-T33

33-layer evolutionary-scale model for deep protein understanding

ProtT5-XL-U50

Large T5-based transformer optimized for protein embeddings

BioT5

Biomedical T5 model adapted for representation

Feature Fusion

Differential Evolution

Optimizes weighted combination of transformer features

Feature Selection

SHAP

Select top 500 most important fused features

Classification

MHSA-GRU

Multi-Head Self Attention GRU classifier

Confidence Estimation

Final toxicity probability score

Model Performance Metrics

96.42%
Accuracy
Overall Correctness
96.90%
Sensitivity
True Positive Rate
98.65%
AUC Score
Area Under Curve
95.88%
Precision
Positive Predictive Value
90.35%
Specificity
True Negative Rate
95.64%
F1-Score
Harmonic Mean

Experience Our Algorithm in Action

Experience our multi-stage deep learning pipeline by testing your own sequences or exploring our sample dataset for toxicity prediction.