Yoann Abriel
fr

All projects

2026

Fake News Detection

NLP-based fake news detection system with full pipeline: TF-IDF baseline, BERT/RoBERTa/DeBERTa fine-tuning, cross-dataset evaluation, and SHAP/LIME interpretability analysis. Trained on the LIAR dataset (12,800 political statements).

This NLP specialization project offers a complete fake news detection system. The pipeline covers exploratory analysis, preprocessing, baseline models (Naive Bayes, Logistic Regression, XGBoost), transformer model fine-tuning (BERT, RoBERTa, DeBERTa), evaluation on an external dataset (out-of-distribution generalization), and interpretability analysis with SHAP and LIME. The LIAR dataset contains 12,800 political statements labeled by PolitiFact fact-checkers. The project also includes an ethical bias analysis.

Challenges

  • Multi-class classification of political statements with contextual nuances
  • Model generalization on unseen external datasets
  • Model decision interpretability to ensure trust
  • Detection and analysis of prediction biases

Solutions

  • Progressive pipeline: TF-IDF baseline → BERT/RoBERTa/DeBERTa fine-tuning
  • Cross-dataset evaluation to measure out-of-distribution robustness
  • SHAP and LIME analysis for prediction explainability
  • Ethical bias audit integrated in the evaluation pipeline

Results

  • 5 notebooks covering the complete EDA → interpretability pipeline
  • Comparison of 6+ models (baseline + transformers)
  • SHAP/LIME interpretability analysis on predictions
  • Generalization evaluation on external dataset

Technologies

Python · PyTorch · BERT · Transformers · SHAP · LIME · Scikit-learn · XGBoost