Data Scientist / AI Engineer
Summary
I’m Mohamed Boulaich, a Data Science & Operations Research engineer with a degree from INSEA. I’m passionate about transforming complex data into smart, actionable insights and designing solutions that actually work in the real world.
I’ve worked on projects ranging from OCR-based document processing pipelines to RAG-powered chatbots, applying software engineering best practices like CI/CD, testing, and clean code. My interests lie at the intersection of machine learning, optimization, and automation — anything that makes systems more intelligent and efficient.
Professional Experiences
- Machine Learning Freelancer - OCR for tables
Through Independent Consultant, Remote
Jul. 2024 – Oct. 2024
- Developed textify-docs, a Python library for text extraction from diverse document formats, focusing on modularity and reusability.
- Built a pipeline using OpenCV for image preprocessing, followed by Pytesseract for efficient text extraction.
- Applied Table Transformer (TATR), based on DETR, to enhance table extraction accuracy.
- Utilized LLMs to identify and structure relevant information.
The main pipeline for extracting information from the “portail marocain des marchés publics”
The core algorithm behind the textify_docs library
- AI Research Intern (PFE) – Multi-Agent Reinforcement Learning
Ai Movement, UM6P, Rabat, Morocco
Feb. 2025 – Aug. 2025
- Designed and implemented a novel QMIX variant leveraging agent contribution masking and regularization, resulting in a compressed model with lower computational overhead for cooperative multi-agent tasks.
- Conducted a state-of-the-art review of MARL algorithms in shared reward settings and benchmarked against standard environments using PettingZoo and PyMarl.
The QMIX architecture. Each agent has a local network. A central mixing network, whose weights are generated by state-conditioned hypernetworks (red), combines the individual utilities into total Q value while ensuring monotonicity. Original Paper
Architectural comparison. QMIX-Masked introduces a masking layer that selectively prunes agent q-values before it reaches the central mixing network of the standard QMIX architecture.
- AI Engineering Intern - RAG
Maroc Telecom, Rabat, Morocco
Jun. 2024 – Aug. 2024
- Leveraged LlamaIndex framework with vector indexing for efficient information retrieval.
- Integrated ChromaDB for optimized storage and loading of embeddings.
- Optimized chatbot responses through prompt engineering, hyperparameter tuning, and tools abstraction.
- Created a user-friendly interface using Streamlit to showcase the chatbot’s functionality.
Streamlit app featuring an interactive chatbot
- Data Analyst Intern
Higher Planning Commission (HCP), Tangier, Morocco
Jun. 2023 – Jul. 2022
- Cleaned and preprocessed data, handling missing values and normalizing it, then used linear regression to forecast the 2024 population based on the cleaned demographic dataset.
- Utilized Folium library to create interactive maps for visualizing population distributions across different regions.
Projects
Artistic Neural Style Transfer – PyTorch, Jupyter notebooks, Streamlit (Oct. 2024 - Present)
GitHub Repo
- Utilized Neural Style Transfer with VGG19 CNN for feature extraction and gram matrix computations.
- Developed a Streamlit app allowing users to generate stylized images.
- Fine-tuned hyperparameters to optimize style transfer quality.
Diagram illustrating the Neural Style Transfer process. The content image (bottom left) and style image (top left) are passed through a convolutional neural network. Content and style representations are extracted at different layers. These representations are then used to guide the transformation of a generated image (typically a white noise image), resulting in style reconstructions (top) and content reconstructions (bottom) at various levels of the network. Original Paper

Accent Detection DL model – Pytorch, Pandas, Scikit-Learn, Librosa (Apr. 2024 - Present)
GitHub Repo
- Reviewed literature on accent recognition methods and deep learning architectures.
- Collected audio recordings from the Speech Accent Archive.
- Preprocessed the dataset using silence trimming and noise reduction.
- Implemented and fine-tuned an ANN to predict accents using MFCCs, achieving 0.84 accuracy, 0.79 F1 score, and 0.80 precision.
Sentiment Analysis On Movie Reviews – Spacy, PySpark, Pandas, Scikit-Learn (May. 2024 - Jun. 2024)
GitHub Repo
- Preprocessed text data using tokenization, lemmatization, and vectorization (BoW, Tf-IDF).
- Trained classifiers (Logistic Regression, SVM, Naive Bayes), achieving 86% accuracy with SVM.
- Used PySpark to process large-scale data, improving computational efficiency.
Financial Cointegration Analysis | Time Series Analysis, Python, Jupyter Notebooks
Dec. 2023 – Dec. 2023
GitHub Repo
- Applied the Engle-Granger two-step method and cointegration concept to distinguish between spurious correlations and genuine long-term relationships.
- Analyzed historical price data from Yahoo Finance using Python’s yfinance library.
Testing cointegration between two financial time series
Student Accommodation Clustering | Python, Scikit-Learn, Folium
Nov. 2022 – Dec. 2022
GitHub Repo
- Developed a student accommodation clustering system to suggest optimal housing options based on proximity to preferred locations.
- Conducted data collection and cleaning tasks, including API integration and handling missing values.
- Implemented machine learning techniques, particularly the K-Means clustering algorithm from scikit-learn, to group accommodation options based on their similarity.
- Performed exploratory data analysis (EDA) with visualizations and maps to gain insights into student preferences and housing patterns.
Optimal housing based on students preferred locations.
Education
-
Engineering, Data Science
The National Institute of Statistics and Applied Economics, Rabat, Morocco
2022 - 2025
-
SPE, MP*
Higher School Preparatory Classes (CPGE) Moulay Idriss, Fes, Morocco
2021 - 2022
Skills
- Languages: English (TOEFL 82), French (fluent), Arabic (native)
- Statistics: Statistical Inference, Descriptive Statistics, Statistical learning, Machine Learning Methodology, Traditional Modeling, Generalized Linear Models, Time Series Analysis, Hidden Markov Model, Stochastic Processes, Queuing Theory
- Operations Research: Linear and Integer Programming, Stochastic Optimization, Decomposition (Dantzig-Wolfe, Benders, Column Generation), Metaheuristics (SA, GA, TS), Graph Theory
- Libraries: Numpy, Pandas, Scikit-Learn, Pytorch, Transformers, SpaCy, Seaborn
- DevOps/MLOps Tools: Code Versioning, CI/CD (Github Actions), Automated testing (tox, pytest), Docker), Code Quality Tools, Agile/Scrum
- Data Tools: Apache Airflow, Apache Superset