Data Scientist / AI Engineer
- Email: boulaich.mohamed970@gmail.com
- Alternative Email: mboulaich@insea.ac.ma
- LinkedIn: Mohamed Boulaich
- GitHub: BlcMed
Summary
I am a Data Science student passionate about leveraging data-driven approaches to solve real-world problems. My academic background has equipped me with skills in statistical analysis, machine learning, and data visualization. I’m eager to apply my knowledge to contribute meaningfully to the field of data science.
Internship
- OCR-Based Data Extraction
Direction Générale des Impôts, Rabat, Morocco
Jun. 2024 – Aug. 2024
- Developed textify-docs, a Python library for text extraction from diverse document formats, focusing on modularity and reusability.
- Built a pipeline using OpenCV for image preprocessing, followed by Pytesseract for efficient text extraction.
- Applied Table Transformer (TATR), based on DETR, to enhance table extraction accuracy.
- Utilized LLMs to identify and structure relevant information.
The main pipeline for extracting information from the “portail marocain des marchés publics”
The core algorithm behind the textify_docs library
- Engineered RAG Chatbot
Maroc Telecom, Rabat, Morocco
Jun. 2024 – Aug. 2024
- Leveraged LlamaIndex framework with vector indexing for efficient information retrieval.
- Integrated ChromaDB for optimized storage and loading of embeddings.
- Optimized chatbot responses through prompt engineering, hyperparameter tuning, and tools abstraction.
- Created a user-friendly interface using Streamlit to showcase the chatbot’s functionality.
Streamlit app featuring an interactive chatbot
- Data Analyst Intern
Higher Planning Commission (HCP), Tangier, Morocco
Jun. 2023 – Jul. 2022
- Retrieved demographic data from HCP BDS (Base de Données Statistiques) for various regions and provinces spanning from 2015 to 2023
- Employed linear regression to forecast the population for the year 2024 based on the gathered demographic data
- Utilized Folium library to create interactive maps for visualizing population distributions across different regions
Projects
Accent Detection DL model – Pytorch, Pandas, Scikit-Learn, Librosa (Apr. 2024 - Present)
GitHub Repo
- Reviewed literature on accent recognition methods and deep learning architectures.
- Collected audio recordings from the Speech Accent Archive.
- Preprocessed the dataset using silence trimming and noise reduction.
- Implemented and fine-tuned an ANN to predict accents using MFCCs, achieving 84.15% accuracy, 0.79 F1 score, and 0.80 precision.
Artistic Neural Style Transfer – PyTorch, Jupyter notebooks, Streamlit (Oct. 2024 - Present)
GitHub Repo
- Utilized Neural Style Transfer with VGG19 CNN for feature extraction and gram matrix computations.
- Developed a Streamlit app allowing users to generate stylized images.
- Fine-tuned hyperparameters to optimize style transfer quality.
Diagram illustrating the Neural Style Transfer process. The content image (bottom left) and style image (top left) are passed through a convolutional neural network. Content and style representations are extracted at different layers. These representations are then used to guide the transformation of a generated image (typically a white noise image), resulting in style reconstructions (top) and content reconstructions (bottom) at various levels of the network. (Gatys et al., 2015)
Sentiment Analysis On Movie Reviews – Spacy, PySpark, Pandas, Scikit-Learn (May. 2024 - Jun. 2024)
GitHub Repo
- Preprocessed text data using tokenization, lemmatization, and vectorization (BoW, Tf-IDF).
- Trained classifiers (Logistic Regression, SVM, Naive Bayes), achieving 86% accuracy with SVM.
- Used PySpark to process large-scale data, improving computational efficiency.
Financial Cointegration Analysis | Time Series Analysis, Python, Jupyter Notebooks
Dec. 2023 – Dec. 2023
GitHub Repo
- Applied the Engle-Granger two-step method and cointegration concept to distinguish between spurious correlations and genuine long-term relationships.
- Analyzed historical price data from Yahoo Finance using Python’s yfinance library.
Testing cointegration between two financial time series
Student Accommodation Clustering | Python, Scikit-Learn, Folium
Nov. 2022 – Dec. 2022
GitHub Repo
- Developed a student accommodation clustering system to suggest optimal housing options based on proximity to preferred locations.
- Conducted data collection and cleaning tasks, including API integration and handling missing values.
- Implemented machine learning techniques, particularly the K-Means clustering algorithm from scikit-learn, to group accommodation options based on their similarity.
- Performed exploratory data analysis (EDA) with visualizations and maps to gain insights into student preferences and housing patterns.
Optimal housing based on students preferred locations.
Education
-
Engineering, Data Science
The National Institute of Statistics and Applied Economics, Rabat, Morocco
2022 - 2024
-
SPE, MP*
Higher School Preparatory Classes (CPGE) Moulay Idriss, Fes, Morocco
2021 - 2022
Skills