Portrait of Arda Korkmaz

Arda Korkmaz

Software Engineer · Applied ML (RAG, NLP, CV) · Bioinformatics & EdTech

About

I’m a software engineer focused on building ML-enabled products that hold up in production—especially in high-stakes domains like clinical bioinformatics and education. My work spans retrieval-augmented generation (RAG) for structured extraction from long documents, scalable backend systems, and computer vision pipelines. I care about judgment under uncertainty: choosing modeling and evaluation approaches that are practical, supportable, and aligned with real user needs.

I’m currently a Software Engineer at Massive Bioinformatics, where I build platform capabilities for analysis pipelines and reporting, and I previously consulted at Codeswitch Software on adaptive learning systems. I completed the UC San Diego Extended Studies Machine Learning Engineering Bootcamp (2025) to strengthen end-to-end ML engineering fundamentals.

Projects & Work Highlights

Retrieval-Augmented Generation for Clinical Data Extraction

Built a RAG system to extract structured clinical information from long, noisy documents. Implemented custom preprocessing, chunking, retrieval logic, and benchmarked multiple retrieval setups by evaluating structure, relevance, and alignment of model outputs.

Massive Analyser Backend Re-Architecture (Django → FastAPI Microservices)

Rearchitected a Django monolith into FastAPI microservices, improving maintainability and enabling more reliable releases. Implemented coverage-based unit testing and strengthened production-readiness.

Pipeline-to-Desktop App Workflow (Flet + QEMU + Docker)

Built a reusable workflow to package bioinformatics pipelines as self-contained desktop applications, bundling platform-specific native libraries. Enabled secure, offline-friendly execution with real-time monitoring and cross-platform compatibility.

Reporting Toolkits for Clinical Pipelines (PDF + HTML)

Developed a Python PDF report generation toolkit with custom page geometry logic adopted across multiple pipelines. Also built HTML report rendering with Jinja2 templates and backend APIs to populate reports from application data.

Adaptive Learning with Hidden Markov Models (EdTech Consulting)

Implemented a per-user Hidden Markov Model that inferred latent knowledge states from sequential responses and drove adaptive sequencing across learning units, targeting proficiency gaps to improve learning outcomes.

Clinical Data Web Scraping for Internal Knowledge Base

Built custom scrapers for public clinical data using rotating proxies, dynamic user-agents, and rate-limit strategies to collect genetic and clinical data reliably under access restrictions.

Real-Time Food Detection System (Computer Vision)

Developed a production-ready real-time food detection system to automate restaurant billing by identifying items via computer vision.

Selected Skills

Languages: Python, SQL, Bash, C/C++, Java
Tools: Linux/Unix, Docker, Git, PostgreSQL, Snakemake, QEMU
ML: Deep Learning, NLP, RAG, Computer Vision, Data Mining
Libraries: PyTorch, TensorFlow, Transformers, OpenCV, scikit-learn, pandas, NumPy, FastAPI

Training & Mentorship

Contact

Email: contact@korkmazarda.com
Location: Istanbul, Turkey