Custom ML, neural networks, and
real data science
We build production ML for problems where off-the-shelf AI is not enough. Gradient boosting, custom neural networks, computer vision pipelines, fine-tuned foundation models, and RAG systems - shipped with the inference layer and the monitoring around it. Auditable models, real engineering, no LLM hype.
Six things we do under the ML and data science umbrella.
Tabular ML and gradient boosting
XGBoost, LightGBM, CatBoost on raw transactional data. Churn prediction, risk scoring, reactivation probability, fraud detection, demand forecasting. Auditable scores with the features that drove them - not a black box. This is what we use when the problem is structured and the cost of a wrong answer is real money.
Custom neural networks from scratch
When the problem has temporal structure or representation needs that gradient boosting cannot capture, we build the architecture. PyTorch and TensorFlow. Time series, sequence modeling, signal generation. Backtested before deployment, monitored after.
Computer vision pipelines
Document understanding, OCR, layout analysis, and image classification. Claude Vision and Gemini for general document AI, custom OpenCV and small vision models when the task is constrained and latency matters. We choose the model that fits the constraint, not the model that sounds good.
Foundation model fine-tuning and adaptation
Fine-tuning where the platform supports it (OpenAI, open-weight Llama and Mistral). Prompt engineering, retrieval-augmented generation, and prompt caching where it does not (Claude). We do not pretend fine-tuning is the answer to every domain problem - often RAG with the right chunking and reranking outperforms a fine-tune at a fraction of the cost and risk.
Retrieval-augmented generation and context engineering
Embedding selection, chunking strategy, hybrid search, reranking, and grounded generation. We design the retrieval stack so the LLM has the right context, not all the context. This is usually the highest-leverage move on real client problems before reaching for fine-tuning.
Production ML serving and monitoring
FastAPI inference services, batch and streaming pipelines, model versioning, feature stores when justified, drift monitoring, and retraining loops. We ship the model and the surrounding system, not a Jupyter notebook the client cannot run.
Four principles that shape every ML project we ship.
Start with the data, not the model
Most ML projects fail at the data layer. We audit account-type flags, consent state, label leakage, time-window correctness, and class imbalance before any model is trained. A clean dataset with a simple model beats a dirty dataset with a sophisticated one.
Choose the simplest model that hits the bar
Gradient boosting on tabular data beats neural networks more often than people admit. We do not reach for transformers when XGBoost solves the problem. The right model is the one your team can audit, deploy, and retrain - not the one with the highest-cited paper.
Do not use LLMs where formula-based ML is the correct tool
LLMs hallucinate. If the answer drives money (a risk score, a credit decision, a churn probability), we use auditable ML. We use LLMs where hallucination is recoverable: drafting copy a human will review, summarizing for an analyst, generating explanations of an auditable score.
Ship the model and the system around it
Inference API, monitoring, retraining cadence, rollback path. A model in a notebook is not a deliverable. We do not call something production-ready unless it can run without us.
- You have a structured prediction problem (churn, risk, fraud, demand, propensity) and off-the-shelf scoring APIs do not understand your data.
- You need a custom architecture because the problem has temporal, multimodal, or domain-specific structure that pre-trained models do not capture.
- You have a large document corpus or a vision pipeline where Claude Vision or GPT-4o alone is too expensive, too slow, or not accurate enough at production scale.
- You want to fine-tune or adapt a foundation model for a domain (legal, medical, finance, gaming) and need someone to scope realistically what fine-tuning will and will not buy you.
- You have a working LLM prototype and need to turn it into a production system with retrieval, evaluation, monitoring, and cost control.
- Your problem is fully solved by an off-the-shelf API (OpenAI, Claude, an existing vendor model) and you do not need customization. We will tell you to use it and not bill you for what you do not need.
- You do not have data yet. ML needs labels and history. If you are pre-data, we will build the automation first and the model later.
- You want to fine-tune a foundation model when a well-designed prompt plus RAG would deliver the same outcome cheaper and faster. We will redirect you.
Two cases that show how we actually build models.
Per-player churn and reactivation scoring
Gradient boosting on raw transactions, not pre-aggregated averages. Two scores per player, returned with the features that drove them. Scoring core built in two weeks, pilot running on six-plus months of history.
Read case →Custom neural network for market signal generation
Custom architecture trained on price and volume data with technical indicator inputs. Entry and exit signals with confidence scores and reasoning, delivered to Telegram. Backtested before deployment.
Read case →What we use, when it actually fits.
Questions we actually get.
Have a problem an off-the-shelf model cannot solve?
Bring us the data and the target. We will scope honestly - including the cases where you do not need us.
Book a 30-min scoping call