Fairness at scale: Maki’s hybrid approach to proctoring

At Maki People, our mission has always been to make hiring both fair and efficient through reliable, science-driven assessments. But as more tests move online, a fundamental question arises: how can we make sure every candidate takes their test under the same conditions?

Cheating undermines trust. It damages the integrity of results and can hurt a company’s reputation. Yet being overly strict can be just as harmful—penalizing honest candidates and creating frustration. The real challenge is finding the balance between accuracy and fairness.

That’s where Maki’s hybrid proctoring technology comes in. It’s a system that combines the precision of computer vision with the reasoning power of large language models (LLMs) to spot genuine cheating while avoiding false positives. The result is a proctoring method that is accurate, explainable, and scalable, giving companies the confidence that their hiring decisions are based on reliable, unbiased results.

The hidden complexity of online cheating

Cheating isn’t always obvious. It can range from someone helping off-camera to a candidate using earbuds, a second screen, or even a phone placed just outside the frame. Sometimes, the signs are subtle - a reflection mistaken for another person, or a photo on the wall misread as a face. The challenge isn’t just detecting these anomalies; it’s interpreting them intelligently.

A fair system needs to understand context. That’s why our approach combines two types of AI models: one that sees and one that reasons.

The eyes: CNNs for visual detection

Convolutional Neural Networks (CNNs) are the workhorses of image analysis. They’re designed to detect patterns - edges, shapes, and objects - and are used in everything from self-driving cars to facial recognition. For our proctoring system, we use YOLO (You Only Look Once), a state-of-the-art CNN architecture that identifies relevant elements in real time with high precision.

At Maki, we used two specialized CNNs: one to detect faces, and another to detect electronic devices such as phones, laptops, and monitors. The reason is simple - detecting a human face and detecting a screen involve very different visual cues. By training separate models for each, we gain accuracy and significantly reduce the risk of false positives.

The brain: LLMs for contextual reasoning

While CNNs are excellent at identifying what’s in an image, they can’t tell you what’s really happening. They might see “two faces” or a “phone,” but they can’t determine intent. Is that second face a real person or a reflection? Is the phone being used or just sitting idle?

To bridge that gap, we introduced Large Language Models as a reasoning layer on top of the visual detectors. Our LLMs don’t just process object data—they analyze the entire image holistically, reasoning like a human reviewer would.

When a CNN flags something suspicious, the LLM takes over. It examines the image, applies clear rules, and explains its reasoning step by step. This process ensures that each alert comes with an interpretable, traceable justification—something traditional proctoring systems often lack.

We use two LLMs in parallel: one focused on determining if multiple engaged people are in the frame, and another on assessing whether a detected device is actively being used. If no anomalies are detected, the system skips the LLM stage entirely, conserving resources and keeping the process smooth for candidates.

How the system makes decisions

The hybrid method operates in three stages:

Detection: The CNN scans each image to identify faces and objects that could indicate cheating.
Reasoning: If something unusual appears, the LLM evaluates it, applying structured logic to decide whether it’s a real concern.
Decision: The system outputs an alert category and a concise justification.

This layered approach allows Maki’s system to move from raw visual signals to reliable, explainable decisions - a major leap forward from traditional proctoring models.

A closer look: when reasoning makes the difference

Consider a case where the camera detects two faces. In most systems, that would immediately trigger an alert. But our hybrid method looks deeper. The LLM checks if the second face is real, using cues like shadows, depth, and texture. It then determines whether that person is actually engaged with the candidate or just passing by. Only if both conditions are met does the system raise an alert.

This reasoning dramatically reduces false positives. It ensures that candidates aren’t unfairly flagged for harmless situations, while real cheating is still caught with high confidence.

More than technology: building trust in assessment

Proctoring isn’t just a technical problem - it’s a matter of trust. Candidates deserve to know that they’re being evaluated fairly, and companies need confidence that their assessments reflect true ability.

By combining the precision of CNNs with the contextual intelligence of LLMs, Maki delivers a proctoring solution that safeguards both. It upholds test integrity without compromising the candidate experience, demonstrating that fairness and effectiveness can coexist.

In short, our hybrid approach turns monitoring into intelligent oversight - a system that sees clearly, reasons fairly, and earns trust at scale.

Fairness at scale: Maki’s hybrid approach to proctoring

The hidden complexity of online cheating

The eyes: CNNs for visual detection

The brain: LLMs for contextual reasoning

How the system makes decisions

A closer look: when reasoning makes the difference

More than technology: building trust in assessment

Related resources

When AI Speaks | Candidate Reactions to Voice Technology

Embracing the Agentic Era | How AI Is Evolving Human Behaviour and Why That’s a Good Thing

The Death of the CV and What Comes Next