How One Bootcamp Built a Code Originality Pipeline

In early 2022, CareerDevs Academy — a 12-week full-stack JavaScript bootcamp with campuses in Austin and Denver — faced a growing problem. Their student body had expanded from 30 to over 200 learners per cohort, and the teaching team was drowning in submissions. More concerning: code that looked too polished, too uniform, or too far beyond a student's demonstrated ability was slipping through the cracks.

"We had a student submit a React project with custom hooks, Redux state management, and a fully mocked test suite on week four," said Mariana Torres, CareerDevs' Director of Curriculum. "That student couldn't explain useState the week before. Something was off."

The bootcamp's approach to code originality was ad-hoc: instructors would spot-check submissions, run occasional manual searches on suspicious snippets, and have one-on-one conversations when patterns felt wrong. But as cohorts grew, that model broke. This is the story of how one bootcamp built a systematic code originality pipeline — and what they learned along the way.

Why Bootcamps Need a Different Approach Than Universities

Bootcamps operate under fundamentally different constraints than traditional CS departments. Students pay significant tuition for a compressed, high-stakes learning experience. Many are career-changers with families and mortgages. The pedagogical model assumes rapid skill acquisition through intense repetition and project-based learning.

"We can't treat code originality the same way a university does," explained Torres. "A CS student caught plagiarizing might fail a course. If we fail a bootcamp student on week six, they've lost $12,000 and three months of their life. We have to catch issues early, and we have to make the process educational, not punitive."

CareerDevs also faced a structural challenge unique to bootcamps: their students frequently used online resources as part of the learning process. Stack Overflow snippets, GitHub examples, and tutorial code were explicitly encouraged during the first four weeks. The line between learning and copying was intentionally blurry — until it wasn't.

The Anatomy of the Originality Pipeline

Torres assembled a working group of three senior instructors and one curriculum developer to design what they called the Code Originality Verification Pipeline. The pipeline had three stages, each with a specific purpose and escalation path.

Stage 1: Automated Static Analysis at Submission

Every student submission — whether a daily exercise, weekly project, or capstone — was run through an automated static analysis script. This script didn't check for plagiarism directly. Instead, it flagged submissions with suspicious structural properties.

#!/usr/bin/env python3
# Simplified version of CareerDevs' submission analyzer
import ast
import os
from collections import Counter

def analyze_submission(filepath):
    with open(filepath, 'r') as f:
        source = f.read()
    
    tree = ast.parse(source)
    
    # Metrics for suspicious patterns
    metrics = {
        'function_count': 0,
        'class_count': 0,
        'import_count': 0,
        'avg_line_length': 0,
        'comment_ratio': 0,
        'complexity_score': 0,
        'variable_count': 0
    }
    
    # Extract metrics...
    
    # Calculate deviation from expected range
    expected_ranges = {
        'function_count': (1, 5),    # For week 3 assignments
        'import_count': (0, 3),
        'avg_line_length': (20, 60),
        'comment_ratio': (0.05, 0.20)
    }
    
    flags = []
    for metric, (low, high) in expected_ranges.items():
        if metrics[metric] < low or metrics[metric] > high:
            flags.append(f"{metric}: {metrics[metric]} (expected {low}-{high})")
    
    return flags

The static analyzer flagged submissions that deviated significantly from expected patterns for the current week's assignment. A week-2 JavaScript fundamentals project with 15 functions and 30 imports? Flagged. A week-6 React app with zero comments and a single monolithic function? Also flagged.

"We weren't looking for cheating at this stage," said Torres. "We were looking for submissions that didn't match the learning trajectory. A student who suddenly submits code with a completely different structural signature than their previous work deserves a conversation."

Stage 2: Similarity Detection Against the Corpus

Submissions that passed Stage 1 were then run through similarity detection against the full corpus of current and past student submissions, plus a curated index of common online resources. CareerDevs used a combination of token-based comparison and AST fingerprinting to catch both exact copies and structurally reorganized code.

"The token-based approach catches students who copy-paste and change variable names," explained James Harrington, the senior instructor who implemented the system. "The AST approach catches the more sophisticated attempts — students who restructure the control flow but preserve the essential logic."

The similarity threshold was deliberately set low: any submission with greater than 40% similarity to another source was flagged for instructor review. This generated more false positives than a higher threshold would, but the team prioritized catching edge cases over minimizing review time.

"We'd rather have 50 flagged submissions to review than miss one case of significant code copying. The review process itself was a teaching opportunity." — James Harrington, Senior Instructor

Stage 3: Instructor Review and Educational Intervention

Flagged submissions were assigned to the student's primary instructor, who conducted a structured review. The instructor would:

  • Review the similarity report alongside the student's previous submissions to assess trajectory
  • Schedule a 15-minute code review conversation with the student
  • Use a standardized rubric to determine whether the similarity represented learning, collaboration, or copying
  • Document the outcome and any required follow-up

The key innovation was the rubric. CareerDevs developed a four-category classification for suspicious submissions:

Category Description Response
Learning Artifact Student used tutorial code, adapted it significantly, and can explain the logic No action; encourage attribution practice
Unattributed Adaptation Student used external code with minimal changes and did not cite sources Educational conversation about attribution; resubmit with citations
Collaboration Overflow Two or more students submitted substantially similar work beyond the collaboration policy Policy reminder; individual code review to ensure understanding
Direct Copying Submission is substantially identical to another source with no evidence of learning Formal meeting with program director; learning plan development

What the Data Showed After One Semester

CareerDevs ran the pipeline for one full cohort cycle (12 weeks, 210 students). The results were instructive:

  • Stage 1 (static analysis) flagged 18% of all submissions — 372 of 2,100 total submissions
  • Stage 2 (similarity detection) flagged 7% — 147 submissions
  • Stage 3 (instructor review) categorized 112 as Learning Artifacts, 24 as Unattributed Adaptations, 8 as Collaboration Overflow, and 3 as Direct Copying

"The numbers told us something important," said Torres. "Most flagged submissions were legitimate learning behavior. Students were using resources the way we taught them to. But the 11% that fell into the serious categories — those were students we could help."

The three Direct Copying cases were particularly revealing. All three students were struggling with core concepts and had resorted to copying entire projects in a panic. Two of the three completed the program after developing individualized learning plans. One withdrew and was offered a deferred enrollment.

"The pipeline caught those students early," Torres noted. "Without it, they would have submitted a copied capstone, graduated without the skills, and had a miserable time on the job market. We would have failed them."

Integration With Existing Tools

CareerDevs built their pipeline using a combination of open-source tooling and commercial services. The static analysis layer used a customized version of ESLint with custom rules for submission patterns. The similarity detection used a combination of JPlag for token-based comparison and a custom AST fuzzy-matching script.

"We looked at several commercial options, including Codequiry for similarity detection," said Harrington. "The API integration model made it easy to slot into our existing submission pipeline. We could send a submission, get back similarity scores against our corpus, and feed those into our Stage 2 workflow without building the infrastructure ourselves."

The team also built a lightweight dashboard that gave instructors a unified view of a student's submission history, flags, and review outcomes. This made it possible to identify patterns — a student who was flagged in week 3 for unattributed adaptation and again in week 6 for collaboration overflow would trigger a proactive advising conversation.

Lessons for Other Programs

After a year of operating the pipeline, Torres and Harrington distilled their experience into several recommendations for other bootcamps and educational programs considering similar approaches.

Start with pedagogy, not punishment. "The goal of code originality verification is to help students learn, not to catch them cheating," Torres emphasized. "Every point of contact in the pipeline should feel like support, not surveillance."

Calibrate your thresholds. CareerDevs initially set the similarity threshold at 30%, which generated overwhelming numbers of false positives. They raised it to 40% after the first month. "We learned that bootcamp code is naturally similar because everyone's solving the same problems with the same tools," said Harrington. "A 30% similarity threshold flagged half the class for some assignments."

Build attribution into your curriculum. The bootcamp added a module in week two on code attribution practices. Students learned how to cite Stack Overflow answers, GitHub repositories, and tutorial code in comments and documentation. This dramatically reduced unintentional plagiarism.

Automate the boring parts, but keep humans in the loop. "The static analysis and similarity detection are fine to automate," Torres said. "But the categorization and intervention need human judgment. I've never seen an algorithm that can tell the difference between a student who copied out of laziness and one who copied out of desperation."

"The best plagiarism detection system is the one that turns a potential disciplinary incident into a teaching moment." — Mariana Torres, Director of Curriculum

Frequently Asked Questions

How do you handle false positives in code similarity detection?

CareerDevs's three-stage pipeline is designed to catch false positives at Stage 3, where instructors have the context and judgment to classify submissions correctly. The team found that most false positives fell into the Learning Artifact category — legitimate adaptation of externally sourced code.

What's the right similarity threshold for bootcamp code?

It depends on the assignment and language. CareerDevs settled on 40% as a starting point but allowed instructors to adjust per-assignment baselines. Algorithmic problem sets (sorting algorithms, data structure implementations) naturally have higher baseline similarity than open-ended projects.

Should bootcamps check for AI-generated code?

CareerDevs chose not to add AI detection to their initial pipeline. "We teach students to use AI tools responsibly as part of our curriculum," Torres explained. "The question isn't whether they used AI, but whether they learned from the process." The team plans to revisit this decision as AI tools become more integrated into development workflows.