226 lines
8.1 KiB
Markdown
226 lines
8.1 KiB
Markdown
---
|
|
name: Duplicate Code Detector
|
|
description: Identifies duplicate code patterns across the codebase and suggests refactoring opportunities
|
|
|
|
on:
|
|
workflow_dispatch:
|
|
schedule: daily
|
|
|
|
permissions:
|
|
contents: read
|
|
issues: read
|
|
pull-requests: read
|
|
|
|
safe-outputs:
|
|
create-issue:
|
|
expires: 2d
|
|
title-prefix: "[duplicate-code] "
|
|
labels: [code-quality, automated-analysis]
|
|
assignees: copilot
|
|
group: true
|
|
max: 3
|
|
|
|
timeout-minutes: 15
|
|
---
|
|
|
|
# Duplicate Code Detection
|
|
|
|
Analyze code to identify duplicated patterns using semantic analysis. Report significant findings that require refactoring.
|
|
|
|
## Task
|
|
|
|
Detect and report code duplication by:
|
|
|
|
1. **Analyzing Recent Commits**: Review changes in the latest commits
|
|
2. **Detecting Duplicated Code**: Identify similar or duplicated code patterns using semantic analysis
|
|
3. **Reporting Findings**: Create a detailed issue if significant duplication is detected (threshold: >10 lines or 3+ similar patterns)
|
|
|
|
## Context
|
|
|
|
- **Repository**: ${{ github.repository }}
|
|
- **Commit ID**: ${{ github.event.head_commit.id }}
|
|
- **Triggered by**: @${{ github.actor }}
|
|
|
|
## Analysis Workflow
|
|
|
|
### 1. Changed Files Analysis
|
|
|
|
Identify and analyze modified files:
|
|
- Determine files changed in the recent commits using `git log` and `git diff`
|
|
- Focus on source code files (programming language files)
|
|
- **Exclude test files** from analysis (files matching patterns: `*_test.*`, `*.test.*`, `*.spec.*`, `test_*.*`, or located in directories named `test`, `tests`, `__tests__`, or `spec`)
|
|
- **Exclude generated files** and build artifacts
|
|
- **Exclude workflow files** from analysis (files under `.github/workflows/*`)
|
|
- Use code exploration tools to understand file structure
|
|
- Read modified file contents to examine changes
|
|
|
|
### 2. Duplicate Detection
|
|
|
|
Apply analysis to find duplicates:
|
|
|
|
**Pattern Search**:
|
|
- Search for duplication indicators using grep and code search:
|
|
- Similar function signatures
|
|
- Repeated logic blocks
|
|
- Similar variable naming patterns
|
|
- Near-identical code blocks
|
|
- Look for functions with similar names across different files
|
|
- Identify structural similarities in code organization
|
|
|
|
**Semantic Analysis**:
|
|
- Compare code blocks for logical similarity beyond textual matching
|
|
- Identify different implementations of the same functionality
|
|
- Look for copy-paste patterns with minor variations
|
|
|
|
### 3. Duplication Evaluation
|
|
|
|
Assess findings to identify true code duplication:
|
|
|
|
**Duplication Types**:
|
|
- **Exact Duplication**: Identical code blocks in multiple locations
|
|
- **Structural Duplication**: Same logic with minor variations (different variable names, etc.)
|
|
- **Functional Duplication**: Different implementations of the same functionality
|
|
- **Copy-Paste Programming**: Similar code blocks that could be extracted into shared utilities
|
|
|
|
**Assessment Criteria**:
|
|
- **Severity**: Amount of duplicated code (lines of code, number of occurrences)
|
|
- **Impact**: Where duplication occurs (critical paths, frequently called code)
|
|
- **Maintainability**: How duplication affects code maintainability
|
|
- **Refactoring Opportunity**: Whether duplication can be easily refactored
|
|
|
|
### 4. Issue Reporting
|
|
|
|
Create separate issues for each distinct duplication pattern found (maximum 3 patterns per run). Each pattern should get its own issue to enable focused remediation.
|
|
|
|
**When to Create Issues**:
|
|
- Only create issues if significant duplication is found (threshold: >10 lines of duplicated code OR 3+ instances of similar patterns)
|
|
- **Create one issue per distinct duplication pattern** - do NOT bundle multiple patterns in a single issue
|
|
- Limit to the top 3 most significant patterns if more are found
|
|
- Use the `create_issue` tool from safe-outputs MCP **once for each pattern**
|
|
|
|
**Issue Contents for Each Pattern**:
|
|
- **Executive Summary**: Brief description of this specific duplication pattern
|
|
- **Duplication Details**: Specific locations and code blocks for this pattern only
|
|
- **Severity Assessment**: Impact and maintainability concerns for this pattern
|
|
- **Refactoring Recommendations**: Suggested approaches to eliminate this pattern
|
|
- **Code Examples**: Concrete examples with file paths and line numbers for this pattern
|
|
|
|
## Detection Scope
|
|
|
|
### Report These Issues
|
|
|
|
- Identical or nearly identical functions in different files
|
|
- Repeated code blocks that could be extracted to utilities
|
|
- Similar classes or modules with overlapping functionality
|
|
- Copy-pasted code with minor modifications
|
|
- Duplicated business logic across components
|
|
|
|
### Skip These Patterns
|
|
|
|
- Standard boilerplate code (imports, exports, package declarations)
|
|
- Test setup/teardown code (acceptable duplication in tests)
|
|
- **All test files** (files matching: `*_test.*`, `*.test.*`, `*.spec.*`, `test_*.*`, or in `test/`, `tests/`, `__tests__/`, `spec/` directories)
|
|
- **All workflow files** (files under `.github/workflows/*`)
|
|
- Configuration files with similar structure
|
|
- Language-specific patterns (constructors, getters/setters)
|
|
- Small code snippets (<5 lines) unless highly repetitive
|
|
- Generated code or vendored dependencies
|
|
|
|
### Analysis Depth
|
|
|
|
- **Primary Focus**: Files changed in recent commits (excluding test files and workflow files)
|
|
- **Secondary Analysis**: Check for duplication with existing codebase
|
|
- **Cross-Reference**: Look for patterns across the repository
|
|
- **Historical Context**: Consider if duplication is new or existing
|
|
|
|
## Issue Template
|
|
|
|
For each distinct duplication pattern found, create a separate issue using this structure:
|
|
|
|
````markdown
|
|
# 🔍 Duplicate Code Detected: [Pattern Name]
|
|
|
|
*Analysis of commit ${{ github.event.head_commit.id }}*
|
|
|
|
**Assignee**: @copilot
|
|
|
|
## Summary
|
|
|
|
[Brief overview of this specific duplication pattern]
|
|
|
|
## Duplication Details
|
|
|
|
### Pattern: [Description]
|
|
- **Severity**: High/Medium/Low
|
|
- **Occurrences**: [Number of instances]
|
|
- **Locations**:
|
|
- `path/to/file1.ext` (lines X-Y)
|
|
- `path/to/file2.ext` (lines A-B)
|
|
- **Code Sample**:
|
|
````[language]
|
|
[Example of duplicated code]
|
|
````
|
|
|
|
## Impact Analysis
|
|
|
|
- **Maintainability**: [How this affects code maintenance]
|
|
- **Bug Risk**: [Potential for inconsistent fixes]
|
|
- **Code Bloat**: [Impact on codebase size]
|
|
|
|
## Refactoring Recommendations
|
|
|
|
1. **[Recommendation 1]**
|
|
- Extract common functionality to: `suggested/path/utility.ext`
|
|
- Estimated effort: [hours/complexity]
|
|
- Benefits: [specific improvements]
|
|
|
|
2. **[Recommendation 2]**
|
|
[... additional recommendations ...]
|
|
|
|
## Implementation Checklist
|
|
|
|
- [ ] Review duplication findings
|
|
- [ ] Prioritize refactoring tasks
|
|
- [ ] Create refactoring plan
|
|
- [ ] Implement changes
|
|
- [ ] Update tests
|
|
- [ ] Verify no functionality broken
|
|
|
|
## Analysis Metadata
|
|
|
|
- **Analyzed Files**: [count]
|
|
- **Detection Method**: Semantic code analysis
|
|
- **Commit**: ${{ github.event.head_commit.id }}
|
|
- **Analysis Date**: [timestamp]
|
|
````
|
|
|
|
## Operational Guidelines
|
|
|
|
### Security
|
|
- Never execute untrusted code or commands
|
|
- Only use read-only analysis tools
|
|
- Do not modify files during analysis
|
|
|
|
### Efficiency
|
|
- Focus on recently changed files first
|
|
- Use semantic analysis for meaningful duplication, not superficial matches
|
|
- Stay within timeout limits (balance thoroughness with execution time)
|
|
|
|
### Accuracy
|
|
- Verify findings before reporting
|
|
- Distinguish between acceptable patterns and true duplication
|
|
- Consider language-specific idioms and best practices
|
|
- Provide specific, actionable recommendations
|
|
|
|
### Issue Creation
|
|
- Create **one issue per distinct duplication pattern** - do NOT bundle multiple patterns in a single issue
|
|
- Limit to the top 3 most significant patterns if more are found
|
|
- Only create issues if significant duplication is found
|
|
- Include sufficient detail for coding agents to understand and act on findings
|
|
- Provide concrete examples with file paths and line numbers
|
|
- Suggest practical refactoring approaches
|
|
- Assign issue to @copilot for automated remediation
|
|
- Use descriptive titles that clearly identify the specific pattern (e.g., "Duplicate Code: Error Handling Pattern in Parser Module")
|
|
|
|
**Objective**: Improve code quality by identifying and reporting meaningful code duplication that impacts maintainability. Focus on actionable findings that enable automated or manual refactoring.
|