Battery Pouch Cell Defect Detection System Using YOLOv11 Lightweight Model ‘Pouch Cell Defect Detection using Lightweight YOLOv11’

Article information

J. Electrochem. Sci. Technol. 2025;.jecst.2025.00717
Publication date (electronic) : 2025 October 13
doi : https://doi.org/10.33961/jecst.2025.00717
School of Electronic and Electrical Engineering, Kyungpook National University
*CORRESPONDENCE T: +82-10-3520-0379 E: dhc@ee.knu.ac.kr
†These authors contributed equally to this work.
Received 2025 August 22; Accepted 2025 September 18.

Abstract

This study introduces a real-time vision-based defect detection system for lithium-ion battery pouch cells, leveraging a lightweight YOLOv11n architecture. A dataset comprising 15,312 high-resolution images (1024 × 1024 pixels) was constructed from a commercial production line and categorized into five surface defect classes: pass, leakage, pinhole, swelling, and scratch. To mitigate the dataset’s inherent class imbalance—with leakage representing only 8.8% of samples—comprehensive data augmentation strategies were employed, including mosaic augmentation (0.8 probability), MixUp (0.2), and random erasing (0.35) to enhance class distribution equilibrium and strengthen model generalization capabilities. The proposed Lightweight YOLOv11n model was developed through systematic optimizations: global channel-width reduction (0.75 ratio), C2K depthwise-separable residual blocks, SPPF-Lite spatial pyramid pooling, C2PSA attention modules, unified 64-channel detection heads, removal of stride-32 detection branch, magnitude-based weight pruning (threshold: 1×10⁻2), and mixed-precision optimization (FP32 to FP16). Comparative experiments with YOLOv11n and YOLOv12n were conducted on an NVIDIA A100 GPU. Lightweight YOLOv11n attained an F1-score of 0.9791, with precision and recall of 0.9781 and 0.9802 respectively, achieving mAP@0.5 of 0.9883 and mAP@50-95 of 0.8456. The model maintained real-time inference capability with a process duration of 9.43 ms (106.1 theoretical FPS), representing a 48.4% reduction compared to YOLOv12n (18.26 ms) while sacrificing only 0.47 percentage points in F1-score. These results demonstrate that Lightweight YOLOv11n provides an optimal trade-off between detection performance and computational efficiency, validating its applicability for inline quality inspection on high-speed manufacturing lines.

INTRODUCTION

The widespread implementation of lithium-ion pouch cells across electric vehicle (EV) and energy storage system (ESS) platforms is attributed to their elevated energy density, design adaptability, and optimal thermal characteristics [1,2,3]. Driven by the rapid expansion of the global EV market and energy infrastructure, the demand for high-performance, highly reliable batteries has increased exponentially, making battery quality and reliability critical factors for industry competitiveness.

This study focuses on vision-based defect detection for lithium-ion pouch cells, which offer superior gravimetric energy density and design flexibility through their lightweight aluminum-laminated film packaging compared to cylindrical or prismatic configurations [4]. However, this structural advantage introduces significant vulnerabilities to mechanical damage, including leakage, pinhole formation, scratches, and swelling, particularly during manufacturing processes, material handling, and transportation phases [5]. The flexible packaging architecture inherently increases morphological complexity and surface variation, creating substantial challenges for automated defect pattern recognition and classification algorithms. These structural weaknesses are exacerbated by microscopic manufacturing defects originating from separator fabrication or electrode coating processes, which can severely compromise battery performance, trigger thermal events, or result in catastrophic failure modes, necessitating robust early-stage detection and classification systems for ensuring product reliability and operational safety [6,7]. For real-time, inline vision inspection systems deployed on high-speed production lines, this investigation specifically targets leakage, pinhole, swelling, and scratch defects as the primary anomaly categories based on their distinctive visual characteristics, electrochemical significance, and direct impact on manufacturing quality control. Leakage represents the most critical failure mode, as electrolyte loss through compromised pouch sealing not only accelerates capacity degradation through active material isolation but also creates fire and explosion hazards through flammable vapor accumulation, with studies indicating that even micro-scale leaks can result in 15–25% capacity loss within 500 cycles [8]. Pinhole defects, typically ranging from 10–50 μm in diameter, compromise separator integrity and create localized current concentration zones that can initiate lithium plating, dendrite formation, and subsequent thermal runaway events with onset temperatures reduced by 20–30°C compared to intact cells [9]. Swelling manifests as visible surface deformation caused by gas evolution from electrolyte decomposition, side reactions, and thermal abuse, with volumetric expansion exceeding 10% indicating severe internal degradation and posing mechanical stress risks to battery pack housings. Scratches expose underlying electrode materials to ambient moisture and oxygen, creating galvanic corrosion sites that induce localized heating zones with temperatures elevated by 5–15°C above normal operating conditions, ultimately leading to accelerated aging and reduced cycle life. While other defect types such as delamination and contamination also impact cell performance, delamination phenomena require subsurface analysis beyond conventional optical inspection capabilities, and contamination typically lacks distinct visual signatures detectable through surface imaging alone [10,11]. Therefore, the scope of this work is strategically confined to these four visually detectable defect categories that represent both the most tractable targets for real-time vision-based inspection and the most electrochemically consequential failure modes directly correlating with manufacturing yield, product lifetime, and end-user safety in commercial battery applications.

Traditional inspection methods—such as rule-based image processing, binarization, and classical filtering—have been widely used for surface defect detection [12]. However, due to the high variability and unpredictability of pouch cell surface defects, these methods require the manual definition of thousands of rules, making maintenance, scalability, and adaptation to new defect types highly impractical [9]. In addition, deep learning techniques based on classification or segmentation are often limited in this domain due to the ambiguous definition and labeling of defect classes, and they struggle to meet the real-time processing demands of modern manufacturing lines [13]. Notably, in 24/7 automated production environments, the speed, false positive rate, and field robustness of vision systems directly impact defect rates and quality management costs. Moreover, visual inspection by skilled human operators is becoming increasingly unsustainable due to fatigue, subjectivity, the expansion of overseas manufacturing sites, and rising labor costs [14].

To address these limitations, there has been a paradigm shift towards deep learning-based object detection for rapid and automated quality inspection in battery manufacturing. In particular, the YOLO (You Only Look Once) family of real-time object detectors has seen significant adoption across industries such as automotive parts, display panels, and semiconductors, thanks to its unified architecture and high inference speed, which enable efficient detection of multiple object types in a single forward pass [15,16]. Recent advances in the YOLO family (YOLOv5, v7, v8, and especially YOLOv11) have dramatically improved detection accuracy and computational efficiency. Lightweight versions employing pruning, quantization, and knowledge distillation have demonstrated real-time inspection capability even on edge devices with limited resources [17].

In this study, we propose a YOLOv11n-based automated vision inspection system for real-time surface defect detection in lithium-ion pouch cells. YOLOv11n introduces architectural innovations such as the C3k2 block for improved accuracy-to-computation trade-off and the C2PSA (Cross-Stage Partial Spatial Attention) module for enhanced detection of small or overlapping defects. The inspection system utilizes a lightweight YOLOv11n model, with additional channel pruning and parameter reduction to maximize computational efficiency and enable deployment on edge devices with limited memory, power, and processing capabilities. The model is trained on a custom dataset of 15,312 high-resolution (1024 × 1024 pixels) images from a commercial battery line, categorized into five classes (pass, leakage, pinhole, scratches, swelling). The dataset was partitioned without overlap into three subsets: training (70%, 10,718 images), validation (15%, 2,297 images), and testing (15%, 2,297 images). To address class imbalance issues and enhance the model's ability to generalize to diverse defect conditions, Mosaic augmentation was applied during the training phase.

Comprehensive experiments conducted on an NVIDIA A100 GPU demonstrate that the proposed model achieves an F1-score of 0.9791 with only 9.43 ms inference latency, validating its effectiveness for highthroughput manufacturing environments. These results establish the Lightweight YOLOv11n as a practical solution that successfully bridges the gap between academic research and industrial requirements, providing manufacturers with a deployment-ready system capable of maintaining quality standards while meeting the demanding speed constraints of modern battery production lines.

BACKGROUND THEORY

Overview of Pouch Cells

Lithium-ion pouch cells replace the rigid metal can with a laminated Al-polymer film, providing three benefits: reduced mass and volume for higher energy density, a flexible form factor to exploit irregular pack spaces, and high in-plane thermal conductivity for efficient heat removal during high-rate operation.

The same thin-film enclosure, however, lowers the cell's mechanical resilience and exposes the electrodes to external mechanical or chemical stressors. Leakage, pinhole, scratches, and swelling can arise at any stage from winding to tab-welding. Although such anomalies are often sub-millimeter in scale, they can compromise pouch integrity, accelerate electrolyte evaporation, and ultimately trigger internal short circuits or thermal runaway.

Four key structural attributes of pouch cells exert predominant influence on the efficacy of machine-vision inspection systems. First, the lightweight and flexible envelope presents unique challenges, as the film thickness typically ranges between 100 and 200 μm, meaning that even minor indentation or cut marks may breach the barrier layer. Second, the high active-material loading characteristic of pouch cells creates additional complexity, since a larger proportion of the cell volume is electrochemically active, thereby magnifying the impact of local defects on overall capacity and safety. Third, the tool-free form-factor variability, while advantageous for manufacturing flexibility, introduces inspection difficulties because manufacturing lines accommodate multiple cell footprints without retooling, and this versatility also introduces position-and shape-dependent defect patterns. Finally, the enhanced heat dissipation capability of pouch cells, although beneficial for performance, creates optical challenges as it causes the aluminum layer to reflect specular illumination, resulting in strong highlights and shadows that complicate traditional rule-based vision algorithms [18]. These characteristics necessitate inspection systems that combine micrometer-level sensitivity with millisecond-level throughput. Conventional binarization- or edge-filter pipelines lack the adaptability to cope with the morphological diversity and unpredictable spatial distribution of pouch-cell defects, particularly under the mixed lighting conditions found on high-speed production lines. The proposed Lightweight YOLOv11n detector addresses this gap by learning defect-class features directly from high-resolution imagery while maintaining sub-millisecond inference latency, as detailed in Sections 3 and 4.

Pouch Cell Manufacturing Process

Pouch cell quality and safety hinge on tight process control across four integrated stages.

The first stage, electrode manufacturing, involves coating active-material slurries (e.g., NCM or LFP on Al foil for the cathode; graphite or Si–C on Cu foil for the anode), which are subsequently dried and calendered, with typical defects including coating non-uniformity, pinholes, and surface scratches.

During the second stage of cell assembly, calendered electrodes and polymer separators are stacked or wound, then enclosed in an Al-laminated pouch, where mis-registration, trapped foreign particles, or wrinkled separators can introduce latent short-circuit risks.

The third stage encompasses electrolyte filling and sealing, where electrolyte injection and vacuum sealing complete the mechanical enclosure, and improper wetting, micro-leaks, or heat-seal pinholes formed at this stage directly degrade performance and safety.

Finally, the formation, grading, and final inspection stage involves cells undergoing formation cycling to establish a stable solid electrolyte interphase (SEI), followed by capacity and impedance grading, with vision-based inspection then flagging residual surface anomalies before qualified cells proceed to module/pack integration.

Each stage presents distinct defect mechanisms, underscoring the need for high-resolution, real-time vision inspection such as the proposed Lightweight YOLOv11n system to intercept faults before they propagate downstream.

Conventional Inspection Methods and Their Limitations

Conventional inspection approaches for pouch cells can be grouped into seven distinct categories, each with inherent advantages and limitations.

Visual inspection relies on human operators and remains inexpensive. It is susceptible to fatigue-induced inconsistency and performs poorly at spotting minute or irregular flaws on high-throughput lines.

Mechanical inspection uses pressure or drop tests to gauge structural robustness; however, this approach is destructive, sample-based, and ineffective for superficial defects.

Electrical inspection evaluates open-circuit voltage, internal resistance, capacity, and leakage current to infer internal faults, yet this method slows production and cannot localize surface anomalies.

Ultrasonic inspection detects internal delamination and voids via acoustic reflections and is non-destructive, but remains capital-intensive and offers limited sensitivity to exterior defects.

X-ray inspection provides radiographic images of electrode alignment and stacking uniformity and is highly informative, yet it is slow, costly, and resolution-constrained for sub-millimeter flaws.

Infrared thermography represents a rapid technique that reveals hot spots or latent shorts by monitoring thermal fields, but is prone to false positives and largely insensitive to non-thermal surface damage.

Finally, machine-vision inspection employs deep learning detectors (e.g., CNN- or YOLO-based) to deliver real-time, round-the-clock identification of diverse surface defects, eliminating operator bias, though initial deployment demands high-quality annotated datasets, model optimisation expertise, and dedicated compute infrastructure.

These constraints motivate the development of a lightweight, high-accuracy vision system—detailed in Section 3—that overcomes throughput, sensitivity, and cost barriers inherent to traditional methods.

EXPERIMENTAL METHODS

Deep Learning-Based Vision Inspection for Pouch Cells

Recent advances in convolutional neural networks (CNNs), particularly single-stage detectors within the YOLO framework, have revolutionized quality inspection practices across manufacturing, healthcare, transportation, and energy sectors. These technological innovations have established vision-based deep learning as the de facto standard for real-time defect detection applications, delivering substantial improvements in both accuracy and operational speed. Within lithium-ion battery production processes, object detection architectures demonstrate particular efficacy for identifying surface anomalies on pouch cells.

The transition from traditional rule-based inspection methods to deep learning approaches addresses fundamental limitations inherent in conventional systems. Traditional methods require manual definition of extensive rule sets to accommodate defect variability, creating scalability and maintenance challenges that become prohibitive in high-volume production environments. Deep learning models, conversely, learn defect characteristics directly from training data, enabling adaptive recognition of complex patterns and morphological variations that exceed the capabilities of hand-crafted algorithms [19].

Technical Advantages of Deep Learning-Based Defect Detection

Deep learning detectors provide four decisive advantages over manual or heuristic inspection pipelines that directly impact manufacturing efficiency and quality control outcomes.

GPU-accelerated parallel processing architectures enable millisecond-level inference capabilities, supporting continuous monitoring requirements on high-throughput production lines where conventional inspection methods introduce unacceptable bottlenecks.

Hierarchical CNN feature extraction mechanisms capture subtle textural and morphological cues that human operators or traditional algorithms frequently miss, delivering superior sensitivity to minute defects while eliminating subjective interpretation variability inherent in human-based inspection. Model inference operates deterministically across all inspection cycles, guaranteeing consistent pass/fail criteria and thereby enhancing statistical process control reliability and reducing quality management uncertainties.

Automated processing minimizes labor dependencies and eliminates human error propagation pathways, contributing to improved overall equipment effectiveness (OEE) and enabling 24/7 operation without performance degradation due to operator fatigue or shift changes.

Application Methods of Major Deep Learning Models

Deep learning inspection workflows typically encompass three primary categories, differentiated by output granularity requirements and computational constraints.

Object detection approaches simultaneously locate and classify defects within single inference passes. The YOLO family, progressing through YOLOv5, YOLOv8, YOLOv11, and YOLOv12 iterations, offers exceptional speed-to-accuracy ratios that make these architectures particularly suitable for high-throughput pouch-cell inspection. Alternative frameworks such as Faster R-CNN [2022] provide enhanced localization precision at increased computational cost, while Single Shot Detector (SSD) architectures balance these trade-offs with moderate resource requirements. In parallel, Transformer-based detectors have matured rapidly: the original DETR [23] introduced an end-to-end set-prediction paradigm that removes handcrafted post-processing, with Deformable DETR and DN-DETR addressing early limitations in convergence and feature resolution; most recently, RT-DETR [24] demonstrated that carefully engineered DETR variants can rival—and in some regimes surpass—contemporary YOLO models in real-time settings by eliminating the latency and accuracy penalties of NMS.

Application-oriented studies further show DETR-style models’ competitiveness on manufacturing defects: improved RT-DETR pipelines for printed-circuit boards (PCB) report real-time throughput with high mAP, and domain-adaptable “De-DETR” variants have been proposed to enhance robustness under distribution shift. These advances underscore that Transformer detectors are increasingly practical where accelerator support (e.g., TensorRT) is available. Nevertheless, under stringent edge constraints—limited compute/memory, deterministic low latency, and multi-line inspection budgets—the YOLO family retains clear advantages: compact parameterization, mature deployment stacks, and consistent frame-level latency from a dense one-stage head. Our proposed lightweight YOLO variant leverages these strengths to meet strict realtime (>25 FPS) production requirements while narrowing the accuracy gap to recent DETR/RT-DETR designs, offering a pragmatic balance of efficiency and accuracy for pouch-cell surface defect inspection.

Image classification methodologies determine defective status for entire images or segmented regions, delivering rapid binary or multiclass decisions. Established architectures including ResNet, VGGNet, and EfficientNet provide efficient screening capabilities suitable for preliminary quality assessment stages where detailed localization is not required [25].

Semantic and instance segmentation techniques generate pixel-accurate defect boundaries, proving beneficial for irregular defect morphologies such as fine scratches or complex surface damage patterns. U-Net and fully convolutional network (FCN) architectures achieve state-of-the-art performance on industrial imagery, facilitating quantitative damage assessment and enabling downstream analytics for process optimization [26].

Dataset Construction

The dataset comprises 15,312 high-resolution (1024×1024) images from commercial production lines, categorized as follows :

- Pass: 3,494 images (22.8%)

- Leakage: 1,355 images (8.8%)

- Pinhole: 3,566 images (23.3%)

- Swelling: 3,520 images (23.0%)

- Scratch: 3,377 images (22.1%)

To address the class imbalance (particularly the underrepresented leakage class) and enhance model robustness, the following augmentation techniques were applied :

- Geometric: rotation (±18°), translation (0.2), scaling (0.4), shearing (8°), perspective (0.0015)

- Photometric: HSV variations (h=0.03, s=0.7, v=0.5)

- Advanced: Mosaic (0.8), MixUp (0.2), Random Erasing (0.35)

All images were resized to 640×640 pixels for computational efficiency.

Fig. 1.

Defect labels training batch image

Design of YOLO-based Defect Detection Model

To satisfy real-time inspection requirements on edge hardware while maintaining detection accuracy for pouch cell surface anomalies, the YOLOv11n architecture was selected as the baseline model due to its optimized parameter count and computational efficiency characteristics [27]. YOLOv12n served as a performance benchmark for comparative evaluation throughout the development process. The selection of YOLOv11n was motivated by its demonstrated superiority in balancing inference speed with detection precision, particularly for small-scale defect identification tasks requiring sub-millisecond processing capabilities.

Building upon the YOLOv11n foundation, a comprehensively optimized Lightweight YOLOv11n model was developed through systematic architectural modifications targeting computational efficiency without compromising feature extraction capabilities. The lightweight design philosophy centered on reducing parameter redundancy while preserving the multi-scale feature representation essential for detecting diverse defect morphologies across varying surface conditions.

The first optimization involved implementing global channel-width reduction throughout both backbone and neck components, where feature map channels were systematically reduced by applying a 0.75 reduction ratio to minimize computational overhead. This approach was based on empirical evidence that many intermediate feature channels exhibit low activation patterns and contribute minimally to final detection performance, particularly in domain-specific applications such as pouch cell inspection where feature complexity requirements are more constrained than general object detection tasks.

Standard C2f modules were replaced with novel C2K depthwise-separable residual blocks that employ a dual-branch architecture where one computational path operates with reduced channel dimensions while maintaining representational capacity through efficient feature routing [28]. The C2K design incorporates depthwise convolutions followed by pointwise operations, reducing computational complexity from O(k²×Cin×Cout) to O(k²×Cin + Cin×Cout), where k represents kernel size, thereby achieving significant FLOP reduction while preserving gradient flow characteristics essential for training stability.

The neck architecture was enhanced through SPPF-Lite implementation, which streamlines spatial pyramid pooling by first reducing input channels via 1×1 convolutions before applying sequential k=5 max-pooling operations. This modification captures multi-scale contextual information with minimal computational overhead compared to traditional SPPF modules, addressing the challenge that pouch cell defects manifest across varying spatial scales from micro-pinholes to macro-level swelling patterns.

Position-Sensitive Attention (PSA) modules were integrated within Cross-Stage Partial (CSP) wrappers [29] to form C2PSA blocks, compensating for information loss introduced by channel reduction and pruning operations. The PSA mechanism reduces Query-Key-Value (QKV) dimensions while fusing attention maps with depthwise-convolution features, enabling the model to recover global contextual cues critical for distinguishing between genuine defects and benign surface variations commonly encountered in manufacturing environments.

Detection head architecture was unified across all output scales to employ fixed 64-channel configurations, eliminating parameter inconsistencies and reducing memory requirements during inference. The large-scale detection branch operating at stride-32 was removed based on analysis indicating that pouch-cell defects primarily occur at medium to small scales

—a distribution consistent with prior reports in industrial defect datasets— making the computational overhead of large-scale detection branches unnecessary for this application domain [30]. Practically, removing the stride-32 path prunes its head convolutions and output projections and also eliminates decoding/NMS on that branch, reducing kernel launches and memory traffic; for a 640×640 input, it further cuts about 5% of candidate boxes (20×20 vs. 80×80 and 40×40 maps: 400/(6400+1600+400)≈4.8%), which proportionally lowers post-processing latency while allowing compute to be focused on the stride-8/16 heads where most defects reside.

Magnitude-based weight pruning was applied with a threshold of 1×10⁻² to eliminate parameters contributing minimally to model performance, followed by mixed-precision optimization converting weights from FP32 to FP16 precision while maintaining batch normalization parameters in FP32 format to preserve training stability [31]. These modifications collectively achieved substantial model compression while retaining the multi-scale feature richness required for reliable surface defect detection, thereby enhancing deployability on resource-constrained edge computing platforms without sacrificing detection accuracy or real-time processing capabilities.

Fig. 2.

Comparison of YOLOv11n and Lightweight YOLOv 11n

Training & Implementation

The proposed YOLO-based models were trained using an NVIDIA A100 GPU (40GB) in the Google Colab environment. The implementation environment and training configuration were systematically optimized to maximize both computational efficiency and detection performance for pouch cell surface defect identification.

The hardware foundation consisted of an NVIDIA A100-SXM4-40GB GPU providing substantial computational resources for deep learning workloads, supported by PyTorch 2.0.1 framework with CUDA 11.8 acceleration and Python 3.10 runtime environment. The Ultralytics YOLOv11 framework (version 11.0.10) was selected for its mature implementation and comprehensive training pipeline optimized for object detection tasks.

Training configuration parameters were carefully selected to balance convergence stability with computational efficiency requirements. The model underwent 100 training epochs with a batch size of 16, enabling effective gradient accumulation while maintaining memory efficiency on the A100 hardware. Input images were processed at 640×640 pixel resolution to optimize the trade-off between detection accuracy and inference speed, ensuring real-time processing capabilities suitable for high-throughput manufacturing environments. The optimization strategy employed Stochastic Gradient Descent (SGD) [32] with momentum coefficient of 0.937, providing robust convergence characteristics suitable for industrial defect detection applications. Learning rate scheduling utilized cosine annealing starting from an initial rate of 0.01, ensuring smooth convergence behavior throughout the extended training period [33].

Advanced augmentation strategies specifically targeting object detection performance were applied to maximize training effectiveness. Mosaic augmentation (mosaic=0.8) combined four training images into composite samples, enabling the model to learn complex multi-object detection scenarios while effectively expanding dataset diversity [34]. MixUp augmentation (mixup=0.2) blended image pairs with label interpolation to promote smoother decision boundaries and improved generalization capabilities [35]. Random erasing (erasing=0.35) selectively masked rectangular regions within training images, forcing the model to develop robust feature representations resilient to partial occlusions and manufacturing artifacts [36].

The lightweight model implementation process began with loading pre-trained YOLOv11n weights using torch. load() with explicit map_location specification to ensure proper device allocation and avoid memory conflicts. Magnitude-based pruning was systematically applied to weight tensors using a threshold of 1×10⁻2, removing parameters that contribute minimally to model performance while preserving critical feature extraction capabilities. The pruned model underwent precision conversion from FP32 to FP16 format for inference optimization, reducing memory footprint by approximately 50% while maintaining numerical stability through selective retention of FP32 precision for batch normalization layers. The modified state dictionary was preserved using torch. save() to enable deployment and further optimization iterations.

Optimization techniques focused on leveraging mixed precision capabilities throughout both training and inference phases. Mixed precision training was implemented using torch. cuda. amp. GradScaler to enable automatic loss scaling and gradient normalization, preventing gradient underflow while maintaining training stability. Automatic mixed precision inference utilized torch. cuda. amp. autocast context managers to selectively apply FP16 operations where beneficial while preserving FP32 precision for critical computational paths [37]. GPU memory optimization was achieved through systematic cache clearing using torch. cuda. empty_- cache() between training batches, preventing memory fragmentation and ensuring consistent performance throughout extended training sessions.

RESULTS

Overall Detection Performance

The experimental evaluation demonstrates that YOLOv12n achieved the highest F1-score (Fig. 8) of 0.9838, followed by Lightweight YOLOv11n (0.9791) and YOLOv11n (0.9746). While the Lightweight model did not achieve the highest absolute accuracy, it delivered superior computational efficiency with only marginal accuracy trade-offs. Throughout this section, accuracy differences are reported in percentage points (pp) and speed differences in relative percent (%).

Fig. 8.

F1-Confidence Graph

Comprehensive Performance Analysis

Validated Performance Metrics of YOLO Models.

Processing Speed and Throughput Analysis

The process duration measurements enable calculation of theoretical frames per second through the formula FPS = 1000/Process Duration (ms). This yields 98.4 FPS for YOLOv11n, 54.8 FPS for YOLOv12n, and 106.1 FPS for Lightweight YOLOv11n. These theoretical values represent single-batch inference performance under optimal conditions including dedicated GPU resources, no I/O overhead, and no memory transfer delays.

Fig. 3.

Comparison of Precision-Confidence Graphs

The Lightweight model achieves 48.4% reduction in process duration compared to YOLOv12n (9.43ms vs 18.26ms) and 7.2% reduction compared to YOLOv11n. At the measured inference speeds, the Lightweight model can theoretically process 6,366 images per minute, compared to 5,904 for YOLOv11n and 3,288 for YOLOv12n, demonstrating significant throughput advantages for high-volume production environments.

Fig. 4.

Precision vs Epoch Graph

This cleaner version removes all speculative claims about multi-line deployment, operational efficiency assumptions, and arbitrary FPS requirements while maintaining the factual performance measurements.

Fig. 5.

Recall vs Epoch Graph

Accuracy-Efficiency Trade-off Analysis

The performance metrics reveal exceptional optimization success with the Lightweight model achieving only 0.47 pp F1-score reduction compared to YOLOv12n while gaining 48.4% in processing speed reduction (or 93.6% FPS increase). Compared to YOLOv11n, the Lightweight model actually improves F1-score by 0.45 pp while also achieving 7.2% faster processing. This cost-benefit ratio validates the lightweight architecture for industrial deployment where throughput constraints often outweigh marginal accuracy differences.

Multi-Scale Detection Performance

The mAP metrics (Fig. 6,7) reveal consistent detection performance at standard IoU thresholds, with all models achieving above 0.987 for mAP@0.5. YOLOv12n leads marginally at 0.9889, followed closely by Lightweight YOLOv11n at 0.9883, representing only 0.06 pp difference. The mAP@50-95 metric, which averages performance across IoU thresholds from 0.5 to 0.95 in 0.05 increments, shows greater variation with YOLOv12n achieving 0.8718, YOLOv11n at 0.8564, and Lightweight at 0.8456. This 2.62 pp reduction from YOLOv12n indicates slightly lower localization precision at stricter IoU thresholds, though this limitation remains acceptable for industrial inspection where IoU above 0.5 typically suffices.

Fig. 6.

mAP@50 vs Epoch Graph

Fig. 7.

mAP@50-95 vs Epoch Graph

The approximately 12–14% relative drop from mAP@0.5 to mAP@50–95 across all models (14.4% for Lightweight, 13.3% for YOLOv11n, 11.9% for YOLOv12n) suggests that while defect detection remains robust, precise boundary localization presents ongoing challenges, particularly for irregular defects such as scratches and leakage patterns with ambiguous edges.

Statistical Validation of Results

The consistency of metrics across models provides confidence in experimental validity. YOLOv11n and YOLOv12n both maintain precision exceeding recall, indicating a slightly precision-oriented detection policy. Conversely, the Lightweight model shows recall (0.9802) marginally exceeding precision (0.9781), suggesting a slightly recall-leaning configuration that prioritizes capturing all defects. The standard deviation across models measures σ(precision) = 0.004 and σ (recall) = 0.005, with coefficient of variation ≈0.5%, confirming measurement stability and reproducibility.

Performance analysis reveals two distinct operational clusters: YOLOv12n prioritizes accuracy at the cost of speed, while YOLOv11n and its Lightweight model balance both metrics. This clustering validates the architectural decisions in the Lightweight model, which maintains the efficiency characteristics of YOLOv11n while achieving competitive detection performance.

Production Deployment Implications

Based on process duration measurements and assuming 80% operational efficiency, the effective throughput per GPU enables different multi-line configurations. The Lightweight model at 84.8 effective FPS translates to 5,088 cells per minute when operating a single line at full capacity. YOLOv11n achieves 78.7 effective FPS (4,722 cells/minute), while YOLOv12n manages 43.8 effective FPS (2,628 cells/minute).

For multi-line scalability with 25 FPS per line requirement, safe concurrent operation supports three lines for Lightweight YOLOv11n (total 15,264 cells/minute), three lines for YOLOv11n (total 14,166 cells/minute), and one line for YOLOv12n (2,628 cells/minute). These throughput calculations demonstrate that the Light-weight model’s efficiency gains translate directly to increased production capacity, with a 480% throughput improvement over YOLOv12n in multi-line configurations.

Fig. 9.

Confusion Matrix

Critical Performance Assessment

The Lightweight YOLOv11n successfully achieves its design objectives of balancing detection accuracy with computational efficiency. The theoretical 106.1 FPS substantially exceeds typical industrial requirements of 30–60 FPS, providing ample margin for system integration overhead. The F1-score of 0.9791 surpasses the industry-standard quality threshold of 0.95, ensuring reliable defect detection. The 48.4% processing time reduction compared to YOLOv12n enables deployment on edge devices with limited computational resources, while the accuracy sacrifice remains below 0.5 pp, a negligible trade-off for the substantial efficiency gains.

Fig. 10.

Lightweight – Defect Prediction Results

Despite the high overall detection performance, systematic analysis of failure cases provides valuable insights into model limitations and opportunities for future improvements. Fig. 11 illustrates three representative failure modes encountered during testing, each revealing distinct challenges in the detection pipeline.

Fig. 11.

Failed Cases

Case 1. demonstrates a false negative scenario where a visible leakage defect was not detected by the model. This failure occurred at the image boundary where the defect region was partially visible, resulting in incomplete visual features that fell below the detection confidence threshold. Such edge-case failures represent approximately 1.2% of test cases and highlight the importance of ensuring complete defect visibility during image acquisition.

Case 2. exhibits a false negative where the model incorrectly predicted “Pass” classification (confidence 0.83) alongside the correct leakage detection (confidence 0.47). This dual prediction indicates model uncertainty when distinguishing between genuine defects and benign surface artifacts under challenging illumination conditions. The reflective aluminum surface creates specular highlights that can mimic or obscure defect patterns, particularly for subtle leakage manifestations. Such ambiguous cases account for approximately 0.5% of detections.

Case 3. represents a false negative for an extremely small or low-contrast defect that was not detected despite potential presence on the cell surface. This failure mode is attributed to the 640×640 input resolution limitation, which may be insufficient for resolving micro-scale defects below approximately 10 μm in the original 1024×1024 imagery. The uniform appearance of the cell surface provides minimal textural cues for feature extraction, with such cases representing less than 0.8% of the test set.

Across the entire test set, failure cases collectively represent less than 2.5% of samples. False negatives occur primarily for defects at image boundaries or with dimensions below 10 μm. False positives arise from dust particles, surface reflections, or residue mimicking defect patterns. The confusion matrix in Fig. 9 confirms minimal misclassification between morphologically similar categories, with scratch-pinhole confusion occurring in fewer than 2% of detected defects due to their similar linear geometry.

These failure patterns indicate clear directions for model enhancement: multi-resolution processing architectures could address small-defect detection limitations, boundary-aware augmentation strategies could reduce edge-case failures, and refined attention mechanisms could improve discrimination under challenging lighting conditions. Nevertheless, the low overall failure rate validates the model's suitability for production deployment, where such edge cases can be flagged for manual secondary inspection.

These validated metrics confirm that the Lightweight YOLOv11n provides optimal balance for high-speed battery manufacturing inspection. The model delivers near-state-of-the-art detection performance while maintaining the computational efficiency necessary for real-time, multi-line deployment in production environments. The successful optimization demonstrates that targeted architectural modifications can achieve significant computational savings without compromising the detection capabilities critical for quality control in battery manufacturing.

CONCLUSION

This study successfully developed and validated a lightweight YOLOv11n-based vision inspection system for real-time surface defect detection in lithium-ion battery pouch cells, achieving an optimal balance between detection accuracy and computational efficiency. The proposed Lightweight YOLOv11n model, optimized through systematic architectural modifications including channel-width reduction, depthwise-separable convolutions, and magnitude-based pruning, achieved an F1-score of 0.9791 while reducing inference time to 9.43 ms—a 48.4% improvement over YOLOv12n. The model demonstrated robust detection capabilities across five defect categories (pass, leakage, pinhole, swelling, scratch) with mAP@0.5 of 0.9883, validating its effectiveness for industrial deployment.

The research contributions extend beyond performance metrics to address critical challenges in battery manufacturing quality control. The lightweight architecture enables deployment on resource-constrained edge devices while maintaining detection accuracy above industry-standard thresholds, directly addressing the scalability requirements of modern production facilities. The comprehensive dataset of 15,312 high-resolution images and the strategic augmentation pipeline provide a foundation for training robust models capable of handling the morphological complexity and class imbalance inherent in pouch cell defect detection. These technical advances translate to tangible manufacturing benefits: reduced false positive rates, consistent 24/7 inspection capability, and elimination of operator-dependent variability that plague traditional inspection methods.

Despite these achievements, several limitations warrant acknowledgment and present opportunities for future research. The current model's performance on extremely small defects (<10 μm) remains constrained by the 640×640 input resolution, suggesting potential benefits from multi-resolution processing architectures. The dataset, while substantial, represents a single manufacturing facility’s production characteristics, and cross-facility validation would strengthen generalization claims. Future investigations should explore structured pruning techniques and quantization-aware training to achieve INT8 deployment without accuracy degradation. Additionally, integrating transformer-based attention mechanisms could enhance small defect detection while maintaining real-time performance. Expanding the defect taxonomy to include subsurface anomalies through multi-modal fusion of optical and thermal imaging represents another promising research direction. Finally, developing automated hyperparameter optimization frameworks specific to defect morphology could further improve detection performance across diverse manufacturing conditions, ultimately advancing the reliability and safety of lithium-ion battery production systems.

References

1. Nitta N, Wu F, Lee J. T, Yushin G. Mater. Today 2015;18(5):252–264.
2. Armand M, Tarascon J.-M. Nature 2008;451:652–657.
3. Scrosati Bruno, Garche Jürgen. J. Power Sources 2010;195(9):2419–2430.
4. Waldmann T, Hogg B.-I, Wohlfahrt-Mehrens M. J. Power Sources 2018;384:107–124.
5. Andre D, Meiler M, Steiner K, Walz H, Soczka-Guth T, Sauer D. U. J. Power Sources 2011;196(12):5334–5341.
6. Fang X, Luo Q, Zhou B, Li C, Tian L. Sensors 2020;20(18):5136.
7. Phillips A, Ulsh M, Porter J, Bender G. Fuel Cells 2017;17(3):288–298.
8. Ouyang D, Chen M, Huang Q, Weng J, Wang Z, Wang J. Applied Sci. 2019;9(12):2483.
9. Hoffmann L, Kasper M, Kahn M, Gramse G, Silva G. V, Herrmann C, Kurrat M, Kienberger F. Batteries 2021;7(4):64.
10. Chen Y, Shu Y, Li X, Xiong C, Cao S, Wen X, Xie Z. J. Intell. Fuzzy Syst. 2021;41(3):4327–4335.
11. Mohanty D, Hockaday E, Li J, Hensley D. K, Daniel C. J. Power Sources 2016;312:70–79.
12. Fang X, Liu L, Yang C, Sun Y. IEEE Trans. Instrum. Meas. 2020;69(3):626–644.
13. Lv X, Duan F, Jiang J.-J, Fu X, Gan L. Sensors 2020;20(6):1650.
14. Kujawińska A, Vogt K. MPER 2015;6(2):25–31.
15. Redmon J, Farhadi A. Farhadi, YOLO9000: Better, Faster, Stronger IEEE CVPR; 2017. p. 7263–7271.
16. Wang C.-Y, Bochkovskiy A, Liao H.-Y. M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors IEEE CVPR; 2023.
17. Shafiee M. J, Chywl B, Li F, Wong A. arXiv 2017.
18. Huang L.-P, Hsu Q.-C, Liu B.-H, Lin C.-F, Chen C.-H. Metals 2023;13(5):861.
19. Weimer D, Scholz-Reiter B, Shpitalni M. CIRP Annals 2016;65(1):417–420.
20. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation IEEE CVPR; 2014.
21. Girshick R. Fast R-CNN IEEE CVPR; 2015. p. 1440–1448.
22. Ren S, He K, Girshick R, Sun J. IEEE Trans. Pattern Anal. Mach. Intell. 2017;39(6):1137–1149.
23. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. ECCV 2020;2020:213–219.
24. Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q, Liu Y, Chen J. DETRs Beat YOLOs on Real-time Object Detection IEEE CVPR; 2024. p. 16965–16974.
25. Tan M, Le Q. V. Efficientnet: Rethinking model scaling for convolutional neural networks. In : Proceedings of the 36th International Conference on Machine Learning; 2019. p. 6105–6114.
26. Ronneberger O, Fischer P, Brox T. In : U-Net: Convolutional Networks for Biomedical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015; 2015. p. 234–241.
27. Jocher G, Jing Q, Chaurasia A. Ultralytics yolo11, gitHub repository 2024.
28. Howard A. G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. arXiv 2017.
29. Wang C.-Y, Liao H.-Y. M, Wu Y.-H, Chen P.-Y, Hsieh J. -W, Yeh I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN IEEE CVPR; 2020. p. 390–391.
30. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C.-Y, Berg A. C. SSD: Single Shot MultiBox Detector ECCV 2016; 2016.
31. Han S, Pool J, Tran J, Dally W. J. arXiv 2015.
32. Ruder S. arXiv 2017.
33. Loshchilov I, Hutter F. arXiv 2017.
34. Bochkovskiy A, Wang C.-Y, Liao H.-Y. M. arXiv 2020.
35. Zhang H, Cisse M, Dauphin Y. N, Lopez-Paz D. arXiv 2017.
36. Zhong Z, Zheng L, Kang G, Li S, Yang Y. Proceedings of the AAAI conference on artificial intelligence 2020;34(7):13001–13008.
37. Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaiev O, Venkatesh G, Wu H. arXiv 2018.

Article information Continued

Fig. 1.

Defect labels training batch image

Fig. 2.

Comparison of YOLOv11n and Lightweight YOLOv 11n

Fig. 3.

Comparison of Precision-Confidence Graphs

Fig. 4.

Precision vs Epoch Graph

Fig. 5.

Recall vs Epoch Graph

Fig. 6.

mAP@50 vs Epoch Graph

Fig. 7.

mAP@50-95 vs Epoch Graph

Fig. 8.

F1-Confidence Graph

Fig. 9.

Confusion Matrix

Fig. 10.

Lightweight – Defect Prediction Results

Fig. 11.

Failed Cases

Table 1.

Validated Performance Metrics of YOLO Models.

YOLOv11n YOLOv12n Lightweight YOLOv11n
Precision 0.9765 0.9851 0.9781
Recall 0.9728 0.9824 0.9802
F1-Score 0.9746 0.9838 0.9791
mAP@50 0.9870 0.9889 0.9883
mAP@50-95 0.8564 0.8718 0.8456
Process duration (ms) 10.16 18.26 9.43