Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Kursplan
Performance Concepts and Metrics
- Latency, throughput, power usage, resource utilization
- System vs model-level bottlenecks
- Profiling for inference vs training
Profiling on Huawei Ascend
- Using CANN Profiler and MindInsight
- Kernel and operator diagnostics
- Offload patterns and memory mapping
Profiling on Biren GPU
- Biren SDK performance monitoring features
- Kernel fusion, memory alignment, and execution queues
- Power and temperature-aware profiling
Profiling on Cambricon MLU
- BANGPy and Neuware performance tools
- Kernel-level visibility and log interpretation
- MLU profiler integration with deployment frameworks
Graph and Model-Level Optimization
- Graph pruning and quantization strategies
- Operator fusion and computational graph restructuring
- Input size standardization and batch tuning
Memory and Kernel Optimization
- Optimizing memory layout and reuse
- Efficient buffer management across chipsets
- Kernel-level tuning techniques per platform
Cross-Platform Best Practices
- Performance portability: abstraction strategies
- Building shared tuning pipelines for multi-chip environments
- Example: tuning an object detection model across Ascend, Biren, and MLU
Summary and Next Steps
Krav
- Experience working with AI model training or deployment pipelines
- Understanding of GPU/MLU compute principles and model optimization
- Basic familiarity with performance profiling tools and metrics
Audience
- Performance engineers
- Machine learning infrastructure teams
- AI system architects
21 timmar