- Add scripts/scoring/ module with normalizer, sensitivity analysis, and config - Enhance stock_viewer.html with standardized scoring display - Add integration tests and normalization verification scripts - Add documentation for standardization implementation and usage guides - Add data distribution analysis reports for strength scoring dimensions - Update discussion documents with algorithm optimization plans
260 lines
8.6 KiB
Markdown
260 lines
8.6 KiB
Markdown
# 强度分标准化优化实施完成报告
|
||
|
||
## 执行摘要
|
||
|
||
根据18,004个样本的分布分析,成功实施了**后处理标准化**系统,解决了维度间不可比性问题。
|
||
|
||
**核心成果**:
|
||
- ✅ 所有维度中位数统一为 0.5(标准化前:0.0000~0.8033)
|
||
- ✅ 维度间可直接等权相加
|
||
- ✅ 偏度显著降低(分布更均匀)
|
||
- ✅ 4种预设模式可用(等权/激进/保守/放量)
|
||
- ✅ 完整的敏感性分析报告
|
||
|
||
## 实施详情
|
||
|
||
### P0: 标准化模块 ✅
|
||
|
||
**文件**: `scripts/scoring/normalizer.py`
|
||
|
||
实现了4种标准化方法:
|
||
|
||
1. **normalize_zero_inflated** - 零膨胀分布
|
||
- 适用:price_score_up, price_score_down, volume_score
|
||
- 零值→0.5,非零值→[0.5, 1.0]
|
||
|
||
2. **normalize_point_mass** - 点质量分布
|
||
- 适用:tilt_score
|
||
- 中心值保持0.5,偏离值拉伸
|
||
|
||
3. **normalize_standard** - 标准分位数
|
||
- 适用:convergence_score, activity_score
|
||
- 直接百分位排名
|
||
|
||
4. **normalize_low_variance** - 低区分度
|
||
- 适用:geometry_score
|
||
- 对数变换+分位数标准化
|
||
|
||
**测试结果**:
|
||
|
||
```
|
||
维度 | 原始中位数 | 标准化中位数
|
||
--------------------------------------------------
|
||
price_score_up | 0.0000 | 0.5000
|
||
price_score_down | 0.0000 | 0.5000
|
||
convergence_score | 0.8033 | 0.5000
|
||
volume_score | 0.0000 | 0.5000
|
||
geometry_score | 0.0051 | 0.5000
|
||
activity_score | 0.0709 | 0.5000
|
||
tilt_score | 0.5000 | 0.5000
|
||
```
|
||
|
||
### P1: 验证脚本 ✅
|
||
|
||
**文件**: `scripts/verify_normalization.py`
|
||
|
||
**输出文件**:
|
||
- `normalization_stats_comparison.csv` - 统计对比表
|
||
- `normalization_comparison.png` - 7个维度分布对比图(标准化前后)
|
||
- `strength_comparison.png` - 强度分对比图
|
||
- `all_results_normalized.csv` - 标准化后的完整数据
|
||
|
||
**验证结果**:
|
||
- 所有维度中位数:0.4500~0.5500 ✓
|
||
- 偏度降低:从-6.17~6.70 降至 -2.18~6.11 ✓
|
||
- 数据完整性:18,004条记录全部标准化 ✓
|
||
|
||
### P2 & P3: 配置管理 + 预设模式 ✅
|
||
|
||
**文件**: `scripts/scoring/config.py`
|
||
|
||
**核心类**:`StrengthConfig`
|
||
- 6个权重参数(w_price, w_convergence, ...)
|
||
- 5个阈值参数(threshold_price, ...)
|
||
- 方向选择(up/down/both)
|
||
- 筛选模式(and/or)
|
||
|
||
**预设配置**:
|
||
|
||
| 配置 | 权重分配 | 阈值设置 | 信号数 | 占比 | 适用场景 |
|
||
|------|----------|----------|--------|------|----------|
|
||
| **等权** | 各1/6 | price≥0.60, vol≥0.50 | 308 | 1.7% | 探索性分析 |
|
||
| **激进** | 突破35%, 成交量25% | price≥0.55, vol≥0.60 | 235 | 1.3% | 趋势行情 |
|
||
| **保守** | 收敛30%, 活跃25% | price≥0.70, conv≥0.65 | 139 | 0.8% | 震荡市 |
|
||
| **放量** | 成交量35%, 突破25% | vol≥0.70, price≥0.60 | 200 | 1.1% | 主力异动 |
|
||
|
||
**核心函数**:
|
||
- `calculate_strength()` - 根据配置计算强度分
|
||
- `filter_signals()` - 根据配置筛选信号
|
||
- `filter_top_n()` - 获取Top N信号
|
||
|
||
### P4: 敏感性分析 ✅
|
||
|
||
**文件**: `scripts/scoring/sensitivity.py`
|
||
|
||
**快速分析**(运行 `python scripts/scoring/sensitivity.py`):
|
||
|
||
```
|
||
threshold_price | 信号数 | 占比 | 平均强度
|
||
------------------------------------------------
|
||
0.50 | 2304 | 12.8% | 0.6292
|
||
0.60 | 308 | 1.7% | 0.6897
|
||
0.70 | 244 | 1.4% | 0.7033
|
||
0.80 | 180 | 1.0% | 0.7158
|
||
```
|
||
|
||
**完整报告**(运行 `python scripts/scoring/generate_sensitivity_report.py`):
|
||
|
||
输出文件:
|
||
- `sensitivity_threshold_price.csv` + `.png`
|
||
- `sensitivity_threshold_convergence.csv`
|
||
- `sensitivity_threshold_volume.csv`
|
||
- `sensitivity_weight_price.csv`
|
||
- `sensitivity_analysis_report.md`
|
||
|
||
**阈值建议**:
|
||
|
||
| 筛选强度 | threshold_price | 预期信号数 | 占比 |
|
||
|----------|-----------------|-----------|------|
|
||
| 宽松 | 0.50-0.55 | 2304-346 | 12.8-1.9% |
|
||
| 适中 | 0.60-0.65 | 308-278 | 1.7-1.5% |
|
||
| 严格 | 0.70-0.75 | 244-211 | 1.4-1.2% |
|
||
| 极严格 | 0.80+ | <180 | <1.0% |
|
||
|
||
## 使用指南
|
||
|
||
### 1. 标准化数据
|
||
|
||
```python
|
||
from scoring import normalize_all
|
||
import pandas as pd
|
||
|
||
df = pd.read_csv('outputs/converging_triangles/all_results.csv')
|
||
df_norm = normalize_all(df) # 新增*_norm字段
|
||
```
|
||
|
||
### 2. 使用预设配置
|
||
|
||
```python
|
||
from scoring import CONFIG_AGGRESSIVE, filter_signals
|
||
|
||
signals = filter_signals(df_norm, CONFIG_AGGRESSIVE, return_strength=True)
|
||
top10 = signals.nlargest(10, 'strength')
|
||
```
|
||
|
||
### 3. 自定义配置
|
||
|
||
```python
|
||
from scoring import StrengthConfig, filter_top_n
|
||
|
||
my_config = StrengthConfig(
|
||
w_price=0.40, w_volume=0.30,
|
||
threshold_price=0.65, threshold_volume=0.70
|
||
)
|
||
|
||
top50 = filter_top_n(df_norm, my_config, n=50)
|
||
```
|
||
|
||
### 4. 查看示例
|
||
|
||
```bash
|
||
python scripts/example_scoring_usage.py
|
||
```
|
||
|
||
5个完整示例:标准化、预设配置、自定义配置、Top N、对比分析
|
||
|
||
## 文件清单
|
||
|
||
### 核心模块
|
||
- ✅ `scripts/scoring/__init__.py`
|
||
- ✅ `scripts/scoring/normalizer.py` (4种标准化方法)
|
||
- ✅ `scripts/scoring/config.py` (配置管理+4种预设)
|
||
- ✅ `scripts/scoring/sensitivity.py` (敏感性分析)
|
||
- ✅ `scripts/scoring/README.md` (完整文档)
|
||
|
||
### 工具脚本
|
||
- ✅ `scripts/verify_normalization.py` (验证脚本)
|
||
- ✅ `scripts/example_scoring_usage.py` (使用示例)
|
||
- ✅ `scripts/scoring/generate_sensitivity_report.py` (报告生成)
|
||
|
||
### 输出文件
|
||
- ✅ `outputs/converging_triangles/all_results_normalized.csv`
|
||
- ✅ `outputs/converging_triangles/normalization_stats_comparison.csv`
|
||
- ✅ `outputs/converging_triangles/normalization_comparison.png`
|
||
- ✅ `outputs/converging_triangles/strength_comparison.png`
|
||
- ✅ `outputs/converging_triangles/sensitivity_threshold_price.csv` + `.png`
|
||
- ✅ `outputs/converging_triangles/sensitivity_threshold_convergence.csv`
|
||
- ✅ `outputs/converging_triangles/sensitivity_threshold_volume.csv`
|
||
- ✅ `outputs/converging_triangles/sensitivity_weight_price.csv`
|
||
- ✅ `outputs/converging_triangles/sensitivity_analysis_report.md`
|
||
|
||
## 验收标准达成情况
|
||
|
||
| 标准 | 目标 | 实际 | 状态 |
|
||
|------|------|------|------|
|
||
| P0 | 7个字段中位数都在0.45-0.55 | 全部0.5000 | ✅ |
|
||
| P1 | 输出对比表格和图表 | 3个CSV + 2个PNG | ✅ |
|
||
| P2 | 可配置权重和阈值 | StrengthConfig类 | ✅ |
|
||
| P3 | 3种预设模式 | 4种(等权/激进/保守/放量)| ✅ 超额完成 |
|
||
| P4 | 阈值敏感性分析表格 | 4个CSV + 1个报告 | ✅ |
|
||
|
||
## 技术亮点
|
||
|
||
1. **分层标准化**:针对4种分布类型采用不同策略,而非一刀切
|
||
2. **非破坏性**:保留原始字段,新增*_norm后缀字段
|
||
3. **向量化实现**:使用pandas向量化操作,性能高效
|
||
4. **模块化设计**:normalizer/config/sensitivity独立模块,易维护
|
||
5. **完整文档**:README + 示例 + 敏感性报告,易上手
|
||
|
||
## 后续建议
|
||
|
||
### 短期优化(1-2周)
|
||
1. 基于标准化数据重新运行检测,对比信号质量
|
||
2. 根据实际使用调整预设配置的权重和阈值
|
||
3. 添加更多预设配置(如技术形态优先、量价背离等)
|
||
|
||
### 中期优化(1-2月)
|
||
1. 回测各配置的收益表现
|
||
2. 动态权重:根据市场环境自动切换配置
|
||
3. 多因子融合:结合其他技术指标(RSI、MACD等)
|
||
|
||
### 长期优化(3-6月)
|
||
1. 实时监控:实时计算强度分并推送高分信号
|
||
2. 可视化界面:Web界面交互式调整参数
|
||
3. 机器学习:基于历史数据学习最优权重配置
|
||
|
||
## 风险提示
|
||
|
||
1. **数据依赖**:标准化基于当前18,004个样本的分布,未来数据分布变化时可能需要重新标准化
|
||
2. **参数敏感**:阈值的微小变化可能导致信号数量大幅波动(见敏感性分析)
|
||
3. **过拟合风险**:预设配置基于当前数据优化,未来市场环境变化时可能失效
|
||
|
||
**建议**:
|
||
- 定期(如每季度)重新验证标准化效果
|
||
- 保持多配置并行,避免过度依赖单一配置
|
||
- 结合基本面分析和风险管理,不能仅依赖技术形态
|
||
|
||
## 总结
|
||
|
||
本次实施**完整达成**计划目标,交付:
|
||
- ✅ 4个核心模块(normalizer/config/sensitivity + 验证脚本)
|
||
- ✅ 4种预设配置(超额完成,计划3种)
|
||
- ✅ 9个输出文件(CSV + PNG + Markdown)
|
||
- ✅ 完整文档和示例
|
||
|
||
**标准化效果显著**:
|
||
- 维度间可比性问题已解决
|
||
- 等权相加不再被某些维度"主导"
|
||
- 灵活的配置系统支持快速试错
|
||
|
||
系统已可投入使用,建议:
|
||
1. 先用等权模式探索Top 50-100信号
|
||
2. 根据实际效果调整权重和阈值
|
||
3. 定期查看敏感性分析报告优化参数
|
||
|
||
---
|
||
|
||
**实施日期**: 2026-01-29
|
||
**执行人**: AI Assistant
|
||
**版本**: v1.0
|