- Add scripts/scoring/ module with normalizer, sensitivity analysis, and config - Enhance stock_viewer.html with standardized scoring display - Add integration tests and normalization verification scripts - Add documentation for standardization implementation and usage guides - Add data distribution analysis reports for strength scoring dimensions - Update discussion documents with algorithm optimization plans
353 lines
10 KiB
Markdown
353 lines
10 KiB
Markdown
# 强度分标准化系统
|
||
|
||
## 概述
|
||
|
||
针对收敛三角形形态检测的18,004个样本,本系统解决了**维度间不可比性**问题,实现了:
|
||
|
||
1. **分层标准化**:针对4种分布类型(零膨胀、点质量、正常、低区分度)采用不同标准化策略
|
||
2. **灵活配置**:可配置权重、阈值、方向、筛选模式
|
||
3. **预设模式**:等权、激进、保守、放量4种预设配置
|
||
4. **敏感性分析**:分析参数变化对筛选结果的影响
|
||
|
||
## 核心问题
|
||
|
||
**标准化前**的分布问题:
|
||
|
||
| 维度 | 中位数 | 分布类型 | 问题 |
|
||
|------|--------|----------|------|
|
||
| price_score_up | 0.0000 | 零膨胀 | 无法区分"未突破"vs"小幅突破" |
|
||
| price_score_down | 0.0000 | 零膨胀 | 同上 |
|
||
| volume_score | 0.0000 | 零膨胀 | 同上 |
|
||
| tilt_score | 0.5000 | 点质量 | 75%的值=0.5,缺乏区分度 |
|
||
| convergence_score | 0.8033 | 正常 | 值域偏大,等权相加时"吃掉"其他维度 |
|
||
| geometry_score | 0.0051 | 低区分度 | 值域极小,等权相加时被"吃掉" |
|
||
| activity_score | 0.0709 | 正常 | - |
|
||
|
||
**标准化后**:所有维度中位数统一为 **0.5**,可直接等权相加。
|
||
|
||
## 文件结构
|
||
|
||
```
|
||
scripts/
|
||
├── scoring/ # 核心模块
|
||
│ ├── __init__.py # 模块导出
|
||
│ ├── normalizer.py # 标准化模块(4种标准化方法)
|
||
│ ├── config.py # 配置管理(预设模式+自定义配置)
|
||
│ └── sensitivity.py # 敏感性分析
|
||
│
|
||
├── verify_normalization.py # 验证标准化效果
|
||
├── example_scoring_usage.py # 使用示例(5个示例)
|
||
│
|
||
└── scoring/generate_sensitivity_report.py # 生成完整敏感性报告
|
||
|
||
outputs/converging_triangles/ # 输出目录
|
||
├── all_results.csv # 原始数据
|
||
├── all_results_normalized.csv # 标准化后数据
|
||
├── normalization_stats_comparison.csv # 统计对比
|
||
├── normalization_comparison.png # 对比图表
|
||
├── strength_comparison.png # 强度分对比
|
||
├── sensitivity_threshold_price.csv # 突破幅度阈值敏感性
|
||
├── sensitivity_threshold_convergence.csv # 收敛度阈值敏感性
|
||
├── sensitivity_threshold_volume.csv # 成交量阈值敏感性
|
||
├── sensitivity_weight_price.csv # 突破幅度权重敏感性
|
||
└── sensitivity_analysis_report.md # 敏感性分析报告
|
||
```
|
||
|
||
## 快速开始
|
||
|
||
### 1. 标准化原始数据
|
||
|
||
```python
|
||
from scoring import normalize_all
|
||
import pandas as pd
|
||
|
||
# 加载原始数据
|
||
df = pd.read_csv('outputs/converging_triangles/all_results.csv')
|
||
df = df[df['is_valid'] == True]
|
||
|
||
# 标准化
|
||
df_norm = normalize_all(df)
|
||
|
||
# 现在df_norm包含:
|
||
# - 原始字段:price_score_up, convergence_score, ...
|
||
# - 标准化字段:price_score_up_norm, convergence_score_norm, ...
|
||
```
|
||
|
||
### 2. 使用预设配置筛选信号
|
||
|
||
```python
|
||
from scoring import CONFIG_EQUAL, CONFIG_AGGRESSIVE, filter_signals
|
||
|
||
# 等权模式
|
||
signals_equal = filter_signals(df_norm, CONFIG_EQUAL, return_strength=True)
|
||
print(f"等权模式: {len(signals_equal)} 个信号")
|
||
|
||
# 激进模式(重视突破35% + 成交量25%)
|
||
signals_aggr = filter_signals(df_norm, CONFIG_AGGRESSIVE, return_strength=True)
|
||
print(f"激进模式: {len(signals_aggr)} 个信号")
|
||
|
||
# 查看Top 10
|
||
top10 = signals_aggr.nlargest(10, 'strength')
|
||
```
|
||
|
||
### 3. 自定义配置
|
||
|
||
```python
|
||
from scoring import StrengthConfig, filter_signals
|
||
|
||
# 创建自定义配置
|
||
my_config = StrengthConfig(
|
||
name="我的配置",
|
||
w_price=0.40, # 突破权重40%
|
||
w_volume=0.30, # 成交量权重30%
|
||
w_convergence=0.15,
|
||
w_geometry=0.05,
|
||
w_activity=0.05,
|
||
w_tilt=0.05,
|
||
threshold_price=0.65, # 突破阈值
|
||
threshold_volume=0.70, # 成交量阈值(>0.5才启用)
|
||
direction='up', # 只看向上突破
|
||
)
|
||
|
||
# 筛选
|
||
signals = filter_signals(df_norm, my_config, return_strength=True)
|
||
```
|
||
|
||
### 4. 获取Top N信号
|
||
|
||
```python
|
||
from scoring import filter_top_n, CONFIG_EQUAL
|
||
|
||
# 获取强度分Top 50的信号
|
||
top50 = filter_top_n(df_norm, CONFIG_EQUAL, n=50)
|
||
|
||
# Top 50包含strength列,已按强度降序排列
|
||
```
|
||
|
||
## 标准化方法详解
|
||
|
||
### 1. 零膨胀分布标准化 (normalize_zero_inflated)
|
||
|
||
**适用于**:price_score_up, price_score_down, volume_score
|
||
|
||
**策略**:
|
||
- 零值(未发生) → 0.5(中性基准)
|
||
- 非零值(已发生) → [0.5, 1.0] 区间按排名映射
|
||
|
||
**原理**:保留"零vs非零"的质的差异,同时在非零内部保持量的差异。
|
||
|
||
### 2. 点质量分布标准化 (normalize_point_mass)
|
||
|
||
**适用于**:tilt_score
|
||
|
||
**策略**:
|
||
- 中心值(0.5)附近 → 保持0.5
|
||
- 正偏离(>0.5) → 拉伸到 [0.5, 1.0]
|
||
- 负偏离(<0.5) → 拉伸到 [0.0, 0.5]
|
||
|
||
**原理**:75%的值恰好=0.5(对称三角形),对这些保持不变;剩余25%按偏离程度拉伸。
|
||
|
||
### 3. 标准分位数标准化 (normalize_standard)
|
||
|
||
**适用于**:convergence_score, activity_score
|
||
|
||
**策略**:直接转换为百分位排名 [0, 1]
|
||
|
||
**原理**:分布相对正常,直接排名即可。
|
||
|
||
### 4. 低区分度标准化 (normalize_low_variance)
|
||
|
||
**适用于**:geometry_score
|
||
|
||
**策略**:
|
||
1. 对数变换扩大小值区间的区分度
|
||
2. 分位数标准化
|
||
|
||
**原理**:值普遍极低(中位数0.005),log1p变换可拉开小值间的差距。
|
||
|
||
## 预设配置
|
||
|
||
### CONFIG_EQUAL - 等权模式
|
||
|
||
```python
|
||
各维度权重: 1/6 (约16.7%)
|
||
阈值: price≥0.60, convergence≥0.50, volume≥0.50
|
||
适用: 探索性分析,不确定哪个维度更重要时
|
||
```
|
||
|
||
### CONFIG_AGGRESSIVE - 激进模式
|
||
|
||
```python
|
||
权重: 突破35%, 成交量25%, 收敛15%, 其他5-10%
|
||
阈值: price≥0.55, volume≥0.60
|
||
适用: 趋势行情,追求突破力度和放量确认
|
||
```
|
||
|
||
### CONFIG_CONSERVATIVE - 保守模式
|
||
|
||
```python
|
||
权重: 收敛度30%, 活跃度25%, 突破15%, 其他5-15%
|
||
阈值: price≥0.70, convergence≥0.65, activity≥0.50
|
||
适用: 震荡市,重视形态质量和活跃度
|
||
```
|
||
|
||
### CONFIG_VOLUME_FOCUS - 放量模式
|
||
|
||
```python
|
||
权重: 成交量35%, 突破25%, 收敛15%, 其他5-10%
|
||
阈值: volume≥0.70, price≥0.60
|
||
适用: 捕获主力异动,必须明显放量
|
||
```
|
||
|
||
## 敏感性分析
|
||
|
||
### 快速分析
|
||
|
||
```bash
|
||
python scripts/scoring/sensitivity.py
|
||
```
|
||
|
||
输出示例:
|
||
|
||
```
|
||
threshold_price | 信号数 | 占比 | 平均强度
|
||
------------------------------------------------
|
||
0.50 | 2304 | 12.8% | 0.6292
|
||
0.60 | 308 | 1.7% | 0.6897
|
||
0.70 | 244 | 1.4% | 0.7033
|
||
0.80 | 180 | 1.0% | 0.7158
|
||
```
|
||
|
||
### 完整分析报告
|
||
|
||
```bash
|
||
python scripts/scoring/generate_sensitivity_report.py
|
||
```
|
||
|
||
生成:
|
||
- sensitivity_threshold_price.csv(及PNG图表)
|
||
- sensitivity_threshold_convergence.csv
|
||
- sensitivity_threshold_volume.csv
|
||
- sensitivity_weight_price.csv
|
||
- sensitivity_analysis_report.md(汇总报告)
|
||
|
||
## 阈值设置建议
|
||
|
||
根据敏感性分析,推荐:
|
||
|
||
| 筛选强度 | threshold_price | 预期信号数 | 占比 |
|
||
|----------|-----------------|-----------|------|
|
||
| 宽松 | 0.50-0.55 | 2000-350 | 11-2% |
|
||
| 适中 | 0.60-0.65 | 300-280 | 1.7-1.5% |
|
||
| 严格 | 0.70-0.75 | 240-210 | 1.4-1.2% |
|
||
| 极严格 | 0.80+ | <180 | <1.0% |
|
||
|
||
**成交量阈值**:
|
||
- ≤ 0.5:不筛选成交量(适合震荡市)
|
||
- 0.60-0.70:适度放量要求
|
||
- ≥ 0.75:高放量要求(可能过于严格)
|
||
|
||
## 使用示例
|
||
|
||
查看 `scripts/example_scoring_usage.py`,包含5个示例:
|
||
|
||
1. 基础标准化
|
||
2. 使用预设配置筛选信号
|
||
3. 自定义配置
|
||
4. 获取Top N信号
|
||
5. 对比不同配置的结果
|
||
|
||
运行:
|
||
|
||
```bash
|
||
python scripts/example_scoring_usage.py
|
||
```
|
||
|
||
## 验证标准化效果
|
||
|
||
```bash
|
||
python scripts/verify_normalization.py
|
||
```
|
||
|
||
输出:
|
||
- 标准化前后统计对比表
|
||
- 7个维度的分布对比图
|
||
- 强度分对比图
|
||
- all_results_normalized.csv(标准化后数据)
|
||
|
||
## API参考
|
||
|
||
### 标准化模块 (normalizer.py)
|
||
|
||
```python
|
||
normalize_all(df: pd.DataFrame) -> pd.DataFrame
|
||
对all_results.csv中的7个得分字段进行分层标准化
|
||
|
||
calculate_strength_equal_weight(df_normalized, direction='up') -> pd.Series
|
||
计算等权强度分
|
||
```
|
||
|
||
### 配置模块 (config.py)
|
||
|
||
```python
|
||
class StrengthConfig:
|
||
# 创建配置对象
|
||
config = StrengthConfig(w_price=0.4, threshold_price=0.65, ...)
|
||
|
||
# 验证配置
|
||
config.validate()
|
||
|
||
# 打印摘要
|
||
print(config.summary())
|
||
|
||
calculate_strength(df_normalized, config) -> pd.Series
|
||
根据配置计算综合强度分
|
||
|
||
filter_signals(df_normalized, config, return_strength=False) -> pd.DataFrame
|
||
根据配置筛选信号
|
||
|
||
filter_top_n(df_normalized, config, n=100) -> pd.DataFrame
|
||
筛选强度分Top N的信号
|
||
```
|
||
|
||
### 敏感性分析 (sensitivity.py)
|
||
|
||
```python
|
||
analyze_threshold_sensitivity(df, config, param_name, param_range) -> pd.DataFrame
|
||
分析阈值参数的敏感性
|
||
|
||
analyze_weight_sensitivity(df, config, weight_name, weight_range) -> pd.DataFrame
|
||
分析权重参数的敏感性
|
||
|
||
generate_full_sensitivity_report(df, config, output_dir)
|
||
生成完整的敏感性分析报告
|
||
```
|
||
|
||
## 后续优化方向
|
||
|
||
1. **动态权重**:根据市场环境自动调整权重(牛市 vs 震荡市)
|
||
2. **多因子融合**:结合其他技术指标(RSI、MACD等)
|
||
3. **回测验证**:基于历史数据回测各配置的收益表现
|
||
4. **实时监控**:实时计算强度分并推送高分信号
|
||
5. **可视化界面**:交互式调整参数并实时预览结果
|
||
|
||
## 注意事项
|
||
|
||
1. **阈值 > 0.5 才启用筛选**:volume_score_norm, geometry_score_norm 的阈值 ≤ 0.5 表示不筛选
|
||
2. **权重和必须为1**:自定义配置时确保所有权重之和=1.0
|
||
3. **标准化后的值域**:所有*_norm字段的范围都是[0, 1]
|
||
4. **原始字段保留**:标准化不修改原始字段,新增*_norm后缀字段
|
||
|
||
## 问题反馈
|
||
|
||
如有问题,请查看:
|
||
1. `outputs/converging_triangles/sensitivity_analysis_report.md`
|
||
2. `outputs/converging_triangles/normalization_comparison.png`
|
||
3. 运行 `python scripts/example_scoring_usage.py` 查看示例
|
||
|
||
---
|
||
|
||
**版本**: 1.0
|
||
**更新日期**: 2026-01-29
|
||
**作者**: AI Assistant
|