- Added `--show-details` parameter to `pipeline_converging_triangle.py` for generating detailed charts that display all pivot points and fitting lines. - Implemented an iterative outlier removal algorithm in `fit_pivot_line` to improve the accuracy of pivot point fitting by eliminating weak points. - Updated `USAGE.md` to include new command examples for the detailed mode. - Revised multiple documentation files to reflect recent changes and improvements in the pivot detection and visualization processes.
117 lines
3.0 KiB
Markdown
117 lines
3.0 KiB
Markdown
# 拟合线迭代离群点移除优化
|
||
|
||
## 问题描述
|
||
|
||
当前的分段选择算法会选中一些"弱"枢轴点用于拟合:
|
||
|
||
- 上沿线:某个高点虽然是时间段内最高,但明显低于其他高点(如图中第二个点 5.8 元)
|
||
- 这些点会拉低/拉高拟合线,导致与主观判断不符
|
||
|
||
## 解决方案:迭代离群点移除
|
||
|
||
### 核心逻辑
|
||
|
||
```
|
||
1. 初始拟合:用所有枢轴点做线性回归
|
||
2. 计算残差:每个点到拟合线的偏差
|
||
3. 识别离群点:
|
||
- 上沿线:价格明显低于拟合线的点 = 弱高点
|
||
- 下沿线:价格明显高于拟合线的点 = 弱低点
|
||
4. 移除最差的离群点
|
||
5. 重新拟合
|
||
6. 重复直到收敛
|
||
```
|
||
|
||
### 算法流程图
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
A[输入枢轴点] --> B[初始线性回归]
|
||
B --> C[计算残差]
|
||
C --> D{存在离群点?}
|
||
D -->|是| E[移除最大离群点]
|
||
E --> F{剩余点 >= 3?}
|
||
F -->|是| G{迭代次数 < 3?}
|
||
G -->|是| B
|
||
G -->|否| H[返回当前拟合]
|
||
F -->|否| H
|
||
D -->|否| H
|
||
```
|
||
|
||
## 代码修改
|
||
|
||
### 文件: [src/converging_triangle.py](src/converging_triangle.py)
|
||
|
||
重写 `fit_pivot_line` 函数(第 230-350 行):
|
||
|
||
```python
|
||
def fit_pivot_line(
|
||
pivot_indices: np.ndarray,
|
||
pivot_values: np.ndarray,
|
||
mode: str = "upper",
|
||
min_points: int = 2,
|
||
outlier_threshold: float = 1.5, # 新增:离群点阈值(标准差倍数)
|
||
max_iterations: int = 3, # 新增:最大迭代次数
|
||
) -> Tuple[float, float, np.ndarray]:
|
||
"""
|
||
迭代离群点移除的枢轴点拟合算法
|
||
|
||
策略:
|
||
1. 先用所有点做初始拟合
|
||
2. 识别并移除偏离拟合线的"弱"点
|
||
3. 迭代直到收敛
|
||
|
||
对于上沿线:移除价格明显低于拟合线的点
|
||
对于下沿线:移除价格明显高于拟合线的点
|
||
"""
|
||
```
|
||
|
||
### 关键参数
|
||
|
||
| 参数 | 默认值 | 说明 |
|
||
|
||
|------|--------|------|
|
||
|
||
| `outlier_threshold` | 1.5 | 残差超过 1.5 倍标准差视为离群点 |
|
||
|
||
| `max_iterations` | 3 | 最多迭代 3 次,避免过度过滤 |
|
||
|
||
| `min_points` | 3 | 至少保留 3 个点用于拟合 |
|
||
|
||
### 离群点判定逻辑
|
||
|
||
**上沿线(upper)**:
|
||
|
||
```python
|
||
# 残差 = 拟合值 - 实际值
|
||
# 正残差表示点在拟合线下方(弱高点)
|
||
residuals = fitted_values - actual_values
|
||
outliers = residuals > threshold # 弱高点
|
||
```
|
||
|
||
**下沿线(lower)**:
|
||
|
||
```python
|
||
# 残差 = 实际值 - 拟合值
|
||
# 正残差表示点在拟合线上方(弱低点)
|
||
residuals = actual_values - fitted_values
|
||
outliers = residuals > threshold # 弱低点
|
||
```
|
||
|
||
### 预期效果
|
||
|
||
以图中 SZ300278 为例:
|
||
|
||
- 第二个点(5.8元)明显低于拟合线
|
||
- 在第一次迭代后会被识别为离群点并移除
|
||
- 最终拟合线只使用剩余 3 个更有代表性的高点
|
||
|
||
## 测试计划
|
||
|
||
1. 使用 SZ300278 验证修复效果
|
||
2. 对比修改前后的图表
|
||
3. 确保不会过度过滤正常的枢轴点
|
||
|
||
## 文档更新
|
||
|
||
更新 [docs/枢轴点分段选择算法详解.md](docs/枢轴点分段选择算法详解.md),添加迭代离群点移除的说明。 |