Spring Data JPA @OneToMany及FetchMode.SUBSELECT性能问题及踩坑分析

1 背景

项目中需要对外提供分页接口,在经过几次迭代后,BA发现分页接口出现了某些特定查询条件返回timeout的情况,经过排查发现,这是@OneToMany映射与FetchMode.SUBSELECT结合使用时的性能问题。

2. 代码示例

下面给出经过脱敏和简化后的部分项目代码以作说明。

2.1 Entity Info

  1. BucketEntity Info
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
@Entity
@Table(name = "bucket")
public class BucketEntity {

@Id
@Column(name = "bucket_id")
private String id;

@Column(name = "bucket_number")
private String number;

@Column(name = "bucket_description")
private String description;

private String status;

@Column(name = "updated_timestamp")
private Instant timestamp;

@OneToMany(cascade = CascadeType.ALL, fetch = FetchType.EAGER)
@Fetch(value = FetchMode.SUBSELECT)
@JoinColumn(name = "bucket_id", referencedColumnName = "bucket_id", nullable = false, updatable = false)
private List<BucketLocationEntity> bucketLocations;
}
  1. BucketLocationEntity info
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
@Entity
@Table(name = "bucket_location")
public class BucketLocationEntity implements Serializable {

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "bucket_location_id")
private Long id;

@Column(name = "bucket_id", insertable = false, updatable = false)
private String bucketId;

private String location;

@OneToMany(cascade = CascadeType.ALL, fetch = FetchType.EAGER)
@Fetch(value = FetchMode.SUBSELECT)
@JoinColumn(name = "bucket_location_id", referencedColumnName = "bucket_location_id", nullable = false, updatable = false)
private List<BucketLocationZipCodeEntity> bucketLocationZipCodes;

@OneToMany(cascade = CascadeType.ALL, fetch = FetchType.EAGER)
@Fetch(value = FetchMode.SUBSELECT)
@JoinColumn(name = "bucket_location_id", referencedColumnName = "bucket_location_id", nullable = false, updatable = false)
private List<BucketSalesAreaEntity> bucketSalesAreas;
}

2.2 Controller code

1
2
3
4
5
6
7
@GetMapping("...")
public PageableResponse<List<BucketDto>> getBucketsByLocation(
String location, Integer pageNumber, Integer pageSize) {

Page<BucketDto> page = bucketService.findBucketsByLocation(location, pageNumber, pageSize);
return transformToPageableResponse(page);
}

2.3 Service code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@Service
@RequiredArgsConstructor
public class BucketService {

private final BucketRepository bucketRepository;
private final BucketDtoTransformer bucketDtoTransformer;

public Page<BucketDto> findBucketsByLocation(String location, int pageNumber, int pageSize) {
Pageable pageable = PageRequest.of(pageNumber - 1, pageSize);
Page<BucketEntity> page = bucketRepository.findAllByBucketLocations_Location(location, pageable);

......

return page.map(bucketEntity -> {
return bucketDtoTransformer.transform(bucketEntity);
});
}
}

3 运行结果

3.1 paging query bucket sql

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Hibernate: 
select
bucketen0_.bucket_id as bucket1_1_,
bucketen0_.bucket_number as bulket2_1_,
bucketen0_.bucket_description as bucket3_1_,
bucketen0_.status as status14_1_,
bucketen0_.updated_timestamp as updated15_1_
from
bucket bucketen0_
left outer join
bucket_location bucketlo1_
on bucketen0_.bucket_id=bucketlo1_.bucket_id
where
bucketlo1_.location = ? limit ? offset ?

3.2 query bucket_location sql

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Hibernate: 
select
bucketlo0_.bucket_id as bucket1_8_1_,
bucketlo0_.bucket_location_id as bucket1_8_0_,
bucketlo0_.location as location2_8_0_
from
bucket_location bucketlo0_
where
bucketlo0_.bucket_id in (
select
bucketen0_.bucket_id
from
bucket bucketen0_
left outer join
bucket_location bucketlo1_
on bucketen0_.bucket_id=bucketlo1_.bucket_id
where
bucketlo1_.location = ?
)

3.3 query bucket_location_zip_code sql

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Hibernate:
select
zipcode0_.bucket_location_id as bucket5_9_1_,
zipcode0_.valuation_type_code as valuatio2_9_0_,
zipcode0_.handling_type_code as handling3_9_0_,
zipcode0_.handling_type_description as handling4_9_0_
from
bucket_location_zip_code zipcode0_
where
zipcode0_.bucket_location_id in (
select
bucketlo0_.bucket_location_id
from
bucket_location bucketlo0_
where
bucketlo0_.bucket_id in (
select
bucketen0_.bucket_id
from
bucket bucketen0_
left outer join
bucket_location bucketlo1_
on bucketen0_.bucket_id=bucketlo1_.bucket_id
where
bucketlo1_.location = ?
)
)

4 踩坑分析

4.1 慢sql查询分析

根据打印出来的sql可以看出,第一条sql查询bucket数据时是做了limit限制的。但第二条sql查询bucket_location数据时,我们发现limit条件没有出现在子查询语句中,这就导致了一次大数据量的查询。

从第三条sql语句中不难发现,与bucket_location OneToMany关联的bucket_location_zip_code查询中会将所有location与查询条件一致的数据项查询出来,这同样也是一次大数据量的查询。

由此引发的连锁反应会导致分页查询接口经常出现timeout的报错。

4.2 原因分析

当 Hibernate 加载父实体时,它会记住用于加载父实体时的查询语句。当 Hibernate 尝试加载子实体时,父查询将用作子查询。当 Hibernate 考虑 where 条件时,它不会记住父查询中的 limit 和 offset 子句。如果子表有数百万条记录,性能损失将是严重的。

5 解决方案

5.1 方案说明

避免FetchMode.SUBSELECT在查询子实体时出现忽略limit和offset的问题,我们可以将FetchMode改为SELECT,配上@BatchSize注解并将batchSize的值设置为分页接口的MAX_PAGE_SIZE

5.2 代码细节

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
@Entity
@Table(name = "bucket")
public class BucketEntity {

@Id
@Column(name = "bucket_id")
private String id;

@Column(name = "bucket_number")
private String number;

@Column(name = "bucket_description")
private String description;

private String status;

@Column(name = "updated_timestamp")
private Instant timestamp;

@OneToMany(cascade = CascadeType.ALL, fetch = FetchType.EAGER)
@Fetch(value = FetchMode.SELECT)
@BatchSize(size = MAX_PAGE_SIZE)
@JoinColumn(name = "bucket_id", referencedColumnName = "bucket_id", nullable = false, updatable = false)
private List<BucketLocationEntity> bucketLocations;
}

5.3 改进结果

修改之后,我们再次用相同的请求参数调用分页接口。

分页查询的sql和修改之前一样,此处就不贴了。

1
2
3
4
5
6
7
8
9
10
11
Hibernate: 
select
bucketlo0_.bucket_id as bucket1_8_1_,
bucketlo0_.bucket_location_id as bucket1_8_0_,
bucketlo0_.location as location2_8_0_
from
bucket_location bucketlo0_
where
bucketlo0_.bucket_id in (
?, ?, ?, ?, ?
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Hibernate:
select
zipcode0_.bucket_location_id as bucket5_9_1_,
zipcode0_.valuation_type_code as valuatio2_9_0_,
zipcode0_.handling_type_code as handling3_9_0_,
zipcode0_.handling_type_description as handling4_9_0_
from
bucket_location_zip_code zipcode0_
where
zipcode0_.bucket_location_id in (
select
bucketlo0_.bucket_location_id
from
bucket_location bucketlo0_
where
bucketlo0_.bucket_id in (
?, ?, ?, ?, ?
)
)

可以看到改进之后,sql中的查询条件变为了bucketlo0_.bucket_id in (?, ?, ?, ?, ?) 这样的限定范围的in查询,这样我们就可以通过设置@BatchSize及其value来限定查询的个数,从而达到最有效和最快速查询的目的。