-
2024.05.04 Galeral Physical State Snapshot, Understanding the bugTIL 2024. 5. 4. 16:29
Physical State Snapshot
https://galeracluster.com/library/documentation/sst-physical.html
SST 방법 중에 물리적인 방법이 있다.
설정 값은 rsync와 xtrabackup, clone이 있는데 일반적으로 rsync가 xtrabackup보다 빠르지만 테라바이트 규모에서는 xtrabackup이 더 빠르다.
clone은 8.0.22버전 부터 들어왔다고 하는데 Mysql plugin 기반으로 xtrabackup보다는 빠르지만 donor에 DDL이 발생하면 차단한다고 한다. 이게 무슨 의미인가.
기본적으로 rsync가 default 설정이다.
The Physical State Transfer Method has the following disadvantages:
- These transfers require the joining node to have the same data directory layout and the same storage engine configuration as the donor node. For example, you must use the same file-per-table, compression, log file size and similar settings for InnoDB.
- These transfers are not accepted by servers with initialized storage engines.
- What this means is that when your node requires a state snapshot transfer, the database server must restart to apply the changes. The database server remains inaccessible to the client until the state snapshot transfer is complete, since it cannot perform authentication without the storage engines.
음.. 재시작을 해야만 SST를 적용할 수 있고 복제되는 동안 클라이언트는 접속할 수 없다는 거 같은데
이게 장애에 원인이 되었을까?
rsync-rsync_wan
https://mariadb.com/kb/en/introduction-to-state-snapshot-transfers-ssts/#rsync-rsync_wan
- The donor node is blocked with a read lock during the SST.
- Because of that, this is the recommended SST method if you do not need to allow the donor node to execute queries during the SST.
SST를 하는 와중에는 읽기 락을 사용한다.
- Use of this SST method could result in data corruption when using innodb_use_native_aio (the default) if the donor is older than MariaDB 10.3.35, MariaDB 10.4.25, MariaDB 10.5.16, MariaDB 10.6.8, or MariaDB 10.7.4; see MDEV-25975. Starting with those donor versions, wsrep_sst_method=rsync is a reliable way to upgrade the cluster to a newer major version.
관련 버그 확인이 필요하다
SSTs and Systemd
- MariaDB's systemd unit file has a default startup timeout of about 90 seconds on most systems. If an SST takes longer than this default startup timeout on a joiner node, then systemd will assume that mysqld has failed to startup, which causes systemd to kill the mysqld process on the joiner node.
- Therefore, if you are using systemd 236 or later, then you should not need to manually override TimeoutStartSec, even if your SSTs run for longer than the configured value. See MDEV-15607 for more information.
SST가 기본 시작 제한 시간보다 오래걸린다고 판단하면 mysqld 프로세스 joiner를 죽인다고 한다.
음 이게 mysql을 띄우지 못한 원인이 되었을까.
실험을 해봐야겠다.
Minimal Cluster Size
3개를 두는 이유도 donor와 joiner이외에 쿼리를 실행할 노드가 하나 있어야하기 때문이다.
'TIL' 카테고리의 다른 글
2024.05.06 Galera Crash Recovery (0) 2024.05.06 2024.05.05 Galera IST (0) 2024.05.05 2024.05.03 Galera SST (0) 2024.05.03 2024.05.02 Galera cluster streaming replication (0) 2024.05.02 2024.05.01 ContextualDeserializer (0) 2024.05.01