9. Ceph 基础篇 - Crush Maps

2022-11-03,,,

文章转载自:https://mp.weixin.qq.com/s?__biz=MzI1MDgwNzQ1MQ==&mid=2247485302&idx=1&sn=00a3a2045797b20983c06b183c935886&chksm=e9fdd282de8a5b94b26b19c4a3b51eede1270c077edaa8899a982d75d4067a48ae0b97e8d870&cur_album_id=1600845417376776197&scene=189#wechat_redirect

CRUSH MAPS 功能简介

官网:https://docs.ceph.com/en/latest/rados/operations/crush-map/

The CRUSH algorithm determines how to store and retrieve data by computing data storage locations. CRUSH empowers Ceph clients to communicate with OSDs directly rather than through a centralized server or broker. With an algorithmically determined method of storing and retrieving data, Ceph avoids a single point of failure, a performance bottleneck, and a physical limit to its scalability.

CRUSH 算法通过计算确定如何存储或者获取数据的存储位置,CRUSH 允许 Ceph Client 直接与 OSD 通信,而不是通过集中式Server或代表;通过算法确定数据存储或获取数据的方法,Ceph 避免单点故障、性能瓶颈及对其可扩展性的物理限制。

CRUSH requires a map of your cluster, and uses the CRUSH map to pseudo-randomly store and retrieve data in OSDs with a uniform distribution of data across the cluster. For a detailed discussion of CRUSH, see CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data.

CRUSH 需要您的群集的映射,并使用 CRUSH 映射伪随机地存储和检索 OSD 中的数据,并使数据在整个群集中均匀分布。有关CRUSH的详细讨论,请参见CRUSH-复制数据的受控,可伸缩,分散式放置等;

CRUSH maps contain a list of OSDs, a list of ‘buckets’ for aggregating the devices into physical locations, and a list of rules that tell CRUSH how it should replicate data in a Ceph cluster’s pools. By reflecting the underlying physical organization of the installation, CRUSH can model—and thereby address—potential sources of correlated device failures. Typical sources include physical proximity, a shared power source, and a shared network. By encoding this information into the cluster map, CRUSH placement policies can separate object replicas across different failure domains while still maintaining the desired distribution. For example, to address the possibility of concurrent failures, it may be desirable to ensure that data replicas are on devices using different shelves, racks, power supplies, controllers, and/or physical locations.

CRUSH maps 包括OSD列表、用于将设备聚合到实际物理位置的 buckets(桶)列表,以及用于告诉 CURSH 如何在Ceph集群池中复制数据的规则列表。通过反映安装的底层物理组织,CRUSH可以对相关设备故障的潜在来源进行建模(从而解决)。典型的源包括物理邻近度、共享电源以及共享网络。通过编码这些信息到集群图中,CRUSH 放置策略分离对象副本到不同的故障域中,同时持续维护所期望的分布。比如,为了解决并发故障的可能性,可能需要确保数据副本位于使用不同架子,机架,电源,控制器和/或物理位置的设备上。

When you deploy OSDs they are automatically placed within the CRUSH map under a host node named with the hostname for the host they are running on. This, combined with the default CRUSH failure domain, ensures that replicas or erasure code shards are separated across hosts and a single host failure will not affect availability. For larger clusters, however, administrators should carefully consider their choice of failure domain. Separating replicas across racks, for example, is common for mid- to large-sized clusters.

当你部署 OSD 时,它们会自动放置在 CRUSH 映射中的主机节点下,该主机节点以其运行的主机的主机名命名。这与默认的 CRUSH 故障域结合在一起,可确保副本或纠删码碎片在主机之间是分开的,并且单个主机故障不会影响可用性。但是,对于较大的群集,管理员应仔细考虑他们对故障域的选择。例如,在中大型集群中,跨机架分隔副本很常见。

Crush 算法简单来说,就是用来完成你的数据,在集群中如何来分配,并且使用什么样的容灾策略确保数据的完整性。

Crush Map 信息查看

ceph map 信息

Crush map 拓扑详情

[root@ceph-node01 ~]# ceph osd crush dump
{
"devices": [ # 设备
{
"id": 0,
"name": "osd.0",
"class": "hdd"
},
。。。
],
"types": [ # 容灾级别类型
{
"type_id": 0,
"name": "osd"
},
{
"type_id": 1,
"name": "host"
},
。。。
{
"type_id": 3,
"name": "rack"
},
{
"type_id": 4,
"name": "row"
},
。。。
{
"type_id": 7,
"name": "room"
},
{
"type_id": 8,
"name": "datacenter"
},
。。。
{
"type_id": 11,
"name": "root"
}
],
"buckets": [ # 数据组织形式,bucket可以嵌套,把相同的放一起
{
"id": -1,
"name": "default", # 名称是default
"type_id": 11, # type 的ID
"type_name": "root", # 类型为顶部根
"weight": 44809,
"alg": "straw2",
"hash": "rjenkins1",
"items": [ # 包含3台主机
{
"id": -3, # 主机ID
"weight": 12804,
"pos": 0
},
{
"id": -5,
"weight": 6402,
"pos": 1
},
{
"id": -7,
"weight": 25603,
"pos": 2
}
]
},
{
"id": -2,
"name": "default~hdd",
"type_id": 11,
"type_name": "root",
"weight": 44809,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": -4,
"weight": 12804,
"pos": 0
},
{
"id": -6,
"weight": 6402,
"pos": 1
},
{
"id": -8,
"weight": 25603,
"pos": 2
}
]
},
{
"id": -3,
"name": "ceph-node01",
"type_id": 1,
"type_name": "host",
"weight": 12804,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 0,
"weight": 6402,
"pos": 0
},
{
"id": 3,
"weight": 6402,
"pos": 1
}
]
},
{
"id": -4,
"name": "ceph-node01~hdd",
"type_id": 1,
"type_name": "host",
"weight": 12804,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 0,
"weight": 6402,
"pos": 0
},
{
"id": 3,
"weight": 6402,
"pos": 1
}
]
},
。。。
],
"rules": [
{
"rule_id": 0,
"rule_name": "replicated_rule", # 规则名称,默认
"ruleset": 0,
"type": 1, # 这个type也是指上面容灾级别类型
"min_size": 1,
"max_size": 10, # 允许副本范围是多少这里是1-10;
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default" # 使用哪个bucket规则
},
{
"op": "chooseleaf_firstn", # 叶子节点,类型为host
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
],
"tunables": {
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 1,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "jewel",
"optimal_tunables": 1,
"legacy_tunables": 0,
"minimum_required_version": "jewel",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 1,
"require_feature_tunables5": 1,
"has_v5_rules": 0
},
"choose_args": {}
} [root@ceph-node01 ~]#

规则查看

[root@ceph-node01 ~]# ceph osd crush rule ls
replicated_rule
[root@ceph-node01 ~]# ceph osd pool get ceph-demo crush_rule
crush_rule: replicated_rule
[root@ceph-node01 ~]#

定制 Crush Map 拓扑架构

定制 Crush Map 有两种方式,一种是使用手动编辑Crush Map,官网:https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/,还有一种是借助子命令的方式进行修改,官网:https://docs.ceph.com/en/latest/rados/operations/crush-map/,下面会对这两种方式分别简单介绍。

实验拓扑

手动编辑步骤

1.Get the CRUSH map

[root@ceph-node01 ceph-deploy]# ceph osd getcrushmap -o crushmap.bin
21
[root@ceph-node01 ceph-deploy]# file crushmap.bin
crushmap.bin: MS Windows icon resource - 8 icons, 1-colors
[root@ceph-node01 ceph-deploy]#

此文件为二进制文件,无法查看,如果想查看,需要解密;

2.Decompile the CRUSH map

[root@ceph-node01 ceph-deploy]# crushtool -d crushmap.bin -o crushmap.txt
[root@ceph-node01 ceph-deploy]# cat crushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54 # devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root # buckets
host ceph-node01 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.195
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.098
item osd.3 weight 0.098
}
host ceph-node02 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 0.195
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.098
item osd.1 weight 0.098
}
host ceph-node03 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 0.391
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.195
item osd.5 weight 0.098
item osd.6 weight 0.098
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 0.781
alg straw2
hash 0 # rjenkins1
item ceph-node01 weight 0.195
item ceph-node02 weight 0.195
item ceph-node03 weight 0.391
} # rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
} # end crush map
[root@ceph-node01 ceph-deploy]#

3.Edit at least one of Devices, Buckets and Rules

[root@ceph-node01 ceph-deploy]# cat crushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd # types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root # buckets
host ceph-node01 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.195
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.098
}
host ceph-node01-ssd {
alg straw2
hash 0 # rjenkins1
item osd.3 weight 0.098
}
host ceph-node02-ssd {
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.098
}
host ceph-node03-ssd {
alg straw2
hash 0 # rjenkins1
item osd.5 weight 0.098
item osd.6 weight 0.098
}
host ceph-node02 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 0.195
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.098
}
host ceph-node03 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 0.391
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.195
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 0.781
alg straw2
hash 0 # rjenkins1
item ceph-node01 weight 0.098
item ceph-node02 weight 0.098
item ceph-node03 weight 0.195
} root ssd {
# weight 0.781
alg straw2
hash 0 # rjenkins1
item ceph-node01-ssd weight 0.098
item ceph-node02-ssd weight 0.098
item ceph-node03-ssd weight 0.196
} # rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
} rule demo_replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take ssd
step chooseleaf firstn 0 type host
step emit
}
# end crush map
[root@ceph-node01 ceph-deploy]#

4.Recompile the CRUSH map.

[root@ceph-node01 ceph-deploy]# crushtool -c crushmap.txt -o crushmap-new.bin
[root@ceph-node01 ceph-deploy]#

5.Set the CRUSH map.

[root@ceph-node01 ceph-deploy]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.78142 root default
-3 0.19537 host ceph-node01
0 hdd 0.09769 osd.0 up 1.00000 1.00000
3 hdd 0.09769 osd.3 up 1.00000 1.00000
-5 0.19537 host ceph-node02
1 hdd 0.09769 osd.1 up 1.00000 1.00000
4 hdd 0.09769 osd.4 up 1.00000 1.00000
-7 0.39067 host ceph-node03
2 hdd 0.19530 osd.2 up 1.00000 1.00000
5 hdd 0.09769 osd.5 up 1.00000 1.00000
6 hdd 0.09769 osd.6 up 1.00000 1.00000
[root@ceph-node01 ceph-deploy]#
[root@ceph-node01 ceph-deploy]# ceph osd setcrushmap -i crushmap-new.bin
22
[root@ceph-node01 ceph-deploy]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-12 0.39198 root ssd
-9 0.09799 host ceph-node01-ssd
3 ssd 0.09799 osd.3 up 1.00000 1.00000
-10 0.09799 host ceph-node02-ssd
4 ssd 0.09799 osd.4 up 1.00000 1.00000
-11 0.19600 host ceph-node03-ssd
5 ssd 0.09799 osd.5 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
-1 0.39098 root default
-3 0.09799 host ceph-node01
0 hdd 0.09799 osd.0 up 1.00000 1.00000
-5 0.09799 host ceph-node02
1 hdd 0.09799 osd.1 up 1.00000 1.00000
-7 0.19499 host ceph-node03
2 hdd 0.19499 osd.2 up 1.00000 1.00000
[root@ceph-node01 ceph-deploy]#

6.查看规则

[root@ceph-node01 ceph-deploy]# ceph osd crush rule ls
replicated_rule
demo_replicated_rule
[root@ceph-node01 ceph-deploy]#

7.设置规则

[root@ceph-node01 ceph-deploy]# ceph osd pool get ceph-demo crush_rule
crush_rule: replicated_rule
[root@ceph-node01 ceph-deploy]# ceph osd crush rule ls
replicated_rule
demo_replicated_rule
[root@ceph-node01 ceph-deploy]# ceph osd pool set ceph-demo crush_rule demo_replicated_rule
set pool 1 crush_rule to demo_replicated_rule
[root@ceph-node01 ceph-deploy]# ceph osd pool get ceph-demo crush_rule
crush_rule: demo_replicated_rule
[root@ceph-node01 ceph-deploy]#
    验证
[root@ceph-node01 ceph-deploy]# rbd create ceph-demo/demo.img --size 10G
[root@ceph-node01 ceph-deploy]# ceph osd map ceph-demo demo.img
osdmap e1375 pool 'ceph-demo' (1) object 'demo.img' -> pg 1.c1a6751d (1.1d) -> up ([3,4,6], p3) acting ([3,4,6], p3)
[root@ceph-node01 ceph-deploy]#

可以看到它有3个副本,分别落在3、4、6的OSD上面,符合我们的预期;通过以上操作即可完成分类。

9.删除

[root@ceph-node01 ceph-deploy]# ceph osd crush rule ls
replicated_rule
demo_replicated_rule
[root@ceph-node01 ceph-deploy]# ceph osd pool set ceph-demo crush_rule replicated_rule
set pool 1 crush_rule to replicated_rule
[root@ceph-node01 ceph-deploy]# crushtool osd setcrushmap -i crushmap.bin
no action specified; -h for help
[root@ceph-node01 ceph-deploy]# ceph osd setcrushmap -i crushmap.bin
23
[root@ceph-node01 ceph-deploy]# ceph osd crush tree
ID CLASS WEIGHT TYPE NAME
-1 0.78142 root default
-3 0.19537 host ceph-node01
0 hdd 0.09769 osd.0
3 hdd 0.09769 osd.3
-5 0.19537 host ceph-node02
1 hdd 0.09769 osd.1
4 hdd 0.09769 osd.4
-7 0.39067 host ceph-node03
2 hdd 0.19530 osd.2
5 hdd 0.09769 osd.5
6 hdd 0.09769 osd.6
[root@ceph-node01 ceph-deploy]#

命令行编辑步骤

1.添加一个根 bucket

[root@ceph-node01 ceph-deploy]# ceph osd crush add-bucket ssd root
added bucket ssd type root to crush map
[root@ceph-node01 ceph-deploy]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0 root ssd
-1 0.78142 root default
-3 0.19537 host ceph-node01
0 hdd 0.09769 osd.0 up 1.00000 1.00000
3 hdd 0.09769 osd.3 up 1.00000 1.00000
-5 0.19537 host ceph-node02
1 hdd 0.09769 osd.1 up 1.00000 1.00000
4 hdd 0.09769 osd.4 up 1.00000 1.00000
-7 0.39067 host ceph-node03
2 hdd 0.19530 osd.2 up 1.00000 1.00000
5 hdd 0.09769 osd.5 up 1.00000 1.00000
6 hdd 0.09769 osd.6 up 1.00000 1.00000
[root@ceph-node01 ceph-deploy]#

ssd 是名称、root是类型;

删除根bucket

[root@ceph-node01 ~]# ceph osd crush remove ssd
removed item id -9 name 'ssd' from crush map
[root@ceph-node01 ~]#

2.添加osd bucket

[root@ceph-node01 ceph-deploy]# ceph osd crush add-bucket ceph-node01-ssd host
added bucket ceph-node01-ssd type host to crush map
[root@ceph-node01 ceph-deploy]# ceph osd crush add-bucket ceph-node02-ssd host
added bucket ceph-node02-ssd type host to crush map
[root@ceph-node01 ceph-deploy]# ceph osd crush add-bucket ceph-node03-ssd host
added bucket ceph-node03-ssd type host to crush map
[root@ceph-node01 ceph-deploy]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-12 0 host ceph-node03-ssd
-11 0 host ceph-node02-ssd
-10 0 host ceph-node01-ssd
-9 0 root ssd
-1 0.78142 root default
-3 0.19537 host ceph-node01
0 hdd 0.09769 osd.0 up 1.00000 1.00000
3 hdd 0.09769 osd.3 up 1.00000 1.00000
-5 0.19537 host ceph-node02
1 hdd 0.09769 osd.1 up 1.00000 1.00000
4 hdd 0.09769 osd.4 up 1.00000 1.00000
-7 0.39067 host ceph-node03
2 hdd 0.19530 osd.2 up 1.00000 1.00000
5 hdd 0.09769 osd.5 up 1.00000 1.00000
6 hdd 0.09769 osd.6 up 1.00000 1.00000
[root@ceph-node01 ceph-deploy]#

删除 osd bucket

[root@ceph-node01 ~]# ceph osd crush remove ceph-node01-ssd
removed item id -10 name 'ceph-node01-ssd' from crush map
[root@ceph-node01 ~]# ceph osd crush remove ceph-node02-ssd
removed item id -11 name 'ceph-node02-ssd' from crush map
[root@ceph-node01 ~]# ceph osd crush remove ceph-node03-ssd
removed item id -12 name 'ceph-node03-ssd' from crush map
[root@ceph-node01 ~]#

3.设置 class

[root@ceph-node01 ~]# ceph osd crush rm-device-class osd.4
done removing class of osd(s): 4
[root@ceph-node01 ~]# ceph osd crush rm-device-class osd.5
done removing class of osd(s): 5
[root@ceph-node01 ~]# ceph osd crush rm-device-class osd.6
done removing class of osd(s): 6
[root@ceph-node01 ~]# ceph -s
cluster:
id: cc10b0cb-476f-420c-b1d6-e48c1dc929af
health: HEALTH_OK services:
mon: 3 daemons, quorum ceph-node01,ceph-node02,ceph-node03 (age 14h)
mgr: ceph-node01(active, since 5d), standbys: ceph-node03, ceph-node02
osd: 7 osds: 7 up (since 14h), 7 in (since 14h)
rgw: 1 daemon active (ceph-node01) task status: data:
pools: 7 pools, 224 pgs
objects: 1.00k objects, 2.6 GiB
usage: 16 GiB used, 784 GiB / 800 GiB avail
pgs: 224 active+clean [root@ceph-node01 ~]# ceph osd crush set-device-class ssd osd.4
set osd(s) 4 to class 'ssd'
[root@ceph-node01 ~]# ceph osd crush set-device-class ssd osd.5
set osd(s) 5 to class 'ssd'
[root@ceph-node01 ~]# ceph osd crush set-device-class ssd osd.6
set osd(s) 6 to class 'ssd'
[root@ceph-node01 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.39075 root ssd
-10 0.09769 host ceph-node01-ssd
3 ssd 0.09769 osd.3 up 1.00000 1.00000
-11 0.09769 host ceph-node02-ssd
4 ssd 0.09769 osd.4 up 1.00000 1.00000
-12 0.19537 host ceph-node03-ssd
5 ssd 0.09769 osd.5 up 1.00000 1.00000
6 ssd 0.09769 osd.6 up 1.00000 1.00000
-1 0.39067 root default
-3 0.09769 host ceph-node01
0 hdd 0.09769 osd.0 up 1.00000 1.00000
-5 0.09769 host ceph-node02
1 hdd 0.09769 osd.1 up 1.00000 1.00000
-7 0.19530 host ceph-node03
2 hdd 0.19530 osd.2 up 1.00000 1.00000
[root@ceph-node01 ~]#

4.关联root bucket 与新建 host bucket

[root@ceph-node01 ceph-deploy]# ceph osd crush move ceph-node01-ssd root=ssd
moved item id -10 name 'ceph-node01-ssd' to location {root=ssd} in crush map
[root@ceph-node01 ceph-deploy]# ceph osd crush move ceph-node02-ssd root=ssd
moved item id -11 name 'ceph-node02-ssd' to location {root=ssd} in crush map
[root@ceph-node01 ceph-deploy]# ceph osd crush move ceph-node03-ssd root=ssd
moved item id -12 name 'ceph-node03-ssd' to location {root=ssd} in crush map
[root@ceph-node01 ceph-deploy]#
[root@ceph-node01 ceph-deploy]# ceph osd crush move osd.3 host=ceph-node01-ssd root=ssd
moved item id 3 name 'osd.3' to location {host=ceph-node01-ssd,root=ssd} in crush map
[root@ceph-node01 ceph-deploy]# ceph osd crush move osd.4 host=ceph-node02-ssd root=ssd
moved item id 4 name 'osd.4' to location {host=ceph-node02-ssd,root=ssd} in crush map
[root@ceph-node01 ceph-deploy]# ceph osd crush move osd.5 host=ceph-node03-ssd root=ssd
moved item id 5 name 'osd.5' to location {host=ceph-node03-ssd,root=ssd} in crush map
[root@ceph-node01 ceph-deploy]# ceph osd crush move osd.6 host=ceph-node03-ssd root=ssd
moved item id 6 name 'osd.6' to location {host=ceph-node03-ssd,root=ssd} in crush map
[root@ceph-node01 ceph-deploy]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.39075 root ssd
-10 0.09769 host ceph-node01-ssd
3 hdd 0.09769 osd.3 up 1.00000 1.00000
-11 0.09769 host ceph-node02-ssd
4 hdd 0.09769 osd.4 up 1.00000 1.00000
-12 0.19537 host ceph-node03-ssd
5 hdd 0.09769 osd.5 up 1.00000 1.00000
6 hdd 0.09769 osd.6 up 1.00000 1.00000
-1 0.39067 root default
-3 0.09769 host ceph-node01
0 hdd 0.09769 osd.0 up 1.00000 1.00000
-5 0.09769 host ceph-node02
1 hdd 0.09769 osd.1 up 1.00000 1.00000
-7 0.19530 host ceph-node03
2 hdd 0.19530 osd.2 up 1.00000 1.00000
[root@ceph-node01 ceph-deploy]#

删除的话,与创建差不多,只需要把osd移动到其它host上面即可

[root@ceph-node01 ~]# ceph osd crush move osd.3 host=ceph-node01 root=default
moved item id 3 name 'osd.3' to location {host=ceph-node01,root=default} in crush map
[root@ceph-node01 ~]# ceph osd crush move osd.4 host=ceph-node02 root=default
moved item id 4 name 'osd.4' to location {host=ceph-node02,root=default} in crush map
[root@ceph-node01 ~]# ceph osd crush move osd.5 host=ceph-node03 root=default
moved item id 5 name 'osd.5' to location {host=ceph-node03,root=default} in crush map
[root@ceph-node01 ~]# ceph osd crush move osd.6 host=ceph-node03 root=default
moved item id 6 name 'osd.6' to location {host=ceph-node03,root=default} in crush map
[root@ceph-node01 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0 root ssd
-10 0 host ceph-node01-ssd
-11 0 host ceph-node02-ssd
-12 0 host ceph-node03-ssd
-1 0.78142 root default
-3 0.19537 host ceph-node01
0 hdd 0.09769 osd.0 up 1.00000 1.00000
3 ssd 0.09769 osd.3 up 1.00000 1.00000
-5 0.19537 host ceph-node02
1 hdd 0.09769 osd.1 up 1.00000 1.00000
4 ssd 0.09769 osd.4 up 1.00000 1.00000
-7 0.39067 host ceph-node03
2 hdd 0.19530 osd.2 up 1.00000 1.00000
5 ssd 0.09769 osd.5 up 1.00000 1.00000
6 ssd 0.09769 osd.6 up 1.00000 1.00000
[root@ceph-node01 ~]#

5.创建规则

[root@ceph-node01 ceph-deploy]# ceph osd crush rule create-replicated ssd-demo ssd host hdd
[root@ceph-node01 ceph-deploy]#

删除规则

[root@ceph-node01 ~]# ceph osd crush rule rm ssd-demo
[root@ceph-node01 ~]# ceph osd crush rule ls
replicated_rule
[root@ceph-node01 ~]#

6.设置 pool 使用规则

[root@ceph-node01 ceph-deploy]# ceph osd crush rule ls
replicated_rule
ssd-demo
[root@ceph-node01 ceph-deploy]#

7.使用规则

[root@ceph-node01 ceph-deploy]# ceph osd map ceph-demo demo.img
osdmap e1738 pool 'ceph-demo' (1) object 'demo.img' -> pg 1.c1a6751d (1.1d) -> up ([0,2,1], p0) acting ([0,2,1], p0)
[root@ceph-node01 ceph-deploy]# ceph osd pool set ceph-demo crush_rule ssd-demo
set pool 1 crush_rule to ssd-demo
[root@ceph-node01 ceph-deploy]# ceph osd map ceph-demo demo.img
osdmap e1745 pool 'ceph-demo' (1) object 'demo.img' -> pg 1.c1a6751d (1.1d) -> up ([4,3,6], p4) acting ([0,1,2], p0)
[root@ceph-node01 ceph-deploy]#

查看修改规则前后,这个 img 使用的是不同规则下面的 osd 存放数据。以上是我们手动通过命令行的方式进行创建 CRUSH MAP 规则。

编辑注意事项

1.CRUSH MAPS 无论使用导出配置文件重新编辑再导入的方式,还是使用命令行方式,都需要记得备份,以防出现问题时回溯;

2.CRUSH MAPS 最好在集群搭建时,规划好,不要在集群运行后再进行调整,运行中的集群中存在大量的数量,如果调整CRUSH MAPS的话,会涉及大量的数据迁移,前期规划好;

3.默认情况下,通过上面两种方式调整CRUSH MAPS后,会有种隐患,重启了osd后,系统自动调整了 CRUSH MAPS,如下:

[root@ceph-node01 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-12 0.39198 root ssd
-9 0.09799 host ceph-node01-ssd
3 ssd 0.09799 osd.3 up 1.00000 1.00000
-10 0.09799 host ceph-node02-ssd
4 ssd 0.09799 osd.4 up 1.00000 1.00000
-11 0.19600 host ceph-node03-ssd
5 ssd 0.09799 osd.5 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
-1 0.39098 root default
-3 0.09799 host ceph-node01
0 hdd 0.09799 osd.0 up 1.00000 1.00000
-5 0.09799 host ceph-node02
1 hdd 0.09799 osd.1 up 1.00000 1.00000
-7 0.19499 host ceph-node03
2 hdd 0.19499 osd.2 up 1.00000 1.00000
[root@ceph-node01 ~]#
[root@ceph-node01 ~]# systemctl restart ceph-osd@3
[root@ceph-node01 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-12 0.29399 root ssd
-9 0 host ceph-node01-ssd
-10 0.09799 host ceph-node02-ssd
4 ssd 0.09799 osd.4 up 1.00000 1.00000
-11 0.19600 host ceph-node03-ssd
5 ssd 0.09799 osd.5 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
-1 0.48897 root default
-3 0.19598 host ceph-node01
0 hdd 0.09799 osd.0 up 1.00000 1.00000
3 ssd 0.09799 osd.3 up 1.00000 1.00000
-5 0.09799 host ceph-node02
1 hdd 0.09799 osd.1 up 1.00000 1.00000
-7 0.19499 host ceph-node03
2 hdd 0.19499 osd.2 up 1.00000 1.00000
[root@ceph-node01 ~]#

发现原来在ceph-node01-ssd bucket下面的osd.3 已经到了默认bucket下面,其实是动态修改了Crush map,这是因为我们增加或者减少osd的时候,都会动态加载crush maps default 规则;官网:https://docs.ceph.com/en/latest/rados/operations/crush-map/,如果禁用的话,可以使用 osd crush update on start = false 来禁用。

查看默认参数:

[root@ceph-node01 ceph-deploy]# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-node01.asok config show |grep osd_crush_update_on_start
"osd_crush_update_on_start": "true",
[root@ceph-node01 ceph-deploy]#

修改方式,添加 osd

[root@ceph-node01 ceph-deploy]# cat ceph.conf
[global]
。。。
。。。
mon_max_pg_per_osd = 500 [client.rgw.ceph-node01]
rgw_frontends = "civetweb port=80" [osd]
osd crush update on start = false
[root@ceph-node01 ceph-deploy]#

推送配置文件:

[root@ceph-node01 ceph-deploy]# ceph-deploy --overwrite-conf config push ceph-node01 ceph-node02 ceph-node03
[root@ceph-node01 ceph-deploy]#

验证配置文件

[root@ceph-node01 ceph-deploy]# crushtool osd setcrushmap -i crushmap-new.bin
no action specified; -h for help
[root@ceph-node01 ceph-deploy]# ceph osd setcrushmap -i crushmap-new.bin
63
[root@ceph-node01 ceph-deploy]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-12 0.39198 root ssd
-9 0.09799 host ceph-node01-ssd
3 ssd 0.09799 osd.3 up 1.00000 1.00000
-10 0.09799 host ceph-node02-ssd
4 ssd 0.09799 osd.4 up 1.00000 1.00000
-11 0.19600 host ceph-node03-ssd
5 ssd 0.09799 osd.5 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
-1 0.39098 root default
-3 0.09799 host ceph-node01
0 hdd 0.09799 osd.0 up 1.00000 1.00000
-5 0.09799 host ceph-node02
1 hdd 0.09799 osd.1 up 1.00000 1.00000
-7 0.19499 host ceph-node03
2 hdd 0.19499 osd.2 up 1.00000 1.00000
[root@ceph-node01 ceph-deploy]#
[root@ceph-node01 ceph-deploy]# systemctl restart ceph-osd@3
[root@ceph-node01 ceph-deploy]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-12 0.39198 root ssd
-9 0.09799 host ceph-node01-ssd
3 ssd 0.09799 osd.3 up 1.00000 1.00000
-10 0.09799 host ceph-node02-ssd
4 ssd 0.09799 osd.4 up 1.00000 1.00000
-11 0.19600 host ceph-node03-ssd
5 ssd 0.09799 osd.5 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
-1 0.39098 root default
-3 0.09799 host ceph-node01
0 hdd 0.09799 osd.0 up 1.00000 1.00000
-5 0.09799 host ceph-node02
1 hdd 0.09799 osd.1 up 1.00000 1.00000
-7 0.19499 host ceph-node03
2 hdd 0.19499 osd.2 up 1.00000 1.00000
[root@ceph-node01 ceph-deploy]#

再次重启 osd,发现crushmap规则并没有发生改变。

9. Ceph 基础篇 - Crush Maps的相关教程结束。

《9. Ceph 基础篇 - Crush Maps.doc》

下载本文的Word格式文档,以方便收藏与打印。