CephFS运维指南
这是一个 Ceph 运维指南。
Operation

安装 (Helm)
添加 Rook Helm 仓库:
bashhelm repo add rook-release https://charts.rook.io/release安装操作器:
bashhelm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph -f values.yaml安装集群:
bashhelm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster -f values.yaml
操作器配置
rook-ceph-operator 默认值
Pod 资源请求和限制:
yamlresources: requests: cpu: 20m全局日志级别:
yamllogLevel: INFOCSI 配置:
RBD 供应器资源:
yamlcsiRBDProvisionerResource: | - name: csi-provisioner resource: requests: cpu: 10m ...RBD 插件资源:
yamlcsiRBDPluginResource: | - name: driver-registrar resource: requests: memory: 128Mi cpu: 5m limits: memory: 256Mi ...CephFS 供应器和插件资源(类似格式)。
NFS 供应器和插件资源(类似格式)。
监控:
yamlmonitoring: enabled: true
集群配置
工具箱:
yamltoolbox: enabled: true resources: requests: cpu: '10m'Ceph 集群规格:
yamlcephClusterSpec: dashboard: port: 7000 labels: monitoring: release: prometheus-stack resources: mgr: requests: cpu: "50m" mon: requests: cpu: "100m" ... removeOSDsIfOutAndSafeToRemove: true
移除 OSD
停止 Rook 操作器:
bashkubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=0将 OSD 标记为 out:
bashceph osd out osd.<ID>确认 OSD 已下线:
bashkubectl -n rook-ceph scale deployment rook-ceph-osd-<ID> --replicas=0 ceph osd down osd.<ID>等待回填完成(
active+cleanPG)。移除 OSD:
bashceph osd purge <ID> --yes-i-really-mean-it ceph auth del osd.<ID> ceph osd crush remove <nodeName>验证:
bashceph osd tree重启 Rook 操作器。
磁盘分区
列出可用磁盘:
bashsudo fdisk -l分区磁盘:
bashsudo fdisk /dev/sda # 使用 `n` 创建,使用 `w` 保存。
清理设备
清除分区:
bashsgdisk --zap-all $DISK dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
暴露监控 GUI
证书定义(Cert Manager):
yamlapiVersion: cert-manager.io/v1 kind: Certificate metadata: name: ceph-sololude-certificate namespace: istio-ingress spec: secretName: ceph-ingress-cert commonName: ceph.sololude.com dnsNames: - ceph.sololude.com issuerRef: name: sololude-issuer网关定义(Istio):
yamlapiVersion: networking.istio.io/v1beta1 kind: Gateway metadata: name: rook-ceph-dashboard-gw namespace: rook-ceph spec: selector: app: istio-ingressgateway servers: - port: number: 443 name: https-ceph protocol: HTTPS hosts: - ceph.sololude.com tls: mode: SIMPLE credentialName: ceph-ingress-cert - port: number: 80 name: http-ceph protocol: HTTP hosts: - ceph.sololude.com虚拟服务定义(Istio):
yamlapiVersion: networking.istio.io/v1 kind: VirtualService metadata: name: ceph-gateway-vs namespace: rook-ceph spec: hosts: - ceph.sololude.com gateways: - rook-ceph-dashboard-gw http: - route: - destination: host: rook-ceph-mgr-dashboard
问题和故障排除
服务端口更改:
在 Helm 值中设置
cephClusterSpec.dashboard.port=7000。
OSD 密钥环不匹配:
检索密钥环并解决不匹配问题。
实体存在但密钥不匹配:
删除旧的认证:
bashceph auth del osd.x
监控
在 Helm 值中启用监控:
yamlmonitoring: enabled: true添加监控标签:
yamlcephClusterSpec.labels.monitoring={release: prometheus-stack}
升级
升级 Helm:
bashcurl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash使用 Helm 升级:
bashhelm upgrade -n rook-ceph rook-ceph rook-release/rook-ceph -f values.yaml
