prometheus HA高可用 m3db集群远程存储

小熊

310
文章

12
评论

2020年8月16日Prometheus评论6,312字数 20738阅读69分7秒阅读模式

简介

本文介绍了基于 prometheus+keepalived+haproxy+m3db 集群实现的监控高可用方案。

转自：CSDN

本文将带大家一步一步的实现基于 prometheus 的监控高可用集群（注重实战，非必要概念不讲）

你将得到一个无单点故障，可以监控物理机信息，openstack 虚拟机信息，openstack 服务，mysql，memcache，rabbitmq 等多种信息的高可用监控集群，并且监控数据为远端存储，具有容灾性，可聚合性（多服务监控数据聚合为一份，数据更加详细并且多监控点数据完全统一）。

惯例，上架构图，自己画的凑合看，pro2 节点和 pro1 节点配置一样，线太多太乱就没画：

架构图

简单的解释一下，理解不了没关系，动手装完自然就理解了。

被监控层

这里指的是被监控节点，当然生产环境肯定不止这一个被监控节点，每个被监控节点上面都跑着各种 exporter，其实就是各种服务的数据采集器。

核心监控层

Prometheus server 会向各个被监控节点的 exporter 拉取数据（当然你可以通过 exporter 向 pushgateway 来主动推数据）。

haproxy+keepalived 相信大家都很熟悉了，就是将多个 prometheus server 组成多备集群，使其具有高可用功能。

m3coordinator 是专门向 M3DB 集群读写数据的组件，我们用他来向 M3DB 集群写数据。

m3query 是专门用来查询 M3DB 数据的，可以查询聚合数据，并且兼容 Prometheus 特有的 prosql 语法，非常方便，我们用它向 M3DB 集群读数据。

数据持久化层

因为 Prometheus 本身收集到的数据是存储在本地 tsdb 时序数据库，无法进行大量数据持久化，数据损坏之后也无法恢复，这很明显无法满足我们对于高可用的需求，所以博主这里选择了 M3DB 作为持久化数据库。最上层就是 M3DB 集群，是一个通过 etcd 来维持集群状态的多活数据库，etcd 节点最好为 3，M3DB node 节点要求最小为 3，无最大上限要求。

简介
架构图
准备工作
安装步骤：
验证是否成功

实战开始：

准备工作

centos1708 机器 * 6 ：

172.27.124.66 m3db01
172.27.124.67 m3db02
172.27.124.68 m3db03
172.27.124.69 pro01
172.27.124.70 pro02
172.27.124.72 controller

keepalived 使用的虚拟 IP * 1 ：

172.27.124.71

各种安装包：（当你看到的时候，很可能一些包已经迭代了多个版本，请自行更新版本，这里是举例）

prometheus-2.12.0-rc.0.linux-amd64.tar.gz

下载地址：https://github.com/prometheus/prometheus/releases/tag/v2.12.0

pushgateway-0.9.1.linux-amd64.tar.gz

下载地址：https://github.com/prometheus/pushgateway/releases/tag/v0.9.1

collectd_exporter-0.4.0.linux-amd64.tar.gz

下载地址：https://github.com/prometheus/collectd_exporter/releases/tag/v0.4.0

node_exporter-0.18.1.linux-amd64.tar.gz

下载地址：https://github.com/prometheus/node_exporter/releases/tag/v0.18.1

mysqld_exporter-0.12.1.linux-amd64.tar.gz

下载地址：https://github.com/prometheus/mysqld_exporter/releases/tag/v0.12.1

rabbitmq_exporter-0.29.0.linux-amd64.tar.gz

下载地址：https://github.com/kbudde/rabbitmq_exporter/releases/tag/v0.29.0

openstack-exporter-0.5.0.linux-amd64.tar.gz

下载地址：https://github.com/openstack-exporter/openstack-exporter/releases/tag/v0.5.0

memcached_exporter-0.6.0.linux-amd64.tar.gz

下载地址：https://github.com/prometheus/memcached_exporter/releases/tag/v0.6.0

haproxy_exporter-0.10.0.linux-amd64.tar.gz

下载地址：https://github.com/prometheus/haproxy_exporter/releases/tag/v0.10.0

注意：exporter 不止上面这点儿，还有很多，有需求自己去官网找。https://prometheus.io/docs/instrumenting/exporters/

yajl-2.0.4-4.el7.x86_64.rpm

collectd-5.8.1-1.el7.x86_64.rpm

collectd-virt-5.8.1-1.el7.x86_64.rpm

这仨包自己去http://rpmfind.net/linux/rpm2html/search.php上面搜

keepalived 和 haproxy 安装包就不说了，直接用 yum 安装就行。

m3_0.13.0_linux_amd64.tar.gz

下载地址：https://github.com/m3db/m3/releases/tag/v0.13.0

当然如果你懒得下，我已经打包好了：https://download.csdn.net/download/u014706515/11833017

安装步骤：

将所有安装包拷贝到所有六台机器的/home/pkg 下。设置所有服务器的 hostname，并保证不重名。

所有机器素质 3 连：关防火墙，关 selinux，安装 ntp 并同步时间。

首先安装 Prometheus 服务和 pushgateway 服务

在 pro01 和 pro02 安装 Prometheus

cd /home/
#解压安装包
tar -zxvf /home/pkg/prometheus-2.12.0-rc.0.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/pushgateway-0.9.1.linux-amd64.tar.gz -C /usr/local/

#重命名
mv /usr/local/prometheus-2.12.0-rc.0.linux-amd64/ /usr/local/prometheus
mv /usr/local/pushgateway-0.9.1.linux-amd64/ /usr/local/pushgateway

#添加Prometheus权限
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
mkdir -p /var/lib/prometheus-data
chown -R prometheus:prometheus /var/lib/prometheus-data
chown -R prometheus:prometheus /usr/local/prometheus/
chown -R prometheus:prometheus /usr/local/pushgateway

#添加Prometheus配置，注意配置中的中文解释，请根据环境进行变量替换
cat > /usr/local/prometheus/prometheus.yml << EOF
global:
  scrape_interval: 20s
  scrape_timeout: 5s
  evaluation_interval: 10s
scrape_configs:
  - job_name: prometheus
    scrape_interval: 5s
    static_configs:
    - targets:
      - 172.27.124.69:9090  #这里填该prometheus服务器的地址，本例是指pro01
      labels:
        instance: prometheus

  - job_name: pushgateway
    scrape_interval: 5s
    static_configs:
      - targets:
        - 172.27.124.69:9091  #这里填该prometheus服务器的地址，本例是指pro01
        labels:
          instance: pushgateway

  - job_name: node_exporter
    scrape_interval: 10s
    static_configs:
      - targets:
        - 172.27.124.72:9100  #这里填exporter所在被监控服务器的地址，本例是指controller
        labels:
          instance: controller  #被监控服务器的名子，为了区分监控的哪台机器，推荐填hostname

  - job_name: haproxy_exporter
    scrape_interval: 5s
    static_configs:
      - targets:
        - 172.27.124.72:9101  #这里填exporter所在被监控服务器的地址，本例是指controller
        labels:
          instance: controller  #被监控服务器的名子，为了区分监控的哪台机器，推荐填hostname

  - job_name: rabbitmq_exporter
    scrape_interval: 5s
    static_configs:
      - targets:
        - 172.27.124.72:9102  #这里填exporter所在被监控服务器的地址，本例是指controller
        labels:
          instance: controller  #被监控服务器的名子，为了区分监控的哪台机器，推荐填hostname

  - job_name: collectd_exporter
    scrape_interval: 5s
    static_configs:
      - targets:
        - 172.27.124.72:9103  #这里填exporter所在被监控服务器的地址，本例是指controller
        labels:
          instance: controller  #被监控服务器的名子，为了区分监控的哪台机器，推荐填hostname

  - job_name: mysqld_exporter
    scrape_interval: 5s
    static_configs:
      - targets:
        - 172.27.124.72:9104  #这里填exporter所在被监控服务器的地址，本例是指controller
        labels:
          instance: controller  #被监控服务器的名子，为了区分监控的哪台机器，推荐填hostname

  - job_name: memcached_exporter
    scrape_interval: 5s
    static_configs:
      - targets:
        - 172.27.124.72:9105  #这里填exporter所在被监控服务器的地址，本例是指controller
        labels:
          instance: controller  #被监控服务器的名子，为了区分监控的哪台机器，推荐填hostname
EOF


# 接下来注册Prometheus为系统服务，可以使用systemctl
cat > /usr/lib/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus-data
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

# 注册pushgateway服务
cat > /usr/lib/systemd/system/pushgateway.service << EOF
[Unit]
Description=pushgateway
After=local-fs.target network-online.target
Requires=local-fs.target network-online.target

[Service]
Type=simple
ExecStart=/usr/local/pushgateway/pushgateway
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

#开机自启与启动
systemctl daemon-reload
systemctl enable prometheus
systemctl enable pushgateway
systemctl start prometheus
systemctl start pushgateway

在 pro01 和 pro02 安装 haproxy 和 keepalived

# 直接yum安装，大众软件，源生yum源就有
yum -y install haproxy keepalived

# 配置haproxy，注意配置中的中文注释
echo "
global
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

listen prometheus-server
  bind 172.27.124.71:9090     #这个是虚拟VIP的地址，端口可以自选，别冲突就行
  balance roundrobin
  option  tcpka
  option  httpchk
  option  httplog
    server pro01 172.27.124.69:9090  check inter 2000 rise 2 fall 5 #填pro01的地址
    server pro02 172.27.124.70:9090  check inter 2000 rise 2 fall 5 #填pro02的地址
" > /etc/haproxy/haproxy.cfg


# 配置keepalived，注意配置中的中文注释
echo "
vrrp_script chk_pro_server_state {
    script \"/etc/prometheus/check_pro.sh\"   # 检测脚本的路径。脚本在后面介绍
    interval 5
    fall 2
    rise 2
}

vrrp_instance haproxy_vip {
  state MASTER
  interface enp0s3   #代理VIP的网卡名，不懂的用'ip a'指令查看，你机器IP在哪个网卡就是哪个
  virtual_router_id 71
  priority 100
  accept
  garp_master_refresh 5
  garp_master_refresh_repeat 2
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 123456
  }

  unicast_src_ip 172.27.124.69    #主服务器的IP 这里我用的pro01
  unicast_peer {
    172.27.124.70    #备用服务器IP列表，我这里只有一个pro02
  }

  virtual_ipaddress {
    172.27.124.71   # 虚拟IP的地址
  }

  track_script {
    chk_pro_server_state
  }
}
" > /etc/keepalived/keepalived.conf

# 添加检测脚本，检测prometheus进程即可
echo "
#!/bin/bash
count=`ps aux | grep -v grep |grep prometheus| wc -l`
if [ \$count > 0 ]; then
    exit 0
else
    exit 1
fi
" >/etc/prometheus/check_pro.sh

#添加检测脚本执行权限
chmod +x /etc/prometheus/check_pro.sh

#开机启动与自启
systemctl enable haproxy
systemctl enable keepalived
systemctl start haproxy
systemctl start keepalived

在 controller 节点安装各种 exporter

# 解压所有exporter到local下
tar -zxvf /home/pkg/node_exporter-0.18.1.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/collectd_exporter-0.4.0.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/mysqld_exporter-0.12.1.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/haproxy_exporter-0.10.0.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/rabbitmq_exporter-0.29.0.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/memcached_exporter-0.6.0.linux-amd64.tar.gz -C /usr/local/

# 重命名
mv /usr/local/node_exporter-0.18.1.linux-amd64/ /usr/local/node_exporter
mv /usr/local/collectd_exporter-0.4.0.linux-amd64/ /usr/local/collectd_exporter
mv /usr/local/mysqld_exporter-0.12.1.linux-amd64/ /usr/local/mysqld_exporter
mv /usr/local/haproxy_exporter-0.10.0.linux-amd64/ /usr/local/haproxy_exporter
mv /usr/local/rabbitmq_exporter-0.29.0.linux-amd64/ /usr/local/rabbitmq_exporter
mv /usr/local/memcached_exporter-0.6.0.linux-amd64/ /usr/local/memcached_exporter


# 安装collectd需要的rpm包
cd /home
rpm -hiv /home/pkg/yajl-2.0.4-4.el7.x86_64.rpm
rpm -hiv /home/pkg/collectd-5.8.1-1.el7.x86_64.rpm
rpm -hiv /home/pkg/collectd-virt-5.8.1-1.el7.x86_64.rpm

# 增加prometheus权限
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /usr/local/node_exporter
chown -R prometheus:prometheus /usr/local/collectd_exporter
chown -R prometheus:prometheus /usr/local/mysqld_exporter
chown -R prometheus:prometheus /usr/local/haproxy_exporter/
chown -R prometheus:prometheus /usr/local/rabbitmq_exporter/
chown -R prometheus:prometheus /usr/local/memcached_exporter/


# 向mysql添加guest用户（没有mysql？那你装毛的mysql exporter啊，赶紧装mysql去）
mysql -uroot -p{你的mysql密码} -e "grant replication client,process on *.* to guest@'%' identified by 'guest';"
mysql -uroot -p{你的mysql密码} -e "grant select on performance_schma.* to guest@'%';"

# 添加collectd配置（为了openstack虚拟机监控）注意中文注释
cat >> /etc/collectd.conf << EOF
LoadPlugin cpu
LoadPlugin memory
LoadPlugin interface
LoadPlugin write_http
LoadPlugin virt
<Plugin cpu>
  ReportByCpu true
  ReportByState true
  ValuesPercentage true
</Plugin>
<Plugin memory>
  ValuesAbsolute true
  ValuesPercentage false
</Plugin>
<Plugin interface>
  Interface "enp0s3"   #openstack public 通信网络对应物理机网卡的名字
  IgnoreSelected false
</Plugin>
<Plugin write_http>
  <Node "collectd_exporter">
    URL "http://172.27.124.72:9103/collectd-post" #IP填被监控节点controller的
    Format "JSON"
    StoreRates false
  </Node>
</Plugin>
<Plugin virt>
  Connection "qemu:///system"
  RefreshInterval 10
  HostnameFormat name
  PluginInstanceFormat name
  BlockDevice "/:hd[a-z]/"
  IgnoreSelected true
</Plugin>
EOF

# 添加mysql exporter环境变量文件
mkdir -p /etc/mysqld_exporter/conf
cat > /etc/mysqld_exporter/conf/.my.cnf << EOF
[client]
user=guest
password=guest
host=172.27.124.72  #你的mysql地址
port=3306  #端口
EOF

# 添加rabbitmq exporter的环境变量，用户密码都需要你在rabbitmq中添加的，不示范了就
cat > /etc/sysconfig/rabbitmq_exporter << EOF
RABBIT_USER="guest"
RABBIT_PASSWORD="guest"
OUTPUT_FORMAT="JSON"
PUBLISH_PORT="9102"
RABBIT_URL="http://172.27.124.72:15672"
EOF

# 注册node exporter服务
cat > /usr/lib/systemd/system/node_exporter.service << EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

# 注册haproxy exporter服务
cat > /usr/lib/systemd/system/haproxy_exporter.service << EOF
[Unit]
Description=Prometheus HAproxy Exporter
After=network.target
User=prometheus
Group=prometheus

[Service]
Type=simple
Restart=always
ExecStart=/usr/local/haproxy_exporter/haproxy_exporter --haproxy.scrape-uri=http://admin:admin@172.27.124.72:8080/haproxy?openstack;csv --web.listen-address=:9101
#这里填的是被监控的haproxy的地址，不是pro节点上那个，
#当然你想监控pro01和pro02那个haproxy也行，在pro节点安装haproxy exporter吧，并且注册到#Prometheus服务中
[Install]
WantedBy=multi-user.target
EOF

# 注册rabbitmq exporter服务
cat > /usr/lib/systemd/system/rabbitmq_exporter.service << EOF
[Unit]
Description=Prometheus RabbitMQ Exporter
After=network.target
User=prometheus
Group=prometheus

[Service]
Type=simple
Restart=always
Environment=
EnvironmentFile=/etc/sysconfig/rabbitmq_exporter
ExecStart=/usr/local/rabbitmq_exporter/rabbitmq_exporter

[Install]
WantedBy=multi-user.target
EOF

# 注册collectd exporter服务
cat > /usr/lib/systemd/system/collectd_exporter.service << EOF
[Unit]
Description=Collectd_exporter
After=network-online.target
Requires=network-online.target

[Service]
Type=simple
User=prometheus
ExecStart=/bin/bash -l -c /usr/local/collectd_exporter/collectd_exporter --web.listen-address=:9103 --web.collectd-push-path=/collectd-post
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF


# 注册collectd服务
cat > cat /usr/lib/systemd/system/collectd.service << EOF
[Unit]
Description=Collectd statistics daemon
Documentation=man:collectd(1) man:collectd.conf(5)
After=local-fs.target network-online.target
Requires=local-fs.target network-online.target

[Service]
ExecStart=/usr/sbin/collectd
Restart=on-failure
Type=notify

[Install]
WantedBy=multi-user.target
EOF

# 注册mysql exporter服务
cat > /usr/lib/systemd/system/mysqld_exporter.service << EOF
[Unit]
Description=Prometheus MySQL Exporter
After=network.target
User=prometheus
Group=prometheus

[Service]
Type=simple
Restart=always
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter \
--config.my-cnf=/etc/mysqld_exporter/conf/.my.cnf \
--web.listen-address=:9104 \
--collect.global_status \
--collect.info_schema.innodb_metrics \
--collect.auto_increment.columns \
--collect.info_schema.processlist \
--collect.binlog_size \
--collect.info_schema.tablestats \
--collect.global_variables \
--collect.info_schema.query_response_time \
--collect.info_schema.userstats \
--collect.info_schema.tables \
--collect.slave_status

[Install]
WantedBy=multi-user.target
EOF

# 注册memcached exporter服务
cat > /usr/lib/systemd/system/memcached_exporter.service << EOF
[Unit]
Description=Prometheus Memcached Exporter
After=network.target
User=prometheus
Group=prometheus

[Service]
Type=simple
Restart=always
ExecStart=/usr/local/memcached_exporter/memcached_exporter --memcached.address=127.0.0.1:11211 --web.listen-address=:9105

[Install]
WantedBy=multi-user.target
EOF

# 自启与启动
systemctl daemon-reload
systemctl enable node_exporter
systemctl enable haproxy_exporter
systemctl enable rabbitmq_exporter
systemctl enable collectd
systemctl enable collectd_exporter
systemctl enable mysqld_exporter
systemctl enable memcached_exporter

systemctl start node_exporter
systemctl start haproxy_exporter
systemctl start rabbitmq_exporter
systemctl start collectd
systemctl start collectd_exporter
systemctl start mysqld_exporter
systemctl start memcached_exporter

在 m3db01,m3db02,m3db03 节点配置 m3db 需要的内核参数

sysctl -w vm.max_map_count=3000000
sysctl -w vm.swappiness=1
sysctl -n fs.file-max
sysctl -n fs.nr_open
sysctl -w fs.file-max=3000000
sysctl -w fs.nr_open=3000000
ulimit -n 3000000

# 怎么永久生效不用我说了，自己配。
# 不配也行，就是各种warning太多了受不了。

在 m3db01,m3db02,m3db03 节点安装 M3DB 集群并启动

安装前先etcd集群搭建

# 解压安装包
cd /home
tar -zxvf /home/pkg/m3_0.13.0_linux_amd64.tar.gz

# 重命名
mv /home/m3_0.13.0_linux_amd64 /home/m3db

# 创建配置文件 /home/m3db/m3dbnode.yml ,内容如下
cat >> /home/m3db/m3dbnode.yml << EOF
coordinator:
  listenAddress:
    type: "config"
    value: "0.0.0.0:7201"

  local:
    namespaces:
      - namespace: default  #非聚合数据库名，必填，不填报错
        type: unaggregated
        retention: 48h
      - namespace: agg    #聚合数据库名，我直接叫agg，你们随意
        type: aggregated
        retention: 48h
        resolution: 10s

  logging:
    level: info

  metrics:
    scope:
      prefix: "coordinator"
    prometheus:
      handlerPath: /metrics
      listenAddress: 0.0.0.0:7203 # until https://github.com/m3db/m3/issues/682 is resolved
    sanitization: prometheus
    samplingRate: 1.0
    extended: none

  tagOptions:
    # Configuration setting for generating metric IDs from tags.
    idScheme: quoted

db:
  logging:
    level: info

  metrics:
    prometheus:
      handlerPath: /metrics
    sanitization: prometheus
    samplingRate: 1.0
    extended: detailed

  hostID:
    resolver: config
    value: m3db01       #这台机器的机器名


  config:
    service:
      env: default_env
      zone: embedded
      service: m3db
      cacheDir: /var/lib/m3kv
      etcdClusters:
        - zone: embedded
          endpoints:
            - 172.27.124.66:2379        #etcd的地址列表，m3db内置了etcd，直接填m3db地址列表
            - 172.27.124.67:2379
            - 172.27.124.68:2379
    seedNodes:
      initialCluster:
        - hostID: m3db01        #etcd的机器名，咱们装一起了，所以也叫m3db01，下面IP也一样
          endpoint: http://172.27.124.66:2380
        - hostID: m3db02
          endpoint: http://172.27.124.67:2380
        - hostID: m3db03
          endpoint: http://172.27.124.68:2380

  listenAddress: 0.0.0.0:9000
  clusterListenAddress: 0.0.0.0:9001
  httpNodeListenAddress: 0.0.0.0:9002
  httpClusterListenAddress: 0.0.0.0:9003
  debugListenAddress: 0.0.0.0:9004

  client:
    writeConsistencyLevel: majority
    readConsistencyLevel: unstrict_majority

  gcPercentage: 100

  writeNewSeriesAsync: true
  writeNewSeriesLimitPerSecond: 1048576
  writeNewSeriesBackoffDuration: 2ms

  bootstrap:
    bootstrappers:
        - filesystem
        - commitlog
        - peers
        - uninitialized_topology
    commitlog:
      returnUnfulfilledForCorruptCommitLogFiles: false

  cache:
    series:
      policy: lru
    postingsList:
      size: 262144

  commitlog:
    flushMaxBytes: 524288
    flushEvery: 1s
    queue:
      calculationType: fixed
      size: 2097152

  fs:
    filePathPrefix: /var/lib/m3db
EOF

# 启动m3db,在三台m3db都配置好以后，就可以启动了。
# 博主比较懒，直接用nohup启动，懒得注册服务了，你们自己弄吧，反正也很简单。
# 三个节点都执行
nohup /home/m3db/m3dbnode -f /home/m3db/m3dbnode.yml &

在任意一个 m3db node 节点初始化 m3db 并且创建 namespace

（namespace 和数据库一个意思，就是建库）

# 安装curl，不解释
yum -y install curl

#初始化非聚合归置组(placement归置组相当于域，说白了就是描述下集群长啥样)
#请无视那一大堆返回值，他证明你执行成功了。
curl -X POST localhost:7201/api/v1/services/m3db/placement/init -d '{
    "num_shards": "512",
    "replication_factor": "3",
    "instances": [
        {
            "id": "m3db01",
            "isolation_group": "us-east1-a",
            "zone": "embedded",
            "weight": 100,
            "endpoint": "172.27.124.66:9000",
            "hostname": "172.27.124.66",
            "port": 9000
        },
        {
            "id": "m3db02",
            "isolation_group": "us-east1-b",
            "zone": "embedded",
            "weight": 100,
            "endpoint": "172.27.124.67:9000",
            "hostname": "172.27.124.67",
            "port": 9000
        },
        {
            "id": "m3db03",
            "isolation_group": "us-east1-c",
            "zone": "embedded",
            "weight": 100,
            "endpoint": "172.27.124.68:9000",
            "hostname": "172.27.124.68",
            "port": 9000
        }
    ]
}'

#初始化聚合归置组
curl -X POST localhost:7201/api/v1/services/m3aggregator/placement/init -d '{
    "num_shards": "512",
    "replication_factor": "3",
    "instances": [
        {
            "id": "m3db01",
            "isolation_group": "us-east1-a",
            "zone": "embedded",
            "weight": 100,
            "endpoint": "172.27.124.66:9000",
            "hostname": "172.27.124.66",
            "port": 9000
        },
        {
            "id": "m3db02",
            "isolation_group": "us-east1-b",
            "zone": "embedded",
            "weight": 100,
            "endpoint": "172.27.124.67:9000",
            "hostname": "172.27.124.67",
            "port": 9000
        },
        {
            "id": "m3db03",
            "isolation_group": "us-east1-c",
            "zone": "embedded",
            "weight": 100,
            "endpoint": "172.27.124.68:9000",
            "hostname": "172.27.124.68",
            "port": 9000
        }
    ]
}'



#创建default非聚合库
curl -X POST http://localhost:7201/api/v1/database/create -d '{
  "type": "cluster",
  "namespaceName": "default",
  "retentionTime": "48h",
  "numShards": "512",
  "replicationFactor": "3"
}'

#创建agg聚合库
curl -X POST http://localhost:7201/api/v1/database/create -d '{
  "type": "cluster",
  "namespaceName": "agg",
  "retentionTime": "48h",
  "numShards": "512",
  "replicationFactor": "3"
}'

在 pro01，pro02 节点上安装 m3coordinator，并启动

# 解压安装包
cd /home
tar -zxvf /home/pkg/m3_0.13.0_linux_amd64.tar.gz

# 重命名
mv /home/m3_0.13.0_linux_amd64 /home/m3db

# 创建配置文件 /home/m3db/m3coordinator.yml ,内容如下
cat >> /home/m3db/m3coordinator.yml << EOF
listenAddress:
  type: "config"
  value: "0.0.0.0:8201"
# m3coordinator和m3node是可以装一起的，所以这里端口别写7201，否则会冲突

logging:
  level: info

metrics:
  scope:
    prefix: "coordinator"
  prometheus:
    handlerPath: /metrics
    listenAddress: 0.0.0.0:8203 # 同理，也别写7203
  sanitization: prometheus
  samplingRate: 1.0
  extended: none

tagOptions:
  idScheme: quoted

clusters:
   - namespaces:
      - namespace: default
        type: unaggregated #非聚合数据库名，必填，不填报错
        retention: 48h
      - namespace: agg
        type: aggregated   #聚合数据库名，我直接叫agg，你们随意
        retention: 48h
        resolution: 10s
     client:
       config:
         service:
           env: default_env
           zone: embedded
           service: m3db
           cacheDir: /var/lib/m3kv
           etcdClusters:
             - zone: embedded
               endpoints:      #etcd的地址列表，m3db内置了etcd，直接填m3db地址列表
                 - 172.27.124.66:2379
                 - 172.27.124.67:2379
                 - 172.27.124.68:2379
       writeConsistencyLevel: majority
       readConsistencyLevel: unstrict_majority
EOF

#懒，直接nohup启动，别学我，老老实实注册系统服务去

nohup /home/m3db/m3coordinator -f /home/m3db/m3coordinator.yml &

在 pro01，pro02 节点上安装 m3query，并启动

# 创建配置文件 /home/m3db/m3query.yml ,内容如下
cat >> /home/m3db/m3query.yml << EOF
listenAddress:
  type: "config"
  value: "0.0.0.0:5201"
#m3query和m3node是可以装一起的，所以这里端口别写7201，否则会冲突

logging:
  level: info

metrics:
  scope:
    prefix: "coordinator"
  prometheus:
    handlerPath: /metrics
    listenAddress: 0.0.0.0:5203 # 同理别写7203
  sanitization: prometheus
  samplingRate: 1.0
  extended: none

tagOptions:
  idScheme: quoted

clusters:
  - namespaces:
      - namespace: default   #非聚合数据库名，必填，不填报错
        type: unaggregated
        retention: 48h
      - namespace: agg       #聚合数据库名，我直接叫agg，你们随意，但是与之前的需要保持一致
        type: aggregated
        retention: 48h
        resolution: 10s
    client:
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:       #etcd的地址列表，m3db内置了etcd，直接填m3db地址列表
                - 172.27.124.66:2379
                - 172.27.124.67:2379
                - 172.27.124.68:2379
      writeConsistencyLevel: majority
      readConsistencyLevel: unstrict_majority
      writeTimeout: 10s
      fetchTimeout: 15s
      connectTimeout: 20s
      writeRetry:
        initialBackoff: 500ms
        backoffFactor: 3
        maxRetries: 2
        jitter: true
      fetchRetry:
        initialBackoff: 500ms
        backoffFactor: 2
        maxRetries: 3
        jitter: true
      backgroundHealthCheckFailLimit: 4
      backgroundHealthCheckFailThrottleFactor: 0.5
EOF

#懒，直接nohup启动，别学我，老老实实注册系统服务去

nohup /home/m3db/m3query-f /home/m3db/m3query.yml &

配置 pro 节点数据源，指向 m3db 集群

# 其实在pro01和pro02的配置里追加两句话就行
echo "
remote_read:
  - url: \"http://localhost:5201/api/v1/prom/remote/read\"
    # 这是读地址，填写m3query的端口
    read_recent: true
remote_write:
  - url: \"http://localhost:8201/api/v1/prom/remote/write\"
    # 这是写地址，填写m3coordinator的端口
" >> /usr/local/prometheus/prometheus.yml

# 重启prometheus服务，耐心多等会儿，m3db需要初始化数据库好久。
# 你可以在m3coordinator和m3query的日志中查看是否是否开始远程读写。
# 两台pro节点都需要执行
systemctl restart prometheus

验证是否成功

其实，验证很简单，那就是等集群运行一段时间后从 m3query 中查询数据，如果能查到就说明数据确实写入了 m3db 集群，并且通过 m3query 已经取回了聚合数据。

# 直接在浏览器打开网页
# 时间戳和步长按照自己集群环境自己调，查询key也自己调，这只是举例。
# ip地址是pro01或者pro02的都行，端口要用m3query的，否则不支持prosql
http://172.27.124.69:5201/api/v1/query_range?query=mysql_up&start=1570084589.681&end=1570088189.681&step=14&_=1570087504176

高可用性验证就自己试吧，关一个 m3node 看看。再关任意一个 prometheus node 试试，看看我们的虚拟 IP 是否正常能获取监控数据。

我的环境查看http://172.27.124.71:9090/targets就可以了。