简介
本文介绍了基于 prometheus+keepalived+haproxy+m3db 集群实现的监控高可用方案。
转自:CSDN
本文将带大家一步一步的实现基于 prometheus 的监控高可用集群(注重实战,非必要概念不讲)
你将得到一个无单点故障,可以监控物理机信息,openstack 虚拟机信息,openstack 服务,mysql,memcache,rabbitmq 等多种信息的高可用监控集群,并且监控数据为远端存储,具有容灾性,可聚合性(多服务监控数据聚合为一份,数据更加详细并且多监控点数据完全统一)。
惯例,上架构图,自己画的凑合看,pro2 节点和 pro1 节点配置一样,线太多太乱就没画:
架构图
简单的解释一下,理解不了没关系,动手装完自然就理解了。
被监控层
这里指的是被监控节点,当然生产环境肯定不止这一个被监控节点,每个被监控节点上面都跑着各种 exporter,其实就是各种服务的数据采集器。
核心监控层
Prometheus server 会向各个被监控节点的 exporter 拉取数据(当然你可以通过 exporter 向 pushgateway 来主动推数据)。
haproxy+keepalived 相信大家都很熟悉了,就是将多个 prometheus server 组成多备集群,使其具有高可用功能。
m3coordinator 是专门向 M3DB 集群读写数据的组件,我们用他来向 M3DB 集群写数据。
m3query 是专门用来查询 M3DB 数据的,可以查询聚合数据,并且兼容 Prometheus 特有的 prosql 语法,非常方便,我们用它向 M3DB 集群读数据。
数据持久化层
因为 Prometheus 本身收集到的数据是存储在本地 tsdb 时序数据库,无法进行大量数据持久化,数据损坏之后也无法恢复,这很明显无法满足我们对于高可用的需求,所以博主这里选择了 M3DB 作为持久化数据库。最上层就是 M3DB 集群,是一个通过 etcd 来维持集群状态的多活数据库,etcd 节点最好为 3,M3DB node 节点要求最小为 3,无最大上限要求。
目录
- 简介
- 架构图
- 准备工作
- 安装步骤:
- 在 pro01 和 pro02 安装 Prometheus
- 在 pro01 和 pro02 安装 haproxy 和 keepalived
- 在 controller 节点安装各种 exporter
- 在 m3db01,m3db02,m3db03 节点配置 m3db 需要的内核参数
- 在 m3db01,m3db02,m3db03 节点安装 M3DB 集群并启动
- 在任意一个 m3db node 节点初始化 m3db 并且创建 namespace
- 在 pro01,pro02 节点上安装 m3coordinator,并启动
- 在 pro01,pro02 节点上安装 m3query,并启动
- 配置 pro 节点数据源,指向 m3db 集群
- 验证是否成功
实战开始:
准备工作
centos1708 机器 * 6 :
172.27.124.66 m3db01
172.27.124.67 m3db02
172.27.124.68 m3db03
172.27.124.69 pro01
172.27.124.70 pro02
172.27.124.72 controller
keepalived 使用的虚拟 IP * 1 :
172.27.124.71
各种安装包:(当你看到的时候,很可能一些包已经迭代了多个版本,请自行更新版本,这里是举例)
prometheus-2.12.0-rc.0.linux-amd64.tar.gz
下载地址:https://github.com/prometheus/prometheus/releases/tag/v2.12.0
pushgateway-0.9.1.linux-amd64.tar.gz
下载地址:https://github.com/prometheus/pushgateway/releases/tag/v0.9.1
collectd_exporter-0.4.0.linux-amd64.tar.gz
下载地址:https://github.com/prometheus/collectd_exporter/releases/tag/v0.4.0
node_exporter-0.18.1.linux-amd64.tar.gz
下载地址:https://github.com/prometheus/node_exporter/releases/tag/v0.18.1
mysqld_exporter-0.12.1.linux-amd64.tar.gz
下载地址:https://github.com/prometheus/mysqld_exporter/releases/tag/v0.12.1
rabbitmq_exporter-0.29.0.linux-amd64.tar.gz
下载地址:https://github.com/kbudde/rabbitmq_exporter/releases/tag/v0.29.0
openstack-exporter-0.5.0.linux-amd64.tar.gz
下载地址:https://github.com/openstack-exporter/openstack-exporter/releases/tag/v0.5.0
memcached_exporter-0.6.0.linux-amd64.tar.gz
下载地址:https://github.com/prometheus/memcached_exporter/releases/tag/v0.6.0
haproxy_exporter-0.10.0.linux-amd64.tar.gz
下载地址:https://github.com/prometheus/haproxy_exporter/releases/tag/v0.10.0
注意:exporter 不止上面这点儿,还有很多,有需求自己去官网找。https://prometheus.io/docs/instrumenting/exporters/
yajl-2.0.4-4.el7.x86_64.rpm
collectd-5.8.1-1.el7.x86_64.rpm
collectd-virt-5.8.1-1.el7.x86_64.rpm
这仨包自己去http://rpmfind.net/linux/rpm2html/search.php上面搜
keepalived 和 haproxy 安装包就不说了,直接用 yum 安装就行。
m3_0.13.0_linux_amd64.tar.gz
下载地址:https://github.com/m3db/m3/releases/tag/v0.13.0
当然如果你懒得下,我已经打包好了:https://download.csdn.net/download/u014706515/11833017
安装步骤:
将所有安装包拷贝到所有六台机器的/home/pkg 下。设置所有服务器的 hostname,并保证不重名。
所有机器素质 3 连:关防火墙,关 selinux,安装 ntp 并同步时间。
首先安装 Prometheus 服务和 pushgateway 服务
在 pro01 和 pro02 安装 Prometheus
cd /home/
#解压安装包
tar -zxvf /home/pkg/prometheus-2.12.0-rc.0.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/pushgateway-0.9.1.linux-amd64.tar.gz -C /usr/local/
#重命名
mv /usr/local/prometheus-2.12.0-rc.0.linux-amd64/ /usr/local/prometheus
mv /usr/local/pushgateway-0.9.1.linux-amd64/ /usr/local/pushgateway
#添加Prometheus权限
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
mkdir -p /var/lib/prometheus-data
chown -R prometheus:prometheus /var/lib/prometheus-data
chown -R prometheus:prometheus /usr/local/prometheus/
chown -R prometheus:prometheus /usr/local/pushgateway
#添加Prometheus配置,注意配置中的中文解释,请根据环境进行变量替换
cat > /usr/local/prometheus/prometheus.yml << EOF
global:
scrape_interval: 20s
scrape_timeout: 5s
evaluation_interval: 10s
scrape_configs:
- job_name: prometheus
scrape_interval: 5s
static_configs:
- targets:
- 172.27.124.69:9090 #这里填该prometheus服务器的地址,本例是指pro01
labels:
instance: prometheus
- job_name: pushgateway
scrape_interval: 5s
static_configs:
- targets:
- 172.27.124.69:9091 #这里填该prometheus服务器的地址,本例是指pro01
labels:
instance: pushgateway
- job_name: node_exporter
scrape_interval: 10s
static_configs:
- targets:
- 172.27.124.72:9100 #这里填exporter所在被监控服务器的地址,本例是指controller
labels:
instance: controller #被监控服务器的名子,为了区分监控的哪台机器,推荐填hostname
- job_name: haproxy_exporter
scrape_interval: 5s
static_configs:
- targets:
- 172.27.124.72:9101 #这里填exporter所在被监控服务器的地址,本例是指controller
labels:
instance: controller #被监控服务器的名子,为了区分监控的哪台机器,推荐填hostname
- job_name: rabbitmq_exporter
scrape_interval: 5s
static_configs:
- targets:
- 172.27.124.72:9102 #这里填exporter所在被监控服务器的地址,本例是指controller
labels:
instance: controller #被监控服务器的名子,为了区分监控的哪台机器,推荐填hostname
- job_name: collectd_exporter
scrape_interval: 5s
static_configs:
- targets:
- 172.27.124.72:9103 #这里填exporter所在被监控服务器的地址,本例是指controller
labels:
instance: controller #被监控服务器的名子,为了区分监控的哪台机器,推荐填hostname
- job_name: mysqld_exporter
scrape_interval: 5s
static_configs:
- targets:
- 172.27.124.72:9104 #这里填exporter所在被监控服务器的地址,本例是指controller
labels:
instance: controller #被监控服务器的名子,为了区分监控的哪台机器,推荐填hostname
- job_name: memcached_exporter
scrape_interval: 5s
static_configs:
- targets:
- 172.27.124.72:9105 #这里填exporter所在被监控服务器的地址,本例是指controller
labels:
instance: controller #被监控服务器的名子,为了区分监控的哪台机器,推荐填hostname
EOF
# 接下来注册Prometheus为系统服务,可以使用systemctl
cat > /usr/lib/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus-data
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
# 注册pushgateway服务
cat > /usr/lib/systemd/system/pushgateway.service << EOF
[Unit]
Description=pushgateway
After=local-fs.target network-online.target
Requires=local-fs.target network-online.target
[Service]
Type=simple
ExecStart=/usr/local/pushgateway/pushgateway
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
#开机自启与启动
systemctl daemon-reload
systemctl enable prometheus
systemctl enable pushgateway
systemctl start prometheus
systemctl start pushgateway
在 pro01 和 pro02 安装 haproxy 和 keepalived
# 直接yum安装,大众软件,源生yum源就有
yum -y install haproxy keepalived
# 配置haproxy,注意配置中的中文注释
echo "
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
listen prometheus-server
bind 172.27.124.71:9090 #这个是虚拟VIP的地址,端口可以自选,别冲突就行
balance roundrobin
option tcpka
option httpchk
option httplog
server pro01 172.27.124.69:9090 check inter 2000 rise 2 fall 5 #填pro01的地址
server pro02 172.27.124.70:9090 check inter 2000 rise 2 fall 5 #填pro02的地址
" > /etc/haproxy/haproxy.cfg
# 配置keepalived,注意配置中的中文注释
echo "
vrrp_script chk_pro_server_state {
script \"/etc/prometheus/check_pro.sh\" # 检测脚本的路径。脚本在后面介绍
interval 5
fall 2
rise 2
}
vrrp_instance haproxy_vip {
state MASTER
interface enp0s3 #代理VIP的网卡名,不懂的用'ip a'指令查看,你机器IP在哪个网卡就是哪个
virtual_router_id 71
priority 100
accept
garp_master_refresh 5
garp_master_refresh_repeat 2
advert_int 1
authentication {
auth_type PASS
auth_pass 123456
}
unicast_src_ip 172.27.124.69 #主服务器的IP 这里我用的pro01
unicast_peer {
172.27.124.70 #备用服务器IP列表,我这里只有一个pro02
}
virtual_ipaddress {
172.27.124.71 # 虚拟IP的地址
}
track_script {
chk_pro_server_state
}
}
" > /etc/keepalived/keepalived.conf
# 添加检测脚本,检测prometheus进程即可
echo "
#!/bin/bash
count=`ps aux | grep -v grep |grep prometheus| wc -l`
if [ \$count > 0 ]; then
exit 0
else
exit 1
fi
" >/etc/prometheus/check_pro.sh
#添加检测脚本执行权限
chmod +x /etc/prometheus/check_pro.sh
#开机启动与自启
systemctl enable haproxy
systemctl enable keepalived
systemctl start haproxy
systemctl start keepalived
在 controller 节点安装各种 exporter
# 解压所有exporter到local下
tar -zxvf /home/pkg/node_exporter-0.18.1.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/collectd_exporter-0.4.0.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/mysqld_exporter-0.12.1.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/haproxy_exporter-0.10.0.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/rabbitmq_exporter-0.29.0.linux-amd64.tar.gz -C /usr/local/
tar -zxvf /home/pkg/memcached_exporter-0.6.0.linux-amd64.tar.gz -C /usr/local/
# 重命名
mv /usr/local/node_exporter-0.18.1.linux-amd64/ /usr/local/node_exporter
mv /usr/local/collectd_exporter-0.4.0.linux-amd64/ /usr/local/collectd_exporter
mv /usr/local/mysqld_exporter-0.12.1.linux-amd64/ /usr/local/mysqld_exporter
mv /usr/local/haproxy_exporter-0.10.0.linux-amd64/ /usr/local/haproxy_exporter
mv /usr/local/rabbitmq_exporter-0.29.0.linux-amd64/ /usr/local/rabbitmq_exporter
mv /usr/local/memcached_exporter-0.6.0.linux-amd64/ /usr/local/memcached_exporter
# 安装collectd需要的rpm包
cd /home
rpm -hiv /home/pkg/yajl-2.0.4-4.el7.x86_64.rpm
rpm -hiv /home/pkg/collectd-5.8.1-1.el7.x86_64.rpm
rpm -hiv /home/pkg/collectd-virt-5.8.1-1.el7.x86_64.rpm
# 增加prometheus权限
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /usr/local/node_exporter
chown -R prometheus:prometheus /usr/local/collectd_exporter
chown -R prometheus:prometheus /usr/local/mysqld_exporter
chown -R prometheus:prometheus /usr/local/haproxy_exporter/
chown -R prometheus:prometheus /usr/local/rabbitmq_exporter/
chown -R prometheus:prometheus /usr/local/memcached_exporter/
# 向mysql添加guest用户(没有mysql?那你装毛的mysql exporter啊,赶紧装mysql去)
mysql -uroot -p{你的mysql密码} -e "grant replication client,process on *.* to guest@'%' identified by 'guest';"
mysql -uroot -p{你的mysql密码} -e "grant select on performance_schma.* to guest@'%';"
# 添加collectd配置(为了openstack虚拟机监控)注意中文注释
cat >> /etc/collectd.conf << EOF
LoadPlugin cpu
LoadPlugin memory
LoadPlugin interface
LoadPlugin write_http
LoadPlugin virt
<Plugin cpu>
ReportByCpu true
ReportByState true
ValuesPercentage true
</Plugin>
<Plugin memory>
ValuesAbsolute true
ValuesPercentage false
</Plugin>
<Plugin interface>
Interface "enp0s3" #openstack public 通信网络对应物理机网卡的名字
IgnoreSelected false
</Plugin>
<Plugin write_http>
<Node "collectd_exporter">
URL "http://172.27.124.72:9103/collectd-post" #IP填被监控节点controller的
Format "JSON"
StoreRates false
</Node>
</Plugin>
<Plugin virt>
Connection "qemu:///system"
RefreshInterval 10
HostnameFormat name
PluginInstanceFormat name
BlockDevice "/:hd[a-z]/"
IgnoreSelected true
</Plugin>
EOF
# 添加mysql exporter环境变量文件
mkdir -p /etc/mysqld_exporter/conf
cat > /etc/mysqld_exporter/conf/.my.cnf << EOF
[client]
user=guest
password=guest
host=172.27.124.72 #你的mysql地址
port=3306 #端口
EOF
# 添加rabbitmq exporter的环境变量,用户密码都需要你在rabbitmq中添加的,不示范了就
cat > /etc/sysconfig/rabbitmq_exporter << EOF
RABBIT_USER="guest"
RABBIT_PASSWORD="guest"
OUTPUT_FORMAT="JSON"
PUBLISH_PORT="9102"
RABBIT_URL="http://172.27.124.72:15672"
EOF
# 注册node exporter服务
cat > /usr/lib/systemd/system/node_exporter.service << EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
# 注册haproxy exporter服务
cat > /usr/lib/systemd/system/haproxy_exporter.service << EOF
[Unit]
Description=Prometheus HAproxy Exporter
After=network.target
User=prometheus
Group=prometheus
[Service]
Type=simple
Restart=always
ExecStart=/usr/local/haproxy_exporter/haproxy_exporter --haproxy.scrape-uri=http://admin:admin@172.27.124.72:8080/haproxy?openstack;csv --web.listen-address=:9101
#这里填的是被监控的haproxy的地址,不是pro节点上那个,
#当然你想监控pro01和pro02那个haproxy也行,在pro节点安装haproxy exporter吧,并且注册到#Prometheus服务中
[Install]
WantedBy=multi-user.target
EOF
# 注册rabbitmq exporter服务
cat > /usr/lib/systemd/system/rabbitmq_exporter.service << EOF
[Unit]
Description=Prometheus RabbitMQ Exporter
After=network.target
User=prometheus
Group=prometheus
[Service]
Type=simple
Restart=always
Environment=
EnvironmentFile=/etc/sysconfig/rabbitmq_exporter
ExecStart=/usr/local/rabbitmq_exporter/rabbitmq_exporter
[Install]
WantedBy=multi-user.target
EOF
# 注册collectd exporter服务
cat > /usr/lib/systemd/system/collectd_exporter.service << EOF
[Unit]
Description=Collectd_exporter
After=network-online.target
Requires=network-online.target
[Service]
Type=simple
User=prometheus
ExecStart=/bin/bash -l -c /usr/local/collectd_exporter/collectd_exporter --web.listen-address=:9103 --web.collectd-push-path=/collectd-post
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
# 注册collectd服务
cat > cat /usr/lib/systemd/system/collectd.service << EOF
[Unit]
Description=Collectd statistics daemon
Documentation=man:collectd(1) man:collectd.conf(5)
After=local-fs.target network-online.target
Requires=local-fs.target network-online.target
[Service]
ExecStart=/usr/sbin/collectd
Restart=on-failure
Type=notify
[Install]
WantedBy=multi-user.target
EOF
# 注册mysql exporter服务
cat > /usr/lib/systemd/system/mysqld_exporter.service << EOF
[Unit]
Description=Prometheus MySQL Exporter
After=network.target
User=prometheus
Group=prometheus
[Service]
Type=simple
Restart=always
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter \
--config.my-cnf=/etc/mysqld_exporter/conf/.my.cnf \
--web.listen-address=:9104 \
--collect.global_status \
--collect.info_schema.innodb_metrics \
--collect.auto_increment.columns \
--collect.info_schema.processlist \
--collect.binlog_size \
--collect.info_schema.tablestats \
--collect.global_variables \
--collect.info_schema.query_response_time \
--collect.info_schema.userstats \
--collect.info_schema.tables \
--collect.slave_status
[Install]
WantedBy=multi-user.target
EOF
# 注册memcached exporter服务
cat > /usr/lib/systemd/system/memcached_exporter.service << EOF
[Unit]
Description=Prometheus Memcached Exporter
After=network.target
User=prometheus
Group=prometheus
[Service]
Type=simple
Restart=always
ExecStart=/usr/local/memcached_exporter/memcached_exporter --memcached.address=127.0.0.1:11211 --web.listen-address=:9105
[Install]
WantedBy=multi-user.target
EOF
# 自启与启动
systemctl daemon-reload
systemctl enable node_exporter
systemctl enable haproxy_exporter
systemctl enable rabbitmq_exporter
systemctl enable collectd
systemctl enable collectd_exporter
systemctl enable mysqld_exporter
systemctl enable memcached_exporter
systemctl start node_exporter
systemctl start haproxy_exporter
systemctl start rabbitmq_exporter
systemctl start collectd
systemctl start collectd_exporter
systemctl start mysqld_exporter
systemctl start memcached_exporter
在 m3db01,m3db02,m3db03 节点配置 m3db 需要的内核参数
sysctl -w vm.max_map_count=3000000
sysctl -w vm.swappiness=1
sysctl -n fs.file-max
sysctl -n fs.nr_open
sysctl -w fs.file-max=3000000
sysctl -w fs.nr_open=3000000
ulimit -n 3000000
# 怎么永久生效不用我说了,自己配。
# 不配也行,就是各种warning太多了受不了。
在 m3db01,m3db02,m3db03 节点安装 M3DB 集群并启动
安装前先etcd集群搭建
# 解压安装包
cd /home
tar -zxvf /home/pkg/m3_0.13.0_linux_amd64.tar.gz
# 重命名
mv /home/m3_0.13.0_linux_amd64 /home/m3db
# 创建配置文件 /home/m3db/m3dbnode.yml ,内容如下
cat >> /home/m3db/m3dbnode.yml << EOF
coordinator:
listenAddress:
type: "config"
value: "0.0.0.0:7201"
local:
namespaces:
- namespace: default #非聚合数据库名,必填,不填报错
type: unaggregated
retention: 48h
- namespace: agg #聚合数据库名,我直接叫agg,你们随意
type: aggregated
retention: 48h
resolution: 10s
logging:
level: info
metrics:
scope:
prefix: "coordinator"
prometheus:
handlerPath: /metrics
listenAddress: 0.0.0.0:7203 # until https://github.com/m3db/m3/issues/682 is resolved
sanitization: prometheus
samplingRate: 1.0
extended: none
tagOptions:
# Configuration setting for generating metric IDs from tags.
idScheme: quoted
db:
logging:
level: info
metrics:
prometheus:
handlerPath: /metrics
sanitization: prometheus
samplingRate: 1.0
extended: detailed
hostID:
resolver: config
value: m3db01 #这台机器的机器名
config:
service:
env: default_env
zone: embedded
service: m3db
cacheDir: /var/lib/m3kv
etcdClusters:
- zone: embedded
endpoints:
- 172.27.124.66:2379 #etcd的地址列表,m3db内置了etcd,直接填m3db地址列表
- 172.27.124.67:2379
- 172.27.124.68:2379
seedNodes:
initialCluster:
- hostID: m3db01 #etcd的机器名,咱们装一起了,所以也叫m3db01,下面IP也一样
endpoint: http://172.27.124.66:2380
- hostID: m3db02
endpoint: http://172.27.124.67:2380
- hostID: m3db03
endpoint: http://172.27.124.68:2380
listenAddress: 0.0.0.0:9000
clusterListenAddress: 0.0.0.0:9001
httpNodeListenAddress: 0.0.0.0:9002
httpClusterListenAddress: 0.0.0.0:9003
debugListenAddress: 0.0.0.0:9004
client:
writeConsistencyLevel: majority
readConsistencyLevel: unstrict_majority
gcPercentage: 100
writeNewSeriesAsync: true
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms
bootstrap:
bootstrappers:
- filesystem
- commitlog
- peers
- uninitialized_topology
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false
cache:
series:
policy: lru
postingsList:
size: 262144
commitlog:
flushMaxBytes: 524288
flushEvery: 1s
queue:
calculationType: fixed
size: 2097152
fs:
filePathPrefix: /var/lib/m3db
EOF
# 启动m3db,在三台m3db都配置好以后,就可以启动了。
# 博主比较懒,直接用nohup启动,懒得注册服务了,你们自己弄吧,反正也很简单。
# 三个节点都执行
nohup /home/m3db/m3dbnode -f /home/m3db/m3dbnode.yml &
在任意一个 m3db node 节点初始化 m3db 并且创建 namespace
(namespace 和数据库一个意思,就是建库)
# 安装curl,不解释
yum -y install curl
#初始化非聚合归置组(placement归置组相当于域,说白了就是描述下集群长啥样)
#请无视那一大堆返回值,他证明你执行成功了。
curl -X POST localhost:7201/api/v1/services/m3db/placement/init -d '{
"num_shards": "512",
"replication_factor": "3",
"instances": [
{
"id": "m3db01",
"isolation_group": "us-east1-a",
"zone": "embedded",
"weight": 100,
"endpoint": "172.27.124.66:9000",
"hostname": "172.27.124.66",
"port": 9000
},
{
"id": "m3db02",
"isolation_group": "us-east1-b",
"zone": "embedded",
"weight": 100,
"endpoint": "172.27.124.67:9000",
"hostname": "172.27.124.67",
"port": 9000
},
{
"id": "m3db03",
"isolation_group": "us-east1-c",
"zone": "embedded",
"weight": 100,
"endpoint": "172.27.124.68:9000",
"hostname": "172.27.124.68",
"port": 9000
}
]
}'
#初始化聚合归置组
curl -X POST localhost:7201/api/v1/services/m3aggregator/placement/init -d '{
"num_shards": "512",
"replication_factor": "3",
"instances": [
{
"id": "m3db01",
"isolation_group": "us-east1-a",
"zone": "embedded",
"weight": 100,
"endpoint": "172.27.124.66:9000",
"hostname": "172.27.124.66",
"port": 9000
},
{
"id": "m3db02",
"isolation_group": "us-east1-b",
"zone": "embedded",
"weight": 100,
"endpoint": "172.27.124.67:9000",
"hostname": "172.27.124.67",
"port": 9000
},
{
"id": "m3db03",
"isolation_group": "us-east1-c",
"zone": "embedded",
"weight": 100,
"endpoint": "172.27.124.68:9000",
"hostname": "172.27.124.68",
"port": 9000
}
]
}'
#创建default非聚合库
curl -X POST http://localhost:7201/api/v1/database/create -d '{
"type": "cluster",
"namespaceName": "default",
"retentionTime": "48h",
"numShards": "512",
"replicationFactor": "3"
}'
#创建agg聚合库
curl -X POST http://localhost:7201/api/v1/database/create -d '{
"type": "cluster",
"namespaceName": "agg",
"retentionTime": "48h",
"numShards": "512",
"replicationFactor": "3"
}'
在 pro01,pro02 节点上安装 m3coordinator,并启动
# 解压安装包
cd /home
tar -zxvf /home/pkg/m3_0.13.0_linux_amd64.tar.gz
# 重命名
mv /home/m3_0.13.0_linux_amd64 /home/m3db
# 创建配置文件 /home/m3db/m3coordinator.yml ,内容如下
cat >> /home/m3db/m3coordinator.yml << EOF
listenAddress:
type: "config"
value: "0.0.0.0:8201"
# m3coordinator和m3node是可以装一起的,所以这里端口别写7201,否则会冲突
logging:
level: info
metrics:
scope:
prefix: "coordinator"
prometheus:
handlerPath: /metrics
listenAddress: 0.0.0.0:8203 # 同理,也别写7203
sanitization: prometheus
samplingRate: 1.0
extended: none
tagOptions:
idScheme: quoted
clusters:
- namespaces:
- namespace: default
type: unaggregated #非聚合数据库名,必填,不填报错
retention: 48h
- namespace: agg
type: aggregated #聚合数据库名,我直接叫agg,你们随意
retention: 48h
resolution: 10s
client:
config:
service:
env: default_env
zone: embedded
service: m3db
cacheDir: /var/lib/m3kv
etcdClusters:
- zone: embedded
endpoints: #etcd的地址列表,m3db内置了etcd,直接填m3db地址列表
- 172.27.124.66:2379
- 172.27.124.67:2379
- 172.27.124.68:2379
writeConsistencyLevel: majority
readConsistencyLevel: unstrict_majority
EOF
#懒,直接nohup启动,别学我,老老实实注册系统服务去
nohup /home/m3db/m3coordinator -f /home/m3db/m3coordinator.yml &
在 pro01,pro02 节点上安装 m3query,并启动
# 创建配置文件 /home/m3db/m3query.yml ,内容如下
cat >> /home/m3db/m3query.yml << EOF
listenAddress:
type: "config"
value: "0.0.0.0:5201"
#m3query和m3node是可以装一起的,所以这里端口别写7201,否则会冲突
logging:
level: info
metrics:
scope:
prefix: "coordinator"
prometheus:
handlerPath: /metrics
listenAddress: 0.0.0.0:5203 # 同理别写7203
sanitization: prometheus
samplingRate: 1.0
extended: none
tagOptions:
idScheme: quoted
clusters:
- namespaces:
- namespace: default #非聚合数据库名,必填,不填报错
type: unaggregated
retention: 48h
- namespace: agg #聚合数据库名,我直接叫agg,你们随意,但是与之前的需要保持一致
type: aggregated
retention: 48h
resolution: 10s
client:
config:
service:
env: default_env
zone: embedded
service: m3db
cacheDir: /var/lib/m3kv
etcdClusters:
- zone: embedded
endpoints: #etcd的地址列表,m3db内置了etcd,直接填m3db地址列表
- 172.27.124.66:2379
- 172.27.124.67:2379
- 172.27.124.68:2379
writeConsistencyLevel: majority
readConsistencyLevel: unstrict_majority
writeTimeout: 10s
fetchTimeout: 15s
connectTimeout: 20s
writeRetry:
initialBackoff: 500ms
backoffFactor: 3
maxRetries: 2
jitter: true
fetchRetry:
initialBackoff: 500ms
backoffFactor: 2
maxRetries: 3
jitter: true
backgroundHealthCheckFailLimit: 4
backgroundHealthCheckFailThrottleFactor: 0.5
EOF
#懒,直接nohup启动,别学我,老老实实注册系统服务去
nohup /home/m3db/m3query-f /home/m3db/m3query.yml &
配置 pro 节点数据源,指向 m3db 集群
# 其实在pro01和pro02的配置里追加两句话就行
echo "
remote_read:
- url: \"http://localhost:5201/api/v1/prom/remote/read\"
# 这是读地址,填写m3query的端口
read_recent: true
remote_write:
- url: \"http://localhost:8201/api/v1/prom/remote/write\"
# 这是写地址,填写m3coordinator的端口
" >> /usr/local/prometheus/prometheus.yml
# 重启prometheus服务,耐心多等会儿,m3db需要初始化数据库好久。
# 你可以在m3coordinator和m3query的日志中查看是否是否开始远程读写。
# 两台pro节点都需要执行
systemctl restart prometheus
验证是否成功
其实,验证很简单,那就是等集群运行一段时间后从 m3query 中查询数据,如果能查到就说明数据确实写入了 m3db 集群,并且通过 m3query 已经取回了聚合数据。
# 直接在浏览器打开网页
# 时间戳和步长按照自己集群环境自己调,查询key也自己调,这只是举例。
# ip地址是pro01或者pro02的都行,端口要用m3query的,否则不支持prosql
http://172.27.124.69:5201/api/v1/query_range?query=mysql_up&start=1570084589.681&end=1570088189.681&step=14&_=1570087504176
高可用性验证就自己试吧,关一个 m3node 看看。再关任意一个 prometheus node 试试,看看我们的虚拟 IP 是否正常能获取监控数据。
我的环境查看http://172.27.124.71:9090/targets就可以了。
评论