Cent OS 7.7 搭建蓝鲸智云社区版5.1.27（2）——标准部署

最新推荐文章于 2022-03-08 10:06:00 发布

原创最新推荐文章于 2022-03-08 10:06:00 发布 · 2.6k 阅读

3 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#linux #centos

蓝鲸智云专栏收录该内容

3 篇文章

订阅专栏

前言

必须按顺序执行以下操作完成蓝鲸智云社区版基础包的安装，以下步骤若有报错/失败，需要根据提示修复错误后，在重新执行相同的命令(支持断点续装)。

1.安装 PaaS 平台及其依赖服务 ./bk_install paas

该步骤完成后，可以打开 PaaS 平台。

[root@JSH-01 ~]# cd /data/install
[root@JSH-01 install]# ./bk_install paas
……
enter a absolute path [/data/bkce]:       #选择安装目录，直接默认即可  /data/bkce/
……
------------------------- agreements ---------------------------------
    Dear users, welcome to use the Tencent BlueKing Software. Please
access http://bk.tencent.com/info/#laws to read the Tencent BlueKing
Software License and Service Agreement carefully. You have no right to
install or use the Software and related services unless you have read
and accepted all the terms of this Agreement. By downloading, install-
ing, using or logging in the Software, you shall be deemed to have
read and agreed to be bound by the Agreement above. If you have under-
stood the above content, please enter "yes" to continue installation,
otherwise, please enter "no" to abort. Thank you for your understan-
ding and support of the Tencent BlueKing Software.
----------------------------------------------------------------------
yes/no ?                                 #提示是否同意腾讯协议，输入yes同意即可

……
如果以上步骤没有报错, 你现在可以通过 http://paas.bk.com:80 访问 paas 平台,
登陆用户名(login user): admin
登陆密码(login password): el3ZQgIN7r

[root@JSH-01 install]# echo $?           #验证执行过程知否报错
0

需要修改本机的hosts文件，将NGINX所在机器的IP设置为 paas.bk.com 的解析IP地址。

2、安装配置平台及其依赖服务 ./bk_install cmdb

该步骤完成后，可以打开配置平台，看到蓝鲸业务及示例业务。

[root@JSH-01 install]# ./bk_install cmdb

……

[192.168.1.7]20200416-195900 98   starting zk(ALL) on host: 192.168.1.8
ZooKeeper JMX enabled by default
Using config: /data/bkce/etc/zoo.cfg
Starting zookeeper ... STARTED
[192.168.1.7]20200416-195906 98   starting zk(ALL) on host: 192.168.1.9
ZooKeeper JMX enabled by default
Using config: /data/bkce/etc/zoo.cfg
Starting zookeeper ... STARTED
[192.168.1.7]20200416-195909 98   starting zk(ALL) on host: 192.168.1.7
ZooKeeper JMX enabled by default
Using config: /data/bkce/etc/zoo.cfg
Starting zookeeper ... STARTED
[192.168.1.7]20200416-195948 102   zk.service.consul无法解析

报错1：[192.168.1.7]20200416-195948 102 zk.service.consul无法解析

解决1：确认 `/etc/resolv.conf` 里第一行是`nameserver 127.0.0.1，如果没有需要添加`

[root@JSH-01 install]# dig zk.service.consul

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> zk.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 29234
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;zk.service.consul.		IN	A

;; AUTHORITY SECTION:
.			1979	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2020041600 1800 900 604800 86400

;; Query time: 36 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: 四 4月 16 20:22:25 CST 2020
;; MSG SIZE  rcvd: 121

[root@JSH-01 install]# vi /etc/resolv.conf
nameserver 127.0.0.1          #首行必须为
nameserver 192.168.1.1
nameserver 119.29.29.29

[root@JSH-01 install]# ./bk_install cmdb    #再次执行
……
[192.168.1.7]20200416-203905|134	start gse_dataop ...Done
[192.168.1.8] nginx: RUNNING
[192.168.1.7] redis: RUNNING
[192.168.1.8] zk: RUNNING
[192.168.1.9] zk: RUNNING
[192.168.1.7] zk: RUNNING
[192.168.1.9] mongod: RUNNING
[192.168.1.7] gse_alarm: RUNNING
[192.168.1.7] gse_ops: RUNNING
[192.168.1.7] gse_opts: RUNNING
[192.168.1.7] gse_api: RUNNING
[192.168.1.7] gse_btsvr: RUNNING
[192.168.1.7] gse_data: RUNNING
[192.168.1.7] gse_dba: RUNNING
[192.168.1.7] gse_task: RUNNING
[192.168.1.7] gse_syncdata: RUNNING
[192.168.1.7] gse_procmgr: RUNNING
[192.168.1.7] gse_dataop: RUNNING
[192.168.1.7] cmdb-nginx: RUNNING
[192.168.1.7] server      cmdb_adminserver                 RUNNING   pid 624, uptime 0:01:26
[192.168.1.7] server      cmdb_apiserver                   RUNNING   pid 621, uptime 0:01:26
[192.168.1.7] server      cmdb_auditcontoller              RUNNING   pid 620, uptime 0:01:26
[192.168.1.7] server      cmdb_datacollection              RUNNING   pid 623, uptime 0:01:26
[192.168.1.7] server      cmdb_eventserver                 RUNNING   pid 622, uptime 0:01:26
[192.168.1.7] server      cmdb_hostcontroller              RUNNING   pid 2928, uptime 0:01:10
[192.168.1.7] server      cmdb_hostserver                  RUNNING   pid 615, uptime 0:01:26
[192.168.1.7] server      cmdb_objectcontroller            RUNNING   pid 617, uptime 0:01:26
[192.168.1.7] server      cmdb_proccontroller              RUNNING   pid 628, uptime 0:01:26
[192.168.1.7] server      cmdb_procserver                  RUNNING   pid 2936, uptime 0:01:10
[192.168.1.7] server      cmdb_toposerver                  RUNNING   pid 616, uptime 0:01:26
[192.168.1.7] server      cmdb_webserver                   RUNNING   pid 618, uptime 0:01:26

如果以上步骤没有报错, 你现在可以通过 http://cmdb.bk.com:80 访问配置平台,

需要修改本机的hosts文件，将NGINX所在机器的IP设置为 cmdb.bk.com 的解析IP地址。

思路1：

该错误意味着内部域名无法解析的问题。内部域名，指的是蓝鲸集群模块之间使用 consul 模块注册的以".service.consul"结尾的域名。它由每台机器上运行的 consul 进程监听的 53 端口提供解析服务。

当无法解析时，第一步，在报错的机器上使用 dig 看看 consul 能否解析：

$ dig xxx.service.consul @127.0.0.1

@127.0.0.1 表示使用 127.0.0.1:53 这个作为 dns 服务器，也就是使用 consul 提供的 dns 服务

正常情况下，可以看到类似下图的记录。如果命令换成dig 域名 没出现正确的记录，说明 /etc/resolv.conf里没有配置上 127.0.0.1 的 namserver，确认 /etc/resolv.conf 里第一行是nameserver 127.0.0.1

;; ANSWER SECTION:
zk.service.consul.    0    IN    A    10.x.x.x
zk.service.consul.    0    IN    A    10.x.x.x
zk.service.consul.    0    IN    A    10.x.x.x

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)

如果出现以下信息，说明 consul 没有正常启动。那么使用 supervisor 启动 consul 进程

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.2 <<>> zk.service.consul @127.0.0.1
;; global options: +cmd
;; connection timed out; no servers could be reached

如果出现以下信息 "IN A" 后面没有 ip 地址，说明 consul 启动了，但是无法解析域名

;; QUESTION SECTION:
;zk.service.consul.        IN    A

;; AUTHORITY SECTION:
consul.            0    IN    SOA    ns.consul. postmaster.consul. 1530849644 3600 600 86400 0

此时按照以下步骤:

运行 consul monitor 看看日志，主要确认 consul 集群状态是否正常。观察是否有"no cluster leader"的输出。
针对具体的域名，譬如 zk.service.consul，那么登陆到 zk 所在机器，查看 /data/bkce/etc/consul.d/zk.json 文件运行里面的 check 脚本，看返回的输出。

对于出现"no cluster leader"的输出时，说明 consul 之间没有成功组成集群，选举出 leader：

检查 consul server 节点是否都 running
在任意一台 consul 上输入 consul join <另外一个 consul 节点>
查看节点状态：consul operator raft list-peers

3、安装作业平台及其依赖组件，并在安装蓝鲸的服务器上装好 gse_agent 供验证

./bk_install app_mgr

该步骤完成后，可以打开作业平台，并执行作业。同时在配置平台中可以看到蓝鲸的模块下加入了主机。

需要修改本机的hosts文件，将NGINX所在机器的IP设置为 job.bk.com 的解析IP地址。

4、部署正式环境及测试环境 ./bk_install app_mgr

该步骤完成后，可以在开发者中心的服务器信息和第三方服务信息中看到已经成功激活的服务器，同时也可以进行 SaaS 应用(除蓝鲸监控和日志检索)的上传部署。

[192.168.1.7]20200416-210915 182   exec initdata_appo on 192.168.1.9
[192.168.1.9]20200416-210916 262   update config file: paas_agent_config.yaml
[192.168.1.9]20200416-210916 268   register appo succeded.
[192.168.1.7]20200416-210916 502   create database bksuite_common
[192.168.1.7]20200416-210916 504   add version info to db
[192.168.1.7]20200416-210917 98   starting appo(ALL) on host: 192.168.1.9
[192.168.1.9]20200416-210921 77   activate appo(192.168.1.9) succeded
[192.168.1.9] paas_agent()    paas_agent                       RUNNING   pid 92512, uptime 0:00:08
[192.168.1.9] nginx: RUNNING
[192.168.1.8] paas_agent()    paas_agent                       RUNNING   pid 2454, uptime 0:04:49
[192.168.1.8] nginx: RUNNING
[192.168.1.8] rabbitmq: RUNNING

如果以上步骤没有报错, 你现在可以完成正式环境及测试环境的部署,可以:
 1. 通过./bk_install saas-o bk_nodeman 部署节点管理app, 或
 2. 通过开发者中心部署app.
若要安装蓝鲸监控, 日志检索, 需要先通过 ./bk_install bkdata 安装 bkdata

您在 /var/spool/mail/root 中有新邮件

5、安装蓝鲸数据平台基础模块及其依赖服务 ./bk_install bkdata

安装该模块后，可以开始安装使用 SaaS 应用: 蓝鲸监控和日志检索

报错2：如以下代码所示

解决2：缺少Python的模块文件，

执行命令 pip install flask-migrate 后再次执行./bk_install bkdata 即可。

[root@JSH-01 install]# ./bk_install bkdata
……
[192.168.1.7]20200416-212347 172   exec initdata_bkdata on 192.168.1.8
[192.168.1.7]20200416-212348 172   exec initdata_bkdata on 192.168.1.9
[192.168.1.9]20200416-212349 104   start to make migration for bkdata ...
[192.168.1.9]20200416-212349 112   on-migrate ... /data/bkce/bkdata/dataapi/on_migrate
[192.168.1.9]20200416-212351 10   init dataserver zk config
[192.168.1.9]20200416-212352 13   create topic
waiting node ready
waiting node ready
waiting node ready
waiting node ready
waiting node ready
waiting node ready
Traceback (most recent call last):
  File "/opt/py27/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/opt/py27/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/data/bkce/bkdata/dataapi/databus/script/kafka/__main__.py", line 147, in <module>
    ensure_topic("connect-status." + m, 5, replication_factor=2, configs=topic_configs)
  File "/data/bkce/bkdata/dataapi/databus/script/kafka/__main__.py", line 85, in ensure_topic
    topic, error_code))
Exception: Unknown error code during creation of topic `connect-status.jdbc`: 7
[192.168.1.9]20200416-212408 14   create topic failed.
[192.168.1.9]20200416-212409 153   migrate failed for bkdata(dataapi)
[192.168.1.7]20200416-212409 179   Abort
您在 /var/spool/mail/root 中有新邮件

报错3：如以下代码所示

解决3：缺少文件，执行 yum install -y snappy-devel

=================set reserved dataid=================
blueking APP ID: 2
update reserved dataids DONE
[192.168.1.9]20200416-222548 24   handle new reserved dataid
[192.168.1.9]20200416-222550 27   update config_etl of system_disk
[192.168.1.9]20200416-222551 151   migrate bkdata(dataapi) done
[192.168.1.7]20200416-222551 172   exec initdata_bkdata on 192.168.1.7
[192.168.1.7]20200416-222552 502   create database bksuite_common
[192.168.1.7]20200416-222552 504   add version info to db
[192.168.1.7]20200416-222553 98   starting bkdata(databus) on host: 192.168.1.8
[192.168.1.7]20200416-222604 98   starting bkdata(dataapi) on host: 192.168.1.9
[192.168.1.9]20200416-222623 485   going to init snapshot data. this may take a while.
http://databus.bkdata.service.consul:10052/connectors
init data of snapshot and components
add etl connector of 2_tomcat_cache
add etl connector of 2_tomcat_jsp
add etl connector of 2_tomcat_net
E
======================================================================
ERROR: init_snapshot_config (databus.tests.DatabusHealthTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/bkce/bkdata/dataapi/databus/tests.py", line 157, in init_snapshot_config
    snapshot.init()
  File "/data/bkce/bkdata/dataapi/databus/init/snapshot.py", line 84, in init
    init_connector()
  File "/data/bkce/bkdata/dataapi/databus/init/snapshot.py", line 89, in init_connector
    init_snapshot_etl_connector(bk_biz_id)
  File "/data/bkce/bkdata/dataapi/databus/init/snapshot.py", line 200, in init_snapshot_etl_connector
    raise Exception(json_data.get('message'))
Exception: \u6dfb\u52a0\u603b\u7ebf\u4efb\u52a1\u5931\u8d25 \u6dfb\u52a0etl\u4efb\u52a1\u5931\u8d25 2_tomcat_net

----------------------------------------------------------------------
Ran 1 test in 121.150s

FAILED (errors=1)
您在 /var/spool/mail/root 中有新邮件

[root@JSH-01 install]# pip install pyecharts-snapshot
Collecting pyecharts-snapshot
  Downloading https://mirrors.cloud.tencent.com/pypi/packages/85/e8/f0a14cc92d89d43e52efb48bca5053c24b53e06e93b43579071476ed0cc6/pyecharts_snapshot-0.2.0-py2.py3-none-any.whl
Collecting pyppeteer>=0.0.25 (from pyecharts-snapshot)
  Downloading https://mirrors.cloud.tencent.com/pypi/packages/b0/16/a5e8d617994cac605f972523bb25f12e3ff9c30baee29b4a9c50467229d9/pyppeteer-0.0.25.tar.gz (1.2MB)
    100% |████████████████████████████████| 1.2MB 7.8MB/s 
pyppeteer requires Python '>=3.5' but the running Python is 2.7.9
You are using pip version 9.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
您在 /var/spool/mail/root 中有新邮件

未完，待续……