> For the complete documentation index, see [llms.txt](https://pshizhsysu.gitbook.io/kubernetes/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://pshizhsysu.gitbook.io/kubernetes/jian-kong/prometheus.md).

# Prometheus体系

## Prometheus + NodeExporter + Grafana + AlertManager

本文介绍 Prometheus + NodeExporter + Grafana + AlertManager 的安装，监控宿主机CPU与内存等情况

## 安装Prometheus

官网给出了很多种安装方法（`https://prometheus.io/docs/prometheus/latest/installation/`），最常用的就是二进制与docker镜像，这里我们使用已经编译好的二进制进行安装。

从官网（`https://prometheus.io/download/`）\
下载最新版本（`2.15.2`）的二进制包，解压，然后查看prometheus版本信息

```
$ wget https://github.com/prometheus/prometheus/releases/download/v2.15.2/prometheus-2.15.2.linux-amd64.tar.gz
$ tar xzvf prometheus-2.15.2.linux-amd64.tar.gz
$ cd prometheus-2.15.2.linux-amd64
$ ./prometheus --version
```

当使用二进制进行安装时，最好用systemd来管理。我们把整个`prometheus-2.15.2.linux-amd64`文件夹移动到`/usr/local/`目录下，并重命名为`prometheus`

```
$ mv ./prometheus-2.15.2.linux-amd64 /usr/local/prometheus
```

然后创建文件`/usr/lib/systemd/system/prometheus.service`，内容如下

```
[Unit]
Description=prometheus
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus
Restart=on-failure

[Install]
WantedBy=multi-user.target
```

关于启动参数可以使用命令`/usr/local/prometheus/prometheus --help`查看。这里我们把数据的存储目录指定到`/var/lib/prometheus`

然后启动prometheus，并查看prometheus是否启动成功

```
$ systemctl daemon-reload && systemctl enable prometheus
$ systemctl start prometheus && systemctl status prometheus
```

如果启动失败，则可以通过`journalctl -xeu prometheus --no-pager`查看错误日志（因为systemd管理的service的stdout日志都由journald接管）。如果成功，此时我们可以通过`http://ip:9090`访问prometheus自带的UI

![](/files/-M69E7hbRUTsXkLz7Rln)

## 安装NodeExporter

从官网（`https://prometheus.io/download/#node_exporter`）下载最新版本（`0.18.1`）的二进制的NodeExporter，解压

```
$ wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
$ tar xzvf node_exporter-0.18.1.linux-amd64.tar.gz
```

同样，我们使用`systemd`来管理这个服务。我们把解压后的整个`node_exporter-0.18.1.linux-amd64`文件夹移动到`/usr/local/`目录下，并重命名为`node_exportor`

```
$ mv ./node_exporter-0.18.1.linux-amd64 /usr/local/node_exportor
```

然后创建文件`/usr/lib/systemd/system/node_exporter.service`，内容如下

```
[Unit]
Description=node_exporter
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target
```

关于启动参数可以使用命令`/usr/local/node_exporter/node_exporter --help`查看。

然后启动`node_exporter`，并查看是否启动成功

```
$ systemctl daemon-reload && systemctl enable node_exporter
$ systemctl start node_exporter && systemctl status node_exporter
```

### 配置Prometheus抓取NodeExporter的数据

当`node_exporter`启动成功后，我们便可以配置`prometheus`，去抓取`node_exporter`的数据。`node_exporter`默认的端口是9100。

一开始的时候，prometheus只有一个抓取对象，就是抓取自已的数据。即一开始时，`/usr/local/prometheus/prometheus.yml`的`scrape_configs:`的内容如下：

```
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
```

修改配置文件`/usr/local/prometheus/prometheus.yml`，加入新的`job`，抓取`node_exporter`的数据，添加完后`scrape_configs:`的完整内容如下：

```
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
    - targets: ['localhost:9100']
      labels:       
        instance: peng01
```

然后重启prometheus

```
$ systemctl restart prometheus
```

然后，我们去到prometheus的`Status -> Targets`，就可以看到上面的两个target，均为`UP`状态

![](/files/-M69E7hdnC-iVCrF95jm)

接下来，我们通过Prometheus来查看一下主机的内存使用率，输入以下查询语句然后就可以看到内存使用率的曲线图（注意：其实在截这个图时，我们的node-exporter与prometheus已经运行了很长一段时间，如果你的运行时间不长，看到的曲线会比较少）

```
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
```

![](/files/-M69E7heVBoCkMt04UcF)

## 安装Grafana

我们使用rpm包来安装，见官方教程（`https://grafana.com/docs/installation/rpm/`）。首先下载rpm包，然后安装

```
$ wget https://dl.grafana.com/oss/release/grafana-5.4.2-1.x86_64.rpm
$ yum -y localinstall ./grafana-5.4.2-1.x86_64.rpm
```

安装后grafana的相关信息如下：

* 环境变量文件：`/etc/sysconfig/grafana-server`
* 配置文件：`/etc/grafana/grafana.ini`
* 数据库文件：`/var/lib/grafana/grafana.db`
* 日志目录：`/var/log/grafana`

启动Grafana

```
$ systemctl daemon-reload
$ systemctl start grafana-server && systemctl enable grafana-server
```

默认情况下grafana会监听`localhost:3000`，用户名与密码为`admin/admin`。

接下来，我们为grafana配置数据源。详见教程`https://grafana.com/docs/features/datasources/prometheus/`。

## 安装AlertManager

github上给了AlertManager的几种安装方法：（<https://github.com/prometheus/alertmanager）。>

这里，我们使用二进制进行安装。首先下载二进制文件，这里我们下载最新版本`0.20.0`（<https://prometheus.io/download/#alertmanager）>

```
$ wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz
```

同样，我们使用`systemd`来管理这个服务。我们把解压后的整个`alertmanager-0.20.0.linux-amd64`文件夹移动到`/usr/local/`目录下，并重命名为`alertmanager`

```
$ mv ./alertmanager-0.20.0.linux-amd64 /usr/local/alertmanager
```

然后创建文件`/usr/lib/systemd/system/alertmanager.service`，内容如下

```
[Unit]
Description=alertmanager
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/alertmanager/alertmanager --config.file /usr/local/alertmanager/alertmanager.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target
```

编辑`/usr/local/alertmanager/alertmanager.yml`，内容如下（参考【5】【6】：

```
global:
  smtp_smarthost: 'smtp.163.com:25'    # 使用163邮箱服务器发邮件
  smtp_from: 'pshizh@163.com'    # 发件人，填写你的163邮箱
  smtp_auth_username: 'pshizh@163.com'  # 你的163邮箱，与上面保持一致
  smtp_auth_password: 'xxxx'  # 你的163邮箱的密码

route:
  group_by: ['example']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email'

receivers:
- name: 'email'
  email_configs:
  - to: '527103524@qq.com'    # 收件人
```

启动alertmanager

```
$ systemctl daemon-reload 
$ systemctl enable alertmanager
$ systemctl start alertmanager
$ systemctl status alertmanager
```

## 为Prometheus配置告警规则

编辑`/usr/local/prometheus/prometheus.yml`文件，在`rule_files`区域添加如下内容：

```
rule_files:
  - "/usr/local/prometheus/rule_files/memory_alert.yml"
```

然后创建文件`/usr/local/prometheus/rule_files/memory_alert.yml`，内容如下

```
groups:
- name: example        # 尚不清楚是否需要与alertmanager.yml中的route.groupby的名字保持一致
  rules:
  - alert: NodeMemoryUsage
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 20
    for: 1m
    annotations:
      summary: "{{$labels.instance}}: High Memory usage detected"
      description: "{{$labels.instance}}: Memory usage is above 20% (current value is:{{ $value }})"
```

不久后，便可以接收到告警邮件，如下

![](/files/-M69E7hf2BuVu_xHvazX)

## Reference

【1】<https://prometheus.io/docs/prometheus/latest/installation/>\
【2】 <https://www.cnblogs.com/yanyouqiang/p/7240696.html>\
【3】<https://grafana.com/docs/installation/rpm/>\
【4】<https://grafana.com/docs/features/datasources/prometheus/>\
【5】<https://www.cnblogs.com/longcnblogs/p/9620733.html>\
【6】\
<https://github.com/prometheus/alertmanager/blob/master/doc/examples/simple.yml>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://pshizhsysu.gitbook.io/kubernetes/jian-kong/prometheus.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
