IP资源耗尽导致pod反复创建失败问题

IP资源耗尽导致pod反复创建失败问题

现象

Pod一直处于ContainerCreating状态

[root@k8scluster2master ~]# kubectl get pod -o wide
NAME                                                   READY     STATUS              RESTARTS   AGE       IP            NODE
abc.fgg-75b769789d-7dmjs                               0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
abc.fgg-75b769789d-f8pq2                               0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
abc.fgg-75b769789d-hzwrs                               0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
abc.fgg-75b769789d-jrzbp                               0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
abc.sdfdcard-67676989bd-7p7wf                          0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
default-http-backend-5c6d95c48-bpk54                   0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
game2.tgame-7bd6d45df8-n2v6h                           0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
game2.tgame-7bd6d45df8-tbdl7                           0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
ido.ido-57489d4b67-mrxs2                               0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
ido.ido-57489d4b67-v2rvq                               0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
nginx-ingress-controller-6c9fcdf8d9-dt8b6              0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
shireapp.game2048-d64d84d54-6vvqg                      0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
shireapp.game2048-d64d84d54-qd9v8                      0/1       ContainerCreating   0          1h        <none>        k8scluster2node2
shireapp.game2048-d64d84d54-t9sbz                      0/1       ContainerCreating   0          1h        <none>        k8scluster2node2

实际登录到对应的node上通过docker ps -a 可以看出,pod每次一创建好就自动退出了。

[root@k8scluster2node2 ~]# docker ps -a
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS                              PORTS               NAMES
eaa05cd8d957        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Up 1 second                                             k8s_POD_shireapp.game2048-d64d84d54-t9sbz_default_710c5423-9170-11e8-8f3a-00163e02a461_669
5e5e9a2ddf4a        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Up Less than a second                                   k8s_POD_abc.fgg-75b769789d-jrzbp_default_6fe70cb6-9170-11e8-8f3a-00163e02a461_0
1616c64af8a4        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) Less than a second ago                       k8s_POD_tiller-deploy-f9b8476d-g7kmz_kube-system_71a5379c-9170-11e8-8f3a-00163e02a461_0
bf3a2a9ee6a6        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) Less than a second ago                       k8s_POD_abc.fgg-75b769789d-7dmjs_default_6fea9885-9170-11e8-8f3a-00163e02a461_0
76a842cfb349        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) Less than a second ago                       k8s_POD_shireapp.game2048-d64d84d54-6vvqg_default_70de87f5-9170-11e8-8f3a-00163e02a461_0
e9567a176938        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Up Less than a second                                   k8s_POD_nginx-ingress-controller-6c9fcdf8d9-dt8b6_default_70cf42da-9170-11e8-8f3a-00163e02a461_0
0b473d52f0dd        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) Less than a second ago                       k8s_POD_default-http-backend-5c6d95c48-bpk54_default_7008f65d-9170-11e8-8f3a-00163e02a461_0
dc6bba6381f3        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Up Less than a second                                   k8s_POD_tencloud.tenweb-7d8cf8cfcb-72xkw_default_7167e2f7-9170-11e8-8f3a-00163e02a461_0
9576fa02316e        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) 1 second ago                                 k8s_POD_game2.tgame-7bd6d45df8-n2v6h_default_707b5e19-9170-11e8-8f3a-00163e02a461_0
51cb0374901c        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) Less than a second ago                       k8s_POD_tencloud.tenweb-7d8cf8cfcb-khkwm_default_713a1d0c-9170-11e8-8f3a-00163e02a461_0
7a04cf4d52f5        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) 1 second ago                                 k8s_POD_ido.ido-57489d4b67-mrxs2_default_701fd1de-9170-11e8-8f3a-00163e02a461_0
20ba60a0a1b5        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) Less than a second ago                       k8s_POD_tencloud.tenweb-7d8cf8cfcb-xqb2t_default_716f801c-9170-11e8-8f3a-00163e02a461_0
5ab4ca633497        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) 1 second ago                                 k8s_POD_ido.ido-57489d4b67-v2rvq_default_70b0c326-9170-11e8-8f3a-00163e02a461_0
0a6a308e9563        k8s.gcr.io/pause-amd64:3.1   "/pause"                 4 seconds ago       Exited (0) Less than a second ago                       k8s_POD_abc.fgg-75b769789d-f8pq2_default_7018ea67-9170-11e8-8f3a-00163e02a461_0
bebc2a958740        k8s.gcr.io/pause-amd64:3.1   "/pause"                 8 seconds ago       Exited (0) 3 seconds ago                                k8s_POD_abc.fgg-75b769789d-hzwrs_default_6fe3a9fd-9170-11e8-8f3a-00163e02a461_688
20454492a325        k8s.gcr.io/pause-amd64:3.1   "/pause"                 8 seconds ago       Exited (0) 3 seconds ago                                k8s_POD_abc.sdfdcard-67676989bd-7p7wf_default_70014929-9170-11e8-8f3a-00163e02a461_680

原因

根据kubectl describe查看到的原因,可以看出可用的IP资源已经被消耗殆尽,导致pod无法正常创建启动。

使用kubectl describe pod 可以看到错误的原因:

  Normal   SandboxChanged          2m (x12 over 3m)    kubelet, k8scluster2node2  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  2m (x10 over 3m)    kubelet, k8scluster2node2  Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "abc.fgg-75b769789d-7dmjs_default" network: failed to allocate for range 0: no IP addresses available in range set: 10.244.7.1-10.244.7.254

解决办法

进入到/var/lib/cni/networks/cbr0目录下,执行下面命令可以释放那些可能是kubelet leak的IP资源:

for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -irl $hash ./; fi; done | xargs rm

IP释放完成之后,pod正常创建启动。