IP资源耗尽导致pod反复创建失败问题
现象
Pod一直处于ContainerCreating状态
[root@k8scluster2master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
abc.fgg-75b769789d-7dmjs 0/1 ContainerCreating 0 1h <none> k8scluster2node2
abc.fgg-75b769789d-f8pq2 0/1 ContainerCreating 0 1h <none> k8scluster2node2
abc.fgg-75b769789d-hzwrs 0/1 ContainerCreating 0 1h <none> k8scluster2node2
abc.fgg-75b769789d-jrzbp 0/1 ContainerCreating 0 1h <none> k8scluster2node2
abc.sdfdcard-67676989bd-7p7wf 0/1 ContainerCreating 0 1h <none> k8scluster2node2
default-http-backend-5c6d95c48-bpk54 0/1 ContainerCreating 0 1h <none> k8scluster2node2
game2.tgame-7bd6d45df8-n2v6h 0/1 ContainerCreating 0 1h <none> k8scluster2node2
game2.tgame-7bd6d45df8-tbdl7 0/1 ContainerCreating 0 1h <none> k8scluster2node2
ido.ido-57489d4b67-mrxs2 0/1 ContainerCreating 0 1h <none> k8scluster2node2
ido.ido-57489d4b67-v2rvq 0/1 ContainerCreating 0 1h <none> k8scluster2node2
nginx-ingress-controller-6c9fcdf8d9-dt8b6 0/1 ContainerCreating 0 1h <none> k8scluster2node2
shireapp.game2048-d64d84d54-6vvqg 0/1 ContainerCreating 0 1h <none> k8scluster2node2
shireapp.game2048-d64d84d54-qd9v8 0/1 ContainerCreating 0 1h <none> k8scluster2node2
shireapp.game2048-d64d84d54-t9sbz 0/1 ContainerCreating 0 1h <none> k8scluster2node2
实际登录到对应的node上通过docker ps -a 可以看出,pod每次一创建好就自动退出了。
[root@k8scluster2node2 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eaa05cd8d957 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Up 1 second k8s_POD_shireapp.game2048-d64d84d54-t9sbz_default_710c5423-9170-11e8-8f3a-00163e02a461_669
5e5e9a2ddf4a k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Up Less than a second k8s_POD_abc.fgg-75b769789d-jrzbp_default_6fe70cb6-9170-11e8-8f3a-00163e02a461_0
1616c64af8a4 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) Less than a second ago k8s_POD_tiller-deploy-f9b8476d-g7kmz_kube-system_71a5379c-9170-11e8-8f3a-00163e02a461_0
bf3a2a9ee6a6 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) Less than a second ago k8s_POD_abc.fgg-75b769789d-7dmjs_default_6fea9885-9170-11e8-8f3a-00163e02a461_0
76a842cfb349 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) Less than a second ago k8s_POD_shireapp.game2048-d64d84d54-6vvqg_default_70de87f5-9170-11e8-8f3a-00163e02a461_0
e9567a176938 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Up Less than a second k8s_POD_nginx-ingress-controller-6c9fcdf8d9-dt8b6_default_70cf42da-9170-11e8-8f3a-00163e02a461_0
0b473d52f0dd k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) Less than a second ago k8s_POD_default-http-backend-5c6d95c48-bpk54_default_7008f65d-9170-11e8-8f3a-00163e02a461_0
dc6bba6381f3 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Up Less than a second k8s_POD_tencloud.tenweb-7d8cf8cfcb-72xkw_default_7167e2f7-9170-11e8-8f3a-00163e02a461_0
9576fa02316e k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) 1 second ago k8s_POD_game2.tgame-7bd6d45df8-n2v6h_default_707b5e19-9170-11e8-8f3a-00163e02a461_0
51cb0374901c k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) Less than a second ago k8s_POD_tencloud.tenweb-7d8cf8cfcb-khkwm_default_713a1d0c-9170-11e8-8f3a-00163e02a461_0
7a04cf4d52f5 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) 1 second ago k8s_POD_ido.ido-57489d4b67-mrxs2_default_701fd1de-9170-11e8-8f3a-00163e02a461_0
20ba60a0a1b5 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) Less than a second ago k8s_POD_tencloud.tenweb-7d8cf8cfcb-xqb2t_default_716f801c-9170-11e8-8f3a-00163e02a461_0
5ab4ca633497 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) 1 second ago k8s_POD_ido.ido-57489d4b67-v2rvq_default_70b0c326-9170-11e8-8f3a-00163e02a461_0
0a6a308e9563 k8s.gcr.io/pause-amd64:3.1 "/pause" 4 seconds ago Exited (0) Less than a second ago k8s_POD_abc.fgg-75b769789d-f8pq2_default_7018ea67-9170-11e8-8f3a-00163e02a461_0
bebc2a958740 k8s.gcr.io/pause-amd64:3.1 "/pause" 8 seconds ago Exited (0) 3 seconds ago k8s_POD_abc.fgg-75b769789d-hzwrs_default_6fe3a9fd-9170-11e8-8f3a-00163e02a461_688
20454492a325 k8s.gcr.io/pause-amd64:3.1 "/pause" 8 seconds ago Exited (0) 3 seconds ago k8s_POD_abc.sdfdcard-67676989bd-7p7wf_default_70014929-9170-11e8-8f3a-00163e02a461_680
原因
根据kubectl describe查看到的原因,可以看出可用的IP资源已经被消耗殆尽,导致pod无法正常创建启动。
使用kubectl describe pod
Normal SandboxChanged 2m (x12 over 3m) kubelet, k8scluster2node2 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 2m (x10 over 3m) kubelet, k8scluster2node2 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "abc.fgg-75b769789d-7dmjs_default" network: failed to allocate for range 0: no IP addresses available in range set: 10.244.7.1-10.244.7.254
解决办法
进入到/var/lib/cni/networks/cbr0目录下,执行下面命令可以释放那些可能是kubelet leak的IP资源:
for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -irl $hash ./; fi; done | xargs rm
IP释放完成之后,pod正常创建启动。