問題:
有進程佔用顯存,但是通過kill -9 PID殺不掉,進程Running態
應該是等不到數據導致阻塞。
➜ ~ alias pgpg='ps aux | grep $1'➜ ~ pg 22109chenkan+ 10350 0.0 0.0 112680 992 pts/24 S+ 21:33 0:00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox 22109chenkan+ 22109 4748051 0.1 170587160 233712 ? R 2020 21121023:41 python main.py huawei RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.000125 --lr_steps 10 20 --epochs 25 --batch-size 16 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1 --shift --shift_div=4 --shift_place=blockres --tune_from=pretrain_model/uniform_sampling/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth --gpus 0 1
通過查詢該用戶進程對Nvidia資源的佔用情況:
➜ ~ alias gpusearchgpusearch='fuser -v /dev/nvidia*'➜ ~ gpusearch.../dev/nvidia2: chenkangyang 14403 F.... gpustat chenkangyang 22109 F.... python/dev/nvidia3: chenkangyang 14403 F.... gpustat chenkangyang 22109 F.... python... ➜ ~ alias gpuwatchgpuwatch='watch --color -n1 gpustat -cpu'➜ ~ gpuwatchEvery 1.0s: gpustat -cpu Sun Jan 3 21:40:04 2021localhost.localdomain Sun Jan 3 21:40:04 2021 440.33.01[0] GeForce GTX 1080 Ti | 61'C, 90 % | 10475 / 11178 MB | user_a:python/302(661M) user_a:python/6679(663M) user_b:python/24018(9139M)[1] GeForce GTX TITAN X | 84'C, 100 % | 7911 / 12212 MB | me:python/30869(7898M)[2] GeForce GTX TITAN X | 57'C, 0 % | 8467 / 12212 MB |[3] GeForce GTX TITAN X | 44'C, 0 % | 8517 / 12212 MB |
可以看到22109佔用了GPU2,GPU3的資源, 可惡
kill -9 22109 沒變化
過濾後,將PID傳遞給 kill 命令, 沒變化
ps aux|grep username|grep python|awk '{print $2}'|xargs kill
殺掉用戶所有進程甚至也不起作用
killall -u chenkangyang
查看進程全部信息:
父進程是1號root進程,總不能殺掉他吧
cat /proc/22109/status
Name: python
State: R (running)
Tgid: 22109
Ngid: 0
Pid: 22109
PPid: 1
TracerPid: 0
Uid: 1035 1035 1035 1035
Gid: 1035 1035 1035 1035
FDSize: 256
Groups: 1035
VmPeak: 170767372 kB
VmSize: 170587160 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 340032 kB
VmRSS: 233712 kB
VmData: 168511280 kB
VmStk: 136 kB
VmExe: 2276 kB
VmLib: 1866892 kB
VmPTE: 1044 kB
VmSwap: 0 kB
Threads: 1
SigQ: 56/514832
SigPnd: 0000000000000100
ShdPnd: 0000000000084107
SigBlk: 0000000000000000
SigIgn: 0000000001001000
SigCgt: 0000000180000002
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: ffffffff
Cpus_allowed_list: 0-31
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 8
nonvoluntary_ctxt_switches: 44680354
參考:https://lists.freebsd.org/pipermail/freebsd-questions/2008-September/182821.html
殭屍進程需要殺父進程才行
最終:聯繫管理員重啟