网卡中断绑定

今天我是来吐槽该文中的一个shell脚本

# setting up irq affinity according to /proc/interrupts
# 2008-11-25 Robert Olsson
# 2009-02-19 updated by Jesse Brandeburg
#
# > Dave Miller:
# (To get consistent naming in /proc/interrups)
# I would suggest that people use something like:
# char buf[IFNAMSIZ+6];
#
# sprintf(buf, "%s-%s-%d",
#         netdev->name,
#  (RX_INTERRUPT ? "rx" : "tx"),
#  queue->index);
#
#  Assuming a device with two RX and TX queues.
#  This script will assign: 
#
# eth0-rx-0  CPU0
# eth0-rx-1  CPU1
# eth0-tx-0  CPU0
# eth0-tx-1  CPU1
#
set_affinity()
{
    MASK=$((1<<$VEC))     printf "%s mask=%X for /proc/irq/%d/smp_affinity\n" $DEV $MASK $IRQ     printf "%X" $MASK > /proc/irq/$IRQ/smp_affinity
    #echo $DEV mask=$MASK for /proc/irq/$IRQ/smp_affinity
    #echo $MASK > /proc/irq/$IRQ/smp_affinity
}
if [ "$1" = "" ] ; then
echo "Description:"
echo "    This script attempts to bind each queue of a multi-queue NIC"
echo "    to the same numbered core, ie tx0¦rx0 --> cpu0, tx1¦rx1 --> cpu1"
echo "usage:"
echo "    $0 eth0 [eth1 eth2 eth3]"
fi
 
# check for irqbalance running
IRQBALANCE_ON=`ps ax ¦ grep -v grep ¦ grep -q irqbalance; echo $?`
if [ "$IRQBALANCE_ON" == "0" ] ; then
echo " WARNING: irqbalance is running and will"
echo "          likely override this script's affinitization."
echo "          Please stop the irqbalance service and/or execute"
echo "          'killall irqbalance'"
fi
#
# Set up the desired devices.
#
for DEV in $*
do
for DIR in rx tx TxRx
do
    MAX=`grep $DEV-$DIR /proc/interrupts ¦ wc -l`
    if [ "$MAX" == "0" ] ; then
    MAX=`egrep -i "$DEV:.*$DIR" /proc/interrupts ¦ wc -l`
    fi
    if [ "$MAX" == "0" ] ; then
    echo no $DIR vectors found on $DEV
    continue
    #exit 1
    fi
    for VEC in `seq 0 1 $MAX`
     do
        IRQ=`cat /proc/interrupts ¦ grep -i $DEV-$DIR-$VEC"$"  ¦ cut  -d:  -f1 ¦ sed "s/ //g"`
        if [ -n  "$IRQ" ]; then
        set_affinity
        else
        IRQ=`cat /proc/interrupts ¦ egrep -i $DEV:v$VEC-$DIR"$"  ¦ cut  -d:  -f1 ¦ sed "s/ //g"`
        if [ -n  "$IRQ" ]; then
            set_affinity
        fi
        fi
    done
done
done

linux network子系统的负责人David Miller提供了一个脚本

这个脚本一眼看上去 很正常的说, 可素对于现代服务器而言,其中隐含了一个很大的坑, 本人今天就亲身被坑了.

看这段代码MASK=$((1<<$VEC))

这里是计算cpu掩码的, 比如网卡eth0 第一个队列eth0-0 那么这里的结果就是MASK=1 将0左移一位 得到2进制0b10 十进制1

这样看是很正常.. 因为根据网络上大片的文章显示计算cpu掩码,就是第几个网卡队列 就位移几位, 比如一个4核4队列网卡, 第4队列的cpu掩码为 1<<3 等于8 反推回去可以得到前面三个队列的cpu掩码, 然后将这个cpu掩码分别写入每个队列中断号 的smp_affinity.类似这样:

echo $((1<<3)) > /proc/irq/xx/smp_affinity

这样就将xx中断绑定到第4个cpu上

这样看还是很符合规律的,但是假设我们的cpu是8核, 网卡队列也是8个呢..

根据$((1<<7))得到的cpu掩码将是128 ,然后将128写入xx中断的smp_affinity中, 观察发现:尼玛说好的绑定到第8个cpu上的呢.. 怎么跑到第4个cpu上了?

33: 73905753 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-4
34: 0 5596608 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-5
35: 0 0 5590023 0 0 0 0 0 IR-PCI-MSI-edge eth0-6
36: 0 0 0 5574803 0 0 0 0 IR-PCI-MSI-edge eth0-7

然后我又放狗..找到这么一句话 计算cpu的方法第一颗为00000001换算成16进制为1,第2颗cpu为00000010换算成16进制为2,依次类推得出,第8颗cpu为80

这里有一个重点就是将2进制转换成16进制 看到这里再看上面的脚本, 尼玛这不是坑爹么… $((1<<n)) 直接是将2进制给转成10进制了哇.. 假如n = 0-3的话, 还好. 结果还是正确的,但是一旦超过了3结果就开始偏差了..这样就直接导致我8核cpu8队 列网卡,在绑定中断的时候产生重叠…. 即队列0-3绑定到cpu0-3,队列4-7绑定到cpu0-3.. 坑爹呢….

找到问题原因了, 于是就自己重写了一遍

#!/usr/bin/python
#coding=utf8

import os,re
 
def irq():
    #return irq number and network interface number
    #exp:
    #irq iface
    #61  0
    #62  1
    cpunum = os.popen("cat /proc/cpuinfo¦grep \"model name\"¦wc -l").read().replace("\n","")
    r = os.popen("cat /proc/interrupts ¦grep -E \"eth[0-9]-\"¦awk '{sub(\"eth[0-9]-\",\"\",$%s);print $1,$%s}'"%(int(cpunum)+3,int(cpunum)+3)).readlines()
    return [ (i.split()[0].split(":")[0],re.sub("[a-zA-Z]", "",i.replace("\n","").split()[1])) for i in r ]
 
def main(irq_queuenum):
 
    # if exists irqbalance process,will killed"
    irqbalance = int(os.popen("ps axu¦grep irqbalance¦grep -v grep¦wc -l").read())
    if irqbalance > 0:os.popen("pkill irqbalance");print "irqbalance is kill"
 
    # set irq_affinity
    for i in irq_queuenum:
        set_irq_affinity(i[0],hex(1 << int(i[1])).replace('0x',''))
 
def set_irq_affinity(IRQ,MASK):
    print 'echo %s to /proc/irq/%s/smp_affinity'%(MASK,IRQ)
    fp = open('/proc/irq/%s/smp_affinity'%IRQ,'w')
    fp.write(str(MASK))
    fp.close()
 
main(irq())

结语

之所以研究中断亲和力,主要是为了增加网卡的负载能力,减少被大量小包攻击致死的几率. 将网卡的队列中断分别绑定到不同的Cpu Core上,可以有效的提高小包负载能力, 由于之前我们前端服务器被小包攻击致死,因此这也是算我们的一种防御措施吧,

附 参考资料:

http://blog.netzhou.net/?p=181

http://www.ibm.com/developerworks/cn/linux/l-cn-linuxkernelint/

http://www.igigo.net/archives/231

一条评论

  1. 网卡中断绑定
    avatar
    Lv.1 1楼

    这层皮不错呢.. 可惜我不用WP好久了. 不然我也会想弄一个来玩玩 😀

    发表评论

  1. 😉
  2. 😐
  3. 😡
  4. 😈
  5. 🙂
  6. 😯
  7. 🙁
  8. 🙄
  9. 😛
  10. 😳
  11. 😮
  12. emoji-mrgree
  13. 😆
  14. 💡
  15. 😀
  16. 👿
  17. 😥
  18. 😎
  19. ➡
  20. 😕
  21. ❓
  22. ❗
  23. 65 queries in 0.422 seconds