python argparse 源碼閱讀

2021-03-02 遊戲不存在
http.server 可以使用 -h 查看幫助。這種自定義的命令行工具對用戶使用程序非常有幫助，我們一起學習是如何實現命令工具的。
先看看展示:
python -m http.server -h
usage: server.py [-h] [--cgi] [--bind ADDRESS] [--directory DIRECTORY] [port]

positional arguments:
  port                  Specify alternate port [default: 8000]

optional arguments:
  -h, --help            show this help message and exit
  --cgi                 Run as CGI Server
  --bind ADDRESS, -b ADDRESS
                        Specify alternate bind address [default: all interfaces]
  --directory DIRECTORY, -d DIRECTORY
                        Specify alternative directory [default:current directory]
從 http.server 模塊幫助信息中可以看到：
位置參數 port , 可以自定義服務埠， 默認值 8000--directory/-d 服務文件目錄，默認當前目錄我們把上面的描述換個方式，使用python函數定義，這樣就很容易理解位置參數和關鍵字參數了:
def http_server(port, cgi=False, bind="all interfaces", directory="current directory"):
    pass
函數和命令行參數定義有所不同：
關鍵字參數有長短選項的寫法，-b和--bind都可以帶著這幾個疑問，我們去python源碼中查找答案。本文分下面幾個部分:
sys.argv 簡介編寫測試腳本
# simple.py

import sys

if __name__ == "__main__":
    print(type(sys.argv), sys.argv)
使用下面的命令運行測試腳本
python simple.py 1 a bbb c=2 d@3
<class 'list'> ['simple.py', '1', 'a', 'bbb', 'c=2', 'd@3']
可以看到 sys.argv 是一個列表，其中包含了被傳遞給 Python 腳本的命令行參數， argv[0] 為腳本的名稱。這是命令工具的起點，所有命令行工具從這裡派生擴展。
getopt 解析我在上個例子中使用了 d@3 , 這是搞笑的。命令行參數有約定的慣例和規範，在unix-shell中由getopt函數實現，具體可以看參考連結中的wiki部分。python中也提供了 getopt 實現。先看看如何使用，短選項使用單個 - 前綴:
import getopt
args = '-a -b -cfoo -d bar a1 a2'.split()
print(args)  # ['-a', '-b', '-cfoo', '-d', 'bar', 'a1', 'a2']

optlist, args = getopt.getopt(args, 'abc:d:')
print(optlist)  # [('-a', ''), ('-b', ''), ('-c', 'foo'), ('-d', 'bar')]
print(args)  # ['a1', 'a2']
長選項使用兩個-- 前綴:
s = '--condition=foo --testing --output-file abc.def -x a1 a2'
args = s.split()
print(args)  # ['--condition=foo', '--testing', '--output-file', 'abc.def', '-x', 'a1', 'a2']

optlist, args = getopt.getopt(args, 'x', [
    'condition=', 'output-file=', 'testing'])
print(optlist)  # [('--condition', 'foo'), ('--testing', ''), ('--output-file', 'abc.def'), ('-x', '')]
print(args)  # ['a1', 'a2']
getopt返關鍵字參數optlist和位置參數args。關鍵字有長關鍵字 --connddition 和短關鍵字 -c 兩個名稱，為什麼會有長短兩種方式，我的理解是長關鍵字語義更明確，單獨的字母 c 很難知道代表的含義，而使用單詞 condditon 則一目了然; 短關鍵字使用更便捷，只需要敲一個字母，還可以多個參數合併，比如 ls -lah。
在 quopri 中演示了如何使用 getopt 實現命令行參數解析:
# quopri

import getopt
try:
    opts, args = getopt.getopt(sys.argv[1:], 'td')
except getopt.error as msg:
    sys.stdout = sys.stderr
    print(msg)
    print("usage: quopri [-t | -d] [file] ...")  # 幫助信息
    print("-t: quote tabs")
    print("-d: decode; default encode")
    sys.exit(2)
...

for o, a in opts:  # 解析關鍵字參數
    if o == '-t': tabs = 1
    if o == '-d': deco = 1

for file in args:  # 解析位置參數
    ...
主要的getopt函數代碼如下:
# getopt 

def getopt(args, shortopts, longopts = []): 
    opts = []
    longopts = list(longopts)
    while args and args[0].startswith('-') and args[0] != '-': 
        ...
        if args[0].startswith('--'):  
            opts, args = do_longs(opts, args[0][2:], longopts, args[1:])
        else:
            opts, args = do_shorts(opts, args[0][1:], shortopts, args[1:])  # args[0][1:] 移除前綴
    return opts, args
3個參數:待解析參數，短參數定義，長參數定義。短參數定義使用字符串，比如abc:d:；長參數使用數組，比如: ['condition=', 'output-file=', 'testing']使用while循環持續的解析關鍵字參數，關鍵字參數解析完成後剩餘的就是位置參數args長參數使用do_longs解析，短參數使用do_shorts解析短參數的解析方法
def do_shorts(opts, optstring, shortopts, args):
    while optstring != '':  # while循環
        opt, optstring = optstring[0], optstring[1:]  # 截取參數關鍵字和剩餘字符
        if short_has_arg(opt, shortopts):
            if optstring == '':
                if not args:
                    raise GetoptError(_('option -%s requires argument') % opt,
                                      opt)
                optstring, args = args[0], args[1:]  # 貪婪後面的參數 '-d', 'bar'
            optarg, optstring = optstring, ''  # 取甚於部分 -cfoo
        else:
            optarg = ''
        opts.append(('-' + opt, optarg))  # 無參數 '-a', '-b
    return opts, args

def short_has_arg(opt, shortopts):
    for i in range(len(shortopts)):
        if opt == shortopts[i] != ':':
            return shortopts.startswith(':', i+1)  # 判斷之後是否跟著:字符 
    raise GetoptError(_('option -%s not recognized') % opt, opt)
以 args=['-a', '-b', '-cfoo', '-d', 'bar', 'a1', 'a2'] 和 shortopts='abc:d:' 為例，介紹一下解析的執行過程:
-a, -b 無需參數值，，在shortopts僅ab，沒有:後綴，未命中short_has_arg，返回兩個元祖 ('-a', ''), ('-b', '')-cfoo 需要參數值，在shortopts中有c:，命中short_has_arg，返回 ('-c', 'foo')-d, 需要參數值，在shortopts中有d:，命中short_has_arg，並捕獲後面跟著的bar，一起返回  ('-d', 'bar')這樣我們就很清楚 abc:d: 的含義了，每個字符是一個參數，如果需要參數值，則後面跟一個:字符。長參數的解析方法，比較類似，就不再贅述了。
optparse 解析從quopri源碼中可以看到 getopt 提供的方法比較單薄，還需要手工print(usage && help)信息，解析後的參數使用也不直觀, 需要按照位置獲取。接下來登場的是 optparse , 可以在cProfile中看到使用示例:
# cProfile.py

from optparse import OptionParser

usage = "cProfile.py [-o output_file_path] [-s sort] scriptfile [arg] ..."  # 注1
parser = OptionParser(usage=usage)
parser.allow_interspersed_args = False
parser.add_option('-o', '--outfile', dest="outfile",
    help="Save stats to <outfile>", default=None)  # 注2
parser.add_option('-s', '--sort', dest="sort",
    help="Sort order when printing to stdout, based on pstats.Stats class",
    default=-1)
    
(options, args) = parser.parse_args()
 
runctx(code, globs, None, options.outfile, options.sort)  # 注3
查看效果:
python -m cProfile -h
Usage: cProfile.py [-o output_file_path] [-s sort] [-m module | scriptfile] [arg] ...

Options:
  -h, --help            show this help message and exit
  -o OUTFILE, --outfile=OUTFILE
                        Save stats to <outfile>
  -s SORT, --sort=SORT  Sort order when printing to stdout, based on
                        pstats.Stats class
  -m                    Profile a library module
optparse對比getopt:
option使用比較直觀，可以使用 options.outfile 獲取參數值 (注3)官方的文檔中介紹optparse難以擴展，已經被廢棄，推薦使用基於它的argparse替代。但是我們還是不放過它，這對理解argparser有幫助。
optparse 模塊結構optparse的模塊類圖:
optparse可以看到optparse模塊主要就是 OptionParser, Option和HelpFormatter 三個類。
optparse 實現parser的使用模版，就是下面3行代碼：創建對象，添加option和進行參數解析並返回
parser = OptionParser(usage=usage)
parser.add_option('-o', '--outfile', dest="outfile",
    help="Save stats to <outfile>", default=None)  #  添加多個參數...
parser.parse_args()  # 自動獲取sys.argv 不需要傳入
先查看 OptionParser 對象創建
class OptionContainer:
    
    def __init__(self, option_class, conflict_handler, description):
        # Initialize the option list and related data structures.
        # This method must be provided by subclasses, and it must
        # initialize at least the following instance attributes:
        # option_list, _short_opt, _long_opt, defaults.
        self._create_option_list()  # 抽象方法，由子類實現，這時候可能沒有abc模塊，抽象方法使用注釋進行要求
        self.option_class = option_class  # 選項類，可以由用戶擴展
        self.conflict_handler = handler
        self.description = description
    
class OptionParser(OptionContainer):
    
    def __init__(self,
                 usage=None,
                 option_list=None,
                 option_class=Option,
                 version=None,
                 conflict_handler="error",
                 description=None,
                 formatter=None,
                 add_help_option=True,
                 prog=None,
                 epilog=None):
        OptionContainer.__init__(
            self, option_class, conflict_handler, description)
        self.usage = usage
        self.version = version
        if formatter is None:
            formatter = IndentedHelpFormatter()  # 默認幫助類
        self.formatter = formatter
        self.formatter.set_parser(self)
        
        self._populate_option_list(option_list,
                                   add_help=add_help_option)

        self._init_parsing_state()
上面代碼創建了OptionParser對象，下面代碼初始化了部分屬性
def _create_option_list(self):
    self.option_list = []
    self.option_groups = [] 
    self._short_opt = {}            # single letter -> Option instance
    self._long_opt = {}             # long option -> Option instance
    self.defaults = {}              # maps option dest -> default value

def _add_help_option(self):
    self.add_option("-h", "--help",
                    action="help",
                    help=_("show this help message and exit"))

def _populate_option_list(self, option_list, add_help=True):
    ...
    if self.version:
        self._add_version_option()  # version-option默認未開啟
    if add_help:
        self._add_help_option()  # 默認添加help-option

def _init_parsing_state(self):
    # These are set in parse_args() for the convenience of callbacks.
    self.rargs = None  # 初始化
    self.largs = None
    self.values = None
add_option實現，可以看到很熟悉的長參數和短參數
def add_option(self, *args, **kwargs):
    if isinstance(args[0], str):
        option = self.option_class(*args, **kwargs)  # Option
    
    self.option_list.append(option)
    option.container = self
    
    for opt in option._short_opts:
        self._short_opt[opt] = option  # 長參數
    for opt in option._long_opts: 
        self._long_opt[opt] = option  # 短參數

    if option.dest is not None:     # option has a dest, we need a default
        if option.default is not NO_DEFAULT:
            self.defaults[option.dest] = option.default
        elif option.dest not in self.defaults:
            self.defaults[option.dest] = None

    return option
Option存儲參數設置, 構建長選項和短選項列表，並check選項是否合法：
class Option:
    
    def __init__(self, *opts, **attrs):
    self._short_opts = []  
    self._long_opts = []  # 為什麼option要有short和long兩個數組
    ...
    self._set_opt_strings(opts)

    # Set all other attrs (action, type, etc.) from 'attrs' dict
    self._set_attrs(attrs)

    for checker in self.CHECK_METHODS:
        checker(self)
使用前綴判斷參數列表：
def _set_opt_strings(self, opts):
    for opt in opts:
        if len(opt) < 2:
            raise
        elif len(opt) == 2:
            if not (opt[0] == "-" and opt[1] != "-"):
                raise 
            self._short_opts.append(opt)
        else:
            if not (opt[0:2] == "--" and opt[2] != "-"):
                raise 
            self._long_opts.append(opt)
option的檢查方法比較多，我們看一下的action和type檢查
CHECK_METHODS = [_check_action,
                 _check_type,
                 _check_choice,
                 _check_dest,
                 _check_const,
                 _check_nargs,
                 _check_callback]

def _check_action(self):
    if self.action is None:
        self.action = "store"  # 默認原樣存儲
    elif self.action not in self.ACTIONS:
        raise

def _check_type(self):  # 判斷參數類型
    if self.type is None:
        if self.action in self.ALWAYS_TYPED_ACTIONS:
            if self.choices is not None:
                # The "choices" attribute implies "choice" type.
                self.type = "choice"  # 枚舉
            else:
                # No type given?  "string" is the most sensible default.
                self.type = "string"
    else:
        # Allow type objects or builtin type conversion functions
        # (int, str, etc.) as an alternative to their names.
        if isinstance(self.type, type):  # 其它類型
            self.type = self.type.__name__

        if self.type == "str":
            self.type = "string"

        if self.type not in self.TYPES:
            raise 
        if self.action not in self.TYPED_ACTIONS:
            raise 
store-action的使用等解析參數時候再介紹。完成Parser對象的構建後，就是如何使用parse_args解析參數：
def parse_args(self, args=None, values=None):
    rargs = sys.argv[1:]  # 從sys.argv中獲取輸入
    ...
    values = self.get_default_values()  # 獲取默認值
    ...
    stop = self._process_args([], rargs, values)  # 解析參數
熟悉的參數解析分支:
def _process_args(self, largs, rargs, values):
    while rargs:  # while循環
        arg = rargs[0]
        elif arg[0:2] == "--":
            # process a single long option (possibly with value(s))
            self._process_long_opt(rargs, values)  # 處理長參數
        elif arg[:1] == "-" and len(arg) > 1:
            # process a cluster of short options (possibly with
            # value(s) for the last one only)
            self._process_short_opts(rargs, values)  # 處理短參數
        ...
短參數的解析過程:
def _process_short_opts(self, rargs, values):
    arg = rargs.pop(0)
    stop = False
    i = 1
    for ch in arg[1:]:  # 逐個字符解析
        opt = "-" + ch
        option = self._short_opt.get(opt)  # 獲取對應的 Option 規則
        i += 1                      # we have consumed a character

        if option.takes_value():
            # Any characters left in arg?  Pretend they're the
            # next arg, and stop consuming characters of arg.
            if i < len(arg):
                rargs.insert(0, arg[i:])
                stop = True

            nargs = option.nargs
            if len(rargs) < nargs:
                ...
            elif nargs == 1:
                value = rargs.pop(0)  # 解析出參數值
            else:
                value = tuple(rargs[0:nargs])
                del rargs[0:nargs]

        else:                       # option doesn't take a value
            value = None

        option.process(opt, value, values, self) # 存儲到option 
Option存儲參數Action的實現:
def process(self, opt, value, values, parser):
    # And then take whatever action is expected of us.
    # This is a separate method to make life easier for
    # subclasses to add new actions.
    return self.take_action(
        self.action, self.dest, opt, value, values, parser)

def take_action(self, action, dest, opt, value, values, parser):
    if action == "store":
        setattr(values, dest, value)  # 直接存儲
    elif action == "store_const":
        setattr(values, dest, self.const)  # 使用定義的常量
    elif action == "store_true":
        setattr(values, dest, True)  # int=true
    elif action == "store_false":
        setattr(values, dest, False)  # int=false
    elif action == "append":
        values.ensure_value(dest, []).append(value)  # 接受數組
    elif action == "append_const":
        values.ensure_value(dest, []).append(self.const)  # 常量數組
    elif action == "count":
        setattr(values, dest, values.ensure_value(dest, 0) + 1) # 計數參數，可以重複使用
    elif action == "callback":  # 支持回掉
        args = self.callback_args or ()
        kwargs = self.callback_kwargs or {}
        self.callback(self, opt, value, parser, *args, **kwargs)
    elif action == "help":  # 幫助
        parser.print_help()
        parser.exit()
    elif action == "version":  # 版本
        parser.print_version()
        parser.exit()
    else:
        raise ValueError("unknown action %r" % self.action)
    return 1
action的使用，可以看參考連結中的howto部分，介紹的非常詳細。接下來重點看一下幫助部分的實現。
# parse
def format_help(self, formatter=None):
        if formatter is None:
            formatter = self.formatter
        result = []
        if self.usage:
            result.append(self.get_usage() + "\n")  # 輸出usage 
        if self.description:
            result.append(self.format_description(formatter) + "\n")  # 輸出description
        result.append(self.format_option_help(formatter))  # 開始option-help 
        ...
        return "".join(result)

def format_option_help(self, formatter=None):
    formatter.store_option_strings(self)
    result = []
    result.append(formatter.format_heading(_("Options")))  
    formatter.indent()
    if self.option_list:
        result.append(OptionContainer.format_option_help(self, formatter)) # 收集option的幫助
        result.append("\n")
    ...
    formatter.dedent()
    return "".join(result[:-1])

def format_option(self, option):
    result = []
    opts = self.option_strings[option]
    opt_width = self.help_position - self.current_indent - 2
    if len(opts) > opt_width: # 輸出option的關鍵字
        opts = "%*s%s\n" % (self.current_indent, "", opts)
        indent_first = self.help_position
    else:                       # start help on same line as opts
        opts = "%*s%-*s  " % (self.current_indent, "", opt_width, opts)
        indent_first = 0
    result.append(opts)
    if option.help:  # 輸出option幫助信息
        help_text = self.expand_default(option)
        help_lines = textwrap.wrap(help_text, self.help_width)
        result.append("%*s%s\n" % (indent_first, "", help_lines[0]))
        result.extend(["%*s%s\n" % (self.help_position, "", line)
                       for line in help_lines[1:]])
    elif opts[-1] != "\n":
        result.append("\n")
    return "".join(result)
看完處理流程，我們大概還有2個疑問：
option為什麼有 _short_opts 和 _long_opts 2個數組如何使用 options.outfile 獲取參數值的第一個問題，請看代碼：
# -o 和 --outfile 都可以表示同一個option
parser.add_option('-o', '--outfile', dest="outfile",
        help="Save stats to <outfile>", default=None)

def _set_opt_strings(self, opts):
    # opts = ['-o', '--outfile']
    for opt in opts:
        if ..:
            self._short_opts.append(opt)
        if ..:
            self._long_opts.append(opt)
第二個問題，請看代碼：
# 參數值存儲到一個字典
setattr(values, dest, value)
...
def parse_args(self, args=None, values=None):
    ...
    # 返回參數字典
    return (values, args)
optparse比較難以擴展，我認為主要是因為這段代碼：
def take_action(self, action, dest, opt, value, values, parser):
    if action == "store":
        setattr(values, dest, value)
    elif action == "store_const":
        setattr(values, dest, self.const)
    ....
這種if-else的代碼邏輯，分支一旦變多，就難以維護。可以考慮用設計模式替換。
argparseargparse 模塊結構argparser的類圖，可以看到繼承自optparse，左側基本一致。只是右側將option換成了action的實現。
argparseargparse 使用示例http.server中argparse使用示例
# http.server 

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--cgi', action='store_true',
                   help='Run as CGI Server')
parser.add_argument('--bind', '-b', metavar='ADDRESS',
                    help='Specify alternate bind address '
                         '[default: all interfaces]')
parser.add_argument('--directory', '-d', default=os.getcwd(),
                    help='Specify alternative directory '
                    '[default:current directory]')
parser.add_argument('port', action='store',
                    default=8000, type=int,
                    nargs='?',
                    help='Specify alternate port [default: 8000]')
args = parser.parse_args()

if args.cgi:
    handler_class = CGIHTTPRequestHandler
else:
    handler_class = SimpleHTTPRequestHandler
test(HandlerClass=handler_class, port=args.port, bind=args.bind)
和optparse一樣的使用模版：創建對象，添加參數，解析參數
parser = argparse.ArgumentParser()
parser.add_argument('--cgi', action='store_true',
                   help='Run as CGI Server')
args = parser.parse_args()
# args.port, args.bind
argparse action實現由於文章篇幅，我們重點看看action部分的實現，使用了註冊模式解決if-else問題:
class _ActionsContainer(object):

    def __init__(self,
                 description,
                 prefix_chars,
                 argument_default,
                 conflict_handler):
        super(_ActionsContainer, self).__init__()
        # set up registries
        self._registries = {}  # 註冊中心

        # register actions
        self.register('action', None, _StoreAction)  # 註冊action類
        self.register('action', 'store', _StoreAction)
        
        
    def _pop_action_class(self, kwargs, default=None):
        action = kwargs.pop('action', default)
        return self._registry_get('action', action, action)  # 獲取對應action類
    
    def add_argument(self, *args, **kwargs):
        
        # create the action object, and add it to the parser
        action_class = self._pop_action_class(kwargs)
        action = action_class(**kwargs)  # 創建action對象
    
    def parse_args(self, args=None, namespace=None):
        for action in self._actions:
            if action.dest is not SUPPRESS:
                if not hasattr(namespace, action.dest):
                    if action.default is not SUPPRESS:
                        setattr(namespace, action.dest, action.default)  # 執行action對象

# 使用方法
parser.add_argument('--cgi', action='store_true',
                   help='Run as CGI Server')
args = parser.parse_args()
小結最後我們再來簡單小結一下:
命令行參數的解析模版都是3步：創建解析器，添加解析器規則和解析參數小技巧分析參數解析過程中，發現切片的一個特點，索引不會越界:
>>> b =[1,2]
>>> b[1:]
[2]
>>> 
>>> b[3:]  # 安全
[]
>>> b[3]  # 異常
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
Python 能夠優雅地處理那些沒有意義的切片索引：一個過大的索引值(即下標值大於字符串實際長度)將被字符串實際長度所代替，當上邊界比下邊界大時(即切片左值大於右值)就返回空字符串。（摘自python-tutorial3）
另外使用gettext支持國際化，也可以插一個眼:
try:
    from gettext import gettext, ngettext
except ImportError:
    def gettext(message):
        return message

_ = gettext

class BadOptionError (OptParseError):
    
    ...

    def __str__(self):
        return _("no such option: %s") % self.opt_str
參考連結https://docs.python.org/zh-cn/3/library/getopt.htmlhttps://en.wikipedia.org/wiki/Getoptshttps://docs.python.org/zh-cn/3/howto/argparse.htmlhttp://www.pythondoc.com/pythontutorial3/index.html恰逢春節，博主做一個小彩蛋，送給大家：
def happyNiuYear():
 print("牛年大吉大利! "*3)

happyNiuYear()
python argparse 源碼閱讀

相關焦點

python之Argparse模塊

Argparse 教程

Python 命令行參數解析庫argparse

Python 命令行之旅:深入 argparse(二)

【python】命令行參數argparse用法詳解

你的第一份Python庫源碼閱讀:records

輕鬆編寫命令行接口,argparse模塊你值得擁有!

Python 命令行之旅:使用 docopt 實現 git 命令

Python 源碼閱讀:int

Python3.7源碼在windows(VS2015)下的編譯和安裝

可能是最通俗易懂的Python入門資料整理和最優學習路線推薦.

Gunicorn 源碼閱讀

python人工智慧項目實戰,PDF+源碼

【程序原始碼】《零基礎學編程-python》源碼包1

如何編寫完美的 Python 命令行程序?

Python wsgiref 模塊源碼淺析

小學生在網吧用python抓取LOL英雄皮膚,步驟簡單,附帶所有源碼

推薦一些能提高生產力的 Python 庫

運行python腳本時傳入參數的三種方式

謹記四條規則,便可寫出完美的Python命令行程序