在Rust 代碼中編寫 Python 是種怎樣的體驗?

2020-12-11 CSDN

作者 | Mara Bos，Rust資深工程師

譯者 | Arvin 責編 | 屠敏

頭圖 | CSDN 下載自東方 IC

以下為譯文：

大約一年前，我發布了一個名為inline-python（https://crates.io/crates/inline-python）的Rust類庫，它允許大家使用python!{ .. }宏輕鬆地將一些Python混合到Rust代碼中。在本系列中，我將從頭展示開發此類庫的過程。

預覽

如果不熟悉inline-python類庫，你可以執行以下操作：

fn main() {let who = "world";let n = 5; python! {for i in range('n): print(i, "Hello", 'who)print("Goodbye") }}

它允許你將Python代碼直接嵌入Rust代碼行之間，甚至直接在Python代碼中使用Rust變量。

我們將從一個比這個簡單得多的案例開始，然後逐步努力以達到這個結果（甚至更多！）。

運行Python代碼

首先，讓我們看一下如何在Rust中運行Python代碼。讓我們嘗試使第一個簡單的示例生效：

fn main(){ println!("Hello ..."); run_python("print(\"... World!\")");}

我們可以使用std：：process：：命令來運行python可執行文件並傳遞python代碼，從而實現run_python，但如果我們希望能夠定義和讀回Python變量，那麼最好從使用PyO3庫開始。

PyO3為我們提供了Python的Rust綁定。它很好地包裝了Python C API，使我們可以直接在Rust中與各種Python對象交互。（甚至在Rust中編寫Python庫，但這是另一個主題。）

它的Python::run 功能完全符合我們的需求。它將Python代碼作為&str，並允許我們使用兩個可選的PyDicts 來定義範圍內的任何變量。讓我們試一試吧：

fn run_python(code: &str) { let py = pyo3::Python::acquire_gil(); // Acquire the 'global interpreter lock', as Python isnot thread-safe. py.python().run(code, None, None).unwrap(); // No locals, no globals.}

$ cargo run Compiling scratchpad v0.1.0 Finished dev [unoptimized + debuginfo] target(s) in0.29s Running `target/debug/scratchpad`Hello ...... World!

看，這就成功了！

基於規則的宏

在字符串中編寫Python不是最便捷的方法，所以我們嘗試改進它。宏允許我們在Rust中自定義語法，所以讓我們嘗試一下：

fn main() {println!("Hello ..."); python! {print("... World!") }}

宏通常是使用macro_rules!進行定義，您可以基於標記和表達式之類的內容使用高級「查找和替換」規則來定義宏。（有關macro_rules!的介紹請參見Rust Book中有關宏的章節，有關Rust宏所有的細節都可以在《Rust宏的小書》中找到。）

由macro_rules!定義的宏在編譯時無法執行任何代碼，這些宏僅是應用了基於模式的替換規則。它非常適合vec![]，甚至是lazy_static!{ .. }，但對於解析和編譯正則表達式（例如regex!("a.*b")）之類的功能而言，還不夠強大。

在宏的匹配規則中，我們可以匹配表達式，標識符，類型和許多其他內容。由於「有效的Python代碼」不是一個選項，所以我們只能讓宏接受所有內容：大量的原始的符號：

macro_rules! python { ($($code:tt)*) => { ... }}

（有關macro_rules!工作原理的詳細信息，請參見上面連結的資源。）

對宏的調用應該產生run_python("..")，這是一個包裹了所有Python代碼的字符串文本。幸運的是：有一個內建宏為我們把內容放到一個字符串裡，叫做stringify!，因此我不必從頭開始。

macro_rules! python { ($($code:tt)*) => { run_python(stringify!($($code)*)); }}

結果如下：

$ cargo r Compiling scratchpad v0.1.0 Finished dev [unoptimized + debuginfo] target(s) in0.32s Running `target/debug/scratchpad`Hello ...... World!

如願以償得到了期望結果！

但是，如果我們有不止一行的Python代碼會怎樣？

fn main() {println!("Hello ..."); python! {print("... World!")print("Bye.") }}

$ cargo r Compiling scratchpad v0.1.0 Finished dev [unoptimized + debuginfo] target(s) in0.31s Running `target/debug/scratchpad`Hello ...thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PyErr { type: Py(0x7f1c0a5649a0, PhantomData) }', src/main.rs:9:5note: run with`RUST_BACKTRACE=1` environment variable to display a backtrace

很不幸，我們失敗了。

為了進行調試，我們需要正確輸出PyErr，並顯示我們傳遞給Python::run的確切Python代碼：

fn run_python(code: &str) {println!("-----");println!("{}", code);println!("-----"); let py = pyo3::Python::acquire_gil();if let Err(e) = py.python().run(code, None, None) { e.print(py.python()); }}

$ cargo r Compiling scratchpad v0.1.0 Finished dev [unoptimized + debuginfo] target(s) in0.27s Running `target/debug/scratchpad`Hello ...-----print("... World!") print("Bye.")----- File "<string>", line 1print("... World!") print("Bye.") ^SyntaxError: invalid syntax

很顯然，兩行Python代碼落在同一行，在Python中這是無效的語法。

現在我們遇到了必須克服的最大問題:stringify!把空白符搞亂了.

空白符和符號

讓我們仔細研究一下stringify!：

fn main() { println!("{}", stringify!( a 123b cx ( y + z )// comment ... ));}

$ cargo r Compiling scratchpad v0.1.0 Finished dev [unoptimized + debuginfo] target(s) in0.21s Running `target/debug/scratchpad`a 123 b c x(y + z) ...

它不僅刪除了所有不必要的空格，還刪除了注釋。因為它的工作原理是處理單詞（token），不再是原始碼裡面的：a，123，b等。

Rustc編譯器做的第一件事就是將原始碼分為單詞，這使得解析後的工作更容易進行，不必處理諸如1，2，3，這樣的個別字符，只需處理諸如「integer literal 123」這樣的單詞。另外，空白和注釋在分詞之後就消失了，因為它們對編譯器來說沒有意義。

stringify!()是一種將一串單詞轉換回字符串的方法，但它是基於「最佳效果」的:它將單詞轉換回文本，並且僅在需要時才在單詞周圍插入空格(以避免將b、c轉換為bc)。

所以這是一個死胡同。Rustc不小心把寶貴的空白符丟掉了，但這在Python中非常重要。

我們可以嘗試猜測一下哪些代碼的空格必須用換行符代替，但是縮進肯定會成為一個問題：

fn main() {let a = stringify!(ifFalse: x() y() );let b = stringify!(ifFalse: x() y() ); dbg!(a); dbg!(b); dbg!(a == b);}

$ cargo r Compiling scratchpad v0.1.0 Finished dev [unoptimized + debuginfo] target(s) in0.20s Running `target/debug/scratchpad`[src/main.rs:12] a = "if False : x() y()"[src/main.rs:13] b = "if False : x() y()"[src/main.rs:14] a == b = true

這兩個Python代碼片段有不同的含義，但是stringify!給了我們相同的結果。

在放棄之前，讓我們嘗試一下其他類型的宏。

過程宏

Rust的過程宏是定義宏的另一種方法。儘管macro_rules!只能定義「函數樣式的宏」（帶有!標記的宏），過程宏也可以定義自定義派生宏（例如#[derive(Stuff)]）和屬性宏（例如#[stuff]）。

過程宏是作為編譯器插件實現的。您需要編寫一個函數，該函數可以訪問編譯器看到的單詞流，然後就可以執行所需的任何操作，最後需要返回一個新的單詞流供編譯器使用（或者用於自定義的用途）：

#[proc_macro]pub fn python(input: TokenStream) -> TokenStream { todo!()}

上述單詞流不夠好。因為我們需要原始碼，而不僅僅是單詞。雖然目前還沒有成功，但是讓我們繼續吧，也許過程宏更大的靈活性能夠解決問題。

由於過程宏在編譯過程中運行Rust代碼，因此它們需要使用單獨的proc-macro類庫中，這個類庫在您編譯其他內容之前已經被編譯好。

$ cargo new --lib python-macro Created library `python-macro`package

查看python-macro/Cargo.toml：

[lib]proc-macro = true

查看Cargo.toml：

[dependencies]python-macro = { path = "./python-macro" }

讓我們從一個只有panics (todo!())的實現開始，在輸出TokenStream之後:

// python-macro/src/lib.rsextern crate proc_macro;useproc_macro::TokenStream;#[proc_macro]pub fn python(input: TokenStream) -> TokenStream { dbg!(input.to_string()); todo!()}

// src/main.rsusepython_macro::python;fn main() { println!("Hello ..."); python! {print("... World!")print("Bye.") }}

$ cargo r Compiling python-macro v0.1.0 Compiling scratchpad v0.1.0error[E0658]: procedural macros cannot be expanded to statements --> src/main.rs:5:5 |5 | / python! {6 | | print("... World!")7 | | print("Bye.")8 | | } | |_____^ | = note: see issue #54727 <https://github.com/rust-lang/rust/issues/54727> for more information = help: add `#![feature(proc_macro_hygiene)]` to the crate attributes to enable

天啊，這裡發生了什麼？

Rust錯誤為「過程宏不能擴展為語句」，以及有關啟用「hygienic macros」的內容。Macro hygiene是Rust宏的出色功能，不會意外地將任何名稱「洩漏」給外界（反之亦然）。如果宏擴展使用了名為的x的臨時變量，則它將與宏外部的任何代碼中出現的變量x分開。

但是，此功能對於過程宏還不穩定。因此，過程宏除了作為一個單獨的項(例如在文件範圍內，但不在函數內)之外，不允許出現在任何地方。

接下來，我們會發現存在一個非常可怕但令人著迷的解決方法—讓我們啟用實驗功能#![feature(proc_macro_hygiene)]並繼續我們的冒險。

(如果你將來讀到這篇文章時，proc_macro_hygiene已經穩定下來了:你可以跳過最後幾段。^ ^)

$ sed -i '1i#![feature(proc_macro_hygiene)]' src/main.rs$ cargo r Compiling scratchpad v0.1.0[python-macro/src/lib.rs:6] input.to_string() = "print(\"... World!\") print(\"Bye.\")"error: proc macro panicked--> src/main.rs:6:5 |6 | / python! {7 | | print("... World!")8 | | print("Bye.")9 | | } | |_____^ | = help: message: not yet implementederror: aborting due to previous errorerror: could not compile `scratchpad`.

在向我們展示了它的字符串輸入參數之後，我們的過程宏即如預期般地崩潰了：

print("... World!") print("Bye.")

正如預期的那樣，空白符再次被丟棄了。:(

是時候選擇放棄了。

不過或者..也許有一種方法可以解決這個問題。

重建空白符

儘管rustc編譯器只在解析和編譯時使用單詞，但是在某種程度上它仍然可以準確地知道何時報告錯誤。單詞中沒有換行符，但是它仍然知道我們的錯誤發生在第6到第9行。那它如何做到的？

事實證明，單詞中包含很多信息。它們包含一個Span，是單詞在源文件中的開始和結束的位置。Span可以告訴單詞在哪個文件、行和列編號處開始和結束。

如果我們能夠得到這些信息，我們就可以通過在單詞之間放置空格和換行符來重新構造空白符，以匹配它們的行和列信息。

提供這些信息的函數還不穩定，而且還沒有#![feature(proc_macro_span)]。讓我們啟用它，看看我們得到了什麼:

#![feature(proc_macro_span)]extern crate proc_macro;useproc_macro::TokenStream;#[proc_macro]pub fn python(input: TokenStream) -> TokenStream {for t in input { dbg!(t.span().start()); } todo!()}

$ cargo r Compiling python-macro v0.1.0 Compiling scratchpad v0.1.0[python-macro/src/lib.rs:9] t.span().start() = LineColumn { line: 7, column: 8,}[python-macro/src/lib.rs:9] t.span().start() = LineColumn { line: 7, column: 13,}[python-macro/src/lib.rs:9] t.span().start() = LineColumn { line: 8, column: 8,}[python-macro/src/lib.rs:9] t.span().start() = LineColumn { line: 8, column: 13,}

真棒！我們得到了一些數據。

但是只有四個單詞了。原來("... World!") 這裡只出現一個單詞，而不是三個（(，"... World!"，和)）。如果看一下TokenStream的文檔，我們會發現它並沒有提供單詞流，而是單詞樹。顯然，Rust的詞法分析器已經匹配了括號（以及大括號和方括號），並且它不僅給出了線性的單詞列表，而且還給出了單詞樹。括號內的單詞可以看成是某個單詞組的後代。

讓我們修改過程宏以遞歸地遍歷組內的所有單詞（並改進一下輸出）：

#[proc_macro]pub fn python(input: TokenStream) -> TokenStream {print(input); todo!()}fn print(input: TokenStream) {for t ininput {if let TokenTree::Group(g) = t { println!("{:?}: open {:?}", g.span_open().start(), g.delimiter());print(g.stream()); println!("{:?}: close {:?}", g.span_close().start(), g.delimiter()); } else { println!("{:?}: {}", t.span().start(), t.to_string()); } }}

$ cargo r Compiling python-macro v0.1.0 Compiling scratchpad v0.1.0LineColumn { line: 7, column: 8 }: printLineColumn { line: 7, column: 13 }: open ParenthesisLineColumn { line: 7, column: 14 }: "... World!"LineColumn { line: 7, column: 26 }: close ParenthesisLineColumn { line: 8, column: 8 }: printLineColumn { line: 8, column: 13 }: open ParenthesisLineColumn { line: 8, column: 14 }: "Bye."LineColumn { line: 8, column: 20 }: close Parenthesis

符合預期，太棒了!

現在要重建空白符，如果我們不在正確的行中，我們需要插入換行符，如果我們不在正確的列中，則需要插入空格。讓我們來看看效果：

#![feature(proc_macro_span)]extern crate proc_macro;useproc_macro::{TokenTree, TokenStream, LineColumn};#[proc_macro]pub fn python(input: TokenStream) -> TokenStream { let mut s = Source { source: String::new(), line: 1, col: 0, }; s.reconstruct_from(input); println!("{}", s.source); todo!()}struct Source { source: String, line: usize, col: usize,}impl Source { fn reconstruct_from(&mut self, input: TokenStream) {for t in input {if let TokenTree::Group(g) = t { let s = g.to_string();self.add_whitespace(g.span_open().start());self.add_str(&s[..1]); // the '[', '{' or '('.self.reconstruct_from(g.stream());self.add_whitespace(g.span_close().start());self.add_str(&s[s.len() - 1..]); // the ']', '}' or ')'. } else {self.add_whitespace(t.span().start());self.add_str(&t.to_string()); } } } fn add_str(&mut self, s: &str) {// Let's assume for now s contains no newlines.self.source += s;self.col += s.len(); } fn add_whitespace(&mut self, loc: LineColumn) {whileself.line < loc.line {self.source.push('\n');self.line += 1;self.col = 0; }whileself.col < loc.column {self.source.push(' ');self.col += 1; } }}

$ cargo r Compiling python-macro v0.1.0 Compiling scratchpad v0.1.0print("... World!")print("Bye.")error: proc macro panicked

看來這是行得通的，但是這些額外的換行符和空格又是怎麼回事？對比下源文件，這是對的，第一個標記從第7行第8列開始，因此它正確地將print放在第8列的第7行。我們要查找的位置正是.rs文件中的確切位置。

開始時多餘的換行符不是問題（空行在Python中無效）。它甚至具有很好的副作用：當Python報告錯誤時，它報告的行號將與.rs文件中的行號匹配。

但是，這8個空格是個問題。儘管我們內部的Python代碼python!{..}相對於Rust代碼是適當縮進的，但我們提取的Python代碼應以「零」縮進級別開始。否則，Python將發生無效縮進的錯誤。

讓我們從所有列號中減去第一個標記的列號：

start_col: None,// <snip> start_col: Option<usize>,// <snip> let start_col = *self.start_col.get_or_insert(loc.column); let col = loc.column.checked_sub(start_col).expect("Invalid indentation.");whileself.col < col {self.source.push(' ');self.col += 1; }// <snip>

$ cargo r Compiling python-macro v0.1.0 Compiling scratchpad v0.1.0print("... World!")print("Bye.")error: proc macro panicked

結果太棒了！

現在，我們只需要把這個字符串轉換為字符串文字標記並將其放在run_python();周圍即可：

TokenStream::from_iter(vec![ TokenTree::from(Ident::new("run_python", Span::call_site())), TokenTree::Group(Group::new( Delimiter::Parenthesis, TokenStream::from(TokenTree::from(Literal::string(&s.source))), )), TokenTree::from(Punct::new(';', Spacing::Alone)), ])

太糟糕了，直接使用TokenTree太困難了，尤其是從頭開始製作trees和streams。

如果只有一種方法可以編寫我們要生成的Rust代碼，那就只能是quote類庫的quote!宏：

letsource = s.source; quote!( run_python(#source); ).into()

現在使用我們的原始run_python函數對其進行測試：

#![feature(proc_macro_hygiene)]usepython_macro::python;fn run_python(code: &str) { let py = pyo3::Python::acquire_gil();if let Err(e) = py.python().run(code, None, None) { e.print(py.python()); }}fn main() { println!("Hello ..."); python! {print("... World!")print("Bye.") }}

$ cargo r Compiling scratchpad v0.1.0 Finished dev [unoptimized + debuginfo] target(s) in0.31s Running `target/debug/scratchpad`Hello ...... World!Bye.

終於成功了！

封裝成類庫

現在我們把它變成一個可重用的庫，：

刪除fn main，重命名main.rs為lib.rs，給類庫起個好名字，例如inline-python，公開run_python，更改quote!()中的run_python調用改為::inline_python::run_python，同時添加pub python_macro::python;從python!這個類庫中重新導出宏。

下一步計劃

可能還有很多內容需要改進，還有很多錯誤需要發現，但是至少我們現在可以在Rust代碼行之間運行Python片段了。

目前最大的問題是，這還不是很有用，因為沒有數據可以（輕鬆）越過Rust-Python的邊界。

在第2部分中，我們將研究如何使Rust變量用於Python代碼。

更新：在等待第2部分的同時，還有第1A部分，只是它沒有改進我們的python!{}宏，但涉及了人們向我詢問的一些細節。具體來說，它涉及：

為什麼要像這樣在Rust內部使用Python，語法問題，例如使用Python的單引號字符串使用Span::source_text的選項，當我第一次編寫這段代碼時，它其實還不存在。原文：https://blog.m-ou.se/writing-python-inside-rust-1/

本文為 CSDN 翻譯，轉載請註明來源出處。

在Rust 代碼中編寫 Python 是種怎樣的體驗?

相關焦點

【Rust日報】 2019-08-28:Rust異步代碼的優勢:相比於其他語言更加容易調試

Python編寫代碼的規範要求

@Python 開發者,如何更加高效地編寫代碼?

學術界開始從Python轉向Rust

「Rust語言」以為你喜歡Python?等到你遇見Rust

如何編寫和運行Python程序

如何編寫簡潔美觀的Python代碼

Rust 中的錯誤處理 - Rust 實踐指南

如何在Python中編寫簡單代碼,並且速度超越Spark?

我們為什麼選擇Rust開發頂尖實時通信產品?|應用程式|代碼|編譯器|...

讓Python代碼更快運行的 5 種方法

Rust 能取代 Python,更好的實現神經網絡?

為什麼選擇Rust?

極速體驗|VS Code+Python敏捷開發

通過編寫一個簡單的遊戲來學習 Rust | Linux 中國

科普文,python注釋,在代碼中對代碼功能進行解釋的標註性文字

5種方法,加密你的Python代碼

代碼整潔之道-編寫 Pythonic 代碼

Python編寫代碼規範-幫你寫出優雅的代碼

代碼詳解:如何用Python運行高性能的數學範式?