C++ folly庫解讀(一) Fbstring —— 一個完美替代std::string的庫(下)

2021-02-14 紅宸笑
builtin_unreachable && assume(0)字符串拷貝
同初始化，也是根據不同的字符串類型，調用不同的函數：
  fbstring_core(const fbstring_core& rhs) {
    assert(&rhs != this);
    switch (rhs.category()) {
      case Category::isSmall:
        copySmall(rhs);
        break;
      case Category::isMedium:
        copyMedium(rhs);
        break;
      case Category::isLarge:
        copyLarge(rhs);
        break;
      default:
        folly::assume_unreachable();
    }
  }
copySmalltemplate <class Char>
inline void fbstring_core<Char>::copySmall(const fbstring_core& rhs) {
  // Just write the whole thing, don't look at details. In
  // particular we need to copy capacity anyway because we want
  // to set the size (don't forget that the last character,
  // which stores a short string's length, is shared with the
  // ml_.capacity field).

  ml_ = rhs.ml_;
}
正如注釋中所說，雖然 small strings 的情況下，字符串存儲在 small中，但是我們只需要把 ml直接賦值即可，因為在一個 union 中。
copyMediumtemplate <class Char>
FOLLY_NOINLINE inline void fbstring_core<Char>::copyMedium(
    const fbstring_core& rhs) {
  // Medium strings are copied eagerly. Don't forget to allocate
  // one extra Char for the null terminator.
  auto const allocSize = goodMallocSize((1 + rhs.ml_.size_) * sizeof(Char));
  ml_.data_ = static_cast<Char*>(checkedMalloc(allocSize));
  // Also copies terminator.
  fbstring_detail::podCopy(
      rhs.ml_.data_, rhs.ml_.data_ + rhs.ml_.size_ + 1, ml_.data_);
  ml_.size_ = rhs.ml_.size_;
  ml_.setCapacity(allocSize / sizeof(Char) - 1, Category::isMedium);
}
medium strings 是 eager copy，所以就是"深拷貝"：
賦值 size、capacity、category.copyLargetemplate <class Char>
FOLLY_NOINLINE inline void fbstring_core<Char>::copyLarge(
    const fbstring_core& rhs) {
  // Large strings are just refcounted
  ml_ = rhs.ml_;
  RefCounted::incrementRefs(ml_.data_);
}
large strings 的 copy 過程很直觀，因為是 COW 方式：
incrementRefs 和內部調用 fromData 這兩個個函數值得看一下：
static RefCounted* fromData(Char* p) {
      return static_cast<RefCounted*>(static_cast<void*>(
          static_cast<unsigned char*>(static_cast<void*>(p)) -
          getDataOffset()));
}

static void incrementRefs(Char* p) {
  fromData(p)->refCount_.fetch_add(1, std::memory_order_acq_rel);
}
因為 ml中指向的是 RefCounted 的 data[1]，所以我們需要通過 fromData 來找到 data_所屬的 RefCounted 的地址。我把 fromData 函數內的運算拆開：
static RefCounted * fromData(Char * p) {
      // 轉換data_[1]的地址
      void* voidDataAddr = static_cast<void*>(p);
      unsigned char* unsignedDataAddr = static_cast<unsigned char*>(voidDataAddr);

     // 獲取data_[1]在結構體的偏移量再相減，得到的就是所屬RefCounted的地址
      unsigned char* unsignedRefAddr = unsignedDataAddr - getDataOffset();

      void* voidRefAddr = static_cast<void*>(unsignedRefAddr);
      RefCounted* refCountAddr = static_cast<RefCounted*>(voidRefAddr);

      return refCountAddr;
}
值得關注的是如何轉換不同類型結構體的指針並做運算，這裡的做法是 ：Char* -> void* -> unsigned char* -> 與size_t做減法 -> void * -> RefCounted*
析構~fbstring_core() noexcept {
    if (category() == Category::isSmall) {
      return;
    }
    destroyMediumLarge();
}
如果是 small 類型，直接返回，因為利用的是棧空間。否則，針對 medium 和 large，調用 destroyMediumLarge。FOLLY_MALLOC_NOINLINE void destroyMediumLarge() noexcept {
  auto const c = category();
  FBSTRING_ASSERT(c != Category::isSmall);
  if (c == Category::isMedium) {
    free(ml_.data_);
  } else {
    RefCounted::decrementRefs(ml_.data_);
  }
}
medium ：free 動態分配的字符串內存即可。large : 調用 decrementRefs，針對共享字符串進行操作。static void decrementRefs(Char * p) {
  auto const dis = fromData(p);
  size_t oldcnt = dis->refCount_.fetch_sub(1, std::memory_order_acq_rel);
  FBSTRING_ASSERT(oldcnt > 0);
  if (oldcnt == 1) {
    free(dis);
  }
}
邏輯也很清晰：先對引用計數減 1，如果本身就只有 1 個引用，那麼直接 free 掉整個 RefCounted。
COW最重要的一點，也是 large strings 獨有的，就是 COW. 任何針對字符串寫的操作，都會觸發 COW，包括前面舉過的[]操作，例如：
non-const operator[](size_type pos "")我們舉個例子，比如non-const operator[](size_type pos "")
non-const operator[](size_type pos "")reference operator[](size_type pos "") {
    return *(begin() + pos);
}

iterator begin() {
    return store_.mutableData();
}
來重點看下 mutableData() :
Char* mutableData() {
  switch (category()) {
  case Category::isSmall:
    return small_;
  case Category::isMedium:
    return ml_.data_;
  case Category::isLarge:
    return mutableDataLarge();
  }
  fbstring_detail::assume_unreachable();
}

template <class Char>
inline Char* fbstring_core<Char>::mutableDataLarge() {
  FBSTRING_ASSERT(category() == Category::isLarge);
  if (RefCounted::refs(ml_.data_) > 1) { // Ensure unique.
    unshare();
  }
  return ml_.data_;
}
同樣是分三種情況。small 和 medium 直接返回字符串的地址，large 會調用 mutableDataLarge()，可以看出，如果引用數大於 1，會進行 unshare 操作 ：
void unshare(size_t minCapacity = 0);

template <class Char>
FOLLY_MALLOC_NOINLINE inline void fbstring_core<Char>::unshare(
    size_t minCapacity) {
  FBSTRING_ASSERT(category() == Category::isLarge);
  size_t effectiveCapacity = std::max(minCapacity, ml_.capacity());
  auto const newRC = RefCounted::create(&effectiveCapacity);
  // If this fails, someone placed the wrong capacity in an
  // fbstring.
  FBSTRING_ASSERT(effectiveCapacity >= ml_.capacity());
  // Also copies terminator.
  fbstring_detail::podCopy(ml_.data_, ml_.data_ + ml_.size_ + 1, newRC->data_);
  RefCounted::decrementRefs(ml_.data_);
  ml_.data_ = newRC->data_;
  ml_.setCapacity(effectiveCapacity, Category::isLarge);
  // size_ remains unchanged.
}
基本思路：
對原有的共享字符串減少一個引用 decrementRefs，這個函數在上面的析構小節裡分析過。設置 ml_的 data、capacity、category.注意此時還不會設置 size，因為還不知道應用程式對字符串進行什麼修改。non-const 與 const大家可能注意到了，上面的 at 和[]強調了 non-const，這是因為 const-qualifer 針對這兩個調用不會觸發 COW ，還以[]為例：
// C++11 21.4.5 element access:
const_reference operator[](size_type pos "") const {
    return *(begin() + pos);
}

const_iterator begin() const {
    return store_.data();
}

// In C++11 data() and c_str() are 100% equivalent.
const Char* data() const { return c_str(); }

const Char* c_str() const {
    const Char* ptr = ml_.data_;
    // With this syntax, GCC and Clang generate a CMOV instead of a branch.
    ptr = (category() == Category::isSmall) ? small_ : ptr;
    return ptr;
}
可以看出區別，non-const 版本的 begin()中調用的是 mutableData()，而 const-qualifer 版本調用的是 data() -> c_str()，而 c_str()直接返回的字符串地址。
所以，當字符串用到[]、at 且不需要寫操作時，最好用 const-qualifer.
我們拿 folly 自帶的benchmark 工具[1]測試一下：
#include "folly/Benchmark.h"
#include "folly/FBString.h"
#include "folly/container/Foreach.h"

using namespace std;
using namespace folly;

BENCHMARK(nonConstFbstringAt, n) {
  ::folly::fbstring str(
      "fbstring is a drop-in replacement for std::string. The main benefit of fbstring is significantly increased "
      "performance on virtually all important primitives. This is achieved by using a three-tiered storage strategy "
      "and by cooperating with the memory allocator. In particular, fbstring is designed to detect use of jemalloc and "
      "cooperate with it to achieve significant improvements in speed and memory usage.");
  FOR_EACH_RANGE(i, 0, n) {
    char &s = str[2];
    doNotOptimizeAway(s);
  }
}

BENCHMARK_DRAW_LINE();

BENCHMARK_RELATIVE(constFbstringAt, n) {
  const ::folly::fbstring str(
      "fbstring is a drop-in replacement for std::string. The main benefit of fbstring is significantly increased "
      "performance on virtually all important primitives. This is achieved by using a three-tiered storage strategy "
      "and by cooperating with the memory allocator. In particular, fbstring is designed to detect use of jemalloc and "
      "cooperate with it to achieve significant improvements in speed and memory usage.");
  FOR_EACH_RANGE(i, 0, n) {
    const char &s = str[2];
    doNotOptimizeAway(s);
  }
}
int main() { runBenchmarks(); }
結果是 constFbstringAt 比 nonConstFbstringAt 快了 175%
============================================================================
delve_folly/main.cc                             relative  time/iter  iters/s
============================================================================
nonConstFbstringAt                                          39.85ns   25.10M
-
constFbstringAt                                  175.57%    22.70ns   44.06M
============================================================================
Reallocreserve、operator+等操作，可能會涉及到內存重新分配，最終調用的都是 memory/Malloc.h 中的 smartRealloc：
inline void* checkedRealloc(void* ptr, size_t size) {
  void* p = realloc(ptr, size);
  if (!p) {
    throw_exception<std::bad_alloc>();
  }
  return p;
}

/**
 * This function tries to reallocate a buffer of which only the first
 * currentSize bytes are used. The problem with using realloc is that
 * if currentSize is relatively small _and_ if realloc decides it
 * needs to move the memory chunk to a new buffer, then realloc ends
 * up copying data that is not used. It's generally not a win to try
 * to hook in to realloc() behavior to avoid copies - at least in
 * jemalloc, realloc() almost always ends up doing a copy, because
 * there is little fragmentation / slack space to take advantage of.
 */
FOLLY_MALLOC_CHECKED_MALLOC FOLLY_NOINLINE inline void* smartRealloc(
    void* p,
    const size_t currentSize,
    const size_t currentCapacity,
    const size_t newCapacity) {
  assert(p);
  assert(currentSize <= currentCapacity &&
         currentCapacity < newCapacity);

  auto const slack = currentCapacity - currentSize;
  if (slack * 2 > currentSize) {
    // Too much slack, malloc-copy-free cycle:
    auto const result = checkedMalloc(newCapacity);
    std::memcpy(result, p, currentSize);
    free(p);
    return result;
  }
  // If there's not too much slack, we realloc in hope of coalescing
  return checkedRealloc(p, newCapacity);
}
從注釋和代碼看為什麼函數起名叫smartRealloc :
如果(the currentCapacity - currentSize) _ 2 > currentSize，即 currentSize < 2/3 _ capacity，說明當前分配的內存利用率較低，此時認為如果使用 realloc 並且 realloc 決定拷貝當前內存到新內存，成本會高於直接 malloc(newCapacity) + memcpy + free(old_memory)。其他__builtin_expect給編譯器提供分支預測信息。原型為：
long __builtin_expect (long exp, long c)
表達式的返回值為 exp 的值，跟 c 無關。 我們預期 exp 的值是 c。例如下面的例子，我們預期 x 的值是 0，所以這裡提示編譯器，只有很小的機率會調用到 foo()
if (__builtin_expect (x, 0))
  foo ();
再比如判斷指針是否為空：
if (__builtin_expect (ptr != NULL, 1))
  foo (*ptr);
在 fbstring 中也用到了builtin_expect，例如創建 RefCounted 的函數 (FOLLY_LIKELY 包裝了一下builtin_expect)：
#if __GNUC__
#define FOLLY_DETAIL_BUILTIN_EXPECT(b, t) (__builtin_expect(b, t))
#else
#define FOLLY_DETAIL_BUILTIN_EXPECT(b, t) b
#endif

//  Likeliness annotations
//
//  Useful when the author has better knowledge than the compiler of whether
//  the branch condition is overwhelmingly likely to take a specific value.
//
//  Useful when the author has better knowledge than the compiler of which code
//  paths are designed as the fast path and which are designed as the slow path,
//  and to force the compiler to optimize for the fast path, even when it is not
//  overwhelmingly likely.

#define FOLLY_LIKELY(x) FOLLY_DETAIL_BUILTIN_EXPECT((x), 1)
#define FOLLY_UNLIKELY(x) FOLLY_DETAIL_BUILTIN_EXPECT((x), 0)

static RefCounted* create(const Char* data, size_t* size) {
  const size_t effectiveSize = *size;
  auto result = create(size);
  if (FOLLY_LIKELY(effectiveSize > 0)) {       // __builtin_expect
    fbstring_detail::podCopy(data, data + effectiveSize, result->data_);
  }
  return result;
}
從彙編角度來說，會將可能性更大的彙編緊跟著前面的彙編，防止無效指令的加載。可以參考：
likely() and unlikely()[2]What is the advantage of GCC's \_\_builtin_expect in if else statements?[3]CMOV 指令conditional move，條件傳送。類似於 MOV 指令，但是依賴於 RFLAGS 寄存器內的狀態。如果條件沒有滿足，該指令不會有任何效果。
CMOV 的優點是可以避免分支預測，避免分支預測錯誤對 CPU 流水線的影響。詳細可以看這篇文檔：amd-cmovcc.pdf[4]
fbstring 在一些場景會提示編譯器生成 CMOV 指令，例如：
const Char* c_str() const {
  const Char* ptr = ml_.data_;
  // With this syntax, GCC and Clang generate a CMOV instead of a branch.
  ptr = (category() == Category::isSmall) ? small_ : ptr;
  return ptr;
}
builtin_unreachable && assume(0)如果程序執行到了builtin_unreachable 和assume(0) ，那麼會出現未定義的行為。例如**builtin_unreachable 出現在一個不會返回的函數後面，而且這個函數沒有聲明為**attribute\_\_((noreturn))。例如6.59 Other Built-in Functions Provided by GCC給出的例子 :
void function_that_never_returns (void);

int g (int c)
{
  if (c)
    {
      return 1;
    }
  else
    {
      function_that_never_returns ();
      __builtin_unreachable ();
    }
}
如果不加__builtin_unreachable ();，會報error: control reaches end of non-void function [-Werror=return-type]
folly 將 builtin_unreachable 和assume(0) 封裝成了assume_unreachable：
[[noreturn]] FOLLY_ALWAYS_INLINE void assume_unreachable() {
  assume(false);
  // Do a bit more to get the compiler to understand
  // that this function really will never return.
#if defined(__GNUC__)
  __builtin_unreachable();
#elif defined(_MSC_VER)
  __assume(0);
#else
  // Well, it's better than nothing.
  std::abort();
#endif
}
在 fbstring 的一些特性場景，比如 switch 判斷 category 中用到。這是上面提到過的 mutableData() :
Char* mutableData() {
  switch (category()) {
    case Category::isSmall:
      return small_;
    case Category::isMedium:
      return ml_.data_;
    case Category::isLarge:
      return mutableDataLarge();
  }
  folly::assume_unreachable();
}
jemalloc大致的算法和原理可以參考 facebook 的這篇博客 ：Scalable memory allocation using jemalloc[7]find 算法使用的簡化的 Boyer-Moore 算法，文檔說明是在查找成功的情況下比 std::string 的 find 快 30 倍。benchmark 代碼在FBStringBenchmark.cpp[8]
我自己測試的情況貌似是搜索長字符串的情況會更好些。
判斷大小端// It's MSVC, so we just have to guess ... and allow an override
#ifdef _MSC_VER
# ifdef FOLLY_ENDIAN_BE
  static constexpr auto kIsLittleEndian = false;
# else
  static constexpr auto kIsLittleEndian = true;
# endif
#else
  static constexpr auto kIsLittleEndian =
  __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__;
#endif
__BYTE_ORDER__ 為預定義宏：[9]，值是ORDER_LITTLE_ENDIAN、ORDER_BIG_ENDIAN、ORDER_PDP_ENDIAN中的一個。
一般會這麼使用：
/* Test for a little-endian machine */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
c++20 引入了std::endian[10]，判斷會更加方便。
（完）
參考資料[1]benchmark 工具: https://github.com/facebook/folly/blob/master/folly/docs/Benchmark.md
[2]likely() and unlikely(): https://kernelnewbies.org/FAQ/LikelyUnlikely
[3]What is the advantage of GCC's __builtin_expect in if else statements?: https://stackoverflow.com/questions/7346929/what-is-the-advantage-of-gccs-builtin-expect-in-if-else-statements
[4]amd-cmovcc.pdf: https://www.cs.tufts.edu/comp/40-2011f/readings/amd-cmovcc.pdf
[5]官網 : http://jemalloc.net/
[6]API 文檔: http://jemalloc.net/jemalloc.3.html
[7]Scalable memory allocation using jemalloc: https://engineering.fb.com/2011/01/03/core-data/scalable-memory-allocation-using-jemalloc/
[8]FBStringBenchmark.cpp: https://github.com/facebook/folly/blob/master/folly/test/FBStringTestBenchmarks.cpp.h#L120
[9]BYTE_ORDER為預定義宏：: https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
[10]std::endian: https://en.cppreference.com/w/cpp/types/endian
C++ folly庫解讀(一) Fbstring —— 一個完美替代std::string的庫(下)

相關焦點

C++:32---IO庫

C++機器學習庫介紹

Facebook folly代碼庫學習心得

C++機器學習庫介紹 | 文末送書

C++標準庫和std命名空間

跟我學C++中級篇——STL的學習

json for modern c++的使用

跟我學C++中級篇——STL中的字符串

C++之字符串類學習總結

在C++中將string轉換為char數組

實現32位簡單版Windows和Linux雙平臺的C++運行庫

c++11新特性,所有知識點都在這了!

功能強大的 C++ redis 客戶端庫增加至 acl 項目中

C++算法庫

史上最快的字符串格式化庫{fmt}及std::format

C++庫文件說明及使用方法

C/C++中字符串與數字轉換

c++ fstream + string處理大數據

C++17之std::optional

C++17新特性介紹:std::optional