同初始化,也是根據不同的字符串類型,調用不同的函數:
fbstring_core(const fbstring_core& rhs) {
assert(&rhs != this);
switch (rhs.category()) {
case Category::isSmall:
copySmall(rhs);
break;
case Category::isMedium:
copyMedium(rhs);
break;
case Category::isLarge:
copyLarge(rhs);
break;
default:
folly::assume_unreachable();
}
}
copySmalltemplate <class Char>
inline void fbstring_core<Char>::copySmall(const fbstring_core& rhs) {
// Just write the whole thing, don't look at details. In
// particular we need to copy capacity anyway because we want
// to set the size (don't forget that the last character,
// which stores a short string's length, is shared with the
// ml_.capacity field).
ml_ = rhs.ml_;
}正如注釋中所說,雖然 small strings 的情況下,字符串存儲在 small中,但是我們只需要把 ml直接賦值即可,因為在一個 union 中。
copyMediumtemplate <class Char>
FOLLY_NOINLINE inline void fbstring_core<Char>::copyMedium(
const fbstring_core& rhs) {
// Medium strings are copied eagerly. Don't forget to allocate
// one extra Char for the null terminator.
auto const allocSize = goodMallocSize((1 + rhs.ml_.size_) * sizeof(Char));
ml_.data_ = static_cast<Char*>(checkedMalloc(allocSize));
// Also copies terminator.
fbstring_detail::podCopy(
rhs.ml_.data_, rhs.ml_.data_ + rhs.ml_.size_ + 1, ml_.data_);
ml_.size_ = rhs.ml_.size_;
ml_.setCapacity(allocSize / sizeof(Char) - 1, Category::isMedium);
}medium strings 是 eager copy,所以就是"深拷貝":
賦值 size、capacity、category.copyLargetemplate <class Char>
FOLLY_NOINLINE inline void fbstring_core<Char>::copyLarge(
const fbstring_core& rhs) {
// Large strings are just refcounted
ml_ = rhs.ml_;
RefCounted::incrementRefs(ml_.data_);
}large strings 的 copy 過程很直觀,因為是 COW 方式:
incrementRefs 和內部調用 fromData 這兩個個函數值得看一下:
static RefCounted* fromData(Char* p) {
return static_cast<RefCounted*>(static_cast<void*>(
static_cast<unsigned char*>(static_cast<void*>(p)) -
getDataOffset()));
}
static void incrementRefs(Char* p) {
fromData(p)->refCount_.fetch_add(1, std::memory_order_acq_rel);
}因為 ml中指向的是 RefCounted 的 data[1],所以我們需要通過 fromData 來找到 data_所屬的 RefCounted 的地址。我把 fromData 函數內的運算拆開:
static RefCounted * fromData(Char * p) {
// 轉換data_[1]的地址
void* voidDataAddr = static_cast<void*>(p);
unsigned char* unsignedDataAddr = static_cast<unsigned char*>(voidDataAddr);
// 獲取data_[1]在結構體的偏移量再相減,得到的就是所屬RefCounted的地址
unsigned char* unsignedRefAddr = unsignedDataAddr - getDataOffset();
void* voidRefAddr = static_cast<void*>(unsignedRefAddr);
RefCounted* refCountAddr = static_cast<RefCounted*>(voidRefAddr);
return refCountAddr;
}值得關注的是如何轉換不同類型結構體的指針並做運算,這裡的做法是 :Char* -> void* -> unsigned char* -> 與size_t做減法 -> void * -> RefCounted*
析構~fbstring_core() noexcept {
if (category() == Category::isSmall) {
return;
}
destroyMediumLarge();
}
如果是 small 類型,直接返回,因為利用的是棧空間。否則,針對 medium 和 large,調用 destroyMediumLarge。FOLLY_MALLOC_NOINLINE void destroyMediumLarge() noexcept {
auto const c = category();
FBSTRING_ASSERT(c != Category::isSmall);
if (c == Category::isMedium) {
free(ml_.data_);
} else {
RefCounted::decrementRefs(ml_.data_);
}
}
medium :free 動態分配的字符串內存即可。large : 調用 decrementRefs,針對共享字符串進行操作。static void decrementRefs(Char * p) {
auto const dis = fromData(p);
size_t oldcnt = dis->refCount_.fetch_sub(1, std::memory_order_acq_rel);
FBSTRING_ASSERT(oldcnt > 0);
if (oldcnt == 1) {
free(dis);
}
}邏輯也很清晰:先對引用計數減 1,如果本身就只有 1 個引用,那麼直接 free 掉整個 RefCounted。
COW最重要的一點,也是 large strings 獨有的,就是 COW. 任何針對字符串寫的操作,都會觸發 COW,包括前面舉過的[]操作,例如:
non-const operator[](size_type pos "")我們舉個例子,比如non-const operator[](size_type pos "")
non-const operator[](size_type pos "")reference operator[](size_type pos "") {
return *(begin() + pos);
}
iterator begin() {
return store_.mutableData();
}來重點看下 mutableData() :
Char* mutableData() {
switch (category()) {
case Category::isSmall:
return small_;
case Category::isMedium:
return ml_.data_;
case Category::isLarge:
return mutableDataLarge();
}
fbstring_detail::assume_unreachable();
}
template <class Char>
inline Char* fbstring_core<Char>::mutableDataLarge() {
FBSTRING_ASSERT(category() == Category::isLarge);
if (RefCounted::refs(ml_.data_) > 1) { // Ensure unique.
unshare();
}
return ml_.data_;
}同樣是分三種情況。small 和 medium 直接返回字符串的地址,large 會調用 mutableDataLarge(),可以看出,如果引用數大於 1,會進行 unshare 操作 :
void unshare(size_t minCapacity = 0);
template <class Char>
FOLLY_MALLOC_NOINLINE inline void fbstring_core<Char>::unshare(
size_t minCapacity) {
FBSTRING_ASSERT(category() == Category::isLarge);
size_t effectiveCapacity = std::max(minCapacity, ml_.capacity());
auto const newRC = RefCounted::create(&effectiveCapacity);
// If this fails, someone placed the wrong capacity in an
// fbstring.
FBSTRING_ASSERT(effectiveCapacity >= ml_.capacity());
// Also copies terminator.
fbstring_detail::podCopy(ml_.data_, ml_.data_ + ml_.size_ + 1, newRC->data_);
RefCounted::decrementRefs(ml_.data_);
ml_.data_ = newRC->data_;
ml_.setCapacity(effectiveCapacity, Category::isLarge);
// size_ remains unchanged.
}基本思路:
對原有的共享字符串減少一個引用 decrementRefs,這個函數在上面的析構小節裡分析過。設置 ml_的 data、capacity、category.注意此時還不會設置 size,因為還不知道應用程式對字符串進行什麼修改。non-const 與 const大家可能注意到了,上面的 at 和[]強調了 non-const,這是因為 const-qualifer 針對這兩個調用不會觸發 COW ,還以[]為例:
// C++11 21.4.5 element access:
const_reference operator[](size_type pos "") const {
return *(begin() + pos);
}
const_iterator begin() const {
return store_.data();
}
// In C++11 data() and c_str() are 100% equivalent.
const Char* data() const { return c_str(); }
const Char* c_str() const {
const Char* ptr = ml_.data_;
// With this syntax, GCC and Clang generate a CMOV instead of a branch.
ptr = (category() == Category::isSmall) ? small_ : ptr;
return ptr;
}可以看出區別,non-const 版本的 begin()中調用的是 mutableData(),而 const-qualifer 版本調用的是 data() -> c_str(),而 c_str()直接返回的字符串地址。
所以,當字符串用到[]、at 且不需要寫操作時,最好用 const-qualifer.
我們拿 folly 自帶的benchmark 工具[1]測試一下:
#include "folly/Benchmark.h"
#include "folly/FBString.h"
#include "folly/container/Foreach.h"
using namespace std;
using namespace folly;
BENCHMARK(nonConstFbstringAt, n) {
::folly::fbstring str(
"fbstring is a drop-in replacement for std::string. The main benefit of fbstring is significantly increased "
"performance on virtually all important primitives. This is achieved by using a three-tiered storage strategy "
"and by cooperating with the memory allocator. In particular, fbstring is designed to detect use of jemalloc and "
"cooperate with it to achieve significant improvements in speed and memory usage.");
FOR_EACH_RANGE(i, 0, n) {
char &s = str[2];
doNotOptimizeAway(s);
}
}
BENCHMARK_DRAW_LINE();
BENCHMARK_RELATIVE(constFbstringAt, n) {
const ::folly::fbstring str(
"fbstring is a drop-in replacement for std::string. The main benefit of fbstring is significantly increased "
"performance on virtually all important primitives. This is achieved by using a three-tiered storage strategy "
"and by cooperating with the memory allocator. In particular, fbstring is designed to detect use of jemalloc and "
"cooperate with it to achieve significant improvements in speed and memory usage.");
FOR_EACH_RANGE(i, 0, n) {
const char &s = str[2];
doNotOptimizeAway(s);
}
}
int main() { runBenchmarks(); }結果是 constFbstringAt 比 nonConstFbstringAt 快了 175%
============================================================================
delve_folly/main.cc relative time/iter iters/s
============================================================================
nonConstFbstringAt 39.85ns 25.10M
-
constFbstringAt 175.57% 22.70ns 44.06M
============================================================================
Reallocreserve、operator+等操作,可能會涉及到內存重新分配,最終調用的都是 memory/Malloc.h 中的 smartRealloc:
inline void* checkedRealloc(void* ptr, size_t size) {
void* p = realloc(ptr, size);
if (!p) {
throw_exception<std::bad_alloc>();
}
return p;
}
/**
* This function tries to reallocate a buffer of which only the first
* currentSize bytes are used. The problem with using realloc is that
* if currentSize is relatively small _and_ if realloc decides it
* needs to move the memory chunk to a new buffer, then realloc ends
* up copying data that is not used. It's generally not a win to try
* to hook in to realloc() behavior to avoid copies - at least in
* jemalloc, realloc() almost always ends up doing a copy, because
* there is little fragmentation / slack space to take advantage of.
*/
FOLLY_MALLOC_CHECKED_MALLOC FOLLY_NOINLINE inline void* smartRealloc(
void* p,
const size_t currentSize,
const size_t currentCapacity,
const size_t newCapacity) {
assert(p);
assert(currentSize <= currentCapacity &&
currentCapacity < newCapacity);
auto const slack = currentCapacity - currentSize;
if (slack * 2 > currentSize) {
// Too much slack, malloc-copy-free cycle:
auto const result = checkedMalloc(newCapacity);
std::memcpy(result, p, currentSize);
free(p);
return result;
}
// If there's not too much slack, we realloc in hope of coalescing
return checkedRealloc(p, newCapacity);
}從注釋和代碼看為什麼函數起名叫smartRealloc :
如果(the currentCapacity - currentSize) _ 2 > currentSize,即 currentSize < 2/3 _ capacity,說明當前分配的內存利用率較低,此時認為如果使用 realloc 並且 realloc 決定拷貝當前內存到新內存,成本會高於直接 malloc(newCapacity) + memcpy + free(old_memory)。其他__builtin_expect給編譯器提供分支預測信息。原型為:
long __builtin_expect (long exp, long c)表達式的返回值為 exp 的值,跟 c 無關。 我們預期 exp 的值是 c。例如下面的例子,我們預期 x 的值是 0,所以這裡提示編譯器,只有很小的機率會調用到 foo()
if (__builtin_expect (x, 0))
foo ();再比如判斷指針是否為空:
if (__builtin_expect (ptr != NULL, 1))
foo (*ptr);在 fbstring 中也用到了builtin_expect,例如創建 RefCounted 的函數 (FOLLY_LIKELY 包裝了一下builtin_expect):
#if __GNUC__
#define FOLLY_DETAIL_BUILTIN_EXPECT(b, t) (__builtin_expect(b, t))
#else
#define FOLLY_DETAIL_BUILTIN_EXPECT(b, t) b
#endif
// Likeliness annotations
//
// Useful when the author has better knowledge than the compiler of whether
// the branch condition is overwhelmingly likely to take a specific value.
//
// Useful when the author has better knowledge than the compiler of which code
// paths are designed as the fast path and which are designed as the slow path,
// and to force the compiler to optimize for the fast path, even when it is not
// overwhelmingly likely.
#define FOLLY_LIKELY(x) FOLLY_DETAIL_BUILTIN_EXPECT((x), 1)
#define FOLLY_UNLIKELY(x) FOLLY_DETAIL_BUILTIN_EXPECT((x), 0)
static RefCounted* create(const Char* data, size_t* size) {
const size_t effectiveSize = *size;
auto result = create(size);
if (FOLLY_LIKELY(effectiveSize > 0)) { // __builtin_expect
fbstring_detail::podCopy(data, data + effectiveSize, result->data_);
}
return result;
}從彙編角度來說,會將可能性更大的彙編緊跟著前面的彙編,防止無效指令的加載。可以參考:
likely() and unlikely()[2]What is the advantage of GCC's \_\_builtin_expect in if else statements?[3]CMOV 指令conditional move,條件傳送。類似於 MOV 指令,但是依賴於 RFLAGS 寄存器內的狀態。如果條件沒有滿足,該指令不會有任何效果。
CMOV 的優點是可以避免分支預測,避免分支預測錯誤對 CPU 流水線的影響。詳細可以看這篇文檔:amd-cmovcc.pdf[4]
fbstring 在一些場景會提示編譯器生成 CMOV 指令,例如:
const Char* c_str() const {
const Char* ptr = ml_.data_;
// With this syntax, GCC and Clang generate a CMOV instead of a branch.
ptr = (category() == Category::isSmall) ? small_ : ptr;
return ptr;
}
builtin_unreachable && assume(0)如果程序執行到了builtin_unreachable 和assume(0) ,那麼會出現未定義的行為。例如**builtin_unreachable 出現在一個不會返回的函數後面,而且這個函數沒有聲明為**attribute\_\_((noreturn))。例如6.59 Other Built-in Functions Provided by GCC給出的例子 :
void function_that_never_returns (void);
int g (int c)
{
if (c)
{
return 1;
}
else
{
function_that_never_returns ();
__builtin_unreachable ();
}
}如果不加__builtin_unreachable ();,會報error: control reaches end of non-void function [-Werror=return-type]
folly 將 builtin_unreachable 和assume(0) 封裝成了assume_unreachable:
[[noreturn]] FOLLY_ALWAYS_INLINE void assume_unreachable() {
assume(false);
// Do a bit more to get the compiler to understand
// that this function really will never return.
#if defined(__GNUC__)
__builtin_unreachable();
#elif defined(_MSC_VER)
__assume(0);
#else
// Well, it's better than nothing.
std::abort();
#endif
}在 fbstring 的一些特性場景,比如 switch 判斷 category 中用到。這是上面提到過的 mutableData() :
Char* mutableData() {
switch (category()) {
case Category::isSmall:
return small_;
case Category::isMedium:
return ml_.data_;
case Category::isLarge:
return mutableDataLarge();
}
folly::assume_unreachable();
}
jemalloc大致的算法和原理可以參考 facebook 的這篇博客 :Scalable memory allocation using jemalloc[7]find 算法使用的簡化的 Boyer-Moore 算法,文檔說明是在查找成功的情況下比 std::string 的 find 快 30 倍。benchmark 代碼在FBStringBenchmark.cpp[8]
我自己測試的情況貌似是搜索長字符串的情況會更好些。
判斷大小端// It's MSVC, so we just have to guess ... and allow an override
#ifdef _MSC_VER
# ifdef FOLLY_ENDIAN_BE
static constexpr auto kIsLittleEndian = false;
# else
static constexpr auto kIsLittleEndian = true;
# endif
#else
static constexpr auto kIsLittleEndian =
__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__;
#endif__BYTE_ORDER__ 為預定義宏:[9],值是ORDER_LITTLE_ENDIAN、ORDER_BIG_ENDIAN、ORDER_PDP_ENDIAN中的一個。
一般會這麼使用:
/* Test for a little-endian machine */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__c++20 引入了std::endian[10],判斷會更加方便。
(完)
參考資料[1]benchmark 工具: https://github.com/facebook/folly/blob/master/folly/docs/Benchmark.md
[2]likely() and unlikely(): https://kernelnewbies.org/FAQ/LikelyUnlikely
[3]What is the advantage of GCC's __builtin_expect in if else statements?: https://stackoverflow.com/questions/7346929/what-is-the-advantage-of-gccs-builtin-expect-in-if-else-statements
[4]amd-cmovcc.pdf: https://www.cs.tufts.edu/comp/40-2011f/readings/amd-cmovcc.pdf
[5]官網 : http://jemalloc.net/
[6]API 文檔: http://jemalloc.net/jemalloc.3.html
[7]Scalable memory allocation using jemalloc: https://engineering.fb.com/2011/01/03/core-data/scalable-memory-allocation-using-jemalloc/
[8]FBStringBenchmark.cpp: https://github.com/facebook/folly/blob/master/folly/test/FBStringTestBenchmarks.cpp.h#L120
[9]BYTE_ORDER為預定義宏:: https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
[10]std::endian: https://en.cppreference.com/w/cpp/types/endian