七叶笔记 » golang编程 » Go 新版泛型使用：80余行代码构建一个哈希表

Go 新版泛型使用：80余行代码构建一个哈希表

分类: golang编程 | 浏览: 277

2018 年，我使用 Go 语言实现了一个玩具性质的哈希表 (1)，以便学习 Go 的 map 等数据类型如何工作。这个版本只支持字符串作为 key 以及 value。

两年后的 2020 年 6 月，Go 团队发布了一篇题为《泛型的下一步 (1) 》的文章，提供了一个新版的泛型草案设计，它基于扩展 Go 现有的接口，而不是添加 contract 等新概念来实现。如果你还没看过，我强烈建议你至少浏览一下新的设计草案文档 (2)。我不是专家，只能以我有限的经验和时间来谈论这个设计。

这篇文章将分享如何将我的玩具 hashtable 移植到新的泛型设计。如果你想跳过介绍并直接查看泛型代码，请随时跳到第二部分 ” 泛型哈希表 ” 章节。

非泛型哈希表

我 2018 年开始的初步设计版本，只支持字符串键和值。

Table 类型是这个包的基础。它内部使用 slice 存储键/值字符串对，其中 slice 内的 hashtable buckets 数量由一个整数 m 决定。

一个较小的 m 意味着较少的桶将被创建，但是每个存储在 Table 中的键有较高的可能性与其他键共享一个桶，从而减慢了查找速度
更大的 m 意味着将创建更多的桶，因此存储在 Table 中的每个键与其他键共享一个桶的可能性较低，从而加快了查找速度。

kv 类型是一个小帮手，用于简洁地存储键/值字符串对。

 // Package hashtable implements a basic hashtable for string key/value pairs.  package hashtable  
 // A Table is a basic hashtable.  type Table struct {   m int   table kv  }  
 // A kv stores key/value data in a Table.  type kv struct {   Key, Value string  }  
 // New creates a Table with m internal buckets.  func New(m int) *Table {   return &Table{   m: m,   table: make([][]kv, m),   }  }

这个哈希表支持两种操作：

Get: 确定一个键是否存在于哈希表中，返回对应的值（如果找到），以及一个布尔值，表示对象是否存在。
Insert：在哈希表中插入一个新的键/值对，覆盖同一键的任何先前的值。

这两个操作都需要一个哈希函数，它可以接受一个输入字符串，并返回一个整数，表示键值可能存在的桶。

 // hash picks a hashtable index to use to store a string with key s.  func (t *Table) hash(s string) int {   h := fnv.New32   h.Write(byte(s))   return int(h.Sum32) % t.m  }

我选择了 hash/fnv32 作为一个简单的、非加密的哈希函数，它可以返回一个整数。然后通过计算模数运算 hash % t.m，我们可以确保得到的整数返回我们的一个哈希表桶的索引。

下面是 Get 操作对应的代码。

 // Get determines if key is present in the hashtable, returning its value and  // whether or not the key was found.  func (t *Table) Get(key string) (string, bool) {   // Hash key to determine which  bucket  this key's value belongs in.   i := t.hash(key)  
  for j, kv := range t.table[i] {   if key == kv.Key {   // Found a match, return it!   return t.table[i][j].Value, true   }   }  
  // No match.   return "", false  }

Table.Get 对输入的 key 进行哈希处理，以确定存储键的值位于哪个桶。一旦确定了 bucket，它就会遍历该 bucket 中的所有 key/value 对，如果 key 与该 bucket 中的某个 key 匹配，则返回该 bucket 的值和 boolean true。

如果输入的键与该桶中的键匹配，返回该桶的值和布尔值 true。
如果没有匹配，返回一个空字符串和布尔值 false。

接下来，我们来看看 Insert。

 // Insert inserts a new key/value pair into the Table.  func (t *Table) Insert(key, value string) {   i := t.hash(key)  
  for j, kv := range t.table[i] {   if key == kv.Key {   // Overwrite previous value for the same key.   t.table[i][j].Value = value   return   }   }  
  // Add a new value to the table.   t.table[i] = append(t.table[i], kv{   Key: key,   Value: value,   })  }

Table.Insert 也需要对输入 key 进行哈希处理，以确定应该使用哪个桶来插入键/值对。当迭代一个桶中的键/值对时，我们可能会发现一个匹配的 key 已经存在。

如果输入的 key 与该 bucket 中的 key 匹配，则用新的值覆盖 key 的值。
如果没有匹配，则在 bucket 的 key/value pair slice 中添加一个新条目。

搞定了！我们已经创建了一个非常基本的哈希表，可以用来处理键/值字符串对。

 // 8 buckets ought to be plenty.  t := hashtable.New(8)  t.Insert("foo", "bar")  t.Insert("baz", "qux")  
 v, ok := t.Get("foo")  fmt.Printf("t.Get(%q) = (%q, %t)", "foo", v, ok)  // t.Get("foo") = ("bar", true)

让我们把现有的这段代码移植到新的 Go 泛型设计中去。

泛型哈希表

我们的目标是利用现有的 hashtable 代码，使其能支持任意键/值对类型。但我们有一个约束：我们的 hashable 中的 key 必须与预先声明的类型约束 comparable 相匹配，以便我们可以做等值比较。

编辑：原本这段代码使用了 type K, V comparable，但这是不是必须的。感谢 Brad Fitzpatrick 和 @nemetroid 指出，type K comparable, V interface{} 就足够了。

 // Package hashtable implements a basic hashtable for generic key/value pairs.  package hashtable  
 // A Table is a basic generic hashtable.  type Table(type K comparable, V interface{})  struct  {   // hash is a function which can hash a key of type K with t.m.   hash func(key K, m int) int  
  m int   table kv  }  
 // A kv stores generic key/value data in a Table.  type kv(type K comparable, V interface{}) struct {   Key K   Value V  }  
 // New creates a table with m internal buckets which uses the specified hash  // function for an input type K.  func New(type K comparable, V interface{})(m int, hash func(K, int) int) *Table(K, V) {   return &Table(K, V){   hash: hash,   m: m,   // Note the parentheses around "kv(K, V)"; these are required!   table: make((kv(K, V)), m),   }  }

凡是需要泛型的地方，都需要新的类型参数列表，因此这些顶层类型和函数都必须有 K 和 V 的类型参数列表，用于 K 的类型必须是 comparable，任何类型都可以用于 V，如 interface{} 所示。

在写这段代码的时候，学会了下面一些神奇的操作。

注意，哈希函数 func(K, int) int 现在是传递给 New 的第二个参数。这是必要的，因为我们必须知道如何对任何给定的泛型进行哈希。我本可以用 Hash int 约束或类似的方式创建一个新的接口，但我希望我的哈希表能与内置的 Go 类型（如 string 和 int）一起工作，而你没法在这些类型上定义方法。
我花了一点时间来弄清楚创建 Table.table 时 make 调用的正确括号用法。我最初的尝试使用了 make(kv(K, V))，这对增加的类型参数是行不通的。

是时候实现 Get:

 // Get determines if key is present in the hashtable, returning its value and  // whether or not the key was found.  func (t *Table(K, V)) Get(key K) (V, bool) {   // Hash key to determine which bucket this key's value belongs in.   // Pass t.m so t.hash can perform the necessary operation "hash % t.m".   i := t.hash(key, t.m)  
  for j, kv := range t.table[i] {   if key == kv.Key {   // Found a match, return it!   return t.table[i][j].Value, true   }   }  
  // No match. The easiest way to return the zero-value for a generic type   // is to declare a temporary variable as follows.   var zero V   return zero, false  }

一个定义在泛型上的方法，必须在其接受者中声明相关的通用类型参数。现在，Get 可以接受任何类型的 K，并返回任何类型的 V，同时用 bool 表示是否找到了值。

除了修改后的方法接收器和一些 K 和 V 类型之外，这看起来和典型的 Go 代码差不多。

这里有一个稍显棘手的问题是如何处理一个泛型的零值 (1)。链接的问题建议像我们在这里所做的那样，通过声明 var zero V，但也许在未来可以有一个更简单的选项来做这件事。我个人很希望看到返回 _、false 或类似的选项，作为泛型和非泛型 Go 的选项。

#the-zero-value

我们继续说说 Insert。

 // Insert inserts a new key/value pair into the Table.  func (t *Table(K, V)) Insert(key K, value V) {   i := t.hash(key, t.m)  
  for j, kv := range t.table[i] {   if key == kv.Key {   // Overwrite previous value for the same key.   t.table[i][j].Value = value   return   }   }  
  // Add a new value to the table.   t.table[i] = append(t.table[i], kv(K, V){   Key: key,   Value: value,   })  }

为了使这段代码成为泛型代码，只需要做很少的修改。

方法接收器现在是 *Table(K, V)，而不是 *Table。
输入参数现在是 (key K, value V) 而不是 (key, value string)
kv{} struct 现在必须声明为 kv(K, V){}。

这就是所有需要做的，我们现在有了一个泛型的哈希表类型，它可以接受任何实现 comparable 类型约束的键和值。

泛型哈希表的用法

为了测试这段代码，我决定创建两个并行的哈希表，作为字符串和整数类型之间的索引和反向索引。

 t1 := hashtable.New(string, int)(8, func(key string, m int) int {   h := fnv.New32   h.Write(byte(key))   return int(h.Sum32) % m  })  
 t2 := hashtable.New(int, string)(8, func(key int, m int) int {   // Good enough!   return key % m  })

在调用泛型构造函数 New 时，我们指定泛型 K 和 V 的类型参数，例如 t1 是 Table(string, int)，意思是 K=string，V=int；t2 则相反：Table(int, string)，因为 int 和 string 都符合 comparable 类型约束，所以完全没问题。

为了对泛型进行散列，我们必须提供一个可以对 K 和 t.m 进行操作的散列函数，以产生一个 int 输出。对于 t1，我们重新使用原始例子中的 hash/fnv 哈希。至于 t2，对于我们的演示来说，一个模数运算似乎就足够了。

我明白，在大多数情况下，Go 编译器应该能够在 hashtable.New 这样的调用站点推断出 K 和 V 等泛型的正确类型，但我可能会继续用显式的方式写它们一段时间，以习惯这种设计。

现在我们已经创建了索引和反向索引的哈希表，让我们来填充它们。

 strs := string{"foo", "bar", "baz"}  for i, s := range strs {   t1.Insert(s, i)   t2.Insert(i, s)  }

t1 中的每一个键/值对都会被镜像为 t2 中的值/键。最后，我们可以迭代已知的字符串和索引（以及一个永远不会被发现的附加值），以显示我们的泛型代码的作用。

 for i, s := range append(strs, "nope!") {   v1, ok1 := t1.Get(s)   log.Printf("t1.Get(%v) = (%v, %v)", s, v1, ok1)  
  v2, ok2 := t2.Get(i)   log.Printf("t2.Get(%v) = (%v, %v)\n\n", i, v2, ok2)  }

我们的演示程序的输出如下。

 t1.Get(foo) = (0, true)  t2.Get(0) = (foo, true)  
 t1.Get(bar) = (1, true)  t2.Get(1) = (bar, true)  
 t1.Get(baz) = (2, true)  t2.Get(2) = (baz, true)  
 t1.Get(nope!) = (0, false)  t2.Get(3) = (, false)

写完收工，我们已经在 Go 语言中实现了一个泛型哈希表。

为了更好地理解新的泛型设计草案，我还有不少实验要做。如果你喜欢这篇文章并想更多了解泛型，请查看 Go 官方说明 (1) 和新的泛型设计草案文档。

如果你有问题或评论，请随时在 Twitter 或 Gophers Slack 上通过 @mdlayher 联系我。我很可能在不久的将来也会在 Twitch 上直播一些 Go 泛型内容。

奖励：一个泛型哈希函数

在实现我的泛型 hashtable 时，我在 Gophers Slack 上与 #performance 的一些网友讨论了如何才能访问内置 Go map 使用的运行时的泛型哈希功能。

Gophers Slack 的 @zeebo 提出了这个有趣的、彪悍的、出色的解决方案。

 func hash(type A comparable)(a A) uintptr {   var m interface{} = make(map[A]struct{})   hf := (*mh)(*(*unsafe.Pointer)(unsafe.Pointer(&m))).hf   return hf(unsafe.Pointer(&a), 0)  }  
 func main {   fmt.Println(hash(0))   fmt.Println(hash(false))   fmt.Println(hash("why hello there"))  }  
 ///////////////////////////  /// stolen from runtime ///  ///////////////////////////  
 // mh is an inlined combination of runtime._type and runtime.maptype.  type mh struct {   _ uintptr   _ uintptr   _ uint32   _ uint8   _ uint8   _ uint8   _ uint8   _ func(unsafe.Pointer, unsafe.Pointer) bool   _ *byte   _ int32   _ int32   _ unsafe.Pointer   _ unsafe.Pointer   _ unsafe.Pointer   hf func(unsafe.Pointer, uintptr) uintptr  }

这段代码滥用了这样一个功能，即 Go 接口实际上是一个运行时类型数据的元组和一个类型的指针。通过访问该指针，并使用 unsafe 将其转换为运行时的 map 表示（它有一个散列函数字段），我们可以创建一个泛型的散列函数，用于我们自己的代码中。

很酷，对吧？

80余行代码的例子见：（可在线运行）

本文英文链接：

哔哩哔哩「会员购」在流量回放上的探索
新项目用 Rust 还是 Go ？
Go语言如何实现stop the world？
为什么我放弃使用 Kotlin 中的协程？
基准测试表明, Async Python 远不如同步方式
谈谈PHP8新特性Attributes

本文由高可用架构翻译，技术原创及架构实践文章，欢迎通过公众号菜单「联系我们」进行投稿。

高可用架构

改变互联网的构建方式

Go 泛型 80

七叶笔记

Go 新版泛型使用：80余行代码构建一个哈希表

相关文章

更多编程视频和电子书关注公众号

最近发表

标签列表