Rustc Reading Club:从一个错误出发学习 rustc_resolve

最近 Rust 官方社区搞了个 Rustc Reading Club 的活动,由编译器 team 的 Leader Niko 发起,具体网址在这里:https://rust-lang.github.io/rustc-reading-club/

很可惜的是,11 月 4 日的第一期,由于太过火爆并且 Zoom 人数限制 100 人,导致主持人 Niko 自己进不来所以取消了……等待看看官方后续会怎么搞吧,还是很期待官方组织的活动的。

Rust 中文社群的张汉东大佬也紧跟着官方的活动,在社群里面组织了 Rustc 源码阅读的活动,今天(11 月 7 日)举办了第一期,在这期中我跟着吴翱翔大佬的思路,从一个错误出发,学习了一部分 rustc_resolve 的逻辑,于是想着写一篇博客总结一下。

【小广告】下一期 11 月 14 日下午会由刘翼飞大佬带领大家一起去阅读类型推导相关的代码,有兴趣的同学可以下载飞书,注册一个个人账号,然后扫描二维码加入:

Rust 中文社群

准备工作

言归正传,在阅读 Rustc 源代码之前,我们需要先做一些准备工作,主要是先 clone 下来 Rust 的代码,然后配置好 IDE(虽然但是,Clion 到现在正式版还不支持远程,EAP 又各种 bug……),具体可以参考官方的 guide:https://rustc-dev-guide.rust-lang.org/getting-started.html。跟着这章做完就行:https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html。

从错误出发

这次我们的阅读主要的对象是rustc_resolve,顾名思义应该是做名称解析的,更加详细的信息可以来这瞅一眼:https://rustc-dev-guide.rust-lang.org/name-resolution.html。

我们打开rustc_resolvelib.rs一看,妈呀,光这个文件就接近 4000 行代码,直接这么硬看肯定不现实;不过吴翱翔大佬提出了一个思路:从一个我们最常见的错误the name xx is defined multiple times出发,顺着这条路去学习一下相关的代码。

这是一个很好的办法,当你不知道从哪入手的时候,你可以构造一个场景,由点切入,最终由点及面看完所有代码。

废话少说,我们先祭出搜索大法,在rustc_resolve里面搜一下这个错误是在哪出现的:

查找这个错误

非常巧,正好就在rustc_resolvelib.rs中,于是我们跳转过去,发现确实是这个我们想找的错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
let msg = format!("the name `{}` is defined multiple times", name);

let mut err = match (old_binding.is_extern_crate(), new_binding.is_extern_crate()) {
(true, true) => struct_span_err!(self.session, span, E0259, "{}", msg),
(true, _) | (_, true) => match new_binding.is_import() && old_binding.is_import() {
true => struct_span_err!(self.session, span, E0254, "{}", msg),
false => struct_span_err!(self.session, span, E0260, "{}", msg),
},
_ => match (old_binding.is_import(), new_binding.is_import()) {
(false, false) => struct_span_err!(self.session, span, E0428, "{}", msg),
(true, true) => struct_span_err!(self.session, span, E0252, "{}", msg),
_ => struct_span_err!(self.session, span, E0255, "{}", msg),
},
};

所在的这个函数名也正好是report_conflict,完美!

让我们接着看看这个函数在哪被调用到了:

report_conflict

这个函数除了定义外,被调用到了两次,其中下面这次是在自己函数内部递归调用,我们直接无视掉;还有一次是在build_reduced_graph.rs中,让我们跟着去看看:

build_reduced_graph.rs

在这里是被define方法调用到,看着很符合预期,看来我们找对地方了。

这段代码先通过to_name_binding方法把传入的def转换成一个NameBinding,让我们看看这段干了啥:

NameBinding

NameBinding是一个记录了一个值、类型或者模块定义的结构体,其中kind我们大胆猜测是类型,ambiguity看不懂先放着,expansion也是(如果看过 rustc-dev-guide 能大致知道是和卫生宏展开有关,这里我们也先无视),然后是span也不知道干啥的,点进去研究下感觉和增量编译有关,也先放着,最后vis估摸着应该表示的是可见性。

然后我们再点ResolverArenas看看是干啥的:

1
2
3
4
5
/// Nothing really interesting here; it just provides memory for the rest of the crate.
#[derive(Default)]
pub struct ResolverArenas<'a> {
...
}

嗯,好,没啥值得关注的,只是用来提供内存的,直接无视。

我们再接着回到上面的define方法中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
impl<'a> Resolver<'a> {
/// Defines `name` in namespace `ns` of module `parent` to be `def` if it is not yet defined;
/// otherwise, reports an error.
crate fn define<T>(&mut self, parent: Module<'a>, ident: Ident, ns: Namespace, def: T)
where
T: ToNameBinding<'a>,
{
let binding = def.to_name_binding(self.arenas);
let key = self.new_key(ident, ns);
if let Err(old_binding) = self.try_define(parent, key, binding) {
self.report_conflict(parent, ident, ns, old_binding, &binding);
}
}
...
}

第二句let key = self.new_key(ident, ns);看着也没啥特殊的,就是根据当前所在的namespaceident(表示标识符)新建一个key,那么 value 应该就是上面的binding了。

然后这里调用了try_define,如果返回了 Err 就调用report_conflict,让我们接着进入try_define看看(先不用仔细看):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Define the name or return the existing binding if there is a collision.
crate fn try_define(
&mut self,
module: Module<'a>,
key: BindingKey,
binding: &'a NameBinding<'a>,
) -> Result<(), &'a NameBinding<'a>> {
let res = binding.res();
self.check_reserved_macro_name(key.ident, res);
self.set_binding_parent_module(binding, module);
self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
if res == Res::Err {
// Do not override real bindings with `Res::Err`s from error recovery.
return Ok(());
}
match (old_binding.is_glob_import(), binding.is_glob_import()) {
(true, true) => {
if res != old_binding.res() {
resolution.binding = Some(this.ambiguity(
AmbiguityKind::GlobVsGlob,
old_binding,
binding,
));
} else if !old_binding.vis.is_at_least(binding.vis, &*this) {
// We are glob-importing the same item but with greater visibility.
resolution.binding = Some(binding);
}
}
(old_glob @ true, false) | (old_glob @ false, true) => {
let (glob_binding, nonglob_binding) =
if old_glob { (old_binding, binding) } else { (binding, old_binding) };
if glob_binding.res() != nonglob_binding.res()
&& key.ns == MacroNS
&& nonglob_binding.expansion != LocalExpnId::ROOT
{
resolution.binding = Some(this.ambiguity(
AmbiguityKind::GlobVsExpanded,
nonglob_binding,
glob_binding,
));
} else {
resolution.binding = Some(nonglob_binding);
}
resolution.shadowed_glob = Some(glob_binding);
}
(false, false) => {
return Err(old_binding);
}
}
} else {
resolution.binding = Some(binding);
}

Ok(())
})
}

看着比较长,让我们一点一点来。

第一句let res = binding.res();就有点懵了,res是啥?result?response?其实都不是,我们点进去看看,一直点到底,会发现其实是resolution的缩写:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/// The resolution of a path or export.
///
/// For every path or identifier in Rust, the compiler must determine
/// what the path refers to. This process is called name resolution,
/// and `Res` is the primary result of name resolution.
///
/// For example, everything prefixed with `/* Res */` in this example has
/// an associated `Res`:
///
/// ```
/// fn str_to_string(s: & /* Res */ str) -> /* Res */ String {
/// /* Res */ String::from(/* Res */ s)
/// }
///
/// /* Res */ str_to_string("hello");
/// ```
///
/// The associated `Res`s will be:
///
/// - `str` will resolve to [`Res::PrimTy`];
/// - `String` will resolve to [`Res::Def`], and the `Res` will include the [`DefId`]
/// for `String` as defined in the standard library;
/// - `String::from` will also resolve to [`Res::Def`], with the [`DefId`]
/// pointing to `String::from`;
/// - `s` will resolve to [`Res::Local`];
/// - the call to `str_to_string` will resolve to [`Res::Def`], with the [`DefId`]
/// pointing to the definition of `str_to_string` in the current crate.
//
#[derive(Clone, Copy, PartialEq, Eq, Encodable, Decodable, Hash, Debug)]
#[derive(HashStable_Generic)]
pub enum Res<Id = hir::HirId> {
...
}

好的,这条语句就是获得了我们刚才初始化的bindingresolution,我们接着看:

1
2
self.check_reserved_macro_name(key.ident, res);
self.set_binding_parent_module(binding, module);

先看第一行的check_reserved_macro_name

1
2
3
4
5
6
7
8
9
10
11
12
13
crate fn check_reserved_macro_name(&mut self, ident: Ident, res: Res) {
// Reserve some names that are not quite covered by the general check
// performed on `Resolver::builtin_attrs`.
if ident.name == sym::cfg || ident.name == sym::cfg_attr {
let macro_kind = self.get_macro(res).map(|ext| ext.macro_kind());
if macro_kind.is_some() && sub_namespace_match(macro_kind, Some(MacroKind::Attr)) {
self.session.span_err(
ident.span,
&format!("name `{}` is reserved in attribute namespace", ident),
);
}
}
}

好像也没啥特殊的,就是看看有没有用到保留关键字,先无视掉吧;

再看看第二行set_binding_parent_module

1
2
3
4
5
6
7
fn set_binding_parent_module(&mut self, binding: &'a NameBinding<'a>, module: Module<'a>) {
if let Some(old_module) = self.binding_parent_modules.insert(PtrKey(binding), module) {
if !ptr::eq(module, old_module) {
span_bug!(binding.span, "parent module is reset for binding");
}
}
}

hmmm……好像是绑定了所在的 module,看着也没啥特殊的,也跳过吧。

接着往下看,这一段是重头戏了,让我们先进入update_resolution看看:

update_resolution

这里我们只关注:

1
2
3
4
let resolution = &mut *self.resolution(module, key).borrow_mut();
...

let t = f(self, resolution);

这两行,这两行应该是主要逻辑。

首先,我们调用了self.resolution,我们进去看看:

resolution

这里又调用了resolutions

resolutions

这里我们发现又有一段新的逻辑,我们看下字段的注释:

module populate

会发现其实 module 的 resolution 是 lazy 计算的,ok,具体的build_reduced_graph_external想必就是计算的部分,我们在这里先跳过,作为一个黑盒,之后再去探究。

好了,现在回过头继续看刚才的代码:

resolution

resolution方法中,我们获取到了当前模块的所有resolutions,然后看看key是否存在,不存在就创建一个新的,并返回这个resolution

再回到上层代码:

1
2
3
4
let resolution = &mut *self.resolution(module, key).borrow_mut();
...

let t = f(self, resolution);

这里我们拿到了resolution后调用了传入的 f,让我们回到try_define中,先看 else 部分:

1
2
3
4
5
6
7
8
9
self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
...
} else {
resolution.binding = Some(binding);
}

Ok(())
})

这里如果返回的resolutionbindingNone(对应上面resolution方法中新建的resolution,之前不存在),那么就把resolutionbinding设为当前的binding然后返回Ok,逻辑还是比较简单的。

好了,让我们再接着看看如果原来已经有了一个binding,rustc 会如何处理:

1
2
3
4
5
6
7
8
9
10
11
let res = binding.res();

...

self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
if res == Res::Err {
// Do not override real bindings with `Res::Err`s from error recovery.
return Ok(());
}
...

这里如果之前返回的 res 本身就是 Err 的话,就直接返回,我们看一下 Err 的注释:

Res::Err

嗯,这部分直接无视吧,我们接着看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
let res = binding.res();
self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
...
match (old_binding.is_glob_import(), binding.is_glob_import()) {
(true, true) => {
if res != old_binding.res() {
resolution.binding = Some(this.ambiguity(
AmbiguityKind::GlobVsGlob,
old_binding,
binding,
));
} else if !old_binding.vis.is_at_least(binding.vis, &*this) {
// We are glob-importing the same item but with greater visibility.
resolution.binding = Some(binding);
}
}
...

如果说新的和旧的都是glob_import,那么我们判断一下当前的res和之前的res是否是同一个,如果不是就说明出现了模糊性,我们把resolutionbinding设置成ambiguity(模糊的意思);如果两个res是同一个,那我们再判断一下可见性,如果说新的可见性更大,那我们就直接替换。

这里大家就会疑惑了,glob_import是啥?我们来插入一个小插曲:

1
2
3
4
5
6
7
8
fn import_kind_to_string(import_kind: &ImportKind<'_>) -> String {
match import_kind {
ImportKind::Single { source, .. } => source.to_string(),
ImportKind::Glob { .. } => "*".to_string(),
ImportKind::ExternCrate { .. } => "<extern crate>".to_string(),
ImportKind::MacroUse => "#[macro_use]".to_string(),
}
}

看到这大家应该都知道了吧,我就不过多解释了。

好的,回归正题,看起来这段是处理use相关的,我们可以简单略过,接着往下看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
let res = binding.res();
self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
...
match (old_binding.is_glob_import(), binding.is_glob_import()) {
...
(old_glob @ true, false) | (old_glob @ false, true) => {
let (glob_binding, nonglob_binding) =
if old_glob { (old_binding, binding) } else { (binding, old_binding) };
if glob_binding.res() != nonglob_binding.res()
&& key.ns == MacroNS
&& nonglob_binding.expansion != LocalExpnId::ROOT
{
resolution.binding = Some(this.ambiguity(
AmbiguityKind::GlobVsExpanded,
nonglob_binding,
glob_binding,
));
} else {
resolution.binding = Some(nonglob_binding);
}
resolution.shadowed_glob = Some(glob_binding);
}
...

这一段我们处理了一个glob_import和一个非glob_import的情况,简单来说原则就是,非glob的优先,但是有个例外:如果非glob的是在宏中的,那么这里就会导致“模糊”(Rust 是卫生宏),这里会像上文一样把binding设为ambiguity

这部分的逻辑涉及到宏的相关知识,我们先作为一个黑盒跳过,反正大概了解到了非glob优先,会shadowglob就完事,这也符合我们的编码经验和人体工程学。

好,我们最后看最简单的一部分:

1
2
3
4
5
6
7
8
9
10
let res = binding.res();
self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
...
match (old_binding.is_glob_import(), binding.is_glob_import()) {
...
(false, false) => {
return Err(old_binding);
}
...

如果两个名字都不是glob引入的,那么就说明在当前的命名空间中我们出现了俩一样的名字(要注意在这里解析的不是变量名,所以不允许有一样的),那么就说明出错了,返回错误抛给上层,也就是我们的define方法中,并报错:

1
2
3
4
5
6
7
8
9
10
11
12
/// Defines `name` in namespace `ns` of module `parent` to be `def` if it is not yet defined;
/// otherwise, reports an error.
crate fn define<T>(&mut self, parent: Module<'a>, ident: Ident, ns: Namespace, def: T)
where
T: ToNameBinding<'a>,
{
let binding = def.to_name_binding(self.arenas);
let key = self.new_key(ident, ns);
if let Err(old_binding) = self.try_define(parent, key, binding) {
self.report_conflict(parent, ident, ns, old_binding, &binding);
}
}

总结

好了,至此,我们看完了我们开头所说的the name xx is defined multiple times相关的逻辑啦。

不过我们仍然遗留了一些问题,大家可以继续深入探究一下:

  1. binding被标记为ambiguity后,会发生什么?
  2. moduleresolution是怎么被解析出来的?也就是我们略过的build_reduced_graph_external干了啥?
  3. 宏展开导致的冲突为什么要特殊对待?

大家可以顺着以上的问题继续探究,欢迎大家留言评论或者加入 Rust 中文社群一起讨论学习 Rust~