Out of three headline C++20 features (modules, coroutines and the third one), modules are, in my opinion, by far the most important for the daily use. Modules aim to replace the legacy header system inherited from C and based on primitive textual inclusion with a more scalable, hermetic and fine-grained system.
There has been slow but steady progress on implementing modules in various compilers and build systems. I recently read a blog post “import CMake; C++20 Modules” and, among other things, learned that Clang 16 supports modules out of the box. So I decided to give it a try and build {fmt} as a module. This post is a summary of initial efforts.
Thanks to Daniela Engert, who has done heroic work modularizing {fmt}, reported in her talk “A (short) Tour of C++ Modules” there were very few issues when compiling it as a module with clang.
The main class of issues can be illustrated on the following example (godbolt):
// test.cc
module;
export module test;
export {
template <typename T>
class Classy {
public:
void funky();
};
template <typename T>
void Classy<T>::funky() {
}
// ...
}
when compiled with clang it gives the following error:
$ clang++-16 -std=c++20 --precompile -x c++-module test.cc
test.cc:14:17: error: cannot export 'funky' as it is not at namespace scope
void Classy<T>::funky() {
~~~~~~~~~~~^
The error can be fixed by moving the definition of the member function either
to the class body or outside of the export
block, e.g.:
export {
template <typename T>
class Classy {
public:
void funky() {}
};
// ...
}
This and other issues which are too obscure to discuss here are now solved and {fmt} can be compiled out of the box with clang:
git clone https://github.com/fmtlib/fmt.git
cd fmt
clang++-16 -std=c++20 --precompile -x c++-module src/fmt.cc -I include
clang++-16 -std=c++20 -c fmt.pcm
Right now it is a manual process but the plan is to add CMake support later.
The above commands produce two files, fmt.pcm
which encodes the module
interface using pulse-code modulation and fmt.o
which is a
usual object file.
Once the module is built, it can be consumed as follows:
// example.cc
import fmt;
int main() {
fmt::print("Hello, modules!\n");
}
clang++-16 -std=c++20 -fprebuilt-module-path=. fmt.o example.cc -o example
./example
As expected, this prints
Hello, modules!
Unfortunately, it doesn’t give a measurable build speed up compared to using the lightweight core API. Looking at the compilation time trace we can see that much time is spent generating and optimizing code:
But there is almost no code in example.cc
! It looks like clang is ignoring the
extern template
and recompiles templates instead of using explicit
instantiations from fmt.o
.
To confirm this I put together a simple repro (godbolt). It consists of two files, foo.cxx
which
defines a module with a function template and its explicit instantiation and
main.cxx
which calls this instantiation.
foo.cxx
:
module;
#include <iostream>
export module foo;
export template <typename T>
void hello_world(T val) {
std::cout << val;
}
extern template void hello_world(char);
template void hello_world(char);
main.cxx
:
import foo;
int main() {
hello_world('x');
}
This builds the module and generate an assembly for main
:
clang++-16 -std=c++20 --precompile -x c++-module foo.cxx
clang++-16 -std=c++20 -c foo.pcm
clang++-16 -std=c++20 -fprebuilt-module-path=. -S main.cxx
Inspecting main.s
shows that the hello_world<char>
instantiation is emitted
there:
.text
.file "main.cxx"
.section .text._ZW3foo11hello_worldIcEvT_,"axG",@progbits,void hello_world@foo<char>(char),comdat
.weak void hello_world@foo<char>(char) # -- Begin function void hello_world@foo<char>(char)
.p2align 4, 0x90
.type void hello_world@foo<char>(char),@function
void hello_world@foo<char>(char): # @void hello_world@foo<char>(char)
.cfi_startproc
# %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movb %dil, %al
movb %al, -1(%rbp)
movq std::cout@GOTPCREL(%rip), %rdi
movsbl -1(%rbp), %esi
callq std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char)@PLT
addq $16, %rsp
popq %rbp
.cfi_def_cfa %rsp, 8
retq
This happens regardless of whether we use extern template
or not in foo.cxx
.
For comparison, compiling the following program:
foo.h
:
#include <iostream>
template <typename T>
void hello_world(T val) {
std::cout << val;
}
extern template void hello_world(char);
#include "foo.h"
int main() {
hello_world('x');
}
doesn’t generate hello_world<char>
.
Going back to the {fmt} case we can see that the object files consuming the module is ~200x larger than the one consuming headers:
-rw-rw-r-- 1 vagrant vagrant 2624 Apr 10 23:46 test-header.o
-rw-rw-r-- 1 vagrant vagrant 542072 Apr 10 19:41 test-module.o
Almost all of these are unnecessary template instantiations.
This suggests that we can expect substantial build speed improvement once
clang stops emitting them either by recognizing extern template
or some
other means.
Happy modularizing!
Last modified on 2023-04-10