TARAXA
|
Each rule (guideline, suggestion) can have several parts:
new
Some rules are hard to check mechanically, but they all meet the minimal criteria that an expert programmer can spot many violations without too much trouble. We hope that "mechanical" tools will improve with time to approximate what such an expert programmer notices. Also, we assume that the rules will be refined over time to make them more precise and checkable.
Topics:
const
Code clarity and performance. You don't need to write error handlers for errors caught at compile time.
This example fails to achieve what it is trying to achieve (because overflow is undefined) and should be replaced with a simple static_assert
:
Or better still just use the type system and replace Int
with int32_t
.
Alternative formulation: Don't postpone to run time what can be done well at compile time.
Non-const
global variables hide dependencies and make the dependencies subject to unpredictable changes.
Who else might modify data
?
Global constants are useful.
The rule against global variables applies to namespace scope variables as well.
Alternative: If you use global (more generally namespace scope) data to avoid copying, consider passing the data as an object by reference to const
. Another solution is to define the data as the state of some object and the operations as member functions.
Warning: Beware of data races: If one thread can access nonlocal data (or data passed by reference) while another thread executes the callee, we can have a data race. Every pointer or reference to mutable data is a potential data race.
You cannot have a race condition on immutable data.
(Simple) Report all non-const
variables declared at namespace scope.
It's the simplest and gives the cleanest semantics.
Since std::map
and string
have all the special functions, no further work is needed.
This is known as **"the rule of zero"**.
(Not enforceable) While not enforceable, a good static analyzer can detect patterns that indicate a possible improvement to meet this rule. For example, a class with a (pointer, size) pair of member and a destructor that delete
s the pointer could probably be converted to a vector
.
The semantics of the special functions are closely related, so if one needs to be non-default, the odds are that others need modification too.
Given that "special attention" was needed for the destructor (here, to deallocate), the likelihood that copy and move assignment (both will implicitly destroy an object) are correct is low (here, we would get double deletion).
This is known as **"the rule of five"** or **"the rule of six"**, depending on whether you count the default constructor.
If you want a default implementation of a default operation (while defining another), write =default
to show you're doing so intentionally for that function. If you don't want a default operation, suppress it with =delete
.
Compilers enforce much of this rule and ideally warn about any violation.
Relying on an implicitly generated copy operation in a class with a destructor is deprecated.
(Simple) A class should have a declaration (even a =delete
one) for either all or none of the special functions.
If there is any doubt whether the caller or the callee owns an object, leaks or premature destruction will occur.
Consider:
Who deletes the returned X
? The problem would be harder to spot if compute returned a reference. Consider returning the result by value (use move semantics if the result is large):
Alternative: Using a "smart pointer", such as unique_ptr
(for exclusive ownership) and shared_ptr
(for shared ownership). However, that is less elegant and often less efficient than returning the object itself, so use smart pointers only if reference semantics are needed.
Alternative: Sometimes older code can't be modified because of ABI compatibility requirements or lack of resources. In that case, mark owning pointers using owner
from the guideline support library:
This tells analysis tools that res
is an owner. That is, its value must be delete
d or transferred to another owner, as is done here by the return
.
owner
is used similarly in the implementation of resource handles.
Every object passed as a raw pointer (or iterator) is assumed to be owned by the caller, so that its lifetime is handled by the caller. Viewed another way: ownership transferring APIs are relatively rare compared to pointer-passing APIs, so the default is "no ownership transfer."
See also: Using smart pointers, Use of smart pointer arguments.
delete
of a raw pointer that is not an owner<T>
. Suggest use of standard-library resource handle or use of owner<T>
.reset
or explicitly delete
an owner
pointer on every code path.new
or a function call with an owner
return value is assigned to a raw pointer or non-owner
reference.Use appropriate smart pointer and avoid the waste of resources.
Shared pointer allocate reference counting even ownership is never transported or copied.
Alternative: If ownership owns only one object in a time, then use **std::unique_ptr
** even that ownership can or will be moved to other object. Using std::unique_ptr is the simplest way to avoid leaks. It is reliable, it makes the type system do much of the work to validate ownership safety, it increases readability, and it has zero or near zero run-time cost.
Alternative: If ownership can be duplicated to other object then use **std::shared_ptr
**.
Note: In both cases prefer creation via std::make_unique
or std::make_shared
. If you first make an object and then give it to a shared_ptr constructor, you (most likely) do one more allocation (and later deallocation) than if you use std::make_shared
because the reference counts must be allocated separately from the object. Use make_unique
just for convenience and consistency with std::shared_ptr
. std::unique_ptr
shall be the default pointer type, std::shared_ptr
shall be used only if needed.
C++ Core Guidelines: Rules for Smart Pointers
If you perform two explicit resource allocations in one statement, you could leak resources because the order of evaluation of many subexpressions, including function arguments, is unspecified.
void fun(shared_ptr<Widget> sp1, shared_ptr<Widget> sp2);
This fun
can be called like this:
This is exception-unsafe because the compiler may reorder the two expressions building the function's two arguments. In particular, the compiler can interleave execution of the two expressions: Memory allocation (by calling operator new
) could be done first for both objects, followed by attempts to call the two Widget
constructors. If one of the constructor calls throws an exception, then the other object's memory will never be released!
This subtle problem has a simple solution: Never perform more than one explicit resource allocation in a single expression statement. For example:
The best solution is to avoid explicit allocation entirely use factory functions that return owning objects:
Write your own factory wrapper if there is not one already.
Accepting a smart pointer to a widget
is wrong if the function just needs the widget
itself. It should be able to accept any widget
object, not just ones whose lifetimes are managed by a particular kind of smart pointer. A function that does not manipulate lifetime should take raw pointers or references instead.
operator->
or operator*
) that is copyable but the function only calls any of: operator*
, operator->
or get()
. Suggest using a T*
or T&
instead.operator->
or operator*
) that is copyable/movable but never copied/moved from in the function body, and that is never modified, and that is not passed along to another function that could do so. That means the ownership semantics are not used. Suggest using a T*
or T&
instead.Readability. Minimize resource retention.
Note: C++17 also adds if
and switch
initializer statements. These require C++17 support.
Readability and safety.
As an optimization, you may want to reuse a buffer as a scratch pad, but even then prefer to limit the variable's scope as much as possible and be careful not to cause bugs from data left in a recycled buffer as this is a common source of security bugs.
Flag recycled variables.
Macros are a major source of bugs. Macros don't obey the usual scope and type rules. Macros don't obey the usual rules for argument passing. Macros ensure that the human reader sees something different from what the compiler sees. Macros complicate tool building.
Even if we hadn't left a well-known bug in SQUARE
there are much better behaved alternatives; for example:
Scream when you see a macro that isn't just used for source control (e.g., #ifdef
)
Unnamed constants embedded in expressions are easily overlooked and often hard to understand:
No, we don't all know that there are 12 months, numbered 1..12, in a year. Better:
Better still, don't expose constants:
Flag literals in code. Give a pass to 0
, 1
, nullptr
, \n
, ""
, and others on a positive list.
Readability. Minimize surprises: nullptr
cannot be confused with an int
. nullptr
also has a well-specified (very restrictive) type, and thus works in more scenarios where type deduction might do the wrong thing on NULL
or 0
.
Consider:
Flag uses of 0
and NULL
for pointers. The transformation may be helped by simple program transformation.
The T{e}
construction syntax makes it explicit that construction is desired. The T{e}
construction syntax doesn't allow narrowing. T{e}
is the only safe and general expression for constructing a value of type T
from an expression e
. The casts notations T(e)
and (T)e
are neither safe nor general.
For built-in types, the construction notation protects against narrowing and reinterpretation
The integer to/from pointer conversions are implementation defined when using the T(e)
or (T)e
notations, and non-portable between platforms with different integer and pointer sizes.
When unambiguous, the T
can be left out of T{e}
.
std::vector
and other containers were defined before we had {}
as a notation for construction. Consider:
How do we get a vector
of 10 default initialized int
s?
vector<int> v3(10); // ten elements with value 0
The use of ()
rather than {}
for number of elements is conventional (going back to the early 1980s), hard to change, but still a design error: for a container where the element type can be confused with the number of elements, we have an ambiguity that must be resolved. The conventional resolution is to interpret {10}
as a list of one element and use (10)
to distinguish a size.
This mistake need not be repeated in new code. We can define a type to represent the number of elements:
The main problem left is to find a suitable name for Count
.
Flag the C-style (T)e
and functional-style T(e)
casts.
For efficiency and correctness, you nearly always want to capture by reference when using the lambda locally. This includes when writing or calling parallel algorithms that are local because they join before returning.
The efficiency consideration is that most types are cheaper to pass by reference than by value.
The correctness consideration is that many calls want to perform side effects on the original object at the call site (see example below). Passing by value prevents this.
Unfortunately, there is no simple way to capture by reference to const
to get the efficiency for a local call but also prevent side effects.
Here, a large object (a network message) is passed to an iterative algorithm, and is it not efficient or correct to copy the message (which may not be copyable):
This is a simple three-stage parallel pipeline. Each stage
object encapsulates a worker thread and a queue, has a process
function to enqueue work, and in its destructor automatically blocks waiting for the queue to empty before ending the thread.
Flag a lambda that captures by reference, but is used other than locally within the function scope or passed to a function by reference. (Note: This rule is an approximation, but does flag passing by pointer as those are more likely to be stored by the callee, writing to a heap location accessed via a parameter, returning the lambda, etc. The Lifetime rules will also provide general rules that flag escaping pointers and references including via lambdas.)
Pointers and references to locals shouldn't outlive their scope. Lambdas that capture by reference are just another place to store a reference to a local object, and shouldn't do so if they (or a copy) outlive the scope.
const
and non-local contextIt's confusing. Writing [=]
in a member function appears to capture by value, but actually captures data members by reference because it actually captures the invisible this
pointer by value. If you meant to do that, write this
explicitly.
This is under active discussion in standardization, and may be addressed in a future version of the standard by adding a new capture mode or possibly adjusting the meaning of [=]
. For now, just be explicit.
this
(whether explicitly or via default capture)Using in-class member initializers lets the compiler generate the function for you. The compiler-generated function can be more efficient.
(Simple) A default constructor should do more than just initialize member variables with constants.
To avoid unintended conversions.
If you really want an implicit conversion from the constructor argument type to the class type, don't use explicit
:
(Simple) Single-argument constructors should be declared explicit
. Good single argument non-explicit
constructors are rare in most code based. Warn for all that are not on a "positive list".
Avoid used-before-set errors and their associated undefined behavior. Avoid problems with comprehension of complex initialization. Simplify refactoring.
No, i = 7
does not initialize i
; it assigns to it. Also, i
can be read in the ...
part. Better:
The always initialize rule is deliberately stronger than the an object must be set before used language rule. The latter, more relaxed rule, catches the technical bugs, but:
The always initialize rule is a style rule aimed to improve maintainability as well as a rule protecting against used-before-set errors.
Here is an example that is often considered to demonstrate the need for a more relaxed rule for initialization
This cannot trivially be rewritten to initialize i
and j
with initializers. Note that for types with a default constructor, attempting to postpone initialization simply leads to a default initialization followed by an assignment. A popular reason for such examples is "efficiency", but a compiler that can detect whether we made a used-before-set error can also eliminate any redundant double initialization.
At the cost of repeating cond
we could write:
Assuming that there is a logical connection between i
and j
, that connection should probably be expressed in code:
Obviously, what we really would like is a construct that initialized n variables from a tuple
. For example:
Today, we might approximate that using tie()
:
This may be seen as an example of the immediately initialize from input exception below.
Creating optimal and equivalent code from all of these examples should be well within the capabilities of modern C++ compilers (but don't make performance claims without measuring; a compiler may very well not generate optimal code for every example and there may be language rules preventing some optimization that you would have liked in a particular case).
This rule covers member variables.
The compiler will flag the uninitialized m6
because it is a const
, but it will not catch the lack of initialization of m3
. Usually, a rare spurious member initialization is worth the absence of errors from lack of initialization and often an optimizer can eliminate a redundant initialization (e.g., an initialization that occurs immediately before an assignment).
Complex initialization has been popular with clever programmers for decades. It has also been a major source of errors and complexity. Many such errors are introduced during maintenance years after the initial implementation.
If you are declaring an object that is just about to be initialized from input, initializing it would cause a double initialization. However, beware that this may leave uninitialized data beyond the input – and that has been a fertile source of errors and security breaches:
The cost of initializing that array could be significant in some situations. However, such examples do tend to leave uninitialized variables accessible, so they should be treated with suspicion.
When feasible use a library function that is known not to overflow. For example:
Don't consider simple variables that are targets for input operations exceptions to this rule:
In the not uncommon case where the input target and the input operation get separated (as they should not) the possibility of used-before-set opens up.
A good optimizer should know about input operations and eliminate the redundant operation.
Using an uninitialized
or sentinel value is a symptom of a problem and not a solution:
Now the compiler cannot even simply detect a used-before-set. Further, we've introduced complexity in the state space for widget: which operations are valid on an uninit
widget and which are not?
Sometimes, a lambda can be used as an initializer to avoid an uninitialized variable:
or maybe:
See also: Use lambdas for complex initialization
const
argument can be assumed to be a write into the variable.It nicely encapsulates local initialization, including cleaning up scratch variables needed only for the initialization, without needing to create a needless nonlocal yet nonreusable function. It also works for variables that should be const
but only after some initialization work.
If at all possible, reduce the conditions to a simple set of alternatives (e.g., an enum
) and don't mix up selection and initialization.
Hard. At best a heuristic. Look for an uninitialized variable followed by a loop assigning to it.
To minimize confusion and errors. That is the order in which the initialization happens (independent of the order of member initializers).
(Simple) A member initializer list should mention the members in the same order they are declared.
Makes it explicit that the same value is expected to be used in all constructors. Avoids repetition. Avoids maintenance problems. It leads to the shortest and most efficient code.
How would a maintainer know whether mJ
was deliberately uninitialized (probably a poor idea anyway) and whether it was intentional to give mS
the default value ""
in one case and qqq
in another (almost certainly a bug)? The problem with mJ
(forgetting to initialize a member) often happens when a new member is added to an existing class.
Alternative: We can get part of the benefits from default arguments to constructors, and that is not uncommon in older code. However, that is less explicit, causes more arguments to be passed, and is repetitive when there is more than one constructor:
An initialization explicitly states that initialization, rather than assignment, is done and can be more elegant and efficient. Prevents "use before set" errors.
Readability. Error avoidance. Named casts are more specific than a C-style or functional cast, allowing the compiler to catch some errors.
The named casts are:
static_cast
const_cast
reinterpret_cast
dynamic_cast
std::move
// move(x)
is an rvalue reference to x
std::forward
// forward(x)
is an rvalue reference to x
gsl::narrow_cast
// narrow_cast<T>(x)
is static_cast<T>(x)
gsl::narrow
// narrow<T>(x)
is static_cast<T>(x)
if static_cast<T>(x) == x
or it throws narrowing_error
The example was synthesized from real-world bugs where D
used to be derived from B
, but someone refactored the hierarchy. The C-style cast is dangerous because it can do any kind of conversion, depriving us of any protection from mistakes (now or in the future).
When converting between types with no information loss (e.g. from float
to double
or int64
from int32
), brace initialization may be used instead.
This makes it clear that the type conversion was intended and also prevents conversions between types that might result in loss of precision. (It is a compilation error to try to initialize a float
from a double
in this fashion, for example.)
reinterpret_cast
can be essential, but the essential uses (e.g., turning a machine address into pointer) are not type safe:
reinterpret_cast
.static_cast
between arithmetic types.It makes a lie out of const
. If the variable is actually declared const
, the result of "casting away `const`" is undefined behavior.
Sometimes, you may be tempted to resort to const_cast
to avoid code duplication, such as when two accessor functions that differ only in const
-ness have similar implementations. For example:
Instead, prefer to share implementations. Normally, you can just have the non-const
function call the const
function. However, when there is complex logic this can lead to the following pattern that still resorts to a const_cast
:
Although this pattern is safe when applied correctly, because the caller must have had a non-const
object to begin with, it's not ideal because the safety is hard to enforce automatically as a checker rule.
Instead, prefer to put the common code in a common helper function – and make it a template so that it deduces const
. This doesn't use any const_cast
at all:
You may need to cast away const
when calling const
-incorrect functions. Prefer to wrap such functions in inline const
-correct wrappers to encapsulate the cast in one place.
Sometimes, "cast away `const`" is to allow the updating of some transient information of an otherwise immutable object. Examples are caching, memoization, and precomputation. Such examples are often handled as well or better using mutable
or an indirection than with a const_cast
.
Consider keeping previously computed results around for a costly operation:
Here, GetVal()
is logically constant, so we would like to make it a const
member. To do this we still need to mutate cache
, so people sometimes resort to a const_cast
:
Fortunately, there is a better solution: State that cache
is mutable even for a const
object:
An alternative solution would to store a pointer to the cache
:
That solution is the most flexible, but requires explicit construction and destruction of *cache
(most likely in the constructor and destructor of X
).
In any variant, we must guard against data races on the cache
in multi-threaded code, possibly using a std::mutex
.
const_cast
s.If you use member functions, you need two. Unless you use a nonmember function for (say) ==
, a == b
and b == a
will be subtly different.
When to use a normal, friend, or member function overload?
In most cases, the language leaves it up to you to determine whether you want to use the normal/friend or member function version of the overload. However, one of the two is usually a better choice than the other. When dealing with binary operators that don't modify the left operand (e.g. operator+), the normal or friend function version is typically preferred, because it works for all parameter types (even when the left operand isn't a class object, or is a class that is not modifiable). The normal or friend function version has the added benefit of "symmetry", as all operands become explicit parameters (instead of the left operand becoming *this and the right operand becoming an explicit parameter). When dealing with binary operators that do modify the left operand (e.g. operator+=), the member function version is typically preferred. In these cases, the leftmost operand will always be a class type, and having the object being modified become the one pointed to by *this is natural. Because the rightmost operand becomes an explicit parameter, there's no confusion over who is getting modified and who is getting evaluated. Unary operators are usually overloaded as member functions as well, since the member version has no parameters.
The following rules of thumb can help you determine which form is best for a given situation:
Not everything can be overloaded as a friend function The assignment (=), subscript ([]), function call (()), and member selection (->) operators must be overloaded as member functions, because the language requires them to be.
Not everything can be overloaded as a member function Overloading the I/O operators, we overloaded operator<< for our Point class using the friend function method.
Flag member operator functions.