Thursday, 27 October 2022

Is Rust sacrificing 3U safety for stability?

My original post is too malign or pejorative or something, apparently imputing dark motives when I mean only to highlight the ostensible appearance. So I try again.

[This rant is a work in Progress and will receive regular updates]

TL;DR: This is a general call for prioritisation of breaking changes over preservation of unexpected undetectable unsafe or undefined behaviour in Rust, and for full transparency with respect to such faults. (And maybe there is and I haven't seen it).

Will Rust sacrifice 3U safety for stability?

3U safety refers to safety from unexpected undetectable unsafe behaviours (or, if you prefer, unexpected undetectable undefined behaviours).

The utility of C and other ancient software development tools harks back to those ages when they could be compared to that dog walking on its hind legs: The wonder isn't that it does it well, but that it does it at all. And in comparison to what preceded them, they did it well.

But it's no longer enough.

The instability of such tools has been a major hindrance to the development and maintenance of secure software.

Youthfully optimistic C developers blithely consider C to be a portable macro assembler. The experience that comes with time teaches them that it isn't. 

C compilers have been secretly throwing away increasing chunks of code for decades, and not just in simple cases like throwing away sub-statement code by remembering that a variable is already in a register, or throwing away whole statements like function calls that zero a sensitive buffer (because it wasn't read-from again), any new compiler edition might throw away whole blocks of code that you inserted for security checks, to fix bugs -- and not tell you. 

This is not an attack on C compiler authors or maintainers but is to illustrate (due to insufficient congruence of priorities) the shifting sands on which much software has been, and is being, built.

Will Rust go the same way?

Stability or Speed?

In a much-contested view (especially by compiler authors) D. J. Bernstein controversially wrote in 2015:

...gcc and clang both feel entitled to arbitrarily change the behavior of "undefined" programs. Pretty much every real-world C program is "undefined" according to the C "standard", and new compiler "optimizations" often produce new security holes in the resulting object code, as illustrated by

and many other examples. Crypto code isn't magically immune to this--- one can easily see how today's crypto code audits will be compromised by tomorrow's compiler optimizations, even if the code is slightly too complicated for today's compilers to screw up.

If you want the short version of the dispute, C compilers take advantage of what to developers is an uncontrollably increasing collection of what is termed "undefined behaviour" in order to make speed optimizations because... speed is obviously what you want, and has been one of the main metrics in the compiler wars. (Smallness of output being the other main metric).

It plays out like this: There is an ever-growing bunch of rules that developers must adhere to, and if they don't, the compiler can do what it likes, and it's your fault. When we say do what it likes we mean things like: ignore your code. And compiler writers stick to that like glue, even when it makes it exceedingly difficult to write secure software, and when the introduction of new undefined behaviour makes existing secure software suddenly become insecure when compiled with a new compiler. 

You have the responsibility to look at the compiler changes list and put all the right flags in your makefile to get the old behaviour. Flags that the old compiler won't recognize. How often have you seen a makefile that assembled CFLAGS flags based on the compiler version?

Now the language designers and compiler authors are right because they make the rules, but the fight to produce correct and safe software becomes more and more like a fight, and every edition of the compiler tools behaves more and more like a fully automatic foot-gun - all the better to shoot yourself in the foot with, and the bullets you fire at security problems are optimised away.

Clearly, compiler authors and compiler users have conflicting views on what the compiler should do. And if you don't believe me, read this sorry tale of the assertions that were optimized away.

I don't exaggerate. New tools will produce worse code. And the compiler doesn't warn you. Because in your makefile you forgot to turn on the new warning flag that didn't exist when you wrote the makefile.

Simply recompiling previously working, secure code with a newer version of the compiler can introduce security vulnerabilities. While the new behaviour can be disabled with a flag, existing makefiles do not have that flag set, obviously. And since no warning is produced, it is not obvious to the developer that the previously reasonable behaviour has changed.

I think I want a compiler that warns me when it throws away lines of code that I took the trouble to write and debug, but what do I know? It's really hard. But so is having your code silently yanked away.

Consider what happens when the default code optimisation settings change in the next release of the compiler. As one chap puts it:

Value range propagation now assumes that the this pointer of C++ member functions is non-null. This eliminates common null pointer checks but also breaks some non-conforming code-bases (such as Qt-5, Chromium, KDevelop). As a temporary work-around -fno-delete-null-pointer-checks can be used. Wrong code can be identified by using -fsanitize=undefined.

http://blog.fefe.de/?ts=a9de792d 

A more even-handed treatment of what compiler authors are trying to achieve is given here: https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759de5a7 

It all comes down to a persistent and ever-escalating mismatch of expectations. The scarcely known obligation upon software developers is to be aware of the increasing instances of the increasing definitions of undefined behaviour that exist in existing debugged safe and working, and of the changes to default optimisations, so that with each compiler release they must remove the last year's best practice and replace it with this year's best practice.

And our current state is where:

...the root cause of approximately 70% of security vulnerabilities that Microsoft fixes and assigns a CVE (Common Vulnerabilities and Exposures) are due to memory safety issues. ​

This is despite mitigations including intense code review, training, static analysis, and more.​

While many experienced programmers can write correct systems-level code, it’s clear that no matter the amount of mitigations put in place, it is near impossible to write memory-safe code using traditional systems-level programming languages at scale.

we-need-a-safer-systems-programming-language

Let's expand upon those mitigations. Everyone is compiling C with -Werror and -Wall, but it's not enough. Many look harder by paying good time and money for Coverity, Klocwork, Black Duck, Code Sonar, etc, and trying every C compiler they can get their hands on for maximum warnings, and the information on every change that they can make.

If the compiler authors insist that they are correct (as they may) then the software developers need new tools. For me, and for the author of that previous quote, Rust is that tool.

From C, we have had one long round of compiler-imposed instability in pursuit of speed, and another more recent self-imposed round of instability for safety.

So to answer the question heading this section: 

If you want stability (I mean such as for security bugs to stay fixed) the answer is clear: do not update your compiler tools.

If you want speed, you need the latest compiler tools. Your programs will be very fast, but you will not know what they are doing. Of course, there are ways to find out, but the answer will vary from one compiler edition to the next.

In the question of stability or speed, the C compiler maintainers chose speed for too many decades.

What about Rust?

Stability or Safety?

Don't think that Rust isn't optimising your code. It is. The most popular Rust implementation is based on Clang/LLVM and has various bugs originating in the optimising compiler backend, throwing away code that Rust wants to keep.

But Rust makes certain guarantees of safety, in the same way that C promises to throw away apparently arbitrary multi-line chunks of code. 

You can see the failure list of what are termed Unsound Bugs: Today I see 65 open and 307 closed. If I exclude C bugs from that list then I see 10 open and 86 closed. (I maybe misusing the bug filter system).





Wednesday, 26 October 2022

Is Rust sacrificing 3U safety for stability?

[This rant is a work in Progress and will receive regular updates]
TL;DR: This is a general call for prioritisation of breaking changes over preservation of unexpected undetectable unsafe or undefined behaviour in Rust, and for full transparency with respect to such faults. (And maybe there is and I haven't seen it).
  
I was reading the discussion at Surprising soundness trouble around `PollFn` (preceding Zulip discussion, subsequent github issue) and I was appalled.

As a meta-observation, and speaking as someone arguing for use of Rust in organisation projects (and as I am new to Rust, and maybe I misunderstood the whole thing) but:

Seeing people apparently argue to preserve all three U in the the 3U (unexpected undetectable undefined) behaviour undermines the glorious promises of Rust safety, and the claims of the supposed impossibilities of writing various kinds of bugs in Rust.

This very much damages the Rust cause, and that is something that also ought also to be considered along with the issue of introducing safe but breaking changes for existing users, because new users are coming to Rust for safety, and there will be more new users than existing users. (And most existing users also came for safety).

Those who want stability over safety will stay where they are.

The over-caution about breaking changes turns these expectations on their head. How do you think users feel: Yeah, we didn't introduce the safety of this breaking change 'cos a very few of you might need to make a patch your work and recompile to fix an actual bug, so we left it unexpectedly undetectably unsafe as a favour to you.

Because that is what this looks like.

So while everyone and their dog is now compiling C with -Werror and -Wall and literally begging to get as many breaking changes as they can, and looking yet harder by paying good time and money for Coverity and Klocwork and Black Duck etc, and trying every C compiler they can get their hands on for maximum warnings, people are arguing that Rust should to cover it all up. 

Because that is what it looks like. 

I cannot comprehend the mindset behind that. I'd love to know what some of you been smoking so that I can make sure I never ingest any of it.

All the "nobody is writing such bugs" claims, are just begging for it to come up 10 years later in the post-mortem of a severe exploit, yet we just had a lengthy post-mortem discussed on Zulip because somebody wrote such bugs, and some poor chap spent a week trying to find the cause. 

And we're more worried about "breaking changes" than actual breakage? The promise of Rust was that it should have been impossible to write that bug.

And I don't think much about the idea of simply mentioning such risk in a note at the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard'.

It's having this sort of secretly-documented unexpected undetectable undefined behaviour regularly foisted on us by new optimisations in the C language compilers (introducing new bugs and UB in old code in the process) that drive us to Rust in the first place. That's the sort of breaking change we don't like. We really want safety. If we didn't we wouldn't be spending millions and billions across the board between us on rewriting and retooling for Rust.

It honestly looks like I'm watching TLA-sponsored exploits being embedded into Rust. I can't account for it in any other way.

I just say that this is what it looks like, and it is very damaging to the image of Rust, precisely because it could be very damaging to compiled code, and given a useful combination of gadgets, also damaging to the systems using them, and those using the systems.

I'm sure this isn't the only case, but I daren't look. I'm trying to make a case for Rust based on its promises of safety, and it is a real conflict to know that I might undercover an apparent conspiracy to not only keep the failures of such guarantees hidden, but even to maintain those failures as failures!

I beg in the name of transparency and accountability, that whatever rules need changing are changed, so that awareness of the failure of Rust safety guarantees is paramount:

  • There is a specific public list of any bug or flaw which could accidentally permit unexpected undetectable undefined or unsafe behaviour, along with a collection of instances where it has been discovered.
    There is: see: https://github.com/rust-lang/rust/labels/I-unsound
  • Clippy detection be implemented rapidly, even if there is no fix (or no agreed fix), with the clippy message containing a link to the issue, even if it is unfixable.
  • Breaking fixes must be adopted within a small fixed timescale if non-breaking fixes aren't adopted

I also suggest that if there is a strong case to retain these unsafe potentialities, it is not overriding enough to block the fix. If there is a strong case, then maybe potentially unsafe code can continue to be combined and compiled and released by those with such needs, by use of a strongly frowned upon compiler flag to deselect the safety of the breaking fixes.

But this practice of presenting the preservation of unexpected undetectable unsafe behaviours as some kind of unqualified benefit to those who are fleeing that sort of behaviour in other languages should stop. We undergo the expense and inconvenience and business risk to get guarantees of safety, and not guarantees of stability, which if we are honest, we know are better had by doing absolutely nothing.

With ever-increasing adoption, if Rust or libs are breaking the safety guarantees, the best time to fix them is yesterday, not maybe someday.