site banner

Culture War Roundup for the week of August 5, 2024

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

8
Jump in the discussion.

No email address required.

The crowdstrike incident report is up

As far as documents go it shows that Crowdstrikes competence is... horrific.

Finding 1.

This means that when the sensor wanted to make a detection decision based on the IPC Template Type, the sensor code would supply 20 different input sources to the Content Interpreter. However, the definition of the IPC Template Type in the Template Type Definitions file stated that it expected 21 input fields. This definition resulted in Template Instances in Channel File 291 that expected to operate on 21 inputs. This mismatch was not detected during development of the IPC Template Type. The test cases and Rapid Response Content used to test the IPC Template Type did not trigger a fault during feature development or during testing of the sensor 7.11 release

What this says is that they did not test supplying IPC template type to the sensor at all or how many parameters the IPC template type produces? what kind of nonsense is thais.

\2. A runtime array bounds check was missing for Content Interpreter input fields on Channel File 291 Findings: The Rapid Response Content for Channel File 291 instructed the Content Interpreter to read the 21st entry of the input pointer array. However, the IPC Template Type only generates 20 inputs. As a result, once Rapid Response Content was delivered that used a non-wildcard matching criterion for the 21st input, the Content Interpreter performed an out-of-bounds read of the input array. This is not an arbitrary memory write issue and has been independently reviewed.

(hey can you prevent autoformatting for quotes it's really annoying that I can't exactly quote the doc)

So they didn't do the 1 liner test of checking array's inputs? I know in C you can't do this because array's do not contain their own length as a variable, but a c++ vector would have found this error (I guess in the kernel it's C or bust?). Congrats on using the root of all evil the regex So the regex created some interesting behavior on the (invalid) 21st input because of an OUT OF BOUNDS ARRAY access, oh boy.

  1. Template Type testing should cover a wider variety of matching criteria Findings: Both manual and automated testing were performed during the development of the IPC Template Type. This testing was focused on functional validation of the Template Type including the correct flow of security-relevant data through it, and evaluation of that data to generate appropriate detection alerts based on criteria created in development test cases. Automated testing leveraged internal and external tooling to create the required security- relevant data needed to exercise the IPC Template Type under all supported Windows versions within a broad subset of the expected operational use cases. For automated testing, a static set of 12 test cases was selected to be representative of broader operational expectations and to validate the creation of telemetry and detection alerts. Part of this testing included defining a channel file for use within the test cases. The selection of data in the channel file was done manually and included a regex wildcard matching criterion in the 21st field for all Template Instances, meaning that execution of these tests during development and release builds did not expose the latent out-of-bounds read in the Content Interpreter when provided with 20 rather than 21 inputs.

Automated testing somehow doesnt' include having 21 valid inputs in your 21 parameter funciton? Man now that's some brainpower ChatGPT can write tests better than that.

12 test cases which didn't seem to include any invalid inputs? where's your input validation? Where's the array bounds checking?

  1. The Content Validator contained a logic error Findings: The Content Validator evaluated the new Template Instances. However, it based its assessment on the expectation that the IPC Template Type would be provided with 21 inputs. This resulted in the problematic Template Instance being sent to the Content Interpreter

as expected NO INPUT VALIDATION

CLOWNSTRIKE indeed.

  1. Template Instance validation should expand to include testing within the Content Interpreter Findings: Newly released Template Types are stress tested across many aspects, such as resource utilization, system performance impact and detection volume. For many Template Types, including the IPC Template Type, a specific Template Instance is used to stress test the Template Type by matching against any possible value of the associated data fields to identify adverse system interactions. A stress test of the IPC Template Type with a test Template Instance was executed in our test environment, which consists of a variety of operating systems and workloads. The IPC Template Type passed the stress test and was validated for use, and a Template Instance was released to production as part of a Rapid Response Content update. However, the Content Validator-tested Template Instance did not observe that the mismatched number of inputs would cause a system crash when provided to the Content Interpreter by the IPC Template Type

Basically they didn't do integration testing.

Somethign like

IPCtemplatetype a= IPCtemplatetype.new(1,2,3,4,5,6,7) contentInterpreter b = Functionthatbreaks(IPCtemplatetype)

literally would have instant crashed.

They tested by having each thing be intependently tested by making a fake template type for the content interpreter but not using a real generated one.

Ok I know integration testing is hard, and get's exponentially complicated quickly but you can do basic tests by generating a single instance and then checking.

Or here's a billion dollar idea, just turn on a goddamn windows machine locally with your patch before sending it out. This patch broke ~100% of windows machines it came across, so you just needed to have done 1 manual patch of 1 fucking machine locally to have discovered this bug.

  1. Template Instances should have staged deployment Findings: Each Template Instance should be deployed in a staged rollout.

Basic procedure for every large org, and it wasn't followed at something this big? CLOWNSTRIKE continues

I understand when you have 100 customers, a delayed rollout literally does nothing, but at around 1000 customers it does and at the scale crowdstrike was operating at delayed rollouts are basically mandatory

ok the rest of the doc is mostly corporate jargon and meaningless, but boy this wasn't your normal fuckup this was a fuckup of epicly stupid programming oversight. Multiple errors that an absolute novice should have figured out which the most basic of tests would have found.

what the fuck is wrong with clownstrike

Petty programming nitpicks that don't matter, but still:

in C you can't do this because array's do not contain their own length

Arrays do (in compile-time, so if you have the type sizeof will return the actual size), it's just that they decay to pointers if you do anything with them like pass it to a function.

literally would have instant crashed.

Accessing data outside of an array is undefined behaviour and often won't crash if it's just 1 access outside of the end, it'll just fetch garbage instead. You'd have to build the program with an "undefined behavior sanitizer" that detects stuff like that, but I don't know if that's compatible with running in the windows kernel.

.Arrays do (in compile-time, so if you have the type sizeof will return the actual size), it's just that they decay to pointers if you do anything with them like pass it to a function.

I but a humble C++ programmer who hasn't used arrays except in a Class for so long that I forget that it only decay's to a ptr in certain cases.

Accessing data outside of an array is undefined behaviour and often won't crash if it's just 1 access outside of the end, it'll just fetch garbage instead. You'd have to build the program with an "undefined behavior sanitizer" that detects stuff like that, but I don't know if that's compatible with running in the windows kernel.

The UB would have resulted in NULLPTR except every time though I figured. Yes an UB sanitizer is probably unworkable in a kernel program I don't write kernel code.

Yes an UB sanitizer is probably unworkable in a kernel program

A kernel program is not that different from any other.

i think the problem would be the kernel would need to support the memory sanitizer. as long as your kernel module only touched memory allocated from its own functions then i guess in theory you could run a memory sanitizer without kernel support but your kernel module would be pretty useless. the problem is if the kernel gives you a buffer then how does the memory sanitizer that has no knowledge of the kernel know that the buffer is safe to read or write from. apparently windows does have support for kasan so vendor support should make it workable (https://www.microsoft.com/en-us/security/blog/2023/01/26/introducing-kernel-sanitizers-on-microsoft-platforms/). though, i don't use windows so i don't know how well it works. also, i guess you could just have a userspace test harness but for something like this you probably need some kind of final test with the module running in the kernel.

In this case it is, with completely different rules about stdlib usage, memory allocation, what can and cannot be paged etc.

I but a humble C++ programmer who hasn't used arrays except in a Class for so long that I forget that it only decay's to a ptr in certain cases.

To be fair, this is a wart in C's design. Nobody serious (in particular nobody that does kernel level programming) uses array arguments because decay is inconsistent. The last time I saw the topic discussed it was in the context of Linus bollocking someone over it.

having the out of bound entry as zeroes in testing and garbage in real life is also a way it can pass in the test but fail when deployed. imagine it is a struct and the function accessing it checks if one value is true and then just stops processing if the value is false. it wouldn't crash in testing but once deployed depending on the check it could have a very high probability of crashing. usually boolean check would decay into some kind of comparison to 0 so if the value is stored in 8 bits or even 32 bits then its very likely to be not 0.