Home / News / The case of the crash on a null pointer even though we checked it for null

The case of the crash on a null pointer even though we checked it for null

Here’s a sarcastic version of the provided text:
“A colleague was investigating a crash. The stack at the point of the crash looks like this:

Contoso!winrt::Impl::consume_Windows_Foundation_Collections_IVectorView
    contoso!winrt::Impl::consume_Windows_Foundation_Collections_IVectorView<
      winrt::Windows::Foundation::Collections::IVectorView<int>,
      int>::Size+0x30
    contoso!winrt::Impl::Contoso::implementation::Widget::
    InitializeNodesAsync$_ResumeCoro$1+0x2bc
    contoso!wil::details::coro::apartment_resumer::resume_apartment_callback+0x28
    combase!CRemoteUnknown::DoCallback+0x34
    combase!CRemoteUnknown::DoNonreentrantCallback+0x48
    rpcrt4!Invoke+0x64
    rpcrt4!InvokeHelper+0x130
    rpcrt4!Ndr64StubWorker+0x6cc
    rpcrt4!Ndr

A colleague was investigating a crash. The stack at the point of the crash looks like this:

contoso!winrt::impl::consume_Windows_Foundation_Collections_IVectorView<
            winrt::Windows::Foundation::Collections::IVectorView<int>,
            int>::Size+0x30
contoso!winrt::Contoso::implementation::Widget::
            InitializeNodesAsync$_ResumeCoro$1+0x2bc
contoso!wil::details::coro::apartment_resumer::resume_apartment_callback+0x28
combase!CRemoteUnknown::DoCallback+0x34
combase!CRemoteUnknown::DoNonreentrantCallback+0x48
rpcrt4!Invoke+0x64
rpcrt4!InvokeHelper+0x130
rpcrt4!Ndr64StubWorker+0x6cc
rpcrt4!NdrStubCall3+0xdc
combase!CStdStubBuffer_Invoke+0x6c
combase!ObjectMethodExceptionHandlingAction<⟦...⟧>+0x48
combase!DefaultStubInvoke+0x2b8
combase!SyncServerCall::StubInvoke+0x40
combase!StubInvoke+0x170
combase!ServerCall::ContextInvoke+0x3c4
combase!ReentrantSTAInvokeInApartment+0x1fc
combase!ComInvokeWithLockAndIPID+0xcc4
combase!ThreadDispatch+0x514
combase!ThreadWndProc+0x1b4
user32!UserCallWinProcCheckWow+0x180
user32!DispatchMessageWorker+0x130

If we look at the point of the fault:

0:001>  r
Last set context:
 x0=0000000000000000   x1=00000053af4fdf78   x2=0000017b4f7e6380   x3=0000000000000000
 x4=6597e92abf947185   x5=857194bf2ae99765   x6=857194bf2ae99765   x7=0000017b4f700000
 x8=00007ff85b11ce40   x9=0000000000000001  x10=0000006700000000  x11=0000000000000000
x12=fffffffffff00000  x13=0000000000000000  x14=0000000000000000  x15=000000000000000b
x16=00007ff8d4fd44a0  x17=ffff68553f08a1c6  x18=00007ff8d0bc0000  x19=0000017b4c834de0
x20=0000000000000000  x21=00007ff8d45cd188  x22=00007ff8d45cd188  x23=00007ff8d45cd150
x24=0000017b31196930  x25=00000053af4fdfa0  x26=00000053af4fe580  x27=0000000000000010
x28=0000000000000000   fp=00000053af4fdf90   lr=00007ff85af448cc   sp=00000053af4fdf60
 pc=00007ff85af448f0  psr=80001040 N--- EL0
contoso!winrt::impl::consume_Windows_Foundation_Collections_IVectorView<
    winrt::Windows::Foundation::Collections::IVectorView<int>,
    int>::Size+0x30:
00007ff8`5af448f0 f9400008 ldr         x8,[x0]

we see that we are crashing on a null pointer (x0).

The function is a C++/WinRT consume_, which is just a projection of the underlying COM call. The COM call is performed by reading the vtable pointer from the object, reading the function pointer from the vtable, and then calling the function.

The fact that we are reading from x0 (the inbound and outbound parameter slot) means that this is almost certainly reading the vtable pointer from the object: We want the COM pointer in x0 for the outbound call, so the obvious thing to do is to leave it there while you read the vtable pointer from it.

We can confirm this by reading the disassembly.

0:001> u .-30 .
contoso!winrt::impl::consume_Windows_Foundation_Collections_IVectorView<
    winrt::Windows::Foundation::Collections::IVectorView<int>,
    int>::Size+0x30:
00007ff8`5af448c0 stp         fp,lr,[sp,#-0x10]!   ; build stack frame
00007ff8`5af448c4 mov         fp,sp
00007ff8`5af448c8 bl          contoso!__security_push_cookie (00007ff8`5ab91050)
00007ff8`5af448cc sub         sp,sp,#0x20
00007ff8`5af448d0 mov         w8,#0x1E4
00007ff8`5af448d4 ldr         x0,[x0]              ; fetch the COM pointer
00007ff8`5af448d8 str         wzr,[sp,#0x18]
00007ff8`5af448dc str         w8,[sp]
00007ff8`5af448e0 adrp        x8,contoso!`string'+0x10 (00007ff8`5b11c000)
00007ff8`5af448e4 add         x8,x8,#0xE40
00007ff8`5af448e8 stp         x8,xzr,[sp,#8]
00007ff8`5af448ec add         x1,sp,#0x18
00007ff8`5af448f0 ldr         x8,[x0]              ; read the vtable

(The other code is recording the line number and file name for diagnostic purposes.)

Okay, so we are calling IVectorView<T>::Size on a null pointer.

Let’s see whose idea that is.

Here’s the caller:

winrt::IAsyncAction Widget::InitializeNodesAsync()
{
    auto lifetime = get_strong();
    std::optional<winrt::IVectorView<int32_t>> numbers;
    co_await winrt::resume_background();
    CallWithRetry([&] {
        numbers = GetMagicNumbers();
    });

    if (numbers == nullptr)
    {
        co_return;
    }

    co_await winrt::resume_foreground(m_uithread);

    std::vector<winrt::Node> nodes;
    nodes.reserve((*numbers).Size()); // ← CRASH HERE

The crash inside consume_ tells us that (*numbers) is null. Let’s see if we can confirm that in the debugger.

First, we have to find numbers.

0:001> dv /V
@x19              @x19        __coro_frame_ptr = 0x0000017b`4c834de0
00000000`00000088 @x25+0x0590         lifetime = struct winrt::com_ptr<
                                                     winrt::Contoso::implementation::Widget>
00000000`000000e8 @x25+0x0590            nodes = class std::vector<int>
00000000`000000d8 @x25+0x0590          numbers = class std::optional<winrt::Windows::Foundation::
                                                     Collections::IVectorView<int> >
⟦ ... ⟧

The debugger says that numbers is at @x25+0x0590, and that this calculates out to 00000000`000000d8, which is nonsense. So we can’t really trust that calculation.

Let’s see what the code uses. We disassemble backward from the return address.

0:001> k2
Child-SP          RetAddr           Call Site
00000053`af4fdf60 00007ff8`5b059c8c contoso!winrt::impl::consume_Windows_Foundation_Collections_IVectorView<
                                        winrt::Windows::Foundation::Collections::IVectorView<int>,
                                        int>::Size+0x30
00000053`af4fdfa0 00007ff8`5abc5108 contoso!winrt::Contoso::implementation::Widget::
                                        InitializeNodesAsync$_ResumeCoro$1+0x2bc

We read the return address from the function one deeper on the stack, giving us 00007ff8`5b059c8c.

00007ff8`5b059c84 add  x0,x19,#0xD8 // ← setting up the call to consume_
00007ff8`5b059c88 bl   contoso!winrt::impl::consume_Windows_Foundation_Collections_IVectorView<
                           winrt::Windows::Foundation::Collections::IVectorView<int>,
                           int>::Size
00007ff8`5b059c8c uxtw x1,w0

From the disassembly, we see that the compiler stored the IVector part of the numbers at offset 0xD8 from the coroutine frame, which is in x19.

We can pluck the coroutine frame from the dv output, or we can ask the debugger to restore the nonvolatile registers for us (which includes x19):

0:001> .frame /c 2
0:001> dps @x19+0xd8
0000017b4c834eb8  0000000000000000 // <<<<< the stored IVector 0000017b4c834ec0 0000017b4ff78400

We can ask the debugger for the layout of the std::optional so we can see the full numbers.

I copied the type name from the earlier dv output.

0:001> dt contoso!std::optional<winrt::Windows::Foundation::Collections::IVectorView<int> >
   +0x000 _Dummy           : std::_Nontrivial_dummy_type
   +0x000 _Value           : winrt::Windows::Foundation::Collections::IVectorView<int>
   +0x008 _Has_value       : Bool

Okay, so the value is kept at offset zero, and the _Has_value is at offset 8.

We can eyeball from the earlier dps command that the _Value is nullptr, and the _Has_value is false. (Little-endian means that the single bool is in the least significant byte of the 8-byte value.)

Or we can ask the debugger to interpret it for us.

0:001> dt contoso!std::optional<winrt::Windows::Foundation::Collections::IVectorView<int> > @x19+0xd8
   +0x000 _Dummy           : std::_Nontrivial_dummy_type
   +0x000 _Value           : winrt::Windows::Foundation::Collections::IVectorView<int>
   +0x008 _Has_value       : 0

Okay, so the numbers has no value.

But wait, our code checked for that!

        if (numbers == nullptr)
        {
            co_return;
        }

Why didn’t that work?

Because that’s not how std::optional works.

The std::optional is a sum type of T with a special value called std::nullopt. If you compare a std::optional against anything that isn’t std::nullopt, then you are checking if the std::optional has a value that matches the value you are comparing against.

std::optional holds compared with
std::nullopt Y
std::nullopt true false
X false X == Y

For the purposes of comparison, an empty std::optional is treated as if it had the value std::nullopt, which is a value distinct from the value values of T.

Therefore, writing if (numbers == nullptr) means “if numbers has a value that is equal to nullptr“.

But in this case, numbers has no value, so the comparison fails, and we fall through.

Then we dereference the *numbers, which is specified to retrieve the wrapped value, and it is undefined behavior if the numbers has no value.

In our case, the numbers indeed has no value, so we have entered the world of undefined behavior. In practice, what happens is that we read whatever is in _Value, and we saw in the debugger, that _Value holds a null pointer. We then try to call Size() on a null pointer and crash.

One fix is to change the test from if (numbers == nullptr) to if (!numbers.has_value()) to ask whether the numbers is empty.

But this is working too hard.

The use of std::optional*<T> was itself unnecessary. There is already a natural empty value for IVector, namely nullptr. So we can declare numbers as an IVector, which default-initializes to nullptr, and then check whether it is still nullptr after we try (and possibly fail) to get a value.

winrt::IAsyncAction Widget::InitializeNodesAsync()
{
    auto lifetime = get_strong();
    winrt::IVectorView<int32_t> numbers; // remove std::optional
    co_await winrt::resume_background();
    CallWithRetry([&] {
        numbers = GetMagicNumbers();
    });

    if (numbers == nullptr)
    {
        co_return;
    }

    co_await winrt::resume_foreground(m_uithread);

    std::vector<winrt::Node> nodes;
    nodes.reserve(numbers.Size()); // remove the *

    ⟦...⟧

This change also covers the case where GetMagicNumbers succeeds but returns a null pointer.

In practice, GetMagicNumbers never returns a null pointer because it knows that the empty set is not the same as no set at all. The original code was testing against something that never happens.

The post The case of the crash on a null pointer even though we checked it for null appeared first on The Old New Thing.

Tagged:

Leave a Reply

Your email address will not be published. Required fields are marked *