Wednesday, December 16, 2009

Driver to Service communication- Don’t reinvent the wheel!

Most of us have at one time or the other implemented "something" that was required for the communication between the user mode service and the kernel mode driver. Things are pretty straightforward when the communication is triggered from the service. For example, the service wants to send a list of files to be excluded from whatever magic the driver is doing. Now, let's talk about the communication originating from the driver. What then? Those who have implemented something like this know that it is not as straightforward as the previous case. Ofcoure, if you are writing a file system minifilter, then you are good. Just call FltSendMessage and you are done.

So, let's see if we can make use of the filter manager's communication port functionality to send and receive messages in our driver even if our driver is not a FS minifilter. Let's try to divide the functionality provided by filter manager in two categories: FS filtering and communication. If you observe carefully, both of them are independent of each other. However, the initial call to register the driver with the filter manager does check some registry entries that are specific to a FS minifilter; for example the altitude. Anyways, I tried to write a driver that received process notifications using the Process manager callbacks. My objective was to be able to receive the callback and send the information to the user mode service. So, I started off by using the INF of nullfilter minifilter sample.

Note that the driver that I tried to write was a "no stack" driver- meaning that it was not attached to network stack or storage stack, etc. Still need to figure out if that will work. Anyways, coming back to the driver, I cooked up the FLT_REGISTRATION structure in the following fashion:

CONST FLT_REGISTRATION registrationData =

{

sizeof (FLT_REGISTRATION),

FLT_REGISTRATION_VERSION,

0,

NULL,

NULL,

FooUnload,

NULL,

NULL,

NULL,

NULL,

NULL,

NULL,

NULL,

};


If you observe carefully, the only non-NULL parameter in this structure (except for the first two) is the FooUnload where I have registerd the unload routine. The idea of using this kind of registration structure was to tell the filter manager that we are not interested in any FS I/O callbacks (create, read, write, etc.) or in attaching to any volumes. Note that the 7th parameter is NULL (used to specify the Instance setup callback) but it does not mean that the filter manager will not create instance corresponding to this filter for the volumes. To be able to tell filter manager not to attach to any volume, you need to register an Instance setup callback and then return STATUS_FLT_DO_NOT_ATTACH for all the instance setup callbacks. I had other reasons for letting filter manager create the instances for me, so set it to NULL. Next interesting thing to note here is that you must specify a unload routine in the registration structure. And this where things change a bit. The signature of the unload routine for a regular driver is different from that of a minifilter. So, you need to use the signature required by filter manager model and simply do all the required cleanup in this unload routine.

Next step is to call FltRegisterFilter which will register the driver with the filter manager. And then you can simply create the communication port using the FltCreateCommunicationPort call. And then? Simply start firing requests as you do in the minifilter.

Couple of points to ponder and explore:

  1. What altitude should we ask from MS for this kind of a filter? We are not going to be layered between other drivers in the FS stack. Well, it doesn't really matter what altitude you get because altitudes define the order in which you will see the I/O requests with respect to other filters. So, simply ask for a activity monitoring range.
  2. What happens if we want to attach to a storage stack? The filter manager might not be avialable that soon. Well, think of it this way, you need to communicate with you service only when the service is up. Right? So, you can delay registering your driver with the filter manager by "somehow" detecting that the user mode environment is ready. Or, if it is acceptable for your driver's functionality, you can simply put a dependency on fltmgr.sys in your driver's registry entry.
  3. Implications of LoadOrderGroup and driver type.

Will try out more experiments for the above points and post them soon.

However, there is an alternate way of achieving this. Make a common communication driver that implements this functionalty and export various communication functions to be used by other drivers that you develop. And I like this one. This export driver can load after the filter manager is up. But your driver can load even before that. And anyways, you can be sure that your communication driver is up when you "actually" want to communicate with your service. Just remember to set the start type of your export driver appropriately.

Thursday, April 2, 2009

FltLockUserBuffer locks the buffer in CORRECT process context- HOW?

The WDK documentation says "The caller can be running in any process context. FltLockUserBuffer automatically locks the buffer in the correct process context." Remember that in our legacy filters we used to call MmProbeAndLockPages in the correct process context to lock the pages? Well, the fact is that we can do the same thing that FltLockUserBuffer does, to lock the pages properly, no matter in which process context we are.

Anyways, let's see how FltLockUserBuffer locks the buffer of a process properly even if it is called from a different process context.

Let's see a portion of the disassembly for FltLockUserBuffer and it becomes obvious as to how it is able to lock the pages in correct process context.


 

f84cc355 ff15b4df4cf8    call dword ptr [fltMgr!_imp__IoThreadToProcess (f84cdfb4)]

f84cc35b 50        push eax

f84cc35c ff36        push dword ptr [esi]

f84cc35e ff1568df4cf8    call dword ptr [fltMgr!_imp__MmProbeAndLockProcessPages (f84cdf68)]

f84cc364 834dfcff    or dword ptr [ebp-4],0FFFFFFFFh

f84cc368 807de700    cmp byte ptr [ebp-19h],0


 

In the disassembly, you can see 2 functions that make things obvious: IoThreadToProcess and MmProbeAndLockProcessPages.

So, what does each function do? Well, IoThreadToProcess returns a PEPROCESS given a PETHREAD. And if you remember, FLT_CALLBACK_DATA structure already has a parameter 'Thread' which identifies the thread that initiated the I/O. So, from this thread, the target process is found using the IoThreadToProcess function. The next function is MmProbeAndLockProcessPages. As the name suggests, it locks the pages of a particular process.

Let's see a portion of the disassembly of MmProbeAndLockProcessPages.

8059c5d8 e85bb0f5ff        call nt!KeStackAttachProcess (804f7638)

8059c5dd c745e401000000    mov dword ptr [ebp-1Ch],1

8059c5e4 8975fc        mov dword ptr [ebp-4],esi

8059c5e7 ff7514            push dword ptr [ebp+14h]

8059c5ea ff7510            push dword ptr [ebp+10h]

8059c5ed ff7508            push dword ptr [ebp+8]

8059c5f0 e8c19bf6ff        call nt!MmProbeAndLockPages (805061b6)


 

As you can see, MmProbeAndLockProcessPages internally calls KeStackAttachProcess to attach to the target process for which the pages have to be locked. Once it gets attached, it then calls MmProbeAndLockPages to lock the pages!


 

Simple and sweet! J

Monday, March 23, 2009

Vectored Exception Handling

Vectored exception handling provides a generic mechanism so that the developer does not have to rely on language-dependant keywords for exception handling. Windows provides the following APIs to use VEH:

  1. AddVectoredExceptionHandler
  2. RemoveVectoredExceptionHandler
  3. AddVectoredContinueHandler
  4. RemoveVectoredContinueHandler

All the VEH are called before the structured exception handlers (SEH). A programmer can choose to be first in the chain or the last in the chain of VEH while calling AddVectoredExceptionHandler. The Exception handlers can return EXCEPTION_CONTINUE_EXECUTION or EXCEPTION_CONTINUE_SEARCH depending on whether they have fixed the problem or not. The VEH mechanism also allows the programmer to add a ContinueHandler which is called in case of unhandled exceptions. The continuehandler is called after the global unhandled exception filter returns EXCEPTION_CONTINUE_SEARCH.

A VEH example can be seen in the MSDN. For more details on VEH, check http://msdn.microsoft.com/en-us/magazine/cc301714.aspx

Sunday, February 22, 2009

Injecting Code using CreateRemoteThread.

We all know that we try to do those things that we are barred from… J A normal human tendency! Or maybe a weird human tendency!

So let's first put down what we want to achieve:

  1. We want a way to inject some code in a process (of course I am not talking about our own process).
  2. How do we make it execute?

One line answers:

  1. We can load a DLL in the process to inject the code.
  2. We need to create a thread which will execute that piece of code.


 

Let's enumerate as to what we have that we can make use of:

  1. CreateRemoteThread API to create a thread in a process.
  2. LoadLibrary API to load a DLL.


 

CreateRemoteThread has a parameter which takes the address of the function to be executed when the thread starts. Can we somehow combine points 1 & 2 enumerated above? Yes, we can pass the address of LoadLibrary in CreateRemoteThread. But there is a catch! Don't just write LoadLibrary in the following manner:

CreateRemoteThread( …, LoadLibrary, …);

This is incorrect. You need to get the address of LoadLibrary in kernel32.dll using GetProcAddress and pass that address in CreateRemoteThread.

Next step is to tell LoadLibrary the name of the DLL which needs to be loaded. But remember one thing, you can't use a string allocated in your process in the process where the DLL needs to be loaded. So you need to allocate memory in the target process using VirtualAllocEx. And then copy the path of the DLL in the allocated space using WriteProcessMemory.

Upto this point we have discussed how to inject the code; by using CreateRemoteThread, LoadLibrary, GetProcAddress, VirtualAllocEx and WriteProcessMemory.

But where should we place the code in the DLL so that it gets executed? Afterall if you make some functions in DLL and export them, the target process will not call them anyways. Let's think of the first function that gets called in the DLL when it is loaded. It is DllMain with notification type DLL_PROCESS_ATTACH. You can place the code to be executed in DllMain under the case for DLL_PROCESS_ATTACH.

Simple! Isn't it? But this method faces a problem in Vista. This is because the CreateRemoteThread API does not work across sessions. So all process that run in a different session cannot be targeted using this method.

Monday, February 16, 2009

System call hooking - I

In this post I will be telling you about system call hooking by patching the System Service Descriptor Table (SSDT). But before that let's see what SSDT actually is, who uses it and other BLAH! BLAH!

SSDT is created during the initialization of NTOSKRNL and is used by KiSystemService() to look up entry points of native APIs. KiSystemService is the handler for INT 2Eh/ SYSENTER. Now let us take a look at the structure of this table.

typedef struct SERVICE_DESCRIPTOR_TABLE {

PNTPROC pServiceTable; // Array of entry points

PULONG pdwCounterTable; // Array of usage counters

ULONG dwServiceLimit;     // No. of table entries

PUCHAR pArgumentTable;     // Array of byte counts

} SERVICE_DESCRIPTOR_TABLE, *PSERVICE_DESCRIPTOR_TABLE;

There are two service descriptor tables present in the system: KeServiceDescriptorTable and KeServiceDescriptorTableShadow. Ntoskrnl exports the KeServiceDescriptorTable but not the shadow one. Both of them are defined as an array of SERVICE_DESCRIPTOR_TABLE. Something like SERVICE_DESCRIPTOR_TABLE KeServiceDescriptorTable[NoOfTables].

In both KeServiceDescriptorTable and KeServiceDescriptorTableShadow, the first table (KeServiceDescriptorTable[0] and KeServiceDescriptorTableShadow[0]) contains addresses of Nt APIs.

Let's suppose that we have to hook NtCreateFile. Obviously, we need a mechanism to replace the address of the entry point of NtCreateFile with address of our hook function. Let's do it step by step.

Step 1: Finding an entry for any given function in the SSDT (KeServiceDescriptorTable). Entry for any function 'func' is given by KeServiceDescriptorTable[0].pServiceTable[*(PULONG) ((PUCHAR) func + 1 ))]

Step 2: Read the value present in the entry corresponding to NtCreateFile and save it.

Step 3: Change the value for that entry with address of your hook function.

Apart from the above steps, you need to disable to the write protect bit that is set before writing to the table and then enable it again.

I intentionally did not put the code here! J

Sunday, February 15, 2009

Hooking has its disadvantages!

I actually wanted to write on how to hook the SSDT. But then I realized why not give a warning of why you should not do a thing and then go on to actually tell you how to do that wrong thing. Many of us think that once you hook the SSDT we become the master and we can almost do anything on the system. But at what cost do we do this? Do we go on to make the system unstable? Or it does not really make a difference because we get our work done, and as others already say that Windows crashes anyways, so why should I care much? In my opinion, the stability of the system should not be hampered with. And hooking makes the system unstable.

I will tell you two main things that I can think of because of which you should not hook:

  1. The most obvious reason that comes to our mind is: Hooking is not supported on 64 bit Windows.
  2. Drivers that hook the SSDT cannot unload.
  • If your driver unloads when some function in your driver is still executing, you can be sure of a BSOD. Assume that the function in ntdll.dll calls your hook function in the driver and then you call the original function. Let's say before the original function returns to your hook function, your driver unloads. This will clear the execute flag from the pages that were previously used by your driver's code. And when the original function returns, you will get a BSOD.
  • The chaining of multiple hooks can cause problem. Let's say a driver hooks a function 'F' and changes the entry in SSDT for function 'F' to point to a hook function 'H1' and saves the address of original function 'F'. Let's say a second driver now comes and hooks the same function 'F'. This time, however, SSDT will contain the address of 'H1'. The second driver reads that address and saves it and then goes on to set the address of its own hook function 'H2' in the SSDT. Till now everything is fine. But what if the first driver decides to unload? It will simply replace the address that it had preciously read and saved. Now, SSDT will once again point to the original function 'F'. One problem that comes up is that the hook of the second driver stops getting called. The second (a more serious problem) arises when the second driver decides to unload after the first driver. It patches the SSDT back with what it had read and saved. So, it sets the address of 'H1' in SSDT. Now when the call for function 'F' comes, the OS tries to call function 'H1' because SSDT contains its address. But the function 'H1' does not exist anymore and hence you get a BSOD.

Having told this, I will now go on to tell about how to actually patch the SSDT in a new post.


 

Saturday, February 14, 2009

I don’t want to play in ‘Safe mode’

Safe mode is that chosen state of the Windows OS in which it is expected to run incase you face any problems while running in normal mode. The Windows OS is selective in starting the drivers in safe mode. Except for the boot time drivers, it chooses not to start any driver when a user chooses to boot the OS in safe mode. But what if you have a boot time driver? How do you prevent your driver from being loaded?

Actually it is pretty simple to detect in your driver whether the system is being booted in safe mode. The kernel exports a variable of type PULONG called InitSafeBootMode. Just declare the variable as extern PULONG InitSafeBootMode. In your driver, you can check the value of this variable. If it is 0, it means that the system has NOT been booted in safe mode. Value greater than 0 indicates that the system is booting in safe mode. There are three types of safe mode boot: Minimal (SAFEMODE_MINIMAL), Network (SAFEMODE_NETWORK) and DS Repair (SAFEMODE_DSREPAIR).

It is always a good practice to check this value in your driver and take a decision whether you are prepared to run in safe mode or not.

Queued Spin Locks- Spinning with discipline!

Most of you know about spin locks and yet many of you might not know about queued spin locks. Queued spin locks are available from Windows XP onwards. They are more efficient than the traditional spin locks. There are two advantages that queued spin locks offer:

  1. If multiple threads request the same lock, the threads are queued in the order of their request.
  2. Queued spin locks test and set a variable that is local to the current CPU, and hence generate less bus traffic.


 

How to use queued spin locks?


 

KSPIN_LOCK             qsl;

KLOCK_QUEUE_HANDLE    qh;

KeinitializeSpinLock( &qsl );

KeAcquireInStackQueuedSpinLock( &qsl, &qh );

// Do some work… Don't take too long.

// Remember that you are holding a spin lock

KeReleaseInStackQueuedSpinLock( &qh );


 

As opposed to KeAcquireSpinLock, the caller of KeAcquireInStackQueuedSpinLock does not need to store the current IRQL. This is automatically stored in the KLOCK_QUEUE_HANDLE structure when a call to KeAcquireInStackQueuedSpinLock is made.

Ah! One more thing: Don't ever use KeAcquireSpinLock and KeAcquireInStackQueuedSpinLock on the same spin lock!

Do you really need to hook it?

An evergreen question 'How should I hook...? I want to monitor/ block...' How many of us really look up before asking this question? Hooking has been a classic old way of doing things, but time changes and so do the techniques. Hooking is powerful, but it is bad. Use hooking when you have no other alternative. I have seen many people use hooking just out of ignorance; they don't know that a documented way exists to the same thing.

Remember the classic old tool Regmon? The initial versions of regmon used system call hooking. But then as newer techniques with better features supported got introduced, Regmon started using them. Regmon uses system call hooking till Windows XP and uses registry callback mechanism Windows 2003 onwards.


 

I like to give the example of registry filtering for making people understand when they should use hooking. Let's take 3 tasks:

  1. Monitoring registry calls.
  2. Blocking registry operations.
  3. Modifying data for a registry operation.


 

Let's first take the example of a person who has just started Windows programming and has become aware of the concept of hooking. The first thing that might come in his mind is that all three tasks are do-able by system call hooking of the registry operations. But is it really so? He needs to ask himself two things: Which versions of OS (NT/ 2K/ XP/ 2K3, Vista/ 2K8) do I need to support along with the platforms (x86/ x64/ IA64)? And if that answer includes x64 or IA64, his idea of system call hooking has blown apart by now. Because system call hooking is not supported on 64 bit Windows. A special component of the OS called the Patch Guard prevents system call hooking on 64 bit Windows. But be very sure that on NT and 2K you have no other mechanism other than hooking for doing any of the three tasks.

Before we go any further, let me just tell you that there is registry callback mechanism in Windows starting from Windows XP. But how useful it actually is depends on the functionality that we want to achieve. Anyways, let's take each task one by one.

  1. Monitoring registry calls: Just by the look of it, we feel that if registry callback mechanism is present the bare minimum support that it should provide is that of giving plain notification like callbacks with both the pre-operation callback and a post-operation callback. But life is not so good! The registry callback mechanism of Windows XP is like an engineering project that a student tries to get over with without thinking much about the required functionality. The point is that the registry callback mechanism in Windows XP does not even provide post callbacks for all operations. So what now? Does the developer have an option? Yes, of course. He should use system call hooking for Windows XP. But what about Windows XP x64? I have seen many people ask this question. But the fact is that Windows XP x64 is actually not Windows XP from the inside; it's just Windows XP from the outside. Windows XP x64 is build from the same source code as Windows 2003, and hence is different from Windows XP x86. So, all the functionality that Windows 2003 x64 has, Windows XP x64 also has. Now, what about Windows 2003, Vista and 2008? The registry callback mechanism in 2003 is better than XP but not as good as in Vista. But this task of monitoring most of the registry calls can be achieved in both OS by the callback mechanism.
  2. Blocking registry operations: As such the blocking of registry calls can be achieved by callback mechanism in XP, 2003 and Vista/ 2008. However, since the monitoring functionality itself cannot be achieved properly in XP, hooking still remains the option for XP. For 2003 and Vista/ 2008, the callback mechanism does a good job.
  3. Modifying data for a registry operation: This is one task that gets a little tricky and questionable. For XP, hooking still remains the technique for achieving this. For Vista/ 2008, the callback mechanism supports it properly. However, the problem comes for Windows 2003. You get stuck in both the techniques: The callback mechanism does not support modification of parameters (though the documentation said that the parameters could be modified. But that is wrong!) And if you choose hooking you won't be able to support Windows 2003 64 bit. Here comes a need for business decision! What does a customer require and what is do-able?

Above mentioned example is one of the simplest ones. However, in reality I have seen that some security products really need to hook. But again, they fail to provide that functionality in their 64 bit versions because the hooking mechanism fails there. Again, I have seen some people that do not want to do things in a documented fashion; they WANT to hook! This approach is not good. There can be an endless discussion on hooking vs. documented way, but I believe that the decision should be taken based on the functionality required and the built-in features available from the OS to do it in a documented fashion. Though interesting, but don't always put a hook because that may make you a crook!