Azure Kinect DK Part 1: Displaying depth and IMU data with C#
The The Azure Kinect DK is a developer kit touted as the ultimate artificial intelligence sensors for computer vision and speech models. Released in March 2020 it’s the successor to the older Kinect devices and has a load of resources available for C developers but not a lot of information out there for C# developers.
This series of tutorials aims to fill that void by showing you how to connect and utilise the sensor features progressively in 3 parts:
Part 1 (this tutorial): We’ll create a WPF app, import the Microsoft.Azure.Kinect.Sensor nuget package, connect to the camera and output it’s sensor data before creating a composite view of the colour and depth cameras.
Part 2: We’ll be adding the Microsoft.Azure.Kinect.BodyTracking SDK and we’ll detect bodies and show joint information overlaid onto the colour camera.
Part 3: We’ll join the build to an Azure Custom Vision cloud instance and detect soft drink brands and their distance from the camera in real time.
Resources
All of the source code is available on GitHub and you can follow along with the video below.
Primer
There are a couple of quick concepts to discuss before we jump into the code. Firstly, the device itself and the sensor offset.
If you look at the image here you’ll see that the 12 MP RGB video camera camera (3) sits on the front-left of the device whilst the 1 MP depth sensor (1) sits perfectly centre on the device.
In order to combine the two outputs into a single video stream we need to do some offset transformations, luckily the SDK gives us some handy tools for that.
Secondly, we’ll be using parts of the Accelerometer and gyroscope (4) but we’ll just be outputting the data. We won’t be using it in these tutorials for AI purposes. We also won’t be using the 7-microphone array (2). I may do a tutorial using AI speech models in future if there’s enough call for it. Hit me up on Twitter if you’re interested in AI with speech and the Kinect DK.
We’ll be creating a simple C# WPF app with an MVVM pattern so if you’re unsure about Dependency properties and the like, it might be worth brushing up on them by reading through Microsoft’s dependency properties overview documentation.
Finally, we’ll be getting the camera output in a background thread and marshaling it back to the UI thread using the UI SynchronizationContext so it might be worth having a look at that if you’re not familiar with multithreaded apps in WPF.
Let’s set up the project and view
Fire up Visual Studio and create a new WPF App (.NET Core). Our app is going to have a simple 2x2 grid where the video output (Image) spans across two rows in the first column and the second column contains a combo box for our camera selection and a datagrid for our IMU data output.
Open your MainWindow.xaml file and change the grid so that it matches the markup below:
<Grid> <Grid.RowDefinitions> <RowDefinition Height="Auto" /> <RowDefinition Height="*" /> </Grid.RowDefinitions> <Grid.ColumnDefinitions> <ColumnDefinition Width="*" /> <ColumnDefinition Width="Auto" /> </Grid.ColumnDefinitions> <Image Grid.Row="0" Grid.RowSpan="2" Grid.Column="0" MinWidth="300" MinHeight="300"/> <StackPanel Margin="5" Grid.Row="0" Grid.Column="1"> <Label Content="Camera source:" /> <ComboBox /> </StackPanel> <DataGrid Margin="5" Grid.Column="1" Grid.Row="1" /> </Grid>
Don’t worry that we’ve not bound any of the controls yet, we’ll do that shortly. Next we need to create a view model which will house our data and be responsible for communicating with the camera. Since we’re using an MVVM pattern we’ll to make it implement INotifyPropertyChanged.
Create a new class named KinectViewModel and implement the interface. Your code should look something like this:
public class KinectViewModel : INotifyPropertyChanged { public event PropertyChangedEventHandler PropertyChanged; }
Setting up the camera selection combo box
The job of our ComboBox is to allow the user to switch between camera modes. We’ll use an enum in code and a simple class to represent the options. Add these to the bottom of your KinectViewModel.cs file:
public enum OutputType { Colour, Depth, IR } public class OutputOption { public string Name { get; set; } public OutputType OutputType { get; set; } }
We need to set up some dependency properties for the camera selection combobox. Add the following to your view model:
private OutputOption _selectedOutput; public OutputOption SelectedOutput { get => _selectedOutput; set { _selectedOutput = value; PropertyChanged?.Invoke(this, new PropertyChangedEventArgs("SelectedOutput")); } } public ObservableCollection<OutputOption> Outputs { get; set; }
We need to bind our view to our ViewModel as it’s data context and also instantiate the values for our combobox. Open up the MainWindow.xaml.cs and add the following:
private readonly KinectViewModel _viewModel; public MainWindow() { InitializeComponent(); _viewModel = new KinectViewModel(); DataContext = _viewModel; }
Now add a constructor to the KinectViewModel to set up the camera options:
public KinectViewModel() { Outputs = new ObservableCollection<OutputOption> { new OutputOption{Name = "Colour", OutputType = OutputType.Colour}, new OutputOption{Name = "Depth", OutputType = OutputType.Depth}, new OutputOption{Name = "IR", OutputType = OutputType.IR} }; SelectedOutput = Outputs.First(); }
Finally, update your Combobox in MainWindow.xaml file to bind to your dependency property:
If you fire up the application now you’ll see that the ComboBox is working.
<ComboBox ItemsSource="{Binding Path=Outputs}" SelectedItem="{Binding Path=SelectedOutput}" DisplayMemberPath="Name" />
Connecting to the Kinect and starting background threads
Next up we need to connect to the Kinect. As I mentioned earlier we’ll be getting information from the Kinect sensors on background threads and so that’s what we’ll be setting up next. In order to kick off these threads and set up the camera properly we’ll rely on the Loaded event of the main window, we’ll also used the Closed event to tidy up.
Add Loaded and Closed event handlers to the Window element in MainWindow.xaml:
<Window ... Loaded="Window_Loaded" Closed="Window_Closed">
Now in the in the main window we’re going to use those event hooks to call methods StartCamera() and StopCamera() in the ViewModel. Add the following code to the MainWindow.xaml.cs:
private void Window_Loaded(object sender, RoutedEventArgs e) { _viewModel.StartCamera(); } private void Window_Closed(object sender, EventArgs e) { _viewModel.StopCamera(); }
Add the Microsoft.Azure.Kinect.Sensor SDK nuget package to your project. We’re going to need 4 new methods in our view model for handling the camera:
StartCamera() - responsible for initialising and connecting to the Kinect sensor.
StopCamera() - responsible for tearing down the resources used and stopping the kinect sensor.
ImuCapture() - responsible for querying the sensor information of the kinect device on the background thread.
CameraCapture() - responsible for capturing the imagery from the device cameras
Create the stubs for these camera methods in your KinectViewModel.cs:
internal void StopCamera() {} internal void StartCamera() {} private void ImuCapture() {} private void CameraCapture() {}
In order to retain state we’re going to need to keep a hold of a few variables. The Azure Kinect DK has the ability to chain multiple sensors together to get a more holistic view of a surrounding, we’re however only going to be working with 1 device in this tutorial so we’ll keep a reference to that Device instance.
Because we’ll be firing up two new threads for the IMU and camera data, it can sometimes get a bit messy when tidying up those threads and so they’ll loop while a variable (_applicationIsRunning) is set to true. This allows us to stop the application in the main thread, and give the background threads time to finish their current loop before closing the app.
In order to ensure we’re updating our dependency properties on the UI thread, we’ll also need to keep a reference to the SynchronizationContext for the UI thread.
Add the following variables to your view model:
private bool _applicationIsRunning = true; private Device _device; private SynchronizationContext _uiContext;
You’ll also want a using statement for the Kinect SDK:
using Microsoft.Azure.Kinect.Sensor;
Now let’s jump into connecting to the sensor itself. Update your StartCamera() method as follows:
internal void StartCamera() { if (Device.GetInstalledCount() == 0) { Application.Current.Shutdown(); } _device = Device.Open(); var configuration = new DeviceConfiguration { ColorFormat = ImageFormat.ColorBGRA32, ColorResolution = ColorResolution.R1080p, DepthMode = DepthMode.WFOV_2x2Binned, SynchronizedImagesOnly = true, CameraFPS = FPS.FPS30 }; _device.StartCameras(configuration); _device.StartImu(); _uiContext = SynchronizationContext.Current; Task.Run(() => { ImuCapture(); }); Task.Run(() => { CameraCapture(); }); }
What this code is doing is first checking that any devices are connected at all before opening the default device and retaining a reference to it with the _device = Device.Open() call.
Next we start the camera with our desired configuration. We’re asking for the colour camera to start at 1080p (30 frames per second) with a 32 bit colour format and we’re asking for the depth and passive IR cameras to be started at captured at 512x512 with a wide field of view. The SynchronizedImagesOnly field ensures that our images from both cameras are synchronised.
We’re also asking the device to start spitting out IMU data with the _device.StartImu(), keeping a reference to the UI context and then running the ImuCapture() and CameraCapture() methods on background threads.
Before we jump into our ImuCapture() method implementation, let’s finish off the StopCamera() implementation which should set _applicationIsRunning, give the background threads 1 second to finish off then stop the sensors and dispose of the device:
internal void StopCamera() { _applicationIsRunning = false; Task.WaitAny(Task.Delay(1000)); _device.StopImu(); _device.StopCameras(); _device.Dispose(); }
Collecting IMU data
Our DataGrid we created earlier will be the host of our IMU output. In order to do that we’re going to need a class to hold the data and a new dependency property on our view model. Create a new class to hold each row of IMU data:
public class CameraDetailItem : INotifyPropertyChanged { public event PropertyChangedEventHandler PropertyChanged; public string Name { get; set; } private string _value; public string Value { get => _value; set { _value = value; PropertyChanged?.Invoke(this, new PropertyChangedEventArgs("Value")); } } }
Now add an observable collection to your ViewModel:
public ObservableCollection<CameraDetailItem> CameraDetails { get; set; } = new ObservableCollection<CameraDetailItem>();
We’re basically going to get every piece of data from the Imu capture, insert it into our observable collection if it doesn’t already exist and update the value. Let’s create a helper method to do that which will only add to the observable collection on the UI thread but will always invoke the PropertyChanged event to tell our view to update itself:
private void AddOrUpdateDeviceData(string key, string value) { var detail = CameraDetails.FirstOrDefault(i => i.Name == key); if (detail == null) { detail = new CameraDetailItem { Name = key, Value = value }; _uiContext.Send(x => CameraDetails.Add(detail), null); } detail.Value = value; PropertyChanged?.Invoke(this, new PropertyChangedEventArgs("CameraDetails")); }
Now let’s complete our ImuCapture() method. The method needs to call the device GetImuSample() and update each of the properties we want to report on with our helper method. We’ll also round off and tidy up a few of our values so that the display isn’t rapidly changing with long float values:
private void ImuCapture() { while (_applicationIsRunning) { try { var imu = _device.GetImuSample(); AddOrUpdateDeviceData("Accelerometer: Timestamp", imu.AccelerometerTimestamp.ToString(@"hh\:mm\:ss")); AddOrUpdateDeviceData("Accelerometer: X", (Math.Round(imu.AccelerometerSample.X, 1)).ToString()); AddOrUpdateDeviceData("Accelerometer: Y", (Math.Round(imu.AccelerometerSample.Y, 1)).ToString()); AddOrUpdateDeviceData("Accelerometer: Z", (Math.Round(imu.AccelerometerSample.Z, 1)).ToString()); AddOrUpdateDeviceData("Gyro: Timestamp", imu.GyroTimestamp.ToString(@"hh\:mm\:ss")); AddOrUpdateDeviceData("Gyro: X", (Math.Round(imu.GyroSample.X, 0)).ToString()); AddOrUpdateDeviceData("Gyro: Y", (Math.Round(imu.GyroSample.Y, 0)).ToString()); AddOrUpdateDeviceData("Gyro: Z", (Math.Round(imu.GyroSample.Z, 0)).ToString()); AddOrUpdateDeviceData("Temperature: ", (Math.Round(imu.Temperature, 1)).ToString()); } catch (Exception ex) { _applicationIsRunning = false; MessageBox.Show($"An error occurred } }
Last thing we need to do is update our datagrid in the xaml to use our new collection as it’s data source. Update your datagrid to the following:
<DataGrid Margin="5" Grid.Column="1" Grid.Row="1" ItemsSource="{Binding CameraDetails}"/>
Getting the colour and IR camera output
Next thing we need to do is start getting output from our cameras. We’re going to rely on the user ComboBox selection to determine which camera output and we’ll update an ImageSource property on each loop in our CameraCapture() method.
Start by creating the image source in the ViewModel:
private ImageSource _bitmap; public ImageSource CurrentCameraImage => _bitmap;
It doesn’t need to be a full dependency property at this point as we’ll be raising the PropertyChanged event argument inside CameraCapture().
In our CameraCapture() routine we’ll be calling on the device GetCapture() which returns a Capture object consisting of three Sensor Image objects containing the buffered pixels and metadata for the IR, colour and depth camera.
It’s important to note that this image object isn’t the same as the image class in System.Drawing or System.Windows.Media.Imaging so we’ve got a bit of work to do with it to convert it to something usable in our view. Let’s create a helper class to convert the Image object to a BitmapSource object for our view:
public static class Extensions { public static BitmapSource CreateBitmapSource(this Image image, double dpiX = 300, double dpiY = 300) { PixelFormat pixelFormat; using (Image reference = image.Reference()) { switch (reference.Format) { case ImageFormat.ColorBGRA32: pixelFormat = PixelFormats.Bgra32; break; case ImageFormat.Depth16: case ImageFormat.IR16: pixelFormat = PixelFormats.Gray16; break; default: throw new AzureKinectException($"Pixel format cannot be converted to a BitmapSource"); } // BitmapSource.Create copies the unmanaged memory, so there is no need to keep // a reference after we have created the BitmapSource objects unsafe { using (var pin = reference.Memory.Pin()) { BitmapSource source = BitmapSource.Create( reference.WidthPixels, reference.HeightPixels, dpiX, dpiY, pixelFormat, /* palette: */ null, (IntPtr)pin.Pointer, checked((int)reference.Size), reference.StrideBytes); return source; } } } } }
Now that we have a helper method, we can create the loop for our CameraCapture method which is running on a background thread:
private void CameraCapture() { while (_applicationIsRunning) { try { using (var capture = _device.GetCapture()) { switch (SelectedOutput.OutputType) { case OutputType.Depth: PresentDepth(capture); break; case OutputType.IR: PresentIR(capture); break; case OutputType.Colour: default: PresentColour(capture); break; } PropertyChanged?.Invoke(this, new PropertyChangedEventArgs("CurrentCameraImage")); } } catch (Exception ex) { _applicationIsRunning = false; MessageBox.Show($"An error occurred } }
Just a quick rundown of this code. Again, while the application is running we get the Capture object by calling _device.GetCapture(). We then check which video output is currently selected in our ComboBox before passing the capture to a specific method for each type. Finally, the PropertyChanged event is always invoked to update our UI.
Our colour and IR images are quite straightforward. We simply need to convert them to BitmapSource and we’ll be able to use them straight away. Go ahead and fill them in and stub out the PresentDepth() method which we’ll complete shortly:
private void PresentColour(Capture capture) { _uiContext.Send(x => { _bitmap = capture.Color.CreateBitmapSource(); _bitmap.Freeze(); }, null); } private void PresentIR(Capture capture) { _uiContext.Send(x => { _bitmap = capture.IR.CreateBitmapSource(); _bitmap.Freeze(); }, null); } private void PresentDepth(Capture capture) { }
As you can see, we simply update our bitmap source on the UI thread in both methods with our extension we created earlier. We just need to wire up the image source in our xaml and we should be able to start seeing colour and IR images. Change your Image element in MainWindow.xaml as follows:
<Image Source="{Binding CurrentCameraImage}" Grid.Row="0" Grid.RowSpan="2" Grid.Column="0" MinWidth="300" MinHeight="300"/>
Final step - outputting depth
The final (and best) step in this is to combine the depth camera output with the colour camera output to create a colour coded video image which changes the colour depending on distance from the camera.
I briefly mentioned earlier that to do this we need to do some transformations to the captured image. Luckily for us, the SDK gives us a class that helps us do this, the Transformation class. This class has methods for converting the colour image to the depth camera perspective and vice versa, it also has methods for converting the depth image to a point cloud and also a custom image type which we’ll use in the next tutorial for skeleton tracking.
Go ahead and create a couple more variables in your ViewModel for retaining the bits we need:
private Transformation _transformation; private int _colourWidth; private int _colourHeight;
We need to capture these references when we start the camera. In our StartCameras() method, right after the call to _device.StartImu() add the following code:
var calibration = _device.GetCalibration(configuration.DepthMode, configuration.ColorResolution); _transformation = calibration.CreateTransformation(); _colourWidth = calibration.ColorCameraCalibration.ResolutionWidth; _colourHeight = calibration.ColorCameraCalibration.ResolutionHeight;
Now all we need to do is combine the depth and colour capture images in our PresentDepth() method we created earlier. What we’re going to do is loop through every pixel in the output and shade the pixel depending on the distance from the camera.
We’ll show the full colour pixel if it’s less than 1 metre away.
If the pixel is between 1m-1.2m we’ll shade it red,
1.2m-1.5m we’ll shade it green,
1.5m - 2m we’ll shade it blue,
over 2m we’ll black out the pixel.
The full code for this method is here, I’ll step through it afterwards:
using (Image outputImage = new Image(ImageFormat.ColorBGRA32, _colourWidth, _colourHeight)) using (Image transformedDepth = new Image(ImageFormat.Depth16, _colourWidth, _colourHeight, _colourWidth * sizeof(UInt16))) { // Transform the depth image to the colour capera perspective. _transformation.DepthImageToColorCamera(capture, transformedDepth); _uiContext.Send(x => { // Get the transformed pixels (colour camera perspective but depth pixels). Span<ushort> depthBuffer = transformedDepth.GetPixels<ushort>().Span; // Colour camera pixels. Span<BGRA> colourBuffer = capture.Color.GetPixels<BGRA>().Span; // What we'll output. Span<BGRA> outputBuffer = outputImage.GetPixels<BGRA>().Span; // Create a new image with data from the depth and colour image. for (int i = 0; i < colourBuffer.Length; i++) { // We'll use the colour image if the depth is less than 1 metre. outputBuffer[i] = colourBuffer[i]; var depth = depthBuffer[i]; if (depth == 0) // No depth image. { outputBuffer[i].R = 0; outputBuffer[i].G = 0; outputBuffer[i].B = 0; } if (depth >= 1000 && depth < 1200) // More than a meter away. { outputBuffer[i].R = Convert.ToByte(255 - (255 / (depth - 999))); } if (depth >= 1200 && depth < 1500) { outputBuffer[i].G = Convert.ToByte(255 - (255 / (depth - 1199))); } if (depth >= 1500 && depth < 2000) { outputBuffer[i].B = Convert.ToByte(255 - (255 / (depth - 1499))); } if (depth >= 2000) { outputBuffer[i].Value = 0; } } _bitmap = outputImage.CreateBitmapSource(); _bitmap.Freeze(); }, null); }
Whoa, ok, that’s a lot I know. Let’s break it down…
First we prepare two new images in our using statement:
outputImage is our image that we’ll create the bitmap source with and has the same aspect ratio as the colour camera.
transformedDepth is an image that we’re going to perform a translation on to get depth pixels in the same aspect ratio as the colour camera.
Next we perform the transform on the depth image using our transformation object with the DepthImageToColorCamera() method.
Using _uiContext.Send() we jump back into the UI thread and create three memory spans (a memory-safe representation a contiguous region of memory):
depthBuffer - each byte contains a depth representative of the colour image in mm from the camera or 0 when not known.
colourBuffer - each byte directly copied from the colour camera output.
outputBuffer - is our memory safe access to the outputImage object which we’ll be setting with pixels from the colourBuffer but manipulated according to the depth from the depthBuffer.
Now in our for-loop we stride through every image in the colour camera output, copying the image to our outputBuffer.
We pull the corresponding transformed depth camera pixel value into a variable named depth which represents the distance in millimetres.
Because of the offset of the camera lenses, some pixels will have a value of 0 so we set the colour of the outputBuffer pixel to black.
Finally, for each of our distances we max the R, G or B value according to distance.
Since outputBuffer has been our memory safe access to outputImage we can now call our helper method from earlier to set the BitmapSource for our view.
We’re done. Fire the application again and you should be able to see depth output when you select the depth option from the combo box. Here’s a picture of me holding a 1m spirit level so you can see the output:
What’s next?
If you’ve managed to stick through the tutorial to here then well done. It’s a long one!
In the next installment we’ll be expanding the app with two new options using the body tracking SDK to track bodies and skeletons. See you next time!