Photo by Hunter Harritt on Unsplash

Live Video Inferencing Part 3 DetectNet

Our Goal: to create a ROS node that receives raspberry Pi CSI camera images, runs Object Detection and outputs the result as a message that we can view using rqt_image_view.

Object Detection

We will be generating bounding boxes around objects detected in the image. Unlike image classification, the model can detect multilpe objects per image.

Pre-requisites

Completed the previous post here

Live Video Inferencing (DetectNet)

If you have completed the previous tutorial then this code will look very similar.

First lets put down some detectnet node code.

Create a file named node_imagetaker_detectnet.cpp in your <catkin-workspace-location>/<package-name>/src/ directory

And add the following to it, we will describe what each portion of the code does

Let’s add our needed imports

#include <iostream>
#include <ctime>
#include <sstream>
#include <ros/ros.h>

#include <image_transport/image_transport.h>
#include <cv_bridge/cv_bridge.h>

#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgcodecs.hpp>

#include <sensor_msgs/image_encodings.h>
#include <sensor_msgs/Image.h>

// the API to our segmentation model
#include <jetson-inference/detectNet.h>
// allows us to use the gpu for image operations
#include <jetson-utils/cudaMappedMemory.h>

#include <jetson-utils/loadImage.h>

#include "image_converter.h"

#include <unordered_map>

using namespace cv;
using namespace std;
  • add our globals
// globals
detectNet* net = NULL;

// image converter object
imageConverter* cvt = NULL;

// where we will publish our results too
ros::Publisher* result_pub    = NULL;
  • Add our image callback that will be called when we receive a message from the camera publisher.
  • This is where the image will be passed into the AI model for inferencing. We will also overlay the resulting bounding box here.
// input image subscriber callback
void img_callback( const sensor_msgs::ImageConstPtr& input )
{
    ROS_INFO ("Received Image");
    
    // convert sensor_msgs[rgb] to opencv[brg]
    cv_bridge::CvImagePtr cv_ptr;
    cv_bridge::CvImagePtr cv_ptr_flip; // pointer for flipped image
    try
    {
      cv_ptr = cv_bridge::toCvCopy(input, sensor_msgs::image_encodings::BGR8);
      cv_ptr_flip = cv_bridge::toCvCopy(input, sensor_msgs::image_encodings::BGR8);
    }
    catch (cv_bridge::Exception& e)
    {
      ROS_ERROR("cv_bridge exception: %s", e.what());
      return;
    }
    
    // we are doing a 180 deg flip since
    // my camera is upside down
    const int img_flip_mode_ = -1;
    // flip the image
    cv::flip(cv_ptr->image, cv_ptr_flip->image, img_flip_mode_);
    
    // convert converted image back to a sensor_msgs::ImagePtr
    // for use with nvidia / other ML algorithms
    sensor_msgs::ImagePtr flippedImage = cv_ptr_flip->toImageMsg();

    // convert the image TO reside on GPU
    // the converting TO and converting FROM are the SAME funtion name
    // with different signatures
    if( !cvt || !cvt->Convert(flippedImage) )
    {
        ROS_ERROR (
            "failed to convert %ux%u %s image",
            flippedImage->width,
            flippedImage->height,
            flippedImage->encoding.c_str()
        );
        return;
    }
    else {
        ROS_INFO (
            "Converted %ux%u %s image",
            flippedImage->width,
            flippedImage->height,
            flippedImage->encoding.c_str()
        );
    }
    
    // classify the image
    detectNet::Detection* detections = NULL;

    /*
    enum OverlayFlags
    {
        OVERLAY_NONE       = 0, // < No overlay.
        OVERLAY_BOX        = (1 << 0),  //< Overlay the object bounding boxes
        OVERLAY_LABEL      = (1 << 1), //< Overlay the class description labels
        OVERLAY_CONFIDENCE = (1 << 2), //< Overlay the detection confidence values
    };
    */
    // detect AND render the all possible info i.e
    // bounding box, label, confidence
    const uint32_t flags = detectNet::OVERLAY_BOX | detectNet::OVERLAY_LABEL | detectNet::OVERLAY_CONFIDENCE;
    const int numDetections = net->Detect(
        cvt->ImageGPU(),
        cvt->GetWidth(),
        cvt->GetHeight(),
        &detections,
        flags
    );

    // if an error occured
    if( numDetections < 0 )
    {
        ROS_ERROR(
            "failed to run object detection on %ux%u image",
            flippedImage->width,
            flippedImage->height
        );
        return;
    }
    
    /*
    // if no detections
    if( numDetections == 0 )
    {
        ROS_INFO(
            "detected %i objects in %ux%u image",
            numDetections,
            flippedImage->width,
            flippedImage->height
        );
        return;
    }
    */
    
    
    // if objects were detected, send out message
    if( numDetections >= 0 )
    {
        ROS_INFO(
            "detected %i objects in %ux%u image",
            numDetections,
            flippedImage->width,
            flippedImage->height
        );
        
        
        // get our image with overlays back from the GPU
        // the converting TO and converting FROM are the SAME funtion name
        // with different signatures
        sensor_msgs::Image outputImage;
        // outputImage.header.stamp = ros::Time::now();
        if(!cvt || !cvt->Convert(outputImage, sensor_msgs::image_encodings::BGR8)) {
            ROS_ERROR ("failed to convert outputImage image back");
            return;
        }
        else {
            ROS_INFO (
                "Converted %ux%u %s image back",
                outputImage.width,
                outputImage.height,
                outputImage.encoding.c_str()
            );
        }
        
        // save the output image to png
        sensor_msgs::ImagePtr out_img_ptr;
        cv_bridge::CvImagePtr out_cv_img_ptr;
        try
        {
            out_img_ptr = boost::make_shared<sensor_msgs::Image>(outputImage);

            out_cv_img_ptr = cv_bridge::toCvCopy(
                out_img_ptr,
                sensor_msgs::image_encodings::BGR8
            );

            // publish image result
            result_pub->publish(outputImage);
            // save the image with the bounding box
            // saveImage(&out_cv_img_ptr->image);
        }
        catch (cv_bridge::Exception& e)
        {
            ROS_ERROR("cv_bridge exception: %s", e.what());
            return;
        }
    }
}
  • Finally our main function where we define our subscribers and publishers and initialize our node
int main (int argc, char **argv) {
    ros::init(argc, argv, "imagetaker_detectnet");
    
    ros::NodeHandle nh;
    ros::NodeHandle private_nh("~");
    
    std::string class_labels_path;
    std::string prototxt_path;
    std::string model_path;

    // can change this to another model you have
    // e.g rosrun <package-name> imagetaker_detectnet _model_name:=pednet
    // "ssd-mobilenet-v2" is better than "pednet"!
    std::string model_name = "ssd-mobilenet-v2";
    
    float mean_pixel = 0.0f;
    
    // If the desired objects aren't being detected in the video feed or
    // you're getting spurious detections, try decreasing or increasing
    // the detection threshold (default value is 0.5)
    float threshold  = 0.5f;
    
    private_nh.param<std::string>(
        "model_name",
        model_name,
        "ssd-mobilenet-v2"
    );
    
    // determine which built-in model was requested
    detectNet::NetworkType model = detectNet::NetworkTypeFromStr(model_name.c_str());

    if( model == detectNet::CUSTOM )
    {
        ROS_ERROR(
            "invalid built-in pretrained model name '%s', defaulting to pednet", 
            model_name.c_str()
        );
        model = detectNet::PEDNET;
    }

    // create network using the built-in model
    net = detectNet::Create(model);
    
    if( !net )
    {
        ROS_ERROR("failed to load detectNet model");
        return 0;
    }
    
    
    /*
    * create the class labels parameter vector
    */
    std::hash<std::string> model_hasher;  // hash the model path to avoid collisions on the param server
    std::string model_hash_str = std::string(
        net->GetModelPath()) + std::string(net->GetClassPath()
    );
    const size_t model_hash = model_hasher(model_hash_str);

    ROS_INFO("model hash => %zu", model_hash);
    ROS_INFO("hash string => %s", model_hash_str.c_str());

    // obtain the list of class descriptions
    std::vector<std::string> class_descriptions;
    const uint32_t num_classes = net->GetNumClasses();

    for( uint32_t n=0; n < num_classes; n++ ) {
        class_descriptions.push_back(net->GetClassDesc(n));
    }

    // create the key on the param server
    std::string class_key = std::string("class_labels_") + std::to_string(model_hash);
    // private_nh.setParam(class_key, class_descriptions);

    // populate the vision info msg
    std::string node_namespace = private_nh.getNamespace();
    ROS_INFO("node namespace => %s", node_namespace.c_str());
    
    /*
    * create an image converter object
    */
    cvt = new imageConverter();

    if(!cvt)
    {
        ROS_ERROR("failed to create imageConverter object");
        return 0;
    }
    
    
    /*
    * advertise publisher topics
    */
    ros::Publisher result_publsh = private_nh.advertise<sensor_msgs::Image>("image_detections", 2);
    result_pub = &result_publsh;
    
    /*
    * subscribe to image topic
    */
    ros::Subscriber img_sub = nh.subscribe(
        "/csi_cam_0/image_raw",
        5,
        img_callback
    );

    /*
    * wait for messages
    */
    ROS_INFO("imagetaker node initialized, waiting for messages");

    ros::spin();

    return 0;
    
}

Full code looks like (here)[https://github.com/neil-okikiolu/jetson-nano-ros-vision-ai/blob/master/src/node_imagetaker_detectnet.cpp]

Then add the following to your <catkin-workspace-loaction>/<package-name>/CMakeLists.txt

add_executable(imagetaker_detectnet
    src/node_imagetaker_detectnet.cpp
    src/image_converter.cpp
)
target_link_libraries(imagetaker_detectnet ${catkin_LIBRARIES} jetson-inference)

Verify

Run catkin_make to compile your code and make sure no errors show up

If any errors show up, make a comment and I will try to help you with it

In a terminal run

roscore

In another terminal run

# we are running at 1280x720 @ 24 FPS for now
roslaunch jetson_csi_cam jetson_csi_cam.launch sensor_id:=0 width:=1280 height:=720 fps:=24
# if your camera is in csi port 1 change sensor_id to 1
  • NOTE: you may see a warning about a camera calibration file, you can ignore it

In a third terminal run

# to check if we are getting images from the camera
rostopic list
# you should see /csi_cam_0/camera_info or /csi_cam_1/camera_info
rostopic hz /csi_cam_<0 or 1>/camera_info

You should see something like:

average rate: 1.937
    min: 0.516s max: 0.516s std dev: 0.00000s window: 2
average rate: 1.938
        min: 0.515s max: 0.516s std dev: 0.00048s window: 4
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00056s window: 6
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00048s window: 8
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00047s window: 10
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00045s window: 12
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00053s window: 14
...

In the third terminal Ctrl^c to stop the rostopic hz, now we will run our ROS node. To specify a model, uncomment the string name for the model you want in the main() funtion

# specified model
rosrun <package-name> imagetaker_detectnet

In rqt_image_view click the drop down and select either imagetaker_detectnet/image_detections to see the image segmentation results in real-time!

Successful DetectNet Inference
if you can see an image like this, then you have successfully completed this tutorial

This was long, thanks for staying till this point. Come back later for more image inferencing

If you have any tips or questions feel free to leave a comment!

Now we have learned how to use the pre made model classes (segnet, imagenet, detectnet)

Next time we will actually make a our own custom C++ model class for poseNet (Pose detection).

We will do a deep dive of Nvidia’s Tensor RT and Image Processing with CUDA.

See you then!

Notes

  • the first time you run the node, it has to optimize the model in Tensor RT so it will go on for a couple of minutes
    • just Wait
    • on your next run, the model is cached and it will not take long to start up
  • use rqt_image_view to recieve images in real time
    • if you are running the nano remotely, use SSH with the -X flag for forwarding the UI
    • e.g ssh -X mynano@192.123.456.789

Full source code available on github

Locating Objects with DetectNet