Photo by Maxime VALCARCE on Unsplash

Live Video Inferencing (ImageNet)

Our Goal: to create a ROS node that receives raspberry Pi CSI camera images, run some inferencing on the image (e.g. classification, segmentation or object detection) then outputs the result as a message that we can view using rqt_image_view.

Pre-requisites

Hardware:

  • Jetson Nano Dev kit with Jetpack 4.3 installed
  • Raspberry Pi Camera V2

Software:

  • ROS melodic - install instructions here
  • ROS vision-opencv
    # install ros vision-opencv
    sudo apt-get install ros-melodic-vision-opencv
    
  • Nvidia jetson-inferencing - install instructions here
    • NOTE make sure to download at least one of each model type i.e one imagenet type model, one detectnet type model etc.
    • If you want to run the model download tool again go
    cd <jetson-inference path>/tools/
    vi download-models.sh # or any text editor of your choice
    # change 'BUILD_INTERACTIVE=$1' to 'BUILD_INTERACTIVE="YES"' and save
    ./download-models.sh
    
  • Nvidia jetson-utils (courtesy of wagonhelm)
    # install jeston-utils
    sudo apt-get install libglew-dev
    git clone https://github.com/dusty-nv/jetson-utils.git
    cd jetson-utils
    mkdir build && cd build
    cmake ..
    make
    sudo make install
    

Get Nvidia Jetson CSI camera launcher for ROS

  • Follow the instructions here
  • NOTE before you run catkin_make, make sure to:
    • download the vision_opencv repository from here
      cd <path to vision_opencv download>/cv_bridge/include
      mkdir -p <catkin workspace location>/src/cv_bridge/include
      mv cv_bridge <catkin workspace location>/src/cv_bridge/include
      # make sure you have the 'cv_bridge' folder in <catkin workspace location>/src/cv_bridge/include
      # such that your folder structure is <catkin workspace location>/src/cv_bridge/include/cv_bridge (yes, ther are 2 'cv_bridge')
      
    • change the following line in <catkin workspace location>/src/jetson_csi_cam/jetson_csi_cam.launch
      <!-- REMOVE THIS XML TAG -->
      <env name="GSCAM_CONFIG" value="nvcamerasrc sensor-id=$(arg sensor_id) ! video/x-raw(memory:NVMM),
          width=(int)$(arg width), height=(int)$(arg height), format=(string)I420, framerate=(fraction)$(arg fps)/1 ! 
          nvvidconv flip-method=2 ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR" />
      
      <!-- ADD THIS XML TAG -->
          <env name="GSCAM_CONFIG" value="nvarguscamerasrc sensor-id=$(arg sensor_id) ! video/x-raw(memory:NVMM), 
          width=(int)$(arg width), height=(int)$(arg height), format=(string)NV12, framerate=(fraction)$(arg fps)/1 ! 
          nvvidconv flip-method=2 ! video/x-raw, format=(string)BGRx ! videoconvert" />
      

Verify

Run catkin_make and make sure no errors show up

If any errors show up, make a comment and I will try to help you with it

In a terminal run

roscore

In another terminal run

# we are running at 1280x720 @ 24 FPS for now
roslaunch jetson_csi_cam jetson_csi_cam.launch sensor_id:=0 width:=1280 height:=720 fps:=24
# if your camera is in csi port 1 change sensor_id to 1
  • NOTE: you may see a warning about a camera calibration file, you can ignore it

In a third terminal run

# to check if we are getting images from the camera
rostopic list
# you should see /csi_cam_0/camera_info or /csi_cam_1/camera_info
rostopic hz /csi_cam_<0 or 1>/camera_info

You should see something like:

average rate: 1.937
    min: 0.516s max: 0.516s std dev: 0.00000s window: 2
average rate: 1.938
        min: 0.515s max: 0.516s std dev: 0.00048s window: 4
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00056s window: 6
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00048s window: 8
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00047s window: 10
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00045s window: 12
average rate: 1.937
        min: 0.515s max: 0.517s std dev: 0.00053s window: 14
...

If you see that then you have set everything up correctly, lets move on!

Imagenet

Let’s add our needed imports

#include <ros/ros.h>

#include <image_transport/image_transport.h>

// allows us to use openCV with our ROS images
#include <cv_bridge/cv_bridge.h>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgcodecs.hpp>

// ROS image message
#include <sensor_msgs/image_encodings.h>
#include <sensor_msgs/Image.h>

// classification message
#include <vision_msgs/Classification2D.h>
#include <vision_msgs/VisionInfo.h>

// our API for making inferences using imageNet
#include <jetson-inference/imageNet.h>

// helps us create overlays
#include <jetson-utils/cudaFont.h>
#include <jetson-utils/cudaMappedMemory.h>

// help us load images onto the GPU
#include "image_converter.h"
#include "image_ops.h"

#include <unordered_map>

using namespace cv;
using namespace std;

  • add our globals
// globals
// our inference object
imageNet* net = NULL;
// a helper object used to interact with the GPU
// e.g put images on the GPU
imageConverter* cvt = NULL;

// our ros publisher that will publish images with inference results on them
ros::Publisher* result_pub    = NULL;
  • add our image callback that will be called when we receive a message from the camera publisher
// input image subscriber callback
void img_callback( const sensor_msgs::ImageConstPtr& input )
{
    ROS_INFO ("Received Image");
    
    // convert sensor_msgs[rgb] to opencv[brg]
    cv_bridge::CvImagePtr cv_ptr;
    // cv_bridge::CvImagePtr cv_ptr_flip; // pointer for flipped image
    try
    {
        cv_ptr = cv_bridge::toCvCopy(
            input,
            sensor_msgs::image_encodings::BGR8
        );
        // extract the image we are going to flip to
        // cv_ptr_flip = cv_bridge::toCvCopy(
        //     input,
        //     sensor_msgs::image_encodings::BGR8
        // );
    }
    catch (cv_bridge::Exception& e)
    {
        ROS_ERROR("cv_bridge exception: %s", e.what());
        return;
    }
    
    // we are doing a 180 deg flip since
    // my camera is upside down
    const int img_flip_mode_ = -1;
    // flip the image [if your camera is upside down]
    // cv::flip(cv_ptr->image, cv_ptr_flip->image, img_flip_mode_);
    
    // convert converted image back to a sensor_msgs::ImagePtr
    // for use with nvidia / other ML algorithms
    
    // comment this line if you used the flipped image
    sensor_msgs::ImagePtr flippedImage = cv_ptr->toImageMsg();
    
    // uncomment this line if you used the flipped image
    // sensor_msgs::ImagePtr flippedImage = cv_ptr_flip->toImageMsg();

    // convert the image TO reside on GPU
    // the converting TO and converting FROM are the SAME funtion name
    // with different signatures
    if( !cvt || !cvt->Convert(flippedImage) )
    {
        ROS_ERROR (
            "failed to convert %ux%u %s image",
            flippedImage->width,
            flippedImage->height,
            flippedImage->encoding.c_str()
        );
        return;
    }
    else {
        ROS_INFO (
            "Converted %ux%u %s image",
            flippedImage->width,
            flippedImage->height,
            flippedImage->encoding.c_str()
        );
    }
    
    // classify the image
    float confidence = 0.0f;
    const int img_class = net->Classify(
        cvt->ImageGPU(),
        cvt->GetWidth(),
        cvt->GetHeight(),
        &confidence
    );
    
    
    // verify the output
    if( img_class >= 0 )
    {
        ROS_INFO(
            "classified image, %2.5f%% %s (class=%i)",
            confidence * 100.0f,
            net->GetClassDesc(img_class),
            img_class
        );
        
        
        // use font to draw the class description
        cudaFont* font = cudaFont::Create(adaptFontSize(flippedImage->width));

        if( font != NULL )
        {
            char overlay_str[512];
            sprintf(
                overlay_str,
                "%2.3f%% %s",
                confidence * 100.0f,
                net->GetClassDesc(img_class)
            );

            // use cuda to write the image class and it's confidence
            // on the output image
            font->OverlayText(
                (float4*)cvt->ImageGPU(),
                cvt->GetWidth(),
                cvt->GetHeight(),
                (const char*)overlay_str,
                10,
                10,
                make_float4(255, 255, 255, 255),
                make_float4(0, 0, 0, 100)
            );
        }

        // wait for GPU to complete work
        CUDA(cudaDeviceSynchronize());
    }
    else
    {
        // an error occurred if the output class is < 0
        ROS_ERROR(
            "failed to classify %ux%u image",
            flippedImage->width,
            flippedImage->height
        );
    }
    
    
    // populate the output message
    sensor_msgs::Image msg;

    // get our image back from the GPU
    if( !cvt->Convert(msg, sensor_msgs::image_encodings::BGR8) ) {
        return;
    }
    
    sensor_msgs::ImagePtr out_img_ptr;
    cv_bridge::CvImagePtr out_cv_img_ptr;
    try
    {
        out_img_ptr = boost::make_shared<sensor_msgs::Image>(msg);

        out_cv_img_ptr = cv_bridge::toCvCopy(
            out_img_ptr,
            sensor_msgs::image_encodings::BGR8
        );

        // publish image result
        result_pub->publish(msg);
    }
    catch (cv_bridge::Exception& e)
    {
        ROS_ERROR("cv_bridge exception: %s", e.what());
        return;
    }
}
  • finally our main function
int main (int argc, char **argv) {
    // our node name
    ros::init(argc, argv, "imagetaker_imagenet");
    
    ros::NodeHandle nh;
    // for command line parameters
    ros::NodeHandle private_nh("~");
    
    // can change this to another model you have
    // Below are some models we have downloaded
    std::string model_name = "googlenet";
    // std::string model_name = "resnet-18";
    
    // use this so we can pass parameters via command line e.g
    // rosrun <package-name> imagetaker_imagenet _model_name:=googlenet
    private_nh.param<std::string>(
        "model_name",
        model_name,
        "googlenet"
    );
    
    // determine which built-in model was requested
    imageNet::NetworkType model = imageNet::NetworkTypeFromStr(
        model_name.c_str()
    );

    if(model == imageNet::CUSTOM)
    {
        ROS_ERROR(
            "invalid built-in pretrained model name '%s', defaulting to googlenet",
            model_name.c_str()
        );
        model = imageNet::GOOGLENET;
    }

    // create network using the built-in model
    net = imageNet::Create(model);
    
    if(!net)
    {
        ROS_ERROR("failed to load imageNet model");
        return 0;
    }
    
    
    /*
     * create the class labels parameter vector
     */
    // hash the model path to avoid collisions on the param server
    std::hash<std::string> model_hasher;
    std::string model_hash_str =
        std::string(net->GetModelPath())+ std::string(net->GetClassPath());

    const size_t model_hash = model_hasher(model_hash_str);

    ROS_INFO("model hash => %zu", model_hash);
    ROS_INFO("hash string => %s", model_hash_str.c_str());

    // obtain the list of class descriptions
    std::vector<std::string> class_descriptions;
    const uint32_t num_classes = net->GetNumClasses();

    for( uint32_t n=0; n < num_classes; n++ )
    {
        class_descriptions.push_back(net->GetClassDesc(n));
    }

    /*
    * create image converters
    */
    cvt = new imageConverter();

    if(!cvt)
    {
        ROS_ERROR("failed to create imageConverter object");
        return 0;
    }
    
    /*
    * advertise publisher topics
    */
    ros::Publisher result_publsh = private_nh.advertise<sensor_msgs::Image>("image_classifications", 2);
    result_pub = &result_publsh;


    /*
    * subscribe to image topic
    * if your camera is on csi port 1,
    * subscribe to "/csi_cam_1/image_raw"
    */
    ros::Subscriber img_sub = nh.subscribe(
        "/csi_cam_0/image_raw",
        5,
        img_callback
    );

    /*
    * wait for messages
    */
    ROS_INFO("imagenet node initialized, waiting for messages");

    ros::spin();

    return 0;
}

  • add this to your CMakeLists.txt
add_executable(imagetaker_imagenet
    src/node_imagetaker_imagenet.cpp
    src/image_converter.cpp
    src/image_ops.cpp
)
target_link_libraries(imagetaker_imagenet ${catkin_LIBRARIES} jetson-inference)
  • compile node_imagetaker_imagenet.cpp using
catkin_make
  • run the node using
rosrun <package-name> imagetaker_imagenet _model_name:=googlenet
# substitute googlenet for any imagenet model you have downloaded
  • the first time you run the node, it has to build the AI model so it will go on for a couple of minutes
    • just Wait
    • on your next run, the model is cached and it will not take long to start up
  • use rqt_image_view to recieve images in real time
    • if you are running the nano remotely, use SSH with the -X flag for forwarding the UI
    • e.g ssh -X mynano@192.123.456.789
  • in rqt_image_view click the drop down and select imagetaker_imagenet/image_classifications
  • you should see a bounding box around objects with its object class written in the top right
Successful ImageNet Inference
if you can see this image then you have successfully completed this tutorial
  • if you can see this image then you have successfully completed this tutorial

This was long, thanks for staying till this point. Come back later for more image detection and segmentation.

Also, this is my first time using ROS or CUDA, if you have any tips, feel free to leave a comment!

Full source code available on github

If you have any errors, look around the repo as it has all the configurations, etc.