A fun usage of robotics: “Spy” robot

Lawrence Zou
6 min readJul 8, 2021

--

Robots hold great potential. Especially when you have control over robot right from the comfort of your desktop, “spying” on others whenever you desire… :)

But before we can enjoy the sense of exploring, we must first make the robot.

The idea is to build a robot where movement is controlled using a remote keyboard, and has a camera streaming element which allows you to see through the robot’s “lenses” and not need to follow it around (without this, you wouldn’t be able to “spy” :) ).

Overview of the parts to this project:

  • Building the robot
  • User Robot Control with Keyboard Program
  • Camera streaming
  • Finding a way to connect remotely to the Raspberry Pi — using SSH

Hardware

I decided to use the combination of BrickPi and Raspberry Pi, which allows for a more capable and sophisticated approach to controlling LEGO NXT Robotics. With this, you can play around with gadgets and software (compared to LEGO Mindstorms’ computer/”brain”), such as attaching a camera, and programming robotic movements in Python, as you will see below.

The video is captured using a Raspberry Pi Camera

Building the chassis

It is a version of the tank drive design, serving the purpose of connecting 2 NXT Motors together for movement, and mounting the Raspberry Pi/BrickPi unit on top (images are below).

Since the camera is not a LEGO NXT add-on, but rather, a Raspberry Pi Camera, I simply mounted it by sliding it within two pins.

Front View of Robot with Camera
Bird’s-eye view

Software

Robot remote keyboard control program

Libraries

I used the BrickPi library to connect and control NXT Motors attached to the BrickPi board. Additionally, I started with using the Keyboard module to retrieve key presses, however, this did not allow for remote interactions. To solve this issue, I switched to curses.

Features

Movements the robot can perform are forward, back, left, and right, controlled by the up, down, left, and right arrows respectively. There is also a speed change feature, allowing you to increase (press key “w”) and decrease (key “s”) the speed in intervals of 25 per key press. Also, since the maximum and minimum speed of NXT Motors are 255 and -255 respectively, once you reach that threshold, the program will prompt you that cannot go further in changing speed.

Issues

One problem was the responses to key presses were quite sensitive, and this caused excessive text to be displayed. To deal with this, I made a function which checks if the previous key was the same as the current key pressed, and if so, it would not print the same string to the terminal.

Camera streaming and transferring data over Internet protocol

Transmission Control Protocol (TCP)

TCP allows for secure and proper data packet handling and exchange between computer machines. It is necessary so we can send the image frames captured from the Raspberry Pi to the remote computer which you are controlling from.

The process works as such, the server binds, or assigns addresses (the Host/IP address and Port numbers) to which the socket should recongize, and listens for. The client script then needs to be executed, which connects to the assigned IP address and port number, which should be the server. Once the server side accepts the connection, the 2 devices can interchange information. This behaviour can all be performed using the Socket Python module.

I made 2 scripts, one for the client, which is the machine viewing the camera stream (any other computer you have available), and the other for the server end, the device which will be getting the image data and sending it to the client (BrickPi).

On the server’s end, we use OpenCV to capture and read the video (image frames). Next, to send to client using Python sockets, the data must be byte-like objects, so we serialize (converting objects into a string) the image using Pickle library, which then, using the Python struct module and the pack() function, transforms the string into bytes according to some format — in this case, using unsigned long long (in this way, when “unpacked”, the information remains unaltered and consistent). This data is now ready to send to client.

In the client script, we retrieve data sent from the server, in chunks of the same size as the format we used to pack (this is our payload). This data is written to a variable defined as a byte string. The first 8 bytes is information about image (message), the rest is video; the parts can be seperated by using the string slicing technique and payload size. We need the message portion as it contains the index which the video frame stored up until. To access the information, the message component is unpacked using the struct.unpack() function, in the same format for packing. The quantity of data to be fetched is determined by the value provided in the message. Everything from the beginning to the point indicated by the message is video frame, the last bit is omitted. We deserialize the frame data using Pickle, so it is back to normal. The OpenCV function imshow is called to display the image to desktop — a pop-up window will appear which shows. Note that waitkey() is needed in order for image to display, OpenCV looks for this function to be called before showing.

Visit the link (also at the bottom of this page) for the code provided on GitHub.

Issues with the stream

I noticed the stream was a bit laggy, due to a lossless protocol used (TCP), meaning every packet had to be received before displayed, but using a lossy protocol like UDP (where dropped packets are forgotten about/maintaining a buffer) may help the continuity of the stream.

Remotely controlling using SSH

SSH is Secure SHell, and allows for users to connect remotely to a machine and for data to be protected when shared. You are logging into a terminal environment.

To execute movement scripts on the remote desktop, I took advantage of OpenSSH built-in on Windows (if not there, may need to install), and available SSH connection on Raspberry Pi (will need to enable using raspi-config, as it is disabled by default)

Big ideas/lesson learned

  • Learning techniques to meet needs/desires — understanding what it is I want to solve and forming questions to address these specific unknowns, all which build towards a greater goal. For example, when I needed to find a way to transfer data between devices, I first asked how do we retrieve data from the Internet/other servers around the world, and how does a device send data to a specific device which is accepted and received? I learned that Internet Protocols (like HTTPS, TCP, UCP, etc…) was what powers this processes, and more specifically, it goes through steps to carry out the transportation of data (see image below). I wondered how I can implement this through code, and this led me to discovering the Python socket module.
Server-client TCP Model (Image Source: https://commons.wikimedia.org/wiki/File:InternetSocketBasicDiagram_zhtw.png)
  • Troubleshooting to find and understand the root cause of the problem. There are many instances where this occurred, an example is when keyboard module didn’t work, I knew the issue had to be something related to connecting from the remote environment, as the program was working fine on the local machine. It seems like the Keyboard module only responds to the local keyboard, whereas the curses library sets up an terminal environment and extracts keys from there.

Future improvements

  • Improving the camera frame rate — the current method is really laggy… could be better if lower quality of image, so pixels/data needed to be transferred would be less
  • Find a way of sending remote keyboard readings to the robot (similar to camera streaming and transporting bytes) —to simpliy, so SSH wouldn’t be needed

Conclusion

All in all, it is a nice contraption which employs various technical details to make a fun and “sneaky” robot.

Links:

GitHub Source Code can be found here.

--

--