Most of those solutions need additional hardware or rely on classical computer vision techniques. Hence, that is the fast solution, but it's hard to add custom gestures or even motion ones. The answer we found is MediaPipe Hands that was used for this project. To create the proof of concept for the stated idea, a Tello quadcopter was used as a UAV.

This drone has an open Python SDK, which greatly simplified the development of the Hjdrochloride. However, it also has technical limitations that do not allow it to run gesture recognition on the drone itself (yet). For this purpose a regular PC or Mac was used. The video stream from the drone and commands to the drone are transmitted via regular WiFi, so no additional equipment was needed.

To make the program structure as plain as possible and add the opportunity for easily adding gestures, the architecture is modular, with a control module and a gesture recognition module. Figure 2: Scheme that shows overall project structure and how videostream data from the drone is processed The application is divided into two main parts: gesture recognition and drone controller.

Those are independent instances that can be easily modified. For example, to add new gestures or change the movement speed of the drone. Video stream is passed to the main program, which is a simple script with module initialisation, connections, and typical for the hardware while-true cycle. Frame for the videostream is passed to the gesture recognition module. After getting the ID of the recognised gesture, it is passed to the control module, where the command is sent to the UAV.

Alternatively, the user can control a drone from the keyboard in a more classical manner. So, you can see that the gesture recognition module is divided into keypoint detection and gesture classifier. Exactly the bunch of the MediaPipe key point detector along with the custom gesture classification model distinguishes this gesture recognition system from most others.

Utilizing MediaPipe Hands is a winning solution not only in terms of speed, but also in flexibility. MediaPipe already has a simple gesture recognition calculator that can be inserted into the pipeline. However, we needed a more powerful solution with the ability to quickly change the structure and parameters of the recognizer.

To do so and classify gestures, the custom neural network was created with 4 Fully-Connected layers and 1 Softmax layer for classification. This simple structure gets a vector of 2D coordinates as an input and gives the ID of the classified gesture. Instead of using cumbersome segmentation models with a more algorithmic classification process, a simple neural network can easily handle such tasks. What is more critical, new gestures can be easily added because model retraining tasks take much less time than the algorithmic approach.

The main characteristic of the dataset was that: All data is a vector of x, y coordinates that contain small tilt and different positions of hand during data collection.

Due to the simple structure of the model, excellent accuracy can be obtained with a small number of examples for training each gesture. After conducting several experiments, it turned out that we just needed the dataset with less than 100 new examples for good recognition of new gestures.

Well, the most excellent part about Tello is that it has a ready-made Python API to help us do that without explicitly controlling motors hardware. We just need to set each gesture ID to a command. To remove unnecessary movements due to false detection, even with such a precise model, a special buffer was created, which is saving the last N gestures. This helps to remove glitches or inconsistent recognition. The fundamental goal of this project is to demonstrate the superiority of the keypoint-based gesture recognition approach compared to classical methods.

You can create your own combinations of gestures or rewrite an existing one without collecting massive datasets or manually setting a recognition algorithm. By pressing the button and ID key, the vector of detected points is instantly saved to the overall dataset.

This new dataset can be адрес страницы to retrain classification network to add new gestures for the detection. For now, there is a Pioglitazone Hydrochloride (Actos)- FDA that can be run on Google Colab or locally. Retraining the network-classifier takes about 1-2 minutes on a standard CPU instance.

This project is created to make a push in the area of gesture-controlled drones.

The novelty of the approach lies in the ability to add new gestures or change old ones quickly.

This is made possible thanks to MediaPipe Hands. It works incredibly fast, reliably, and ready out of the box, making gesture recognition both fast and flexible to changes. We will also keep track of MediaPipe updates, especially about adding more flexibility in creating custom calculators for our own models and reducing barriers when creating them.



