S.lang AI Development

About S.LANG AI

S.lang AI is a cutting-edge, next-generation AI/ML software solution designed to break barriers in the world of sign language translation. Unlike many existing solutions available today, S.lang AI is a trailblazer in its field, providing a bidirectional translation experience, taking sign language communication to a whole new level.

Traditionally, sign language translation tools have been limited to a unidirectional approach, converting sign gestures into written or spoken text. While these solutions have undoubtedly been transformative for the deaf and hard-of-hearing communities, there has been a crucial missing link in the process - enabling the translation of spoken or written text back into sign language. S.lang AI bridges this gap with unparalleled proficiency.

Here is What to Expect

- Translation: S.lang AI boasts an unprecedented two-way translation capability, allowing users to easily switch between spoken or written language and sign language with seamless accuracy. This groundbreaking feature opens up new opportunities for inclusive communication, facilitating natural conversations between individuals who use different languages.

- AI/ML Algorithms: Our software leverages the power of state-of-the-art Artificial Intelligence and Machine Learning algorithms. Through extensive training on vast sign language datasets, S.lang AI can accurately interpret and generate sign gestures with remarkable precision, ensuring a high level of fluency and authenticity in both directions.

- User Interface: We believe that accessibility is paramount. S.lang AI comes with an intuitive, user-friendly interface that is easy to navigate for both seasoned sign language users and newcomers alike. The interface supports various input methods, including text, speech, and even image recognition for sign gestures, making it a versatile tool suitable for various scenarios.

- Learning: As an AI/ML-based solution, S.lang AI continues to learn and improve over time. Regular updates and improvements based on user feedback and advancements in the field of AI and sign language research ensure that the software remains at the forefront of technology.

How does it Work?

- Captures Video Frames: The software begins by capturing real-time video frames from an input source, such as a webcam or a pre-recorded video. OpenCV, a popular open-source computer vision library, provides the necessary tools to access video streams and individual frames.

- Hand Detection and Tracking with MediaPipe: MediaPipe, developed by Google, is a powerful framework for building multimodal applied machine learning pipelines. This software utilizes MediaPipe's Hand Tracking module to detect and track hands within each video frame. This process involves identifying key landmarks on the hand, such as fingertips, knuckles, and wrist, using a machine learning-based hand pose estimation model. The model is capable of robustly tracking hands even under various orientations, lighting conditions, and occlusions.

- Hand Gesture Classification: Once MediaPipe successfully tracks the hand landmarks, the software extracts relevant hand features and encodes them into a suitable representation for classification. These features may include the spatial positions of fingers, angles between joints, or any other relevant information that characterizes different hand gestures.

- Translation and Output: Once the classifier determines the most probable hand gesture from the extracted features, S.lang translates it into the corresponding letter or symbol in the chosen sign language. The output is then displayed to the screen for the other user to comprehend.

- Text to Gesture As for the other user, if they would like to communicate back in sign language, they would simply input a prompt or their text to generate unique images corresponding to the input text. What makes this possible is our trained GAN model that runs on two neural networks to generate new, replicated instances of data. Read more about it here

- Real-Time Performace One of the key strengths of your software is its real-time performance. By leveraging efficient algorithms and optimizations provided by OpenCV and MediaPipe, this application can process video frames rapidly, providing instantaneous hand gesture recognition and translation.

Training/Testing Accuracy

When Training and Testing, the results for the first iteration were promising. The training portion came back at 100% while the testing came back at about 99.22% for a near perfectly trained model without overfitting.

Landmark Drawing

When running the main script, landmarks are made to only outline each finger of the user. This allows S.lang to recognize different gesetures without focusing on other figures within the image.

Rapid Classification

Upon further trials, it could be concluded that gesture to text translation was made rapidly with minimal latency issues.

Motion Tracking and Classification

As you can see from this video, we were also able to account for gestures that may require movement such as 'J' or 'Z' letters. Next, we will want to continuously train the model on more gestures and engineer it to translate from text to gesture using DALLE models for text to image generation.

Text to Gesuture using GAN

What is a GAN model?

A GAN (Generative Adversarial Network) is a type of AI model used to generate new data that resembles a given dataset. Its main components include the generator(creates new data) and discriminator(tries to distinguish between real data from training and fake data from the generator). The two components are trained together in a competitive process until the generator becomes proficient at producing realistic data such as images, video, speech, etc.

What is its function in S.lang?

When it comes to translating from text to gesture, we found that trying to capture videos and photos of all gestures and storing them for later reference was time and space consuming. To combat this, we decided to use AI in order for our model to learn patterns and eventually train itself to come up with more gestures without the need for supervised training.

Overview of the GAN model workflow

Whats Next?

What we expect to accomplish in the following months is a full scale model that can work via desktop or phone in a bidirectional translation method.

To achieve dual translation, we want to incorporate AI image generation that can create creative images based on prompt or direct text as specified by the user.

In terms of security, we will follow strict protocols in order to achieve a safe/appropriate speaking environment for users along with the addition of multi-language selection