AI-Powered English Speaking Tutor For Learners

Concepts
- Key Features
Design: Back-of-the-Napkin Sketches
- Design Considerations
Implementation: A Balance Between Quatity and Speed
- Challenges & Solutions
Takeaways & Next Steps

In my last post, An AI-Powered App That Streamlines Your Speaking Test Prep Workflow, I introduced an AI assistant designed to aid in English speaking test preparation, seamlessly integrating with Notion.

Today, I’m excited to share a significant upgrade — a versatile AI-powered chat application that enables near-real-time English conversation with immediate AI feedback. This app is set to revolutionize the way English learners practice speaking, bridging the gap between traditional methods and modern technology by offering an experience akin to having a personal English-speaking tutor at your fingertips.

Concepts

Unlike my previous app, where the AI took on a more passive role, offering feedback after the user’s responses to specific test questions, this new app is far more interactive and dynamic. Here, the AI prompts users to delve deeper into their responses, encouraging elaboration and exploration of various perspectives on a given topic. This shift from passive feedback to active engagement is designed to push learners to improve their conversational skills in a more natural and effective manner.

Key Features

Voice Recording and Timing: Users can record their responses directly within the app, allowing for precise timing and control over their practice sessions.
Instant AI Feedback: Once the user submits a response, the AI immediately processes the input and provides feedback, making the practice feel more like a real conversation.
Text-Based Conversation Display: Both the user’s input and the AI’s responses are displayed in text boxes, allowing for easy reference and review.
Speech Synthesis: The AI’s responses are not just displayed but also spoken out loud using text-to-speech technology, simulating a real-life conversation partner.
Conversation Logging: All interactions are saved, giving learners the opportunity to review their conversations and track their progress over time.

These features work together to create a learning experience that is both immersive and efficient, helping learners improve their English speaking skills in a context that mimics real-world interactions.

Design: Back-of-the-Napkin Sketches

When I first conceived the idea for this app, I knew that the user interface needed to be intuitive and straightforward. I started with quick sketches, focusing on the layout and flow of the GUI. My primary concern was ensuring that users could easily navigate the app without feeling overwhelmed. The placement of buttons, for example, was crucial to avoid any confusion or misclicks, which could disrupt the learning experience.

Design Considerations

Simplicity: The interface had to be clean and uncluttered, allowing users to focus on the task at hand without unnecessary distractions.
Accessibility: Buttons and text fields were placed strategically to make the app user-friendly, even for those who might not be tech-savvy.
Feedback Loops: Visual and audio cues were incorporated to keep users engaged and informed of their progress, reinforcing a sense of achievement after each interaction.

low-fidelity sketches: 1 — Low-fidelity sketches: 1

low-fidelity sketches: 2 — Low-fidelity sketches: 2

Implementation: A Balance Between Quatity and Speed

With my IELTS speaking test fast approaching, time was of the essence. I needed an app that was functional and reliable, but I also had to work quickly. This led me to choose Tkinter, Python's built-in GUI library, which I had already used in my previous project. By leveraging my existing codebase, I was able to get the app up and running in a matter of days.

One of the new tools I introduced in this app was pyttsx3, a text-to-speech library that allowed the AI to "speak" its responses, further enhancing the realism of the interaction. Despite the time constraints, I was able to implement a system that not only works efficiently but also delivers a high-quality user experience.

Challenges & Solutions

Thread Management: One of the biggest challenges was ensuring that the app remained responsive while the AI was generating speech. This was resolved by implementing threading, allowing the text-to-speech engine to run concurrently with the rest of the app.
Real-Time Feedback: Achieving near-real-time feedback required careful optimization of both the AI and the GUI, ensuring that the user's experience remained smooth and uninterrupted.
Conversation History Management: To manage conversation history as the conversation progressed without overwhelming the OpenAI API, I implemented a system that only sent the last few exchanges to the API, reducing the message size and keeping the response times quick.

the completed app: 1 — The completed app: 1

the completed app: 2 — The completed app: 2

Takeaways & Next Steps

While this app is a significant step forward, I recognize that there's still room for improvement. The current version, while effective, is not truly real-time in the sense that a more sophisticated web-based app could be. Given more time, I plan to develop a JavaScript version using Next.js, which would offer a more dynamic and visually appealing user interface, along with advanced features.

However, I'm proud of what I've accomplished within the given timeframe. The app is fast, reliable, and most importantly, it's fun to use. It has already given me a boost of confidence as I prepare for my IELTS speaking test.

Beyond the immediate goal of improving my English speaking skills, this app represents another self-initiated challenge to disrupt conventional approaches by combining creativity with technology - something I make a point of doing and have been enjoying immensely. It's a reflection of my passion for pushing boundaries and exploring new ways to solve problems.

Moving forward, I'll continue to refine this app, and perhaps even integrate it into a broader suite of tools designed to help English learners around the world.