In this second part, I expanded the initial version of the AI assistant by adding new functionalities and optimizing its design to offer an enhanced user experience.
New Functionalities
Building on the initial prototype, I have added several key features that greatly improve the app’s overall functionality:
- Permanent storage with MongoDB
- CRUD operations for threads (conversations)
- Display of existing threads in a sidebar
- Ability to stop and replay AI responses
- Responsive design for better accessibility across devices
- Basic user authentication
Key Considerations
MongoDB vs. RDBMS
Choosing the right database was a crucial decision for the app, as it directly affects scalability, performance, and ease of development. Here’s why I opted for MongoDB over a traditional relational database management system (RDBMS):
- Flexible Schema: MongoDB’s document-based, schema-less structure is ideal for storing conversations and messages, which often vary in size and structure. Each conversation (thread) can be stored as a single document, with messages represented as nested documents. In contrast, RDBMS requires a rigid schema, making it harder to adapt to evolving data models and necessitating complex table relations.
- Ease of Scaling: MongoDB is designed to scale horizontally, allowing data to be distributed across multiple servers with ease. This is essential as the user base and data grow. RDBMS typically scales vertically, which can be limiting and often requires more complex architecture for horizontal scaling.
- Performance: For an app that involves real-time interactions, fast read/write operations are essential. MongoDB’s indexing and ability to store data in a more flexible way make it faster for certain queries compared to RDBMS, which often relies on complex joins and normalized tables, potentially affecting performance as the data grows.
- Real-Time Data: MongoDB’s support for change streams allows me to track real-time updates in the database. This is useful for keeping conversation threads updated without refreshing the entire app, a feature that is more challenging to achieve efficiently in traditional relational databases.
In summary, MongoDB’s flexibility, scalability, and performance made it the best fit for the app, especially given the nature of the data and the need for future growth.
High-quality Transcription
Initially, I used react-speech-recognition to record and transcribe user speech. However, the quality of transcription was subpar, with poor handling of capitalization and punctuation.
To address this, I switched to OpenAI's Audio API, which provides high-quality transcription with proper capitalization and punctuation. This change significantly improved the accuracy and readability of the transcriptions, making conversations smoother and more coherent.
The Upgraded Version
The app is now much more functional and useful compared to the initial version, offering a vastly improved user experience. Users can now store conversations, manage threads, and replay AI responses, all within a responsive and intuitive interface.
Next Step
There is still more to optimize and enhance. The next step is to allow users to register and select different contexts for the AI assistant. This will enable the assistant to perform a variety of tasks, tailoring its responses and behavior to specific user needs.