Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency.
Sakib ShahriarBrady D. LundNishith Reddy MannuruMuhammad Arbab ArshadKadhim HayawiRavi Varma Kumar BevaraAashrith MannuruLaiba BatoolPublished in: CoRR (2024)
Keyphrases
- comprehensive evaluation
- multimodal interfaces
- text to speech
- audio visual
- systematic evaluation
- language acquisition
- multimodal interaction
- computer vision
- spoken language
- speech recognition
- language learning
- text to speech synthesis
- programming language
- english text
- human language
- multi modal
- real time
- natural language
- multi stream
- specification language
- human computer interaction
- language processing
- spoken dialog systems
- speech synthesis
- language generation
- linguistic knowledge
- human communication
- english language
- foreign language
- speech signal
- user interface
- multimedia
- information retrieval