Publication: Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images.