The present experiment investigated the effect of three different presentation modes in children's vocabulary learning with a self-guided multimedia programmes. Participants were 135 third and fourth grade children who read a short English language story presented by a computer programme. For 12 key (previously unknown) words in the story, children received verbal annotations (written translation), visual annotations (picture representing the word), or both. Recall of word translations was better for children who only received verbal annotations than for children who received simultaneously visual and verbal annotations or visual annotations only. Results support previous research about cognitive load in e-learning environments, and show that children's learning processes are hindered by limited working memory. This finding implies a challenge for multimedia programmes designed for children and based on self-regulated learning.