The rise of e-books, the cumulative digitisation of written library materials and the advancement of speech technology have reached a stage enabling library services and e-books to be read out loud to customers in synthetic speech and paper books (either published or still in print) to be delivered in the audio form. The user environment of the digital archive Digar of the Estonian National Library includes a special reading machine capable of producing an audio version of electronic texts in Estonian (books, magazines etc). The application of Elisa Raamat provides access to more than 2500 Estonian e-books, which can not only be read visually from the screen of a smartphone or tablet but also listened to. The speech server of the Institute of the Estonian Language offers, as a public service, the text-to-speech system Vox populi, inviting people to have an audio version synthesized from any text of interest, being prepared to convert any uploaded text (an article, paper, subtitle file, e-book etc.) into an audio file. The present study is focused not only on the description of the systems but also on various issues of text processing and pronunciation as well as on the reflection of text structure in synthetic speech. The quality of self-reading largely depends on how adequately the input abbreviations, numbers and other non-letter sequences are converted into words in correct morphological form and how closely the output pronunciation of foreign names matches that of the source language. In the article we will also discuss a special module for text pre-processing, which helps in the case of more complex text structures and character sequences (e.g. geographic coordinates, sports results, numeral inflection). In addition, book reading requires an as accurate as possible rendering of text structure. The study also analyses audio books to capture the essence of human prosodic phrasing as well as different pauses and the marking of reported speech when talking.