So we will have to write 5300 image descriptions
When preparing this year’s technology diary as a book edition, I’m counting through for the first time: The technology diary contains around 5,600 images. Of these, 300 have an image description that can be read out by screen readers. At 4700 the alt tag only says “image” and at the missing 600 probably not even that.
I expected something similar, because in the first seven to eight years of the technology diary we hardly paid any attention to image description texts. Poor excuse: The old Tumblr editor didn’t make this so easy; you had to add the image descriptions by hand into the HTML view of the post. When something is so complicated, it also suggests that it can’t be that important anyway. But actually we should have known better.
I don’t know what the reasons were for the other authors of the technology diary. The main reason I didn’t come up with the idea earlier is that I’ve only been seeing image descriptions myself for a few years now (formerly on Twitter, now at Mastodon). I noticed that, firstly, there are many people who want them and, secondly, as a sighted person, you can also use them, for example if I don’t understand what is shown in a picture. Or if the text on an image is unreadably small when viewed on the phone (like in the screenshots in this post here). Or when I need background information because I don’t recognize people depicted because of my prosopagnosia or ignorance. Or when I take the train through a dead zone and instead of the pictures I only see the picture descriptions.
I expected an unfavorable ratio of description to no description, but the fact that the technology diary is missing so many descriptive texts surprises me. And it means that it will take a very long time to add the image descriptions later. Since the posts written in the old Tumblr editor (before 2023) can only be edited in this editor, this means that we open each post, switch to HTML view and add the alt tag with the image description must be inserted in the correct place. While you then describe the picture, you cannot see it. This is exactly the same in the new Tumblr editor, where the input field inconveniently overlays the image. So you definitely need a second screen on which the image can be seen. I do this with my cell phone.
Because of this inconvenience, and because recently there has been great progress in image description using image analysis and large language models, I am now trying out several tools that supposedly do this automatically. I didn’t do any extensive research into which one to use, it’s just the first few Google hits on the topic.
I test with the last picture used in the technology diary from this article by Oliver Laumann.
ALT
Oliver wrote the following description of the image:
“Four-color ballpoint pen in original packaging, back of a postcard with the text ‘I left the original refills in because they were original, but here are new ones. Greetings, Kathrin’, small zip-lock bag with four different colored ballpoint pen refills”.
It is certainly not easy to describe this image helpfully and correctly if you are not Oliver Laumann, but a machine. But the difficulty of the task seems to me to be somewhat representative of the other pictures in the technology diary.
With the “Free AI Image Alt Text Generator” I can choose between several languages:
ALT
There are also different description styles:
ALT
I choose “Casual”. The result is not very helpful:
ALT
Things don’t get any better in the “Academic” style:
ALT
The other description styles also lead to only minimal variations of “Something with colored pencils, perfect for creative projects!”
The “AI Alt Text Generator” does not offer me a language selection in the test version and suggests: “A package of colored pencils with a note on it.” At least the handwritten note was recognized here. However, the description does not contribute anything to understanding the article.
Der “Alt Text Generator” meint: “a box of pens and a package of paper”.
Some other tools apparently use the same technique internally and also say “a box of pens and a package of paper”.
I think it’s possible that there are paid offers that generate better image descriptions, but I can’t find a way to try it out for free given my patience. And I don’t want to create a test account only to find out that they can’t do it either.
Perhaps image descriptions are just a bridging technology, and soon screen readers will be able to recognize and describe the images themselves. Here I have a thought, because I rarely think of it, namely: Image description by a human, preferably one with knowledge of the image that is not contained in the image, will always be better than automatically generated description. At the same time, I distrust this idea on principle because “any technology will never be able to do anything as well as a human being” is a statement that has often been historically wrong. In summary: I have no idea. But this ignorance of 2024 is hereby recorded.
In any case, we will have to think up and incorporate 5,600 descriptions ourselves. That no longer works for this year’s book edition. But the technology diary is a long-term project.
(Kathrin Passig)