

Many tasks in organizations are performed in a desktop environment. It is possible to record users’ interactions in a desktop environment by taking screenshots when an action happens. The result is an interaction log. By considering the associated images of a record, it is possible to detect which activity was performed and which activities were enabled. This information can be extracted, resulting in a translucent event log. Such a translucent event log is valuable and can be used as input for dedicated process-mining techniques. The results can be used to analyze human-computer interactions or create bots for robotic process automation. However, current techniques for extracting information on enabled activities rely on template matching, which is rigid and sensitive to variations. To solve this issue, we present our modular framework, ActivityGen. ActivityGen detects and labels graphical user interface elements by also considering additional information. ActivityGen uses more advanced techniques to overcome the limitations of previous approaches and can extract information without a user’s input. Furthermore, it can be adjusted to a user’s needs. It detects graphical user interface elements more accurately than state-of-the-art techniques and labels them faster, more robust, and more domain-oriented than state-of-the-art techniques.