This paper describes a method for enabling an Intelligent Multimedia Interactive Systems based on ontological interaction on video clip shown on ubiquitous systems as a computer monitor, mobile or tablet. We use a layered representation based on semantic texton forests to obtain spatiotemporal object attributes. The interface is created by extracting object information from the video with an human based computation to obtain a richer semantics of attribute to bridge the semantic gap between words describing an image and its visual features. Users can navigate and manipulate objects displayed on video by associating semantic attributes and comments evaluated by the data and sentiment extraction. Folksonomy tags are extracted from users comments to be used in a dynamical driven system (Folksodriven). We show some example applications of proposed method like advertisement inside the objects displayed on a video and an interface based on objects of interest video navigation.