A key characteristic of video data is the associated spatial and temporal semantics. It is important that a video model models the characteristics of objects and their relationships in time and space. Allen's 13 temporal relationships are often used in formulating queries that contain the temporal relationships among video frames. For the spatial relationships, most of the approaches ere based on projecting objects on a two or three-dimensional coordinate system, However, very few attempts have been made formally to represent the spatio-temporal relationships of objects contained in the video data and to formulate queries with spatio-temporal constraints. The purpose of our work is to design a model representation for the specification of the spatio-temporal relationships among objects in video sequences. The model describes the spatial relationships among objects for each frame in a given video scene and the temporal relationships (for this frame) of the temporal intervals measuring the duration of these spatial relationships. It also models the temporal composition of an object, which reflects the evolution of object's spatial relationships over the subsequent frames in the video scene and in the entire video sequence. Our model representation also provides an effective and expressive way for the complete and precise specification of distances among objects in digital video. This model is a basis for the annotation of raw video.