The mother’s arm is moving upwards: how are we able to see in her gesture the tender sweetness leading her to caress the cheek of her sleeping child or the inner violence preparing to hit the soldier’s cheek? How can we foresee in the heavy murderer’s hand movement a slight faltering revealing his fragile uncertainty in finishing his task, how can we measure the hesitation of the woman’s arms closing to protect her crying, in resonant dialogue with the bodies moving around her?

Understanding, measuring and predicting the qualities of movement imply a dynamic cognitive relation with a complex non-linearly stratified temporal dimension. Movements are hierarchically nested: a gesture sequence has a hierarchical layered structure: from high level layers down to more and more local components (as a language: from syntax to phonemes) where every layer influences and is influenced by every other (bottom-up/top-down). Every layer is characterized by a different temporal dimension: a proper rhythm from macro to micro temporal scales of action. This organization does not only apply to action execution, but also to action observation and is at the basis of the unique human ability to understand and predict conspecific gestural qualities. Human skill in understanding and predicting gestural qualities, and attempting to influence one another’s actions, depends on the capacity to create intercrossing relations between these different temporal and spatial layers through feedforward/feedback connections and bidirectional causalities, with the body as a time keeper, coordinating different internal, mental and physiological clocks. In 1973, Johansson showed that the human visual system can perceive the movement of a human body from a limited number of moving points. This landmark study grounded the scientific bases of current motion capture technologies. Recent studies proved that the information contained in such a limited number of moving points does not concern only the activity performed, but can also provide hints about more complex cognitive and affective phenomena: for example, Pollick (2001) showed that participants can infer emotional categories from point-light representations of everyday actions). Studies using naturalistic images and videos have established how fluent we are in body language (de Gelder, 2016).  Very few studies consider the temporal dynamics of the stimulus, and how affective qualities may be perceived faster than other qualities (Meeren et al 2016), be interlinked and change over time. In other words, time is a crucial variable for these processes. Such time intervals are the time intervals of human perception and prediction, i.e., this is a human time, which integrates time at the neural level up to time at the level of narrative structures and content organization. Current technologies either do not deal with such a human time or they do in a quite empirical way: motion capture technologies are most often limited to computation of kinematic measures whose time frame is usually too short for an effective perception and prediction of complex phenomena. While a lot of effort is being spent improving such technologies in the direction of more accurate and more portable systems (e.g., wearable and wireless), such developments are incremental with respect to a conceptual and technological paradigm that remains unchanged. Furthermore, most systems for gesture recognition or for analysis of emotional content from movement data streams adopt time processing windows whose duration is fixed and is usually empirically determined. EnTimeMent proposes a radical change of paradigm and technology in human movement analysis, where the time frame for analysis is grounded on novel neuroscientific, biomechanical, psychological, and computational evidence, and dynamically adapted to the human time governing the phenomena under investigation. This is obtained through innovative scientifically-grounded and time-adaptive technologies that can operate at multiple time scales in a multi-layered approach, transforming the current generation of motion capture and movement analysis systems and endowing them with a completely novel functionality to achieve a novel generation of time-aware multisensory motion perception and prediction systems. The goal of EnTimeMent is to develop the scientific knowledge and the enabling technologies for such a transformation. EnTimeMent will enable novel forms of human-machine interaction (human-machine affective synchronization) and human-to-human entrainment experiences, primarily concerned with the non-verbal, embodied and immersive, active and affective dimensions of qualitative gesture. For instance, novel machine learning techniques inspired by deep networks (Bengio, 2015) might afford the detection of patterns in motion to represent motion at gradually increasing levels of abstraction using Empirical Mode Decomposition (Huang et al., 1998) to extract local characteristic and time scale of the data automatically. This will effectively reduce the originally high-dimensional and redundant raw sensor observations to something that is more manageable for inference of, for example, emotions. The experimental platform will integrate signals from sensors capturing movement in relation to respiration, heart rate, muscle activity, sound, and brain activity across multiple time scales. By these means, the main technological breakthrough of EnTimeMent will be promoting novel perspectives on understanding, measuring and predicting the qualities of movement (at individual and group level) in motion capture, multisensory interfaces, wearables, affective and IoT technologies.


EnTimeMent measurable objectives can be summarised as follows:


Scientific objectives:

Objective 1: To propose and empirically validate a neuro-cognitive model of the  multiple, mutually interactive time scales that contribute to human perception of gesture qualities and action prediction., assuming an embodied cognitive experience of gestural qualities (in time and through time).

Objective 2: To develop computational models grounded on the neuro-cognitive model (Obj.1) for the automated detection, measurement, and prediction of movement qualities at individual as well as group level across different time scales.  

Objective 3: To analyse music performers movement synchronized at different temporal scales in ecological music performance, to inspire the design of computational models (Obj. 2), their applications in the use case scenarios (Obj 7), and to develop and validate a movement sonification framework that enhance the perception and communication of movement across different temporal scales.

Objective 4: To design controlled and ecological experiments to iteratively refine, validate, and evaluate the proposed conceptual framework and computational models, leveraging several test-beds affording the possibility to address prediction in different scenarios. Benefiting from an interdisciplinary research consortium, the aforementioned scenarios will range from action prediction in a controlled laboratory setting, to prediction in dyadic human-human and human-robot interaction and to prediction in small group interaction.

Technological objectives:

Objective 5: Developing computational methods for the analysis and prediction of movement qualities from behavioral signals, based on multi-layer parallel processes at non-linearly stratified temporal dimensions. As a result, we expect the project will produce a library of software modules for the EnTimeMent platform, providing the measurement tools needed for enabling automatic measurement, prediction, and entrainment of movement qualities at both the individual and group level.

Objective 6: Supporting experiments and proof-of-concepts through a hardware and software platform (the EnTimeMent experimental platform), towards the experimentation of novel generations of time-aware motion perception and prediction systems, by enabling: (i) real-time and synchronized data capture from multiple and different input devices; (ii) automated measurement, processing, and prediction of qualities of movement exploiting the theoretical conceptual framework, and in particular the multiple non-linear stratified temporal dimensions; (iii) delivery of multisensory feedback through multiple output devices, including sonification. The platform needs to be scalable to different scenarios, ranging from controlled laboratory environments to ecological realistic settings. Besides common input devices such as video cameras and wireless on-body microphones, the EnTimeMent platform will support commercial as well as prototypes of motion capture systems, Inertial Measurement Units, devices for real-time physiological data acquisition (e.g., EMG and Functional Near Infrared Spectroscopy [fNIRS]).

- Objective 7: Developing three (3) proofs-of-concepts exploiting both the envisaged platform and libraries – in three different real-world scenarios: healing and support of everyday life in disabled persons, human interaction with non-anthropomorphic robots (living architecture), and sensory-motor entrainment and commitment in sport, dance and music. The three proof-of-concepts are selected to cover a broad variety of real life contexts, so that they can fully demonstrate the functionalities of the developed technologies. The expected result is to demonstrate the effectiveness of the technology from the perspective of its broad future exploitation.

Community-building objectives:

-   Objective 8: Building a novel scientific interdisciplinary community leading the transformation of the current generation of motion capture systems into a novel generation of time-aware motion perception and prediction systems. Disciplines include computer science and engineering, human movement and sport sciences, biomechanics, neuroscience, cognitive science, psychology, performing arts. Computer science can contribute to ground conceptual models on strong quantitative evidence; this interdisciplinary framework will contribute to ground computational methods on stronger theoretical bases. EnTimeMent will strengthen interactions between these research areas by means of cross-disciplinary collaborations and public community-building initiatives (e.g., in existing as well as new conferences, workshops, scientific contests) aiming at identifying common research challenges and at addressing them through a shared research methodology.

-   Objective 9: Building an interdisciplinary community of scientists and technologists around the core concept of EnTimeMent, that is, technologies that foster time-aware motion perception and prediction systems at individual as well as group levels, by involving key stakeholders such as startup incubators and innovation hubs. Such a community will cover and address, on the one hand, a natural but strongly innovative evolution of the motion capture and movement analysis research area dealing with adaptive and predictive interfaces, and, on the other hand, will be transversal with respect to technologies used in other areas, for example, affective computing, sport technologies, therapy and rehabilitation, IoT, domotics, gaming, and entertainment.

-   Objective 10: Building a community of stakeholders, including companies and SMEs, interested in the concept of time-aware movement perception and prediction technologies and willing to bring such technologies into their services or products (e.g., companies willing to deploy innovative technological solutions for the EnTimeMent application scenarios). Such a community would enable the project to reach a concrete societal and economic impact. The building community actions will include public initiatives (e.g. hackathons, contests, users workshops) aiming at attracting, engaging and integrating the diverse types of stakeholders including incubation ecosystems for SMEs, for a rich exchange and convergence of the project with external stakeholders.

The success of the project with respect to the objectives listed above will be assessed through studies involving selected user communities exposed to the project enabling technologies in the selected scenarios. Assessment will concern the effectiveness of the technology-supported activities. To this aim, a detailed collection of success measures will be identified and defined. Incubators and innovation hubs for ICT SMEs and stakeholders in the fields of disability and sports technologies (e.g. ( and will be active members of the Stakeholder Panel and will play active roles in co-organizing community-building initiative; two major hospitals will support concretely the experimentation of the healing and chronic pain scenarios in everyday life, providing access to real user populations within a careful management of ethical issues.

EnTimeMent scenarios

Scenario 1: Healing with multiple times

Research questions addressed: How can multi-time EnTimeMent technologies improve everyday life communicative and prosocial skills in severely disabled persons with low or no verbal capabilities?

Narrative description: Giorgio is a 4 year-old child with a severe case of dystonic cerebral palsy. As a consequence, he is not able to control his movements, feed himself or effectively communicate with others, which threatens learning and participation. His mom Francesca struggles with her child’s condition which she hasn’t still fully accepted. Therapists know the situation is extremely complex, with the family living far from the hospital, in an area where social assistance and services are less effective.

In cases as severe as Giorgio’s, in which a complex rehabilitation project takes place as the person is still growing up, the healing process becomes a life project, with multiple aims and goals unfolding in multiple temporal dimensions. The rehabilitation team has leveraged a “home-care-unit” that affords the family to live autonomously while having Giorgio monitored by the EnTimeMent technology , and allows to read Giorgio’s body communicative signals at multiple temporal scales, in order to:

at a lower temporal scale, enhance movement control by interpreting bodily sensorimotor and muscular signals to regulate and better tune care and assistance; Improve emotion regulation by decoding affective bodily signals and providing most appropriate responses and care.

at a middle temporal scale, foster attention and motivation by interpreting bodily signals of vigilance, focus and interest; Improve control of bodily movements while learning daily-life processes such as drinking; Support emotion regulation by interpreting residual bodily intentional communication

at a higher temporal, foster planning and control of complex activities such as play and social interaction by decoding participative signals; Enhance metacognitive processes by reading resilience signals.


Scenario 2: Chronic Musculoskeletal Pain Management with multiple times

Research question addressed: Can the modeling and enhancement of diverse temporal-scales of movement  in chronic musculoskeletal pain facilitate self-management and functional activity?

Narrative description: Since developing chronic back pain, Paul’s attention is constantly on pain and on fearing increased pain or damage. He no longer understands what his body can and cannot do, or learn from experiencing movement. When he tackles physical tasks, his concern is about pain so he rushes them, feels no satisfaction at completing them and often feels defeated by increased pain. His central nervous system is ‘stuck’ on pain. The newly released EnTimeMent motion capture system captures his movements at different temporal-scales through the day and sonifies it. Each temporal-scale of movement carries particular meaning. Low-level temporal dynamics of Paul’s taking a book from a shelf tells him how far he stretches and how much more he could move, constraining anxiety. Longer temporal aspects of his movement, outside his awareness, represent task achievement (e.g. washing his car) and remind him to take regular breaks before pain builds up. Even longer temporal dynamics, over weeks and months, show him progress in his capabilities, while increased fluidity of his movement reflects reduced anxiety and increased confidence. This progress sustains him during setbacks and higher pain days, as it reminds him how to build activity again, and his central nervous system is processing normal messages about movement. We will investigate new computational models that capture these different temporal dynamics of movement and the multiple temporal phenomena characterising them and their interactions (kinematics, muscle activity, respirations, brain activation) and transform them into sound enriched by metaphorical musical structures that through embodiment enhances the awareness of movement capability, and provides a path for taking back control from pain. Such computational models will also aim to model even lower temporal dynamics, capturing protective and avoidant movement responses (habits outside awareness) at physiological and neural levels that maintain pain. We will aim at developing new metrics of progress (e.g. individual motor signature as in Słowiński et al. 2016) based on these different temporal-scales.  Such computational models can inform the design of rehabilitation for chronic pain self-management supported by the novel multi-temporal multi-modal EnTimeMent motion capture technology. We will also explore how it can be used to facilitate communication between the person with chronic pain and her home-carer and healthcare staff to facilitate learning of management skills.


Scenario 3:  Dancing with multiples times

Research questions addressed: How are individual and group motor signatures of dancers characterized at multiple time scales during improvisation among them and with non-anthropomorphic partners?

Narrative description: A group of dancers engaged in a performance with non-anthropomorphic dancing partners perform joint improvisation sequences, playing with social roles (leader, follower, improvising), emotional gesture qualities (hesitant, aggressive etc.) and physical/ghost presence of their dancing partners. Dancers perform in an architectural space, which can be interactively manipulated by sound (interactive movement sonification, to enhance the communication at multiple temporal scales) or light (e.g., in the light or in the dark), and which can even be structurally transformed by means of interactive mobile scenery (e.g. a theatre stage technologically equipped simulating a future living architecture). Using the EnTimeMent motion analysis system, biological and social signals are recorded at multiples time scales, from milliseconds (e.g., EMG) to seconds (e.g., MOCAP) to minutes (e.g., breathing), allowing the computation of Individual Motor Signatures of each dancer (IMS), Group Motor Signature (GMS), together with their dynamics at their respective temporal scales, as well as entrainment, saliency, and prediction of qualities. Applying the new Curiosity-based EnTimeMent Learning Algorithm (CELA), relevant individual and social performance qualities are extracted simultaneously. Concrete instances of this group interaction scenario will be developed, from dance to sports, fitness, wellness, and entertainment group activities.