Given that you would need an exceedingly extensive, labeled dataset to start working with actions/behaviours/body language directly, the only realistic way to start would be with a language model. (And while these are sometimes decent, they’re absolutely terrible at other times (and there’s generally no way to tell which time is which).)
Given that you would need an exceedingly extensive, labeled dataset to start working with actions/behaviours/body language directly, the only realistic way to start would be with a language model. (And while these are sometimes decent, they’re absolutely terrible at other times (and there’s generally no way to tell which time is which).)