Publication: Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition.