A Transformer-Based Multimodal Framework for Enhanced Autism Spectrum Disorder Diagnosis

1) To improve identification performance, we proposed a multimodal framework that integrates medical imaging and clinical textual data.
2) Introduction of a classification token mechanism to enhance feature representation and determination with Vision Transformer.
3) Finally, we implement hyperparameter optimization techniques to improve model efficiency, generalization, and overall performance.