Recent studies have explored collaborative Transformer-based inference in edge computing (EC), but they often overlook the mobility of users and edge devices, leading to potential reliability issues. This paper aims to minimize inference latency in mobile edge computing (MEC) by considering heterogeneity in mobility, computation, and communication. We propose a task partitioning model utilizing the GPipe scheme for Transformer-based inference. The task partitioning and offloading problem is then formulated with constraints on computation resources and mobility, decomposed into a bin-packing problem and an integer optimization problem. To solve these subproblems, we introduce the Distributed Aggregated Competition Algorithm (DACA). Extensive simulations and testbed experiments demonstrate the high performance of our proposed algorithm in minimizing inference latency across heterogeneous mobile edge devices and networks.