Why my neural network is slower on MPS(Apple Silicon) than on CPU

with my MacBook m2.

The code works correctly both on CPU and GPU, but the speed on GPU is much slower! I have loaded my statistic and my model on GPU, and it seemed to work. /Users/guoyijun/Desktop/iShot_2023-08-20_09.57.41.png

I printed my code runtime. when the following function "train" is called, the loop speed among them runs extraordinarily slow.

def train(net, device, train_features, train_labels, test_features, test_labels,
          num_epochs, learning_rate, weight_decay, batch_size):
    
    train_ls, test_ls = [], []
        
    train_iter = d2l.load_array((train_features, train_labels), batch_size, device)
    # Adam
    optimizer = torch.optim.Adam(net.parameters(), lr = learning_rate, weight_decay = weight_decay)
    for epoch in range(num_epochs):
        for X, y in train_iter:
            optimizer.zero_grad()
            l = loss(net(X), y)
            l.backward()
            optimizer.step() #
        train_ls.append(log_rmse(net, train_features, train_labels)) 
    return train_ls, test_ls