Performance issue on Macbook Pro M1

Question

System information

Script can be found below
MacBook Pro M1 (Mac OS Big Sir (11.5.1))
TensorFlow installed from (source)
TensorFlow version (2.5 version) with Metal Support
Python version: 3.9
GPU model and memory: MacBook Pro M1 and 16 GB

Steps needed for installing Tensorflow with metal support. https://developer.apple.com/metal/tensorflow-plugin/

I am trying to train a model on Macbook Pro M1, but the performance is so bad and the train doesn't work properly. It takes a ridiculously long time just for a single epoch.

Code needed for reproducing this behavior.

import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.layers import Embedding, Dense, LSTM
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Model configuration
additional_metrics = ['accuracy']
batch_size = 128
embedding_output_dims = 15
loss_function = BinaryCrossentropy()
max_sequence_length = 300
num_distinct_words = 5000
number_of_epochs = 5
optimizer = Adam()
validation_split = 0.20
verbosity_mode = 1

# Disable eager execution
tf.compat.v1.disable_eager_execution()

# Load dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_distinct_words)
print(x_train.shape)
print(x_test.shape)

# Pad all sequences
padded_inputs = pad_sequences(x_train, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD>
padded_inputs_test = pad_sequences(x_test, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD>

# Define the Keras model
model = Sequential()
model.add(Embedding(num_distinct_words, embedding_output_dims, input_length=max_sequence_length))
model.add(LSTM(10))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer=optimizer, loss=loss_function, metrics=additional_metrics)

# Give a summary
model.summary()

# Train the model
history = model.fit(padded_inputs, y_train, batch_size=batch_size, epochs=number_of_epochs, verbose=verbosity_mode, validation_split=validation_split)

# Test the model after training
test_results = model.evaluate(padded_inputs_test, y_test, verbose=False)
print(f'Test results - Loss: {test_results[0]} - Accuracy: {100*test_results[1]}%')

I have noticed this same problem with LSTM layers

Also, this issue is been reported in Keras and they can't debug.

Keras issue https://github.com/keras-team/keras/issues/15003

tensorflow-metal

6.3k

Posted by

OriAlpha

Reply

I tried for few hours, due to slow training I only trained for 1 epoch, this is a log
2021-07-26 23:09:28.130352: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-07-26 23:09:28.185390: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-07-26 23:09:28.217406: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-07-26 23:09:28.229984: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. Epoch 1/1 20000/20000 [==============================] - loss: 0.5489 - accuracy: 0.6923 --- 6894.8485770225524902 seconds ---
Just for one epoch, it takes around 2 hours that's a nightmare

—
OriAlpha

Add a Comment

Answer 1

It is not fair to achieve TensorFlow repo, before fixing issues of code

Posted by

OriAlpha

Add a Comment

Answer 2

Hi @OriAlpha, We recommend users to upgrade to 12.0 for best support and performance of Metal plugin. I tried the attached script with MacOS 12.0 on a M1 machine and Tensorflow-metal==0.1.2 (I recommend updating to latest metal plugin version). And I got following performance. Please let us know if that helps.

2021-08-24 23:20:50.927094: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.

157/157 [==============================] - 46s 271ms/step - loss: 0.6877 - accuracy: 0.5416 - val_loss: 0.6579 - val_accuracy: 0.6034

Epoch 2/5

157/157 [==============================] - 38s 243ms/step - loss: 0.5634 - accuracy: 0.7459 - val_loss: 0.4508 - val_accuracy: 0.8192

Epoch 3/5

157/157 [==============================] - 38s 244ms/step - loss: 0.4140 - accuracy: 0.8303 - val_loss: 0.3805 - val_accuracy: 0.8410

Epoch 4/5

157/157 [==============================] - 38s 245ms/step - loss: 0.3474 - accuracy: 0.8609 - val_loss: 0.4135 - val_accuracy: 0.8380

Epoch 5/5

157/157 [==============================] - 39s 251ms/step - loss: 0.3075 - accuracy: 0.8814 - val_loss: 0.3535 - val_accuracy: 0.8554

Posted by

Frameworks Engineer

Do I have to upgrade to MacOS 12.0 to fix this problem? Currently, 12.0 is still a beta version.

—
Shaohan

Add a Comment

Answer 3

I saw the same issue, over 7000 seconds per epoch and a lot of warning messages. Then I tried with tf.device("/gpu:0"). Each epoch takes about 38 seconds. However, then I tried with tf.device("/cpu:0"). Each epoch takes only about 7 seconds. So GPU performance is still awful.

I have not yet found a neural net architecture where the M1 GPU is faster than the CPU. For matrix multiplication, the GPU can be 9x faster, but this does not carry over to network training.

Based on other threads and on the comment above by an Apple engineer, it looks like the Apple team doesn't even realize how bad their TensorFlow speed is.

MacBook Air M1 (Mac OS 12 beta) TensorFlow version (2.5 version) with Metal Support Python version: 3.8 GPU model and memory: MacBook Air M1 and 16 GB

Posted by

cantab

Add a Comment

Answer 4

I have the exact same problem!! Started noticing really long training times for a simple BLSTM, and decided to test the above code. I'm also using MacBook Air M1 (Mac OS 12 beta) TensorFlow version (2.5 version) with Metal Support Python version: 3.9 GPU model and memory: MacBook Air M1 and 16 GB. This completely undermines my work! Apple should do something!

Posted by

mrt77

Add a Comment

Answer 5

Yep for me both CPU and GPU performance are not good at all, a relatively simple CNN on a free google colab (with a K80) took about 7 minutes to train, while this same model took about 30minutes on GPU and 42 on CPU in tf 2.6 on my mac mini m1 16gb.

I have seen multiple posts of people experiencing the same issue and the solution always seems to be that you need to upgrade to 12.0 or use CPU (for smaller batch sizes), which both don't seem to fix the issue at hand for most cases.

I would really expect Apple to come up some solution to this, it has been a year since this m1 model was released and I am paying for 3 party notebooks while I would expect such an optimised machine for ML (according to the marketing) to be able to at least run tf at a similar pace as a free colab notebook.

Posted by

10686142

Add a Comment

Answer 6

Hello, Today, I stil getting the same issue in 2022.

it seems the problem has never been solved... I will start started un class on Tensor soon and getting something whitch is very slow like this, that is just so awful.

I don't have choice to use google collab..

Posted by

joe44

Add a Comment

Answer 7

Any new update for 22/12/2022 ?

Posted by

chuongmep

Add a Comment

Answer 8

Same issue in 2023

Posted by

TorbenXD

Add a Comment

Answer 9

Apple M laptop doesnt care about providing support, if your tasks are GPU and ML use nvidia GPU’s those are best, works out of box.

Posted by

OriAlpha

Add a Comment

Answer 10

I am having the same issue -- 4/14/2023 -- Not to mention that I still get the warning to use the from keras.optimizers import Adam as AdamLegacy to make my binary classifier work. Is there any update I should be aware of?

Posted by

erezkatz

Add a Comment

Answer 11

Als0 I don't see a distribution for tensorflow-metal==0.12.0 (latest version is 0.8.0) where can I get it?

Posted by

erezkatz

Add a Comment

Answer 12

Same issue on Tensorflow and the newest Sys env. MacOS 14.0 Beta (23A5286i) Pls help us dear apple!

Posted by

quner

Add a Comment

Answer 13

October 2023 and the issue is still there -- after my upgrade to Sonoma OS I can't get my tensorflow metal to behave well with batch-size of 128 -- I used to run at 64 just fine (it was speedy) and now with higher batches I do see some (not great) performance improvements but the model overfits with large batch sizes. I have read the blogs for all sort of suggestions, reverting back to older version of TF for MAC (I don't want to do that). One suggestion I saw from some postings is to disable GPU alltogether -- anyone had any succces with that?

Posted by

erezkatz

Add a Comment

Answer 14

Hey team, any update on this? Still having the issue with next env: absl-py==1.3.0 aio-pika==8.2.3 aiofiles==22.1.0 aiogram==2.23.1 aiohttp==3.8.3 aiormq==6.4.0 aiosignal==1.3.1 APScheduler==3.9.1.post1 astunparse==1.6.3 async-timeout==4.0.2 attrs==22.1.0 Babel==2.9.1 bert-serving-client==1.10.0 bidict==0.22.1 boto3==1.26.136 botocore==1.29.136 CacheControl==0.12.11 cachetools==5.2.1 certifi==2023.7.22 cffi==1.15.1 charset-normalizer==2.1.1 click==8.1.3 cloudpickle==2.2.0 colorclass==2.2.2 coloredlogs==15.0.1 colorhash==1.2.1 confluent-kafka==1.9.2 cryptography==41.0.7 cycler==0.11.0 dask==2022.10.2 dnspython==2.3.0 docopt==0.6.2 fbmessenger==6.0.0 fire==0.5.0 flatbuffers fonttools==4.38.0 frozenlist==1.3.3 fsspec==2022.11.0 future==0.18.3 gast==0.2.1 google-auth==2.16.0 google-auth-oauthlib==0.4.1 google-pasta==0.2.0 greenlet==3.0.3 grpcio==1.51.1 h5py==3.10.0 httptools==0.5.0 humanfriendly==10.0 idna==3.4 jmespath==1.0.1 joblib==1.2.0 jsonpickle==2.2.0 jsonschema==4.16.0 keras Keras-Preprocessing==1.1.2 kiwisolver==1.4.4 libclang==15.0.6.1 locket==1.0.0 magic-filter==1.0.9 Markdown==3.4.1 MarkupSafe==2.1.2 matplotlib==3.5.3 mattermostwrapper==2.2 msgpack==1.0.4 multidict==5.2.0 networkx==2.6.3 numpy==1.23.5 oauthlib==3.2.2 opt-einsum==3.3.0 packaging pamqp==3.2.0 partd==1.3.0 Pillow==9.4.0 pip==22.3.1 pluggy==1.0.0 prompt-toolkit==3.0.28 protobuf psycopg2-binary==2.9.5 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pydot==1.4.2 PyJWT==2.6.0 pykwalify==1.8.0 pymongo==4.0.1 pyparsing==3.0.9 pyrsistent==0.19.3 python-crfsuite==0.9.8 python-dateutil==2.8.2 python-engineio==4.3.4 python-socketio==5.7.2 pytz==2022.7.1 pytz-deprecation-shim==0.1.0.post0 PyYAML==6.0.1 pyzmq==25.0.0 questionary==1.10.0 randomname==0.1.5 rasa rasa-sdk redis==4.5.3 regex==2022.10.31 requests==2.28.2 requests-oauthlib==1.3.1 requests-toolbelt==0.10.1 rocketchat-API==1.28.1 rsa==4.9 ruamel.yaml==0.17.21 ruamel.yaml.clib==0.2.7 s3transfer==0.6.0 sanic==21.12.2 Sanic-Cors==2.0.1 sanic-jwt==1.8.0 sanic-routing==0.7.2 scikit-learn==1.1.3 scipy==1.12 sentry-sdk==1.11.1 setuptools==65.6.3 six sklearn-crfsuite==0.3.6 slack-sdk==3.19.5 SQLAlchemy==1.4.46 tabulate==0.9.0 tarsafe==0.0.3 tensorboard==2.9 tensorboard-data-server tensorboard-plugin-wit==1.8.1 tensorflow-macos==2.9 tensorflow-metal==0.5.0 tensorflow-addons==0.18.0 tensorflow-estimator==2.9 tensorflow-hub==0.13.0 tensorflow-io-gcs-filesystem==0.36.0 tensorflow-text termcolor==2.2.0 terminaltables==3.1.10 threadpoolctl==3.1.0 toolz==0.12.0 tqdm==4.64.1 twilio==7.14.2 typeguard==2.13.3 typing_extensions==4.4.0 typing-utils==0.1.0 tzdata==2022.7 tzlocal==4.2 ujson==5.7.0 urllib3==1.26.14 uvloop==0.17.0 wcwidth==0.2.6 webexteamssdk==1.6.1 websockets==10.4 Werkzeug==2.2.2 wheel==0.38.1 wrapt==1.14.1 yarl==1.8.2

Posted by

JuanAmay

Add a Comment

Performance issue on Macbook Pro M1

Replies