Tensorflow on M1 Macbook Pro, error when model fit executes

Question

It doesn't matter if I install miniforge or mamba, directly or through brew, when I try to fit the sample model from https://developer.apple.com/metal/tensorflow-plugin/, even with a simple sequential model, I always get this error.

Is there any workaround on this? I'll appreciate any help, thanks!

2022-12-10 11:18:19.941623: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2022-12-10 11:18:20.427283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. 2022-12-10 11:18:21.222950: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.223003: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.363366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.364757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90

NotFoundError Traceback (most recent call last) Cell In[25], line 2 1 model = create_model() ----> 2 history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64);

File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb

File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None:

NotFoundError: Graph execution error:

Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/traitlets/config/application.py", line 992, in launch_instance app.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 711, in start self.io_loop.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start self.asyncio_loop.run_forever() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once handle._run() ...

File "/var/folders/f9/bp40pn0d401d974fy48dxm8h0000gn/T/ipykernel_63636/3393788193.py", line 2, in <module>
  history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64);
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
  return fn(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
  tmp_logs = self.train_function(iterator)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
  return step_function(self, iterator)
......

File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
  outputs = model.train_step(data)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
  self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
  self.apply_gradients(grads_and_vars)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
  return super().apply_gradients(grads_and_vars, name=name)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
  iteration = self._internal_apply_gradients(grads_and_vars)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
  return tf.__internal__.distribute.interim.maybe_merge_call(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
  distribution.extended.update(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
  return self._update_step_xla(grad, var, id(self._var_key(var)))

Node: 'StatefulPartitionedCall_4' could not find registered platform with id: 0x28edf1f90 [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1241]

16k

Posted by

ppobar

Reply

Same question here.Did it because of the system version?I'm in macos13.0.1 vurtura.

—
zhouxiaoliang
were you able to find a work around for this? I am having the exact same problem now.

—
CWcx

Add a Comment

Answer 1

I dropped back to the following versions: tensorflow-macos==2.9 and tensorflow-metal==0.5.0. Was using the tensorflow-macos==2.11 and tensorflow-metal==0.7.0 version and just couldn't get things to work. After dropping back I was able to use the GPU and all my validations worked. I'll check back later to see if a more current version will worl.

Posted by

dweilert

@dweilert Your solution works for me!! (macOS 13.0.1 M1 Max 64GB) Thanks a lot!!!!!

—
bseo
I am new into this . Can you please post a step by step process on how you did it. I tried to downgrade using pip install and all crashed . Thanks

—
BBond
thank you so much, I can solve this problem easily

—
BaeSumok

Answer 2

I don't think I understand the error message better. But here is what I did to make tensorflow working on my Macbook Pro M1 Pro and hopefully it helps. First I removed all previous Anaconda installation I had. See tutorial here (https://docs.anaconda.com/anaconda/install/uninstall/). Then I followed every single step in the link you attached (https://developer.apple.com/metal/tensorflow-plugin/). Make sure you use miniconda instead of anaconda, but bash or graphical installer should not matter. I tried anaconda and it didn't work well due to package conflicts. I was able to run the sample model in Step 4 without a problem.

Posted by

rollingneuron

Add a Comment

Answer 3

I have the same error. I used minconda and still get the error. It occurs in the model.fit. Everything before that looks normal until I run model.fit and get so many warning then error messages

OT_FOUND: could not find registered platform with id: 0x282f9b6f0 2022-12-10 23:22:22.325619: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x282f9b6f0

NotFoundError Traceback (most recent call last) Cell In[27], line 13 11 loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) 12 model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"]) ---> 13 model.fit(x_train, y_train, epochs=5, batch_size=64)

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None:

NotFoundError: Graph execution error:

Node: 'StatefulPartitionedCall_212' could not find registered platform with id: 0x282f9b6f0 [[{{node StatefulPartitionedCall_212}}]] [Op:__inference_train_function_25966]

Posted by

MLTracer

Add a Comment

Answer 4

I was also facing the exact problem, when I.install TensorFlow and test by running the code example "mint database" everything was fine except the model.fit function hope anyone can help to solve the problem

Posted by

Cmffff

Add a Comment

Answer 5

Yes. I'm facing the same problem. I guess you are using Tensorflow-macos 2.11.0 and Tensorflow-metal 0.7.0.

From my understanding, the problem is the 'conda install -c apple tensorflow-deps' step as per the website instruction https://developer.apple.com/metal/tensorflow-plugin/. We are still installing tensorflow-deps 2.9.0 (https://anaconda.org/apple/tensorflow-deps/files). I was facing problem to even run tensorflow-macos 2.10.0 on my mac, had to downgrade to tensorflow-macos 2.9 (to match tensorflow-deps 2.9.0). Also, as per the conda link, there is no tensoflow-deps 2.11.0 from apple yet. Hopefully, this issue is fixed soon.

Posted by

Imperfect8747

Actually tensorflow-macos 2.10.0 and tensorflow-metal 0.6.0 works fine with tensorflow-deps 2.9.0

—
Alexander2022
tensorflow-deps 2.9.0 is just "grpcio >=1.37.0,<2.0", "h5py >=3.6.0,<3.7", "numpy >=1.22.3,<1.23.0", "python"

most likely the problem is something else.

—
Alexander2022

Add a Comment

Answer 6

Same error!!

M1 MAX

Mac OS Ventura 13.1 tensorflow-metal 0.7.0 tensorflow-macos 2.11.0

Metal device set to: Apple M1 Max systemMemory: 32.00 GB maxCacheSize: 10.67 GB

2022-12-11 18:48:12.915462: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

2022-12-11 18:48:12.915489: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )

2022-12-11 18:48:13.971037: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz

Epoch 1/5

/Users/macstudio/miniconda/envs/tf/lib/python3.10/site-packages/keras/backend.py:5585: UserWarning: "sparse_categorical_crossentropy received from_logits=True, but the output argument was produced by a Softmax activation and thus does not represent logits. Was this intended?

output, from_logits = _get_logits(

2022-12-11 18:48:19.160047: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.

2022-12-11 18:48:20.283908: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x12c5c26d0

2022-12-11 18:48:20.283938: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x12c5c26d0

Posted by

Alexander2022

Here is what is working on my M1 Pro. Hope that helps.

System: Mac OS Ventura 13.0.1 (22A400) Conda list: tensorboard 2.10.1 pypi_0 pypi tensorflow-deps 2.9.0 0 apple tensorflow-estimator 2.10.0 pypi_0 pypi tensorflow-macos 2.10.0 pypi_0 pypi tensorflow-metal 0.6.0 pypi_0 pypi

—
rollingneuron

Add a Comment

Answer 7

Hello all, I was also facing the same problem then I installed the recommended versions at here https://developer.apple.com/metal/tensorflow-plugin/. For tensorflow-macos, it is currently 2.9. For tensorflow-metal, it is currently 0.5. With that, I was able to use my gpu. I hope this help

Posted by

utkusaglm

For tensorflow-macos, it is currently 2.9. For tensorflow-metal, it is currently 0.5. All working now! Great! Push this to top!

—
KuiWangCAM

Add a Comment

Answer 8

You need to use the tensorflow-metal version 0.5.0. See the version table on https://developer.apple.com/metal/tensorflow-plugin/.

Install the proper version with:

python -m pip install tensorflow-metal==0.5.0

Posted by

jirihybek

Add a Comment

Answer 9

In my case I should specify versions

python -m pip install tensorflow-macos==2.9 python -m pip install tensorflow-metal==0.5.0

Posted by

gyoung-seok

This solved the issue for me. Thanks!

—
arpit2735
this work for me.

—
kennethwang

Add a Comment

Answer 10

Hi @ppobar

I assume you are seeing this on the latest wheels with tensorflow-macos==2.11 and tensorflow-metal==0.7.0? In that case this most probably has to do with recent changes on tensorflow side for version 2.11 where a new optimizer API has been implemented where a default JIT compilation flag is set (https://blog.tensorflow.org/2022/11/whats-new-in-tensorflow-211.html). This forces the optimizer op to take an XLA path that the pluggable architecture has not implemented yet causing the inelegant crash as it cannot fall back to supported operations. Currently the workaround is to use the older API for optimizers that was used up to TF 2.10 by exporting it from the .legacy folder of optimizers. So more concretely by using Adam optimizer as an example one should change

from tensorflow.keras.optimizers import Adam

to

from tensorflow.keras.optimizers.legacy import Adam.

This should restore previous behavior while the XLA path support is being worked on. Let me know if this solves the issue for you! And if not, could you provide details on which OS version, tf-macos and tf-metal versions you are seeing this and a script I can use to reproduce the issue?

Posted by

Frameworks Engineer

Hi I have updated to tensorflow-macos==2.11 and tensorflow-metal==0.7.0 Change to legacy solved the issue. For me it works. Thank you.

—
Alexander2022
it's work for me. thanks

—
WangZongZheng
this solution partially works for me. I can run the model, but it is not running in the GPU so it's super slow.

—
KemiDG

Answer 11

I dropped back to the following versions: tensorflow-macos==2.9 and tensorflow-metal==0.5.0. Was using the tensorflow-macos==2.11 and tensorflow-metal==0.7.0 version and just couldn't get things to work. After dropping back I was able to use the GPU and all my validations worked. I'll check back later to see if a more current version will worl.

Posted by

dweilert

@dweilert Your solution works for me!! (macOS 13.0.1 M1 Max 64GB) Thanks a lot!!!!!

—
bseo
I am new into this . Can you please post a step by step process on how you did it. I tried to downgrade using pip install and all crashed . Thanks

—
BBond
thank you so much, I can solve this problem easily

—
BaeSumok

Answer 12

@Frameworks Engineer I can confirm that switching to from tensorflow.keras.optimizers.legacy import Adam fixes the XLA problem, and TF 2.11 works fine. So no need to downgrade the tensorflow version. Thank you!

Posted by

Pergamon

Add a Comment

Answer 13

Thank yo so much to everyone.

I had the latest wheels (tensorflow-macos==2.11 and tensorflow-metal==0.7.0), so I tryied with this: from tensorflow.keras.optimizers.legacy import Adam, as @Frameworks Engineer suggested. Even though it was helpfull still some model.fit failed.

So, at the end I went back to tensorflow-macos==2.9 and tensorflow-metal==0.5.0 and, as many of you suggested and, now everything is working fine.

Posted by

ppobar

Add a Comment

Answer 14

I just followed the suggestion provided above (downgrade to tensorflow-macos==2.9 and tensorflow-metal==0.5.0) works!! Thank you all!

Posted by

bseo

Add a Comment

Answer 15

Reinstall miniforge3 with Python 3.9 version. Command- conda create --prefix ./env python=3.8 conda activate ./env 2.conda install -c apple tensorflow-deps. 3.python -m pip install tensorflow-macos==2.9 4.python -m pip install tensorflow-metal==0.5.0 5. Run sample script available on https://developer.apple.com/metal/tensorflow-plugin/%C2%A0This worked for me. Check versions properly.

Posted by

INFINITY1105

Add a Comment

Answer 16

Any update on when this issue can be fixed? I mean to support above tensorflow-macos>2.9. I am using a keras model which requires keras 2.10, not 2.9.0.

Posted by

marlonmin

Add a Comment

Tensorflow on M1 Macbook Pro, error when model fit executes

Accepted Reply

Replies