Thanks for the timely response.
However, the recommended Legacy optimizer API fix did not correct the issue.
I interactively re-ran the updated script using the Legacy optimizer.
The script succeeds up to the final step, including the creation of the Pluggable TensorFlow device.
It fails in the first Epoch when it attempts to fit the model to the training data, i.e.:
>>> model.fit(x_train, y_train, epochs=5, batch_size=64)
Note the UserWarning:
UserWarning: "
sparse_categorical_crossentropyreceived
from_logits=True, but the
output argument was produced by a Softmax activation and thus does not represent logits. Was this intended?
Then note:
OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fc2eb58dc70
HERE IS A FULL DUMP OF THE SESSION:
2023-01-21 17:12:37.085285: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> from tensorflow.keras.optimizers.legacy import Adam
>>> cifar = tf.keras.datasets.cifar100
>>> (x_train, y_train), (x_test, y_test) = cifar.load_data()
>>> model = tf.keras.applications.ResNet50(
... include_top=True,
... weights=None,
... input_shape=(32, 32, 3),
... classes=100,)
2023-01-21 17:14:51.596506: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Metal device set to: AMD Radeon RX Vega 64
systemMemory: 32.00 GB
maxCacheSize: 3.99 GB
2023-01-21 17:14:51.597304: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-01-21 17:14:51.597346: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
>>> loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
>>> model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
>>> model.fit(x_train, y_train, epochs=5, batch_size=64)
Epoch 1/5
/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/backend.py:5585: UserWarning: "`sparse_categorical_crossentropy` received `from_logits=True`, but the `output` argument was produced by a Softmax activation and thus does not represent logits. Was this intended?
output, from_logits = _get_logits(
2023-01-21 17:18:33.774397: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-01-21 17:18:36.989723: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fc2eb58dc70
2023-01-21 17:18:36.989759: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fc2eb58dc70
............. (same as above)
2023-01-21 17:18:38.299162: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fc2eb58dc70
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:
Detected at node 'StatefulPartitionedCall_212' defined at (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
return super().apply_gradients(grads_and_vars, name=name)
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
iteration = self._internal_apply_gradients(grads_and_vars)
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
return tf.__internal__.distribute.interim.maybe_merge_call(
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
distribution.extended.update(
File "/Users/jnevin/venv-metal/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_212'
could not find registered platform with id: 0x7fc2eb58dc70
[[{{node StatefulPartitionedCall_212}}]] [Op:__inference_train_function_23355]
>>>