tensorflow-metal

Huge memory leakage issue with tf.keras.models.predict()

Comparison between MAC Studio M1 Ultra (20c, 64c, 128GB RAM) vs 2017 Intel i5 MBP (16GB RAM) for the subject matter i.e. memory leakage while using tf.keras.models.predict() for saved model on both machines: MBP-2017: First prediction takes around 10MB and subsequent calls ~0-1MB MACSTUDIO-2022: First prediction takes around 150MB and subsequent calls ~70-80MB. After say 10000 such calls o predict(), while my MBP memory usage stays under 10GB, MACSTUDIO climbs to ~80GB (and counting up for higher number of calls). Even using keras.backend.clear_session() after each call on MACSTUDIO did not help. Can anyone having insight on TensorFlow-metal and/or MAC M1 machines help? Thanks, Bapi

tensorflow-metal

Posted

by

karbapi

Tensorflow on M1 Macbook Pro, error when model fit executes

It doesn't matter if I install miniforge or mamba, directly or through brew, when I try to fit the sample model from https://developer.apple.com/metal/tensorflow-plugin/, even with a simple sequential model, I always get this error. Is there any workaround on this? I'll appreciate any help, thanks! 2022-12-10 11:18:19.941623: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2022-12-10 11:18:20.427283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. 2022-12-10 11:18:21.222950: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.223003: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.363366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.364757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 NotFoundError Traceback (most recent call last) Cell In[25], line 2 1 model = create_model() ----> 2 history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None: NotFoundError: Graph execution error: Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/traitlets/config/application.py", line 992, in launch_instance app.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 711, in start self.io_loop.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start self.asyncio_loop.run_forever() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once handle._run() ... File "/var/folders/f9/bp40pn0d401d974fy48dxm8h0000gn/T/ipykernel_63636/3393788193.py", line 2, in <module> history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler return fn(*args, **kwargs) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit tmp_logs = self.train_function(iterator) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function return step_function(self, iterator) ...... File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step outputs = model.train_step(data) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step self.optimizer.minimize(loss, self.trainable_variables, tape=tape) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize self.apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients return super().apply_gradients(grads_and_vars, name=name) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients iteration = self._internal_apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients return tf.__internal__.distribute.interim.maybe_merge_call( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn distribution.extended.update( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var return self._update_step_xla(grad, var, id(self._var_key(var))) Node: 'StatefulPartitionedCall_4' could not find registered platform with id: 0x28edf1f90 [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1241]

Posted

by

ppobar

Error regarding numpy version after installing tensor flow-macos

I tried installing tensorflow by following the guideline at https://developer.apple.com/metal/tensorflow-plugin/ inside of a conda environment. The installation works fine, but when trying to run the verification script at the end of the guide, it fails at import tensorflow as tf and I get the following error: RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf ImportError: numpy.core._multiarray_umath failed to import ImportError: numpy.core.umath failed to import Traceback (most recent call last): File "/Users/username/Desktop/test1.py", line 5, in <module> import tensorflow as tf ... The version of numpy installed by running conda install -c apple tensorflow-deps is 1.22.3, but the error seems to suggest that this is not compatible with tensorflow. Any advice on how to fix that would be much appreciated!

tensorflow-metal

Posted

by

Mac4Ever

Keras with tensorflow-metal freezes during training with image augmentation

I am trying to train an image classification network in Keras with tensorflow-metal. The training freezes after the first 2-3 epochs if image augmentation layers are used (RandomFlip, RandomContrast, RandomBrightness) The system appears to use both GPU as well as CPU (as indicated by Activity Monitor). Also, warnings appear both in Jupyter and Terminal (see below). When the image augmentation layers are removed (i.e. we only rebuild the head and feed images from disk), CPU appears to be idle, no warnings appear, and training completes successfully. Versions: python 3.8, tensorflow-macos 2.11.0, tensorflow-metal 0.7.1 Sample code: img_augmentation = Sequential( [ layers.RandomFlip(), layers.RandomBrightness(factor=0.2), layers.RandomContrast(factor=0.2) ], name="img_augmentation", ) inputs = layers.Input(shape=(384, 384, 3)) x = img_augmentation(inputs) model = tf.keras.applications.EfficientNetV2S(include_top=False, input_tensor=x, weights='imagenet') model.trainable = False x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output) x = tf.keras.layers.BatchNormalization()(x) top_dropout_rate = 0.2 x = tf.keras.layers.Dropout(top_dropout_rate, name="top_dropout")(x) outputs = tf.keras.layers.Dense(179, activation="softmax", name="pred")(x) newModel = Model(inputs=model.input, outputs=outputs, name="EfficientNet_DF20M_species") reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.9, patience=2, verbose=1, min_lr=0.000001) optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, momentum=0.9) newModel.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy']) history = newModel.fit(x=train_ds, validation_data=val_ds, epochs=30, verbose=2, callbacks=[reduce_lr]) During training with image augmentation, Jupyter prints the following warnings while training the first epoch: WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op. ... During training with image augmentation, Terminal keeps spamming the following warning: 2023-02-21 23:13:38.958633: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.958920: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959071: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959115: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959359: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. ... Any suggestions?

Posted

by

Cardu6lis

tensorflow-macos 3.12 with python 3.11

Tensorflow 3.12 supports python 3.11. However, the tensorflow-macos package does not provide a build for python 3.11. I assume this is an oversight in the build pipeline. From https://pypi.org/project/tensorflow-macos/#files the current builds are: tensorflow_macos-2.12.0-cp310-cp310-macosx_12_0_x86_64.whl tensorflow_macos-2.12.0-cp310-cp310-macosx_12_0_arm64.whl tensorflow_macos-2.12.0-cp39-cp39-macosx_12_0_x86_64.whl tensorflow_macos-2.12.0-cp39-cp39-macosx_12_0_arm64.whl tensorflow_macos-2.12.0-cp38-cp38-macosx_12_0_x86_64.whl tensorflow_macos-2.12.0-cp38-cp38-macosx_12_0_arm64.whl I would like to request that: tensorflow_macos-2.12.0-cp311-cp311-macosx_12_0_x86_64.whl tensorflow_macos-2.12.0-cp311-cp311-macosx_12_0_arm64.whl are added.

tensorflow-metal

Posted

by

andreas_madsen

There are some installation problems such as 'official tensorflow-metal'.

System information OS Platform and Distribution: macOS 13.3.1 (MacBook Pro M2) Python Version(Default): Python 3.11.3 Official Installation Guide: https://developer.apple.com/metal/tensorflow-plugin/ I just followed the installation guide. (base) haesunglee@MacBookPro14-haesunglee ~ % python3 -c 'import tensorflow as tf; print(tf.__version__) RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf ImportError: numpy.core._multiarray_umath failed to import ImportError: numpy.core.umath failed to import RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf ImportError: numpy.core._multiarray_umath failed to import ImportError: numpy.core.umath failed to import RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf ImportError: numpy.core._multiarray_umath failed to import ImportError: numpy.core.umath failed to import Traceback (most recent call last): File "/Users/haesunglee/tensorflow_metal.py", line 1, in <module> import tensorflow as tf File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module> from tensorflow.python.tools import module_util as _module_util File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 42, in <module> from tensorflow.python import data File "/Users/haesunglee/miniconda/lib/python3.10/site- from tensorflow.python.data.experimental.ops.data_service_ops import distribute File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/ops/data_service_ops.py", line 22, in ... ... ... ... packages/tensorflow/python/framework/sparse_tensor.py", line 25, in <module> from tensorflow.python.framework import constant_op File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 25, in <module> from tensorflow.python.eager import execute File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 21, in <module> from tensorflow.python.framework import dtypes File "/Users/haesunglee/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/dtypes.py", line 37, in <module> _np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type() TypeError: Unable to convert function return value to a Python type! The signature was () -> handle (base) haesunglee@MacBookPro14-haesunglee ~ % (base) haesunglee@MacBookPro14-haesunglee ~ %

tensorflow-metal

Posted

by

StephenLee

MacBook Pro M2 Max: 32GB vs 64GB RAM for Machine Learning and Longevity

Hi everyone, I'm a Machine Learning Engineer, and I'm planning to buy the MacBook Pro M2 Max with a 38-core GPU variant. I'm uncertain about whether to choose the 32GB RAM or 64GB RAM option. Based on my research and use case, it seems that 32GB should be sufficient for most tasks, including the 4K video rendering I occasionally do. However, I'm concerned about the longevity of the device, as I'd like to keep the MacBook up-to-date for at least five years. Additionally, considering the 38-core GPU, I wonder if 32GB of unified memory might be insufficient, particularly when I need to train Machine Learning models or run docker or even kubernetes cluster. I don't have any budget constraints, as the additional $400 cost isn't an issue, but I want to make a wise decision. I would appreciate any advice on this matter. Thanks in advance!

Posted

by

Aditya-ai

AdamW optimiser crashes on tf-macos v2.12.0

Hi, Apologies if this is not the right place for bug reports in tf-macos, I am happy to move this issue elsewhere if so! The problem occurs in tensorflow-macos v2.12.0 when attempting to call model.compile() with the AdamW optimiser. A warning is thrown, telling us that there is a known slowdown when using v2.11+ optimizers, and the backend attempts to fall back to a legacy version. However, AdamW does not exist in legacy versions, which eventually leads to an "unknown optimizer" error. See below for a MWE. I am running on MacBook M1 Pro. Cheers, Ste ## ## Imports ## import sys import tensorflow as tf from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense from tensorflow.keras.optimizers import AdamW ## ## Report versions ## print(f"Python version is: {sys.version}") ## --> Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:01:19) [Clang 14.0.6 ] print(f"TF version is: {tf.__version__}") ## --> TF version is: 2.12.0 print(f"Keras version is: {tf.keras.__version__}") ## --> Keras version is: 2.12.0 ## ## Create a very simple model ## x_in = Input(1) x = Dense(10)(x_in) model = Model(x_in, x) ## ## Compile model with AdamW optimizer ## model.compile(optimizer=AdamW(learning_rate=1e-3, weight_decay=1e-2)) which outputs: It is worth noting that tf.keras.optimizers.legacy.AdamW does not exist, and so we cannot simply use this.

tensorflow-metal

Posted

by

smenary

Custom shuffle layer leaks memory when run on Apple M1 GPU with tensorflow-metal

See related issue on tensorflow github: https://github.com/tensorflow/tensorflow/issues/60616

tensorflow-metal

Posted

by

sirno

tensorflow-metal model runs EXTREMELY slow

I have the same problem in my LSTM model on APPLE M2. I follow the progress of https://developer.apple.com/metal/tensorflow-plugin/. to set up my environment, and the model run extremely slow. How to fix it? Also, i got this respond while i running the model: Failed to get CPU frequency: 0 Hz

tensorflow-metal

Posted

by

Daphne_0915

Failed to get CPU frequency: 0 Hz

I am performing a grid search over a parameter grid and train the model with different combinations of hyperparameters. I am receiving the following Warning: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Why is that and what can I do to fix it? Thank you very much. Here is the code: def grid_search(model_name): ... elif model_name == 'LSTM': def build_model(units, activation, dropout, layers): model = Sequential() model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, return_sequences=True, input_shape=(2, 1152), recurrent_dropout=0)) model.add(Dropout(dropout)) for i in range(layers): if i != layers-1: model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, return_sequences=True,recurrent_dropout=0)) model.add(Dropout(dropout)) elif i == (layers-1): model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, recurrent_dropout=0)) model.add(Dropout(dropout)) model.add(Dense(units=6, kernel_initializer="normal", activation=activation)) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]) return model param_grid = {'units': [200, 300, 400], 'activation': ['tanh'], 'dropout': [0, 0.2, 0.4, 0.6], 'layers': [0, 5]} group_kfold = GroupKFold(n_splits=len(np.unique(groups_train))) model = KerasClassifier(model=build_model, units=param_grid['units'], activation=param_grid['activation'], dropout=param_grid['dropout'], layers=param_grid['layers']) grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=group_kfold) X_test, X_train, y_test, y_train = raw_dataassigner(model_name) (X_train, y_train) = shuffle(X_train, y_train) with tf.device('/cpu:0'): grid_result = grid_search.fit(X_train, y_train, groups=groups_train) print(f'Best score ({grid_search.best_score_}) for {model_name} model achieved with parameters: ', grid_search.best_params_) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print("%f (%f) with: %r" % (mean, stdev, param)) grid_search('LSTM')

Posted

by

hefl99

AdamW crashes on tf-macos (tf-nightly-macos 2.14.0-dev20230523)

I initially raised this issue in the tensorflow forum, and they directed me back here since this is a tf-macos specific problem [see https://github.com/tensorflow/tensorflow/issues/60673]. When calling Model.compile() with the AdamW optimizer, a warning is thrown saying that v2.11+ optimizers have a known slowdown on M1/M2 devices, and so the backend attempts to fallback to a legacy version. However, no legacy version of the AdamW optimizer exists. In a previous tf-macos version 2.12, this lead to an error during Model.compile() [see issue https://github.com//issues/60652 and https://developer.apple.com/forums/thread/729732]. In the current nightly, this error is not thrown - however, after calling model.compile(), the attribute model.optimizer is set to string 'adamw' instead of an optimizer object. Later, when we call model.fit(), this leads to an AttributeError, because model.optimizer.minimize() does not exist when model.optimizer is a string. Expected behaviour: correctly compile the model with either a v2.11+ optimiser without slowdown, or a legacy-compatible implementation of the AdamW optimizer. Then the model will train correctly with a valid AdamW optimizer when calling model.fit(). Note: a warning message suggests using the optimizer located at tf.keras.optimizers.legacy.AdamW, but this does not exist It would be nice to be able to either use modern optimizers, or have a legacy-compatible version of AdamW, since weight-decay is an important tool in modern ML research, and currently cannot be used on mac. Standalone code to reproduce the issue ##===========## ## Imports ## ##===========## import sys import tensorflow as tf import numpy as np from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense from tensorflow.keras.optimizers import AdamW ##===================## ## Report versions ## ##===================## # # Expected outputs: # Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:01:19) [Clang 14.0.6 ] # TF version is: 2.14.0-dev20230523 # Numpy version is: 1.23.2 # print(f"Python version is: {sys.version}") print(f"TF version is: {tf.__version__}") print(f"Numpy version is: {np.__version__}") ##==============================## ## Create a very simple model ## ##==============================## # # Expected outputs: # Model: "model_1" # _________________________________________________________________ # Layer (type) Output Shape Param # # ================================================================= # Layer_in (InputLayer) [(None, 2)] 0 # # Layer_hidden (Dense) (None, 10) 30 # # Layer_out (Dense) (None, 2) 22 # # ================================================================= # Total params: 52 (208.00 Byte) # Trainable params: 52 (208.00 Byte) # Non-trainable params: 0 (0.00 Byte) # _________________________________________________________________ # x_in = Input(2 , dtype=tf.float32, name="Layer_in" ) x = x_in x = Dense(10, dtype=tf.float32, name="Layer_hidden", activation="relu" )(x) x = Dense(2 , dtype=tf.float32, name="Layer_out" , activation="linear")(x) model = Model(x_in, x) model.summary() ##===================================================## ## Compile model with MSE loss and AdamW optimizer ## ##===================================================## # # Expected outputs: # WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.AdamW` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.AdamW`. # WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.AdamW`. # model.compile( loss = "mse", optimizer = AdamW(learning_rate=1e-3, weight_decay=1e-2) ) ##===========================## ## Generate some fake data ## ##===========================## # # Expected outputs: # X shape is (100, 2), Y shape is (100, 2) # dataset_size = 100 X = np.random.normal(size=(dataset_size, 2)) X = tf.constant(X, dtype=tf.float32) Y = np.random.normal(size=(dataset_size, 2)) Y = tf.constant(Y, dtype=tf.float32) print(f"X shape is {X.shape}, Y shape is {Y.shape}") ##===================================## ## Fit model to data for one epoch ## ##===================================## # # Expected outputs: # --------------------------------------------------------------------------- # AttributeError Traceback (most recent call last) # Cell In[9], line 51 # 1 ##===================================## # 2 ## Fit model to data for one epoch ## # 3 ##===================================## # (...) # 48 # • mask=None # 49 # # ---> 51 model.fit(X, Y, epochs=1) # File ~/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs) # 67 filtered_tb = _process_traceback_frames(e.__traceback__) # 68 # To get the full stack trace, call: # 69 # `tf.debugging.disable_traceback_filtering()` # ---> 70 raise e.with_traceback(filtered_tb) from None # 71 finally: # 72 del filtered_tb # File /var/folders/6_/gprzxt797d5098h8dtk22nch0000gn/T/__autograph_generated_filezzqv9k36.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator) # 13 try: # 14 do_return = True # ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) # 16 except: # 17 do_return = False # AttributeError: in user code: # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1338, in train_function * # return step_function(self, iterator) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1322, in step_function ** # outputs = model.distribute_strategy.run(run_step, args=(data,)) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1303, in run_step ** # outputs = model.train_step(data) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1084, in train_step # self.optimizer.minimize(loss, self.trainable_variables, tape=tape) # AttributeError: 'str' object has no attribute 'minimize' model.fit(X, Y, epochs=1)

Posted

by

smenary

Jax-metal - whisper-jax

Testing out https://developer.apple.com/metal/jax/ mainly for trying to run whisper-jax (https://github.com/sanchit-gandhi/whisper-jax/tree/main) on my M2. The jax-metal plugin seems to install without issues, and the basic test code runs fine. However, the jax-whisper code fails when trying to encode a file with the following error: error: failed to legalize operation 'mhlo.convolution' /Users/pere/jax-metal/lib/python3.10/site-packages/whisper_jax/layers.py:1236:0: note: see current operation: %111 = "mhlo.convolution"(%110, <<UNKNOWN SSA VALUE>>) {batch_group_count = 1 : i64, dimension_numbers = #mhlo.conv<[b, 0, f]x[0, i, o]->[b, 0, f]>, feature_group_count = 1 : i64, lhs_dilation = dense<1> : tensor<1xi64>, padding = dense<1> : tensor<1x2xi64>, precision_config = [#mhlo<precision DEFAULT>, #mhlo<precision DEFAULT>], rhs_dilation = dense<1> : tensor<1xi64>, window_reversal = dense<false> : tensor<1xi1>, window_strides = dense<1> : tensor<1xi64>} : (tensor<1x3000x80xf32>, tensor<3x80x384xf32>) -> tensor<1x3000x384xf32>

tensorflow-metal

Posted

by

PerEK

Jax-metal not recognising GPU

Intel GPU here. I just installed Jax-metal, and it runs fine. However, when I try the following code, it returns the RuntimeError you see below: jax.device_put(jnp.ones(1), device=jax.devices('gpu')[0]) RuntimeError: Unknown backend: 'gpu' requested, but no platforms that are instances of gpu are present. Platforms present are: interpreter,cpu

tensorflow-metal

Posted

by

dkga

Couldn't install tensor flow-text

To satisfy the dependency of keras_nlp for LLM work, I tried to install tensor flow-text but it doesn't install: python -m pip install tensorflow-text ERROR: Could not find a version that satisfies the requirement tensorflow-text (from versions: none) ERROR: No matching distribution found for tensorflow-text

tensorflow-metal

Posted

by

ramkoppu

Jax-metal on M2 Pro does not recognize GPU

Build and installed Jax and Jax-metal following instructions on a M2Pro Mac-mini from here - https://developer.apple.com/metal/jax/ However, the following check seems to suggest XLA using CPU and not GPU. >>> from jax.lib import xla_bridge >>> print(xla_bridge.get_backend().platform) cpu Has anyone got it working to dump GPU? Thanks in advance!

Posted

by

shibd

Metal JAX device appears as `cpu`

Hello, I'm interested in trying the new JAX Metal plug-in and followed the steps in https://developer.apple.com/metal/jax/. Upon installation, I don't see any difference between the backend device detected by JAX and a pure CPU setup: >>> import jax >>> jax.devices() [CpuDevice(id=0)] >>> jax.devices()[0].platform 'cpu' >>> jax.devices()[0].device_kind 'cpu' >>> jax.devices()[0].client.platform 'cpu' >>> jax.devices()[0].client.runtime_type 'tfrt' Is this really using a Metal backend? How can I determine for sure? Thank you!

Posted

by

pcuenca

jax-metal: jax.numpy.tril() and jax.numpy.triu() do nothing

jax.numpy.tril() and jax.numpy.triu() don't do anything with the metal backend, they incorrectly return the input array. How to reproduce: jnp.triu(jnp.ones((4, 4))) returns an array of all ones. Same for jnp.tril().

tensorflow-metal

Posted

by

crowsonkb

Memory Leak Using TensorFlow-Metal

Hi, I've found a memory leak issue when using the tensorFlow-metal plugin for running a deep learning model on a Mac with the M1 chip. Here are the details of my system: System Information MacOS version: 13.4 TensorFlow (macos) version: 2.12.0, 2.13.0-rc1, tf-nightly==2.14.0.dev20230616 TensorFlow-Metal Plugin Version: 0.8, 1.0.0, 1.0.1 Model Details I've implemented a custom model architecture using TensorFlow's Keras API. The model has a dynamic Input, which I resize the images in a Resizing layer. Moreover, the data is passed to the model through a data generator class, using model.fit(). Problem Description When I train this model using the GPU on M1 Mac, I observe a continuous increase in memory usage, leading to a memory leak. This memory increase is more prominent with larger image inputs. For smaller images or average sizes (1024x128), the increase is smaller, but continuous, leading to a memory leak after several epochs. On the other hand, when I switch to using the CPU for training (tf.config.set_visible_devices([], 'GPU')), the memory leak issue is resolved, and I observe normal memory usage. In addition, I've tested the model with different sizes of images and various layer configurations. The memory leak appears to be present only when using the GPU, indeed. I hope this information is helpful in identifying and resolving the issue. If you need any further details, please let me know. The project code is private, but I can try to provide it with pseudocode if necessary.

tensorflow-metal

Posted

by

arthurflor23

recommendedMaxWorkingSetSize - is there a way to use all of our unified memory for GPU / Metal ?

It seems like Apple Silicon (even M1/M2 MAX) devices can only use a certain percentage of their total unified memory for GPU/Metal. This seems to be a limitation related to: recommendedMaxWorkingSetSize Which is quite odd because even M1 Mac Mini's or Macbook Airs run totally fine with 8GB of total memory for both the OS and GPU so why limit this in the first place? Also seems like false advertising to me from Apple by not clearly stating this limitation. I am asking this in regards to the following open source project (but of course more software will be impacted by the same limitation): https://github.com/ggerganov/llama.cpp/pull/1826 another resource I've found: https://developer.apple.com/videos/play/tech-talks/10580/?time=546 If anyone has any ideas on how these limitations can be overcome and how to get apps to use more Memory for GPU (Meta)l I (and the open source community) would be truly grateful! thanks in advance!

Posted

by

Jake88

Posts under tensorflow-metal tag