tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

tensorflow-metal Documentation

Posts under tensorflow-metal tag

127 Posts
Sort by:
Post not yet marked as solved
14 Replies
6.3k Views
System information Script can be found below MacBook Pro M1 (Mac OS Big Sir (11.5.1)) TensorFlow installed from (source) TensorFlow version (2.5 version) with Metal Support Python version: 3.9 GPU model and memory: MacBook Pro M1 and 16 GB Steps needed for installing Tensorflow with metal support. https://developer.apple.com/metal/tensorflow-plugin/ I am trying to train a model on Macbook Pro M1, but the performance is so bad and the train doesn't work properly. It takes a ridiculously long time just for a single epoch. Code needed for reproducing this behavior. import tensorflow as tf from tensorflow.keras.datasets import imdb from tensorflow.keras.layers import Embedding, Dense, LSTM from tensorflow.keras.losses import BinaryCrossentropy from tensorflow.keras.models import Sequential from tensorflow.keras.optimizers import Adam from tensorflow.keras.preprocessing.sequence import pad_sequences # Model configuration additional_metrics = ['accuracy'] batch_size = 128 embedding_output_dims = 15 loss_function = BinaryCrossentropy() max_sequence_length = 300 num_distinct_words = 5000 number_of_epochs = 5 optimizer = Adam() validation_split = 0.20 verbosity_mode = 1 # Disable eager execution tf.compat.v1.disable_eager_execution() # Load dataset (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_distinct_words) print(x_train.shape) print(x_test.shape) # Pad all sequences padded_inputs = pad_sequences(x_train, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD> padded_inputs_test = pad_sequences(x_test, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD> # Define the Keras model model = Sequential() model.add(Embedding(num_distinct_words, embedding_output_dims, input_length=max_sequence_length)) model.add(LSTM(10)) model.add(Dense(1, activation='sigmoid')) # Compile the model model.compile(optimizer=optimizer, loss=loss_function, metrics=additional_metrics) # Give a summary model.summary() # Train the model history = model.fit(padded_inputs, y_train, batch_size=batch_size, epochs=number_of_epochs, verbose=verbosity_mode, validation_split=validation_split) # Test the model after training test_results = model.evaluate(padded_inputs_test, y_test, verbose=False) print(f'Test results - Loss: {test_results[0]} - Accuracy: {100*test_results[1]}%') I have noticed this same problem with LSTM layers Also, this issue is been reported in Keras and they can't debug. Keras issue https://github.com/keras-team/keras/issues/15003
Posted
by
Post marked as solved
20 Replies
35k Views
Following the instructions at https://developer.apple.com/metal/tensorflow-plugin/ I got as far as python -m pip install tensorflow-macos and it responded "ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none) ERROR: No matching distribution found for tensorflow-macos" I'd be grateful for any suggestions
Posted
by
Post not yet marked as solved
11 Replies
6.1k Views
Hello, I cannot predict with my model on Apple M1. I get a error: Traceback (most recent call last):   File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/__main__.py", line 154, in <module>     agent.run()   File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/training.py", line 213, in run     losses = self._train(sample)   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__     result = self._call(*args, **kwds)   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call     return self._stateless_fn(*args, **kwds)   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3023, in __call__     return graph_function._call_flat(   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat     return self._build_call_outputs(self._inference_function.call(   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call     outputs = execute.execute(   File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].  Colocation Debug Info: Colocation group had the following types and supported devices:  Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceApplyAdamWithAmsgrad: CPU  ReadVariableOp: GPU CPU  _Arg: GPU CPU  Colocation members, user-requested devices, and framework assigned devices, if any:   readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   adam_2_adam_update_6_resourceapplyadamwithamsgrad_m (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   adam_2_adam_update_6_resourceapplyadamwithamsgrad_v (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   adam_2_adam_update_6_resourceapplyadamwithamsgrad_vhat (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   ReadVariableOp (ReadVariableOp)    Exp/ReadVariableOp (ReadVariableOp)    ReadVariableOp_1 (ReadVariableOp)    actor/ReadVariableOp (ReadVariableOp)    actor/Exp/ReadVariableOp (ReadVariableOp)    actor/ReadVariableOp_1 (ReadVariableOp)    actor_critic/actor/ReadVariableOp (ReadVariableOp)    actor_critic/actor/Exp/ReadVariableOp (ReadVariableOp)    actor_critic/actor/ReadVariableOp_1 (ReadVariableOp)    Adam_2/Adam/update_6/ResourceApplyAdamWithAmsgrad (ResourceApplyAdamWithAmsgrad) /job:localhost/replica:0/task:0/device:GPU:0 [[{{node ReadVariableOp}}]] [Op:__inference__train_4206]
Posted
by
Post not yet marked as solved
4 Replies
4.7k Views
I have followed all instructions in this tutorial to install Tensorflow-MacOS and Tensorflow Metal for my Apple 16GB M1 Mac mini (2020) running Big Sur 11.5.1. All installations completed without error, but I am still getting the following traceback when trying to use Tensorflow: Traceback (most recent call last):   File "train_model.py", line 3, in <module>     import tensorflow as tf   File "/Users/thesis/miniforge3/envs/thesis_env/lib/python3.8/site-packages/tensorflow/__init__.py", line 440, in <module>     _ll.load_library(_plugin_dir)   File "/Users/thesis/miniforge3/envs/thesis_env/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 153, in load_library     py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/thesis/miniforge3/envs/thesis_env/lib/python3.8/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 6): Symbol not found: _TF_AllocateOutput   Referenced from: /Users/thesis/miniforge3/envs/thesis_env/lib/python3.8/site-packages/tensorflow-plugins/libmetal_plugin.dylib   Expected in: flat namespace I have seen similar issues reported related to the "Symbol not found: " portion of the traceback, but the answer given there (basically, to double-check that the versions of TensorflowMacOS and the Tensorflow Metal plugin match) does not apply to me - both are current and were installed today. If I run the terminal commands for installation again, I get messages informing me that all requirements are already satisfied. A fix would be appreciated - I need to begin training a DL model for my graduate thesis project, and I purchased the computer in part for this particular project.
Posted
by
Post not yet marked as solved
41 Replies
29k Views
Device: MacBook Pro 16 M1 Max, 64GB running MacOS 12.0.1. I tried setting up GPU Accelerated TensorFlow on my Mac using the following steps: Setup: XCode CLI / Homebrew/ Miniforge Conda Env: Python 3.9.5 conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal brew install libjpeg conda install -y matplotlib jupyterlab In Jupyter Lab, I try to execute this code: from tensorflow.keras import layers from tensorflow.keras import models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.summary() The code executes, but I get this warning, indicating no GPU Acceleration can be used as it defaults to a 0MB GPU. Error: Metal device set to: Apple M1 Max 2021-10-27 08:23:32.872480: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-10-27 08:23:32.872707: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Anyone has any idea how to fix this? I came across a bunch of posts around here related to the same issue but with no solid fix. I created a new question as I found the other questions less descriptive of the issue, and wanted to comprehensively depict it. Any fix would be of much help.
Posted
by
Post not yet marked as solved
6 Replies
6.2k Views
Hi, I installed skearn successfully and ran the MINIST toy example successfully. then I started to run my project. The finning thing everything seems good at the start point (at least no ImportError occurs). but when I made some changes of my code and try to run all cells (I use jupyter lab) again, ImportError occurs..... ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib   Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so   Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file) then I have to uninstall scipy, sklearn, etc and reinstall all of them. and my code can be ran again..... Magically I hate to say, anyone knows how to permanently solve this problem? make skearn more stable?
Posted
by
Post not yet marked as solved
9 Replies
9.7k Views
Hi everyone, I found that the performance of GPU is not good as I expected (as slow as a turtle), I wanna switch from GPU to CPU. but mlcompute module cannot be found, so wired. The same code ran on colab and my computer (jupyter lab) take 156s vs 40 minutes per epoch, respectively. I only used a small dataset (a few thousands of data points), and each epoch only have 20 baches. I am so disappointing and it seems like the "powerful" GPU is a joke. I am using 12.0.1 macOS and the version of tensorflow-macos is 2.6.0 Can anyone tell me why this happens?
Posted
by
Post not yet marked as solved
6 Replies
3.0k Views
Hi, I have started experimenting with using my MBP with M1 Pro (10CPU cores / 16 GPU cores) for Tensorflow. Two things were odd/noteworthy: I've compared training models in a tensorflow environment with tensorflow-metal, running the code with either with tf.device('gpu:0'): or with tf.device('cpu:0'): as well as in an environment without the tensorflow-metal plugin. Specifiying the device as CPU in tf-metal almost always leads to a lot longer training times compared to specifying using the GPU, but also compared to running the standard (non-metal environment). Also, the GPU was running at quite high power despite of telling TF to use the CPU. Is this an intended or expected behaviour? As it will be preferable to use the non-metal environment when not benefitting from a GPU. Secondly, at small batch sizes, the GPU power in system stats increases with the batch size, as expected. However, when chaning the batch size from 9 to 10 (this appears like a hard step specifically at this number), GPU power drops by about half, and training time doubles. Increasing batch size from about 10 leads again to a gradual increase in GPU power, on my model the same GPU power as batchsize=9 is reached only at about batchsize=50. Making GPU acceleration using batch-sizes from 10 to about 50 rather useless. I've noticed this behavior on several models, which makes me wonder that this is a general tf-metal behaviour. As a result, I've only been able to benefit from GPU acceleration at a batchsize of 9 and above 100. Once again, is this intended or to be expected?
Posted
by
Post not yet marked as solved
16 Replies
17k Views
Hi all, I'm using my new M1 Pro with the latest Mac OS 12.1 and I'm experiencing issues with installing tensorflow. I've created an environment and have it activated. I tried conda install -c apple tensorflow-deps but returned with Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. PackagesNotFoundError: The following packages are not available from current channels: tensorflow-deps Current channels: https://conda.anaconda.org/apple/osx-64 https://conda.anaconda.org/apple/noarch https://repo.anaconda.com/pkgs/main/osx-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/osx-64 https://repo.anaconda.com/pkgs/r/noarch To search for alternate channels that may provide the conda package you're looking for, navigate to https://anaconda.org and use the search bar at the top of the page. Note: you may need to restart the kernel to use updated packages. Did anyone have the same issue and any advice to address this? Thanks, Andrew
Posted
by
Post marked as solved
6 Replies
1.6k Views
I'm running example from TF site and getting different results from CPU and GPU. Results from GPU are obviously wrong (second image). Why? If I'm executing code with with tf.device('/cpu:0') then the code works as expected, but slower. It's sufficient to execute this lines on CPU to fix the issue: with tf.device('/cpu:0'): real_output = discriminator(images, training=True) fake_output = discriminator(generated_images, training=True) Source code: https://www.tensorflow.org/tutorials/generative/dcgan My complete results: https://disk.yandex.ru/d/E-hU5dpffOmkLg
Posted
by
Post not yet marked as solved
9 Replies
3.7k Views
I am comparing my M1 MBA with my 2019 16" Intel MBP. The M1 MBA has tensorflow-metal, while the Intel MBP has TF directly from Google. Generally, the same programs runs 2-5 times FASTER on the Intel MBP, which presumably has no GPU acceleration. Is there anything I could have done wrong on the M1? Here is the start of the metal run: Metal device set to: Apple M1 systemMemory: 16.00 GB maxCacheSize: 5.33 GB 2022-01-19 04:43:50.975025: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-01-19 04:43:50.975291: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) 2022-01-19 04:43:51.216306: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Epoch 1/10 2022-01-19 04:43:51.298428: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
Posted
by
Post not yet marked as solved
3 Replies
5k Views
Im using my 2020 Mac mini with M1 chip and this is the first time try to use it on convolutional neural network training. So the problem is I install the python(ver 3.8.12) using miniforge3 and Tensorflow following this instruction. But still facing the GPU problem when training a 3D Unet. Here's part of my code and hoping to receive some suggestion to fix this. import tensorflow as tf from tensorflow import keras import json import numpy as np import pandas as pd import nibabel as nib import matplotlib.pyplot as plt from tensorflow.keras import backend as K #check available devices def get_available_devices(): local_device_protos = device_lib.list_local_devices() return [x.name for x in local_device_protos] print(get_available_devices()) Metal device set to: Apple M1 ['/device:CPU:0', '/device:GPU:0'] 2022-02-09 11:52:55.468198: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-02-09 11:52:55.468885: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) X_norm_with_batch_dimension = np.expand_dims(X_norm, axis=0) #tf.device('/device:GPU:0') #Have tried this line doesn't work #tf.debugging.set_log_device_placement(True) #Have tried this line doesn't work patch_pred = model.predict(X_norm_with_batch_dimension) InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] [[model/conv3d/Conv3D/_4]] (1) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] 0 successful operations. 0 derived errors ignored. The code is executable on Google Colab but can't run on Mac mini locally with Jupyter notebook. The NHWC tensor format problem might indicate that Im using my CPU to execute the code instead of GPU. Is there anyway to optimise GPU to train the network in Tensorflow?
Posted
by
Post not yet marked as solved
5 Replies
2.4k Views
I am aware this question has been asked before, but resolutions have worked for me. When I try to import TensorFlow on my python 3.9 environment I get the following error: uwewinter@Uwes-MBP % python3 Python 3.9.10 (main, Jan 15 2022, 11:40:53)  [Clang 13.0.0 (clang-1300.0.29.3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow 2022-02-09 21:30:01.701794: F tensorflow/c/experimental/stream_executor/stream_executor.cc:808] Non-OK-status: stream_executor::MultiPlatformManager::RegisterPlatform( std::move(cplatform)) status: INTERNAL: platform is already registered with name: "METAL" zsh: abort      python3 I have the newest versions of TensorFlow-macos and TensorFlow-metal installed: uwewinter@Uwes-MBP % pip3 list | grep tensorflow         tensorflow-estimator           2.7.0 tensorflow-macos               2.7.0 tensorflow-metal               0.3.0 OSX is latest: uwewinter@Uwes-MBP % sw_vers  ProductName: macOS ProductVersion: 12.2 BuildVersion: 21D49 Mac is a 2021 MBP uwewinter@Uwes-MBP % sysctl hw.model hw.model: MacBookPro18,3
Posted
by
Post not yet marked as solved
11 Replies
8.9k Views
I m using macbook air 13" and pursuing Artificial Intelligence course I m facing huge problem with Jupyter notebook post installing tensorflow as the kernel keeps dying and I have literally tried solution in every article/resource on Google Nothing seems to be fixing the issue. It began only when I started to run code for Convolutional Neural Network Please help me fix this issue and understand why its not getting fixed At the moment, I can only think of trading Macbook for Windows Laptop but It will be very hard as I have not had hands-on Windows laptop Hope to hear back soon Thanks Keshav Lal Seth
Posted
by
Post not yet marked as solved
4 Replies
1.8k Views
Hi,I cannot predict with my model on Apple M1pro. I get a error: Traceback (most recent call last):   File "transformer_pe.py", line 605, in     bgru_main()   File "transformer_pe.py", line 400, in bgru_main     main(traindataSetPath, weightPath, batchSize, maxLen, vectorDim, layers, dropout)   File "transformer_pe.py", line 358, in main     model.fit(x_train, y_train, batch_size=batchSize, epochs=10)   File "/Users/icey/miniforge3/envs/py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1183, in fit     tmp_logs = self.train_function(iterator)   File "/Users/icey/miniforge3/envs/py38/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 889, in call     result = self._call(*args, **kwds)   File "/Users/icey/miniforge3/envs/py38/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call     return self._stateless_fn(*args, **kwds)   File "/Users/icey/miniforge3/envs/py38/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3023, in call     return graph_function._call_flat(   File "/Users/icey/miniforge3/envs/py38/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat     return self._build_call_outputs(self._inference_function.call(   File "/Users/icey/miniforge3/envs/py38/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 591, in call     outputs = execute.execute(   File "/Users/icey/miniforge3/envs/py38/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation model/bert_block/encoder_0/multiheadattention/query/einsum/Einsum/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node model/bert_block/encoder_0/multiheadattention/query/einsum/Einsum/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].  Colocation Debug Info: Colocation group had the following types and supported devices:  Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceApplyAdaMax: CPU  ReadVariableOp: GPU CPU  _Arg: GPU CPU  Colocation members, user-requested devices, and framework assigned devices, if any:   model_bert_block_encoder_0_multiheadattention_query_einsum_einsum_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   adamax_adamax_update_2_resourceapplyadamax_m (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   adamax_adamax_update_2_resourceapplyadamax_v (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0   model/bert_block/encoder_0/multiheadattention/query/einsum/Einsum/ReadVariableOp (ReadVariableOp)    Adamax/Adamax/update_2/ResourceApplyAdaMax (ResourceApplyAdaMax) /job:localhost/replica:0/task:0/device:GPU:0 [[{{node model/bert_block/encoder_0/multiheadattention/query/einsum/Einsum/ReadVariableOp}}]] [Op:__inference_train_function_3353] (py38) icey@IceydeMacBook-Pro 20220204code % python hello.py          Metal device set to: Apple M1 Pro systemMemory: 32.00 GB maxCacheSize: 10.67 GB
Posted
by
Post not yet marked as solved
10 Replies
7.2k Views
I know you guys already support tensorflow by two packages: tensorflow-macos and tensorflow-metal. However to work with NLP, users also need tensorflow-text package. Could you devlopers build a dedicated version tensorflow-text-macos to support this demands. Best Regards,
Posted
by
Post not yet marked as solved
8 Replies
2k Views
We run into an issue that a more complex model fails to converge on M1 Max GPU while it converges on its CPU and on Non-M1 based models. the performance is the same for CPU and GPU for models with single RNN but once we use two RNNs GPU fails to converge. That said, the below example is based on non-sensical data for the model architecture used. but we can observe here the same behavior as the one we observe in our production models (which for obvious reasons we cannot share here). Mainly: the loss goes down to the bottom of the e-06 precision in all cases but when we use two RNNs on GPU. during training we often test e-07 precision level for double RNN with GPU condition, the results do not go that low sometimes reaching also e-05 value level. for our production data we see that double RNN with GPU results in loss of 1.0 and basically stays the same from the first epoch; but for the other conditions it often reaches 0.2 level with clear learning curve. in production model increasing the LSTM_Cell number made the divergence more visible (in this syntactic date it does not happen) the more complex the model is (after the RNN layers) the more visible the issue. Suspected issues: different precision used in CPU and GPU training - we had to decrease the data values a lot to make the effect visible ( if you work with raw data all approaches seem to produce the comparable results) somehow the vanishing gradient problem is more pronounced on GPU as indicated by worse performance as the complexity of the model increases. please let me know if you need any further details Software Stack: Mac OS 12.1 tf 2.7 metal 0.3 also tested on tf. 2.8 Sample Syntax: TEST CONDITIONS: #conditions with issue: 1,2 gpu = 1 # 0 CPU, 1 GPU model_size = 2 # 1 single RNN, 2 double RNN #PARAMETERS LSTM_Cells = 64 epochs = 300 batch = 128 import numpy as np import pandas as pd import sys from sklearn import preprocessing #""" if 'tensorflow' in sys.modules: print("tensorflow uploaded") del sys.modules["tensorflow"] #del tf import tensorflow as tf else: print("tensorflow not uploaded") import tensorflow as tf if gpu == 1: pass else: tf.config.set_visible_devices([], 'GPU') #print("GPUs:", tf.config.list_physical_devices('GPU')) print("GPUs:", tf.config.list_logical_devices('GPU')) #print("CPUs:", tf.config.list_physical_devices('CPU')) print("CPUs:", tf.config.list_logical_devices('CPU')) #""" from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data' column_names = ['MPG', 'Displacement', 'Horsepower', 'Weight'] dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t', sep=' ', skipinitialspace=True).dropna() scaler = preprocessing.StandardScaler().fit(dataset) X_scaled = scaler.transform(dataset) X_scaled = X_scaled * 0.001 Large Values #x_train = np.array(dataset[['Horsepower', 'Weight']]).reshape(-1,2,2) #y_train = np.array(dataset[['MPG','Displacement']]).reshape(-1,2,2) Small Values x_train = np.array(X_scaled[:,2:]).reshape(-1,2,2) y_train = np.array(X_scaled[:,:2]).reshape(-1,2,2) #print(dataset) print(x_train.shape) print(y_train.shape) print(weight.shape) train_data = tf.data.Dataset.from_tensor_slices((x_train[:,:,:8], y_train)).cache().shuffle(x_train.shape[0]).batch(batch).repeat().prefetch(tf.data.experimental.AUTOTUNE) if model_size == 2: #""" # MINIMAL NOT WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) encoder_l2 = tf.keras.layers.LSTM(LSTM_Cells, return_state=True) encoder_l2_outputs = encoder_l2(encoder_l1_outputs[0]) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l2_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" else: #""" # WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l1_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" print(model.summary()) loss_tf = tf.keras.losses.MeanSquaredError() model.compile(optimizer='adam', loss=loss_tf, run_eagerly=True) model.fit(train_data, epochs = epochs, steps_per_epoch = 3)
Posted
by
Post not yet marked as solved
18 Replies
4k Views
Not only Upgrading tensorflow-macos and tensorflow-metal breaks Conv2d with groups arg , it also makes training unable to finish. Today, after upgrading the tensorflow-macos to 2.9.0 and tensorflow-metal to 0.5.0, my notebook can no longer make progress after training around 16 minutes. I tested 4 times. It could happily run around 17 to 18 epochs, each epoch around 55 seconds. After that, it just stopped making progress. I checked the activity monitor, both cpu and gpu usage were 0 at that point. I accidentally found that there are a lot of kernel faults in the Console app. The last one before I force-killed the process: IOReturn IOGPUDevice::new_resource(IOGPUNewResourceArgs *, struct IOGPUNewResourceReturnData *, IOByteCount, uint32_t *): PID 68905 likely leaking IOGPUResource (count=200000) The PID 68905 is in fact the training process. I have always observed this kind of issue for several months. But it's not as frequent and I can restart my notebook train successfully. No luck today. Hope Apple engineers can found the cause and fix it.
Posted
by
Post not yet marked as solved
3 Replies
1.6k Views
Hello Everyone! I recently tried installing TensorFlow following this guide: https://developer.apple.com/metal/tensorflow-plugin/ on my M1 Pro MacBook running 12.4 Monterey. However, I'm faced with the following error message when importing: File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tensorflow/python/framework/dtypes.py:29, in <module> 26 from tensorflow.python.lib.core import _pywrap_bfloat16 27 from tensorflow.python.util.tf_export import tf_export ---> 29 _np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type() 32 @tf_export("dtypes.DType", "DType") 33 class DType(_dtypes.DType): 34 """Represents the type of the elements in a `Tensor`. 35 36 `DType`'s are used to specify the output data type for operations which (...) 46 See `tf.dtypes` for a complete list of `DType`'s defined. 47 """ TypeError: Unable to convert function return value to a Python type! The signature was () -> handle I've checked that my tensoflow-dep has a version of 2.9.0, tensorflow-macos 2.9.2, and tensor flow-metal 0.5.0, with numpy having its latest version of 1.22.4, all in my env. Anyone knows what's up?
Posted
by