tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

tensorflow-metal Documentation

Posts under tensorflow-metal tag

128 Posts
Sort by:
Post not yet marked as solved
1 Replies
365 Views
Hi, there seems to be a difference in behavior when running inference on a trained Keras model using the model __call__ method vs. using the predict or predict_on_batch methods. This only happens when using the GPU for inference and it seems that for certain sequence of operations and float types the 'relu' activation doesn't work as expected and seems to do nothing. I can replicate the problem with the following code (it would only fail with 'relu' activation and tf.float16 and tf.float32 types, while it works fine with tf.float64). import tensorflow as tf import numpy as np DATA_LENGTH = 16 DENSE_WIDTH = 16 BATCH_SIZE = 8 DTYPE = tf.float32 ACTIVATION = 'relu' def TestModel(): inputs = tf.keras.Input(DATA_LENGTH, dtype=DTYPE) u = tf.keras.layers.Dense(DENSE_WIDTH, activation=ACTIVATION, dtype=DTYPE)(inputs) # u = tf.maximum(u, 0.0) output = u*tf.constant(1.0, dtype=DTYPE) model = tf.keras.Model(inputs, output, name="TestModel") return model model = TestModel() model.compile() x = np.random.uniform(size=(BATCH_SIZE, DATA_LENGTH)).astype(DTYPE.as_numpy_dtype) with tf.device('/GPU:0'): out_gpu_call = model(x, training=False) out_gpu_predict = model.predict_on_batch(x) with tf.device('/CPU:0'): out_cpu_call = model(x, training=False) out_cpu_predict= model.predict_on_batch(x) print(f'\nDTYPE {DTYPE}, ACTIVATION: {ACTIVATION}') print("\tMean Abs. Difference GPU (__call__ vs. predict):", np.mean(np.abs(out_gpu_call - out_gpu_predict))) print("\tMean Abs. Difference CPU (__call__ vs. predict):", np.mean(np.abs(out_cpu_call - out_cpu_predict))) print("\tMean Abs. Difference GPU-CPU __call__:", np.mean(np.abs(out_gpu_call - out_cpu_call))) print("\tMean Abs. Difference GPU-CPU predict():", np.mean(np.abs(out_gpu_predict - out_cpu_predict))) The code above produces for example the following output: DTYPE <dtype: 'float32'>, ACTIVATION: relu Mean Abs. Difference GPU (__call__ vs. predict): 0.1955472 Mean Abs. Difference CPU (__call__ vs. predict): 0.0 Mean Abs. Difference GPU-CPU __call__: 1.3573299e-08 Mean Abs. Difference GPU-CPU predict(): 0.1955472 And the results for the GPU are: out_gpu_call <tf.Tensor: shape=(8, 16), dtype=float32, numpy= array([[0.1496982 , 0. , 0. , 0.73772687, 0.26131183, 0.27757105, 0. , 0. , 0. , 0. , 0. , 0.4164225 , 1.0367445 , 0. , 0.5860609 , 0. ], ... out_gpu_predict array([[ 1.49698198e-01, -3.48425686e-01, -2.44667321e-01, 7.37726867e-01, 2.61311829e-01, 2.77571052e-01, -2.26729304e-01, -1.06500387e-01, -3.66294265e-01, -2.93850392e-01, -4.51043218e-01, 4.16422486e-01, 1.03674448e+00, -1.39347658e-01, 5.86060882e-01, -2.05334812e-01], ... Upon inspection of the results it seems that the problem is that the 'relu' activation is not setting the values < 0 to 0 when calling predict_on_batch. When uncommenting the # u = tf.maximum(u, 0.0) line after the Dense layer there is no difference between the two calls (as should be expected). It also happens that removing the multiplication by a constant after the Dense layer, output = u*tf.constant(1.0, dtype=DTYPE) makes the problem dissappear (even when leaving the # u = tf.maximum(u, 0.0) line commented). This is running with the following setup: MacBook Pro, Apple M2 Max chip, macOS Sonoma 14.2 tf version 2.15.0 tensorflow-metal 1.1.0 Python 3.10.13
Posted
by vvaldes.
Last updated
.
Post not yet marked as solved
0 Replies
454 Views
Hello I use Mac Pro M2 16GB This is my code. It is very basic code. `model = Sequential() model.add(LSTM(units=50, input_shape=(X_train.shape[1], X_train.shape[2]))) model.add(Dense(units=1)) model.compile(optimizer='adam', loss='mse') model.fit(X_train, y_train, epochs=50, batch_size=16) train_predict = model.predict(X_train) test_predict = model.predict(X_test) train_predict = scaler.inverse_transform(train_predict) y_train = scaler.inverse_transform(y_train) test_predict = scaler.inverse_transform(test_predict) y_test = scaler.inverse_transform(y_test) When I try to execute this code, anaconda gives the following error I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2 I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. : I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) I can't find any solution, could you help me Thank you
Posted Last updated
.
Post not yet marked as solved
0 Replies
556 Views
Hello, I got a brand new MacBook M3 Pro and trying to configure Tensorflow w/ GPU support. I followed instructions provided at https://developer.apple.com/metal/tensorflow-plugin/ step by step. Unfortunately, even after creating/recreating/installing/uninstalling TensorFlow the problem is not getting resolved as Python crashes. I cannot get past that point to try Jupyter notebook. Here is the error ask the versions in "tf" environment. I already spent entire Saturday yesterday and so far no progress. Can someone tell me what is going on? Python 3.11.7 (main, Dec 4 2023, 18:10:11) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin Type "help", "copyright", "credits" or "license" for more information. import tensorflow as tf 2024-01-07 11:44:04.893581: F tensorflow/c/experimental/stream_executor/stream_executor.cc:743] Non-OK-status: stream_executor::MultiPlatformManager::RegisterPlatform( std::move(cplatform)) status: INTERNAL: platform is already registered with name: "METAL" [1] 1797 abort /opt/homebrew/bin/python3 ❯ python -m pip list | grep tensorflow tensorflow 2.15.0 tensorflow-estimator 2.15.0 tensorflow-io-gcs-filesystem 0.34.0 tensorflow-macos 2.15.0 tensorflow-metal 1.1.0 ❯ python --version Python 3.11.7 OS is Sonoma 14.2.1 Thanks Sohail
Posted Last updated
.
Post not yet marked as solved
0 Replies
428 Views
Here is my environment : python==3.9.0 tensorflow==2.9.0 os==Sonoma 14.2 (23C64) Error : Translated Report (Full Report Below) Process: Python [10330] Path: /Library/Frameworks/Python.framework/Versions/3.9/Resources/Python.app/Contents/MacOS/Python Identifier: org.python.python Version: 3.9.0 (3.9.0) Code Type: X86-64 (Translated) Parent Process: Python [8039] Responsible: Terminal [779] User ID: 501 Date/Time: 2023-12-30 22:31:38.4916 +0530 OS Version: macOS 14.2 (23C64) Report Version: 12 Anonymous UUID: F7E462E7-6380-C3DA-E2EC-5CF01A61D195 Sleep/Wake UUID: 50F32A2D-8CFA-4117-8048-D9CF76E24F26 Time Awake Since Boot: 29000 seconds Time Since Wake: 2193 seconds System Integrity Protection: enabled Notes: PC register does not match crashing frame (0x0 vs 0x10CEDE6D9) Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_INSTRUCTION (SIGILL) Exception Codes: 0x0000000000000001, 0x0000000000000000 Termination Reason: Namespace SIGNAL, Code 4 Illegal instruction: 4 Terminating Process: exc handler [10330] Error Formulating Crash Report: PC register does not match crashing frame (0x0 vs 0x10CEDE6D9) Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 _cpu_feature_guard.so 0x10cede6d9 _GLOBAL__sub_I_cpu_feature_guard.cc + 9 1 dyld 0x2026a3fca invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const + 182 2 dyld 0x2026e5584 invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 133 3 dyld 0x2026d9913 invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 543 4 dyld 0x20268707f dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 249 5 dyld 0x2026d8adc dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 176 6 dyld 0x2026db104 dyld3::MachOFile::forEachInitializerPointerSection(Diagnostics&, void (unsigned int, unsigned int, bool&) block_pointer) const + 116 7 dyld 0x2026e52ba dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 390 8 dyld 0x2026a0cfc dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 222 9 dyld 0x2026a65cb dyld4::JustInTimeLoader::runInitializers(dyld4::RuntimeState&) const + 21 10 dyld 0x2026a0ef1 dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 181 11 dyld 0x2026a4040 dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const::$_1::operator()() const + 98 12 dyld 0x2026a0f87 dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 93 13 dyld 0x2026bdc65 dyld4::APIs::dlopen_from(char const*, int, void*) + 935 14 _ctypes.cpython-39-darwin.so 0x10ac20962 py_dl_open + 162 15 Python 0x10b444f2d cfunction_call + 125 16 Python 0x10b40625d _PyObject_MakeTpCall + 365 17 Python 0x10b4dc8fc call_function + 876 18 Python 0x10b4d9e2b _PyEval_EvalFrameDefault + 25371 19 Python 0x10b4dd563 _PyEval_EvalCode + 2611 20 Python 0x10b4069b1 _PyFunction_Vectorcall + 289 21 Python 0x10b4060b5 _PyObject_FastCallDictTstate + 293 22 Python 0x10b406c98 _PyObject_Call_Prepend + 152 23 Python 0x10b4601e5 slot_tp_init + 165 24 Python 0x10b45b699 type_call + 345 ... Thread 1:: com.apple.rosetta.exceptionserver 0 runtime 0x7ff7fffaf294 0x7ff7fffab000 + 17044 Thread 2:: /Reaper 0 ??? 0x7ff8aa35ea78 ??? 1 libsystem_kernel.dylib 0x7ff819da46fa kevent + 10 2 libzmq.5.dylib 0x10bf038f6 zmq::kqueue_t::loop() + 278 3 libzmq.5.dylib 0x10bf31a59 zmq::worker_poller_base_t::worker_routine(void) + 25 4 libzmq.5.dylib 0x10bf7854c thread_routine(void*) + 300 5 libsystem_pthread.dylib 0x7ff819ddf202 _pthread_start + 99 6 libsystem_pthread.dylib 0x7ff819ddabab thread_start + 15 Thread 3:: /0 0 ??? 0x7ff8aa35ea78 ??? 1 libsystem_kernel.dylib 0x7ff819da46fa kevent + 10 2 libzmq.5.dylib 0x10bf038f6 zmq::kqueue_t::loop() + 278 3 libzmq.5.dylib 0x10bf31a59 zmq::worker_poller_base_t::worker_routine(void) + 25 4 libzmq.5.dylib 0x10bf7854c thread_routine(void*) + 300 5 libsystem_pthread.dylib 0x7ff819ddf202 _pthread_start + 99 6 libsystem_pthread.dylib 0x7ff819ddabab thread_start + 15 Thread 4: 0 ??? 0x7ff8aa35ea78 ??? 1 libsystem_kernel.dylib 0x7ff819da46fa kevent + 10 2 select.cpython-39-darwin.so 0x10ab95dc3 select_kqueue_control + 915 3 Python 0x10b40f11f method_vectorcall_FASTCALL + 335 4 Python 0x10b4dc86c call_function + 732 5 Python 0x10b4d9d72 _PyEval_EvalFrameDefault + 25186 6 Python 0x10b4dd563 _PyEval_EvalCode + 2611 7 Python 0x10b4069b1 _PyFunction_Vectorcall + 289 ...
Posted Last updated
.
Post not yet marked as solved
0 Replies
487 Views
I'm using an anaconda environment Tensorflow-macos 2.15 Keras 2.15 Python 3.11.5 macOS m2 14.1 I guess problem with Pycharm, because cod is working and error is: Cannot find reference 'keras' in 'imported module tensorflow | init.py'. Previously I built a model on a simple MNIST and it's working but have same problem. I have tried different references and versions of python. I've changed environments at least 3 times and it doesn't work.
Posted
by toniX.
Last updated
.
Post not yet marked as solved
2 Replies
560 Views
Hello, I followed the instructions provided here: https://developer.apple.com/metal/tensorflow-plugin/ and while trying to run the example I am getting following error: otFoundError: dlopen(/Users/nedimhadzic/venv-metal/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN10tensorflow16TensorShapeProtoC1ERKS0_ Referenced from: <C62E0AB4-567E-3E14-8F96-9F07A746C4DC> /Users/nedimhadzic/venv-metal/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib Expected in: <FFF31651-3926-3E79-A442-143B7156FB13> /Users/nedimhadzic/venv-metal/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so tensorflow: 2.15.0 tensorlow-metal: 1.0.0 macos: 14.2.1 Intel CPU and AMD Radeon Pro 5500M Any idea? Regards, Nedim
Posted
by nedo99.
Last updated
.
Post not yet marked as solved
0 Replies
492 Views
Hi, I think many of us would love to be able to use our GPUs for Jax on the new Apple Silicon devices, but currently, the Jax-metal plugin is, for all effects and purposes, broken. Is it still under active development? Is there a planned release for a new version? thanks!
Posted
by BVJ.
Last updated
.
Post not yet marked as solved
2 Replies
800 Views
Running the sample Python keras-ocr example on M3 Max returns incorrect results if tensorflow-metal is installed. Code Example: https://keras-ocr.readthedocs.io/en/latest/examples/using_pretrained_models.html Note: https://upload.wikimedia.org/wikipedia/commons/e/e8/FseeG2QeLXo.jpg not found. Line commented out. Without tensorflow-metal (Correct results): ['toodstande', 's', 'somme', 'srny', 'squadron', 'ds', 'quentn', 'snhnen', 'bnpnone', 'sasne', 'taing', 'yeoms', 'sry', 'the', 'royal', 'wessex', 'yeomanry', 'regiment', 'yeomanry', 'wests', 'south', 'the', 'now', 'recruiting', 'arm', 'blon', 'wxybsqipsacomodn', 'email', '438300', '01722'] ['banana', 'union', 'no', 'no', 'software', 'patents'] With tensorflow-metal (Incorrect results): ['sddoooo', '', 'eamnooss', 'xynrr', 'daanues', 'idd', 'innee', 'iiiinus', 'tnounppanab', 'inla', 'ppnt', 'mmnooexyy', 'yyr', 'ehhtt', 'laayvyoorr', 'xeseww', 'rinamoevy', 'tnemiger', 'yrnamoey', 'sstseww', 'htuwlos', 'fefeahit', 'wwoniia', 'turceedrr', 'ymmrira', 'atate', 'prasbyxwr', 'liamme', '00338803144', '22277100'] ['annnaab', 'noolinnu', 'oon', 'oon', 'wttffoos', 'sttneettaap'] Logs: With tensorflow-metal (Incorrect results) (.venv) <REDACTED> % pip3 install -U tensorflow-metal Collecting tensorflow-metal Using cached tensorflow_metal-1.1.0-cp311-cp311-macosx_12_0_arm64.whl.metadata (1.2 kB) Requirement already satisfied: wheel~=0.35 in ./.venv/lib/python3.11/site-packages (from tensorflow-metal) (0.42.0) Requirement already satisfied: six>=1.15.0 in ./.venv/lib/python3.11/site-packages (from tensorflow-metal) (1.16.0) Using cached tensorflow_metal-1.1.0-cp311-cp311-macosx_12_0_arm64.whl (1.4 MB) Installing collected packages: tensorflow-metal Successfully installed tensorflow-metal-1.1.0 (.venv) <REDACTED> % python3 keras-ocr-bug.py Looking for <REDACTED>/.keras-ocr/craft_mlt_25k.h5 2023-12-16 22:05:05.452493: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M3 Max 2023-12-16 22:05:05.452532: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 64.00 GB 2023-12-16 22:05:05.452545: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 24.00 GB 2023-12-16 22:05:05.452591: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-12-16 22:05:05.452609: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) WARNING:tensorflow:From <REDACTED>/.venv/lib/python3.11/site-packages/tensorflow/python/util/dispatch.py:1260: resize_bilinear (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.image.resize(...method=ResizeMethod.BILINEAR...)` instead. Looking for <REDACTED>/.keras-ocr/crnn_kurapan.h5 2023-12-16 22:05:07.526354: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled. 1/1 [==============================] - 1s 855ms/step 2/2 [==============================] - 1s 140ms/step ['sddoooo', '', 'eamnooss', 'xynrr', 'daanues', 'idd', 'innee', 'iiiinus', 'tnounppanab', 'inla', 'ppnt', 'mmnooexyy', 'yyr', 'ehhtt', 'laayvyoorr', 'xeseww', 'rinamoevy', 'tnemiger', 'yrnamoey', 'sstseww', 'htuwlos', 'fefeahit', 'wwoniia', 'turceedrr', 'ymmrira', 'atate', 'prasbyxwr', 'liamme', '00338803144', '22277100'] ['annnaab', 'noolinnu', 'oon', 'oon', 'wttffoos', 'sttneettaap'] Logs: Valid results, without tensorflow-metal (.venv) <REDACTED> % pip3 uninstall tensorflow-metal Found existing installation: tensorflow-metal 1.1.0 Uninstalling tensorflow-metal-1.1.0: Would remove: <REDACTED>/.venv/lib/python3.11/site-packages/tensorflow-plugins/* <REDACTED>/.venv/lib/python3.11/site-packages/tensorflow_metal-1.1.0.dist-info/* Proceed (Y/n)? Y Successfully uninstalled tensorflow-metal-1.1.0 (.venv) <REDACTED> % python3 keras-ocr-bug.py Looking for <REDACTED>/.keras-ocr/craft_mlt_25k.h5 WARNING:tensorflow:From <REDACTED>/.venv/lib/python3.11/site-packages/tensorflow/python/util/dispatch.py:1260: resize_bilinear (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.image.resize(...method=ResizeMethod.BILINEAR...)` instead. Looking for <REDACTED>/.keras-ocr/crnn_kurapan.h5 1/1 [==============================] - 7s 7s/step 2/2 [==============================] - 1s 71ms/step ['toodstande', 's', 'somme', 'srny', 'squadron', 'ds', 'quentn', 'snhnen', 'bnpnone', 'sasne', 'taing', 'yeoms', 'sry', 'the', 'royal', 'wessex', 'yeomanry', 'regiment', 'yeomanry', 'wests', 'south', 'the', 'now', 'recruiting', 'arm', 'blon', 'wxybsqipsacomodn', 'email', '438300', '01722'] ['banana', 'union', 'no', 'no', 'software', 'patents']
Posted Last updated
.
Post not yet marked as solved
0 Replies
399 Views
My environment: Tensorflow: 2.14, tf-metal: 1.1, M3 Max I am working on an GAN full of residual sum and concatenation. It is trained correctly if using CPU only. However, if I enable GPU, it would cause: oc("mps_slice_1"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/d615290d-668b-11ee-9734-0697ca55970a/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":359:0)): error: 'mps.slice' op failed: length value 32 does not fit within the dimension size (33) with start value (32) /AppleInternal/Library/BuildRoots/d615290d-668b-11ee-9734-0697ca55970a/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:2133: failed assertion `Error: MLIR pass manager failed' Some customization I guess might be related to the error: tf.bitwise.bitwise_xor, tf.concat, tf.pad in custom layers numpy.random in train steps. Another debug hint I found is that the "32" is the number of channel of my models' conv layer, and change as I change the number of channel. Is there anyone know what is wrong? Thank you so much
Posted
by tf_noob.
Last updated
.
Post not yet marked as solved
0 Replies
355 Views
x = tf.Variable(tf.ones(3)) x[1].assign(5) Above code results in: tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation ResourceStridedSliceAssign: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceStridedSliceAssign: CPU _Arg: GPU CPU Colocation members, user-requested devices, and framework assigned devices, if any: ref (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 ResourceStridedSliceAssign (ResourceStridedSliceAssign) /job:localhost/replica:0/task:0/device:GPU:0 Op: ResourceStridedSliceAssign Node attrs: ellipsis_mask=0, Index=DT_INT32, T=DT_FLOAT, shrink_axis_mask=1, end_mask=0, begin_mask=0, new_axis_mask=0 Registered kernels: device='XLA_CPU_JIT'; Index in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN, DT_INT4, DT_UINT4] device='DEFAULT'; T in [DT_INT32] device='CPU'; T in [DT_UINT64] device='CPU'; T in [DT_INT64] device='CPU'; T in [DT_UINT32] device='CPU'; T in [DT_UINT16] device='CPU'; T in [DT_INT16] device='CPU'; T in [DT_UINT8] device='CPU'; T in [DT_INT8] device='CPU'; T in [DT_INT32] device='CPU'; T in [DT_HALF] device='CPU'; T in [DT_BFLOAT16] device='CPU'; T in [DT_FLOAT] device='CPU'; T in [DT_DOUBLE] device='CPU'; T in [DT_COMPLEX64] device='CPU'; T in [DT_COMPLEX128] device='CPU'; T in [DT_BOOL] device='CPU'; T in [DT_STRING] device='CPU'; T in [DT_RESOURCE] device='CPU'; T in [DT_VARIANT] device='CPU'; T in [DT_QINT8] device='CPU'; T in [DT_QUINT8] device='CPU'; T in [DT_QINT32] device='CPU'; T in [DT_FLOAT8_E5M2] device='CPU'; T in [DT_FLOAT8_E4M3FN] [[{{node ResourceStridedSliceAssign}}]] [Op:ResourceStridedSliceAssign] name: strided_slice/_assign I am starting to regret my Macbook purchase. There are so many issues with tensorflow-metal: ADAM is slow Inconsistent values with CPU And now this, I saw a post regarding this but that was one year old. So, Macbooks are not even good for learning anymore?
Posted Last updated
.
Post not yet marked as solved
2 Replies
574 Views
I created a macOS 14 VM using https://github.com/s-u/macosvm which uses the Virtualization Framework. I want to check if I can use paravirtualized graphics for tensorflow workloads. I followed the steps from https://developer.apple.com/metal/tensorflow-plugin/ but when I run the script from step 4. Verify, I get a segmentation fault (see below). Did anyone try to get this kind of GPU compute in a VM and succeed? /Users/teuf/venv-metal/lib/python3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020 warnings.warn( 2023-11-20 07:41:11.723578: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple Paravirtual device 2023-11-20 07:41:11.723620: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 10.00 GB 2023-11-20 07:41:11.723626: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 0.50 GB 2023-11-20 07:41:11.723700: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-11-20 07:41:11.723968: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) zsh: segmentation fault python3 ./tensorflow-test.py Thread 0 Crashed:: Dispatch queue: metal gpu stream 0 MPSCore 0x1999598f8 MPSDevice::GetMPSLibrary_DoNotUse(MPSLibraryInfo const*) + 92 1 MPSCore 0x19995c544 0x199927000 + 218436 2 MPSCore 0x19995c908 0x199927000 + 219400 3 MetalPerformanceShadersGraph 0x1fb696a58 0x1fb583000 + 1129048 4 MetalPerformanceShadersGraph 0x1fb6f0cc8 0x1fb583000 + 1498312 5 MetalPerformanceShadersGraph 0x1fb6ef2dc 0x1fb583000 + 1491676 6 MetalPerformanceShadersGraph 0x1fb717ea0 0x1fb583000 + 1658528 7 MetalPerformanceShadersGraph 0x1fb717ce4 0x1fb583000 + 1658084 8 MetalPerformanceShadersGraph 0x1fb6edaac 0x1fb583000 + 1485484 9 MetalPerformanceShadersGraph 0x1fb7a85e0 0x1fb583000 + 2250208 10 MetalPerformanceShadersGraph 0x1fb7a79f0 0x1fb583000 + 2247152 11 MetalPerformanceShadersGraph 0x1fb6602b4 0x1fb583000 + 905908 12 MetalPerformanceShadersGraph 0x1fb65f7b0 0x1fb583000 + 903088 13 libmetal_plugin.dylib 0x1156dfdcc invocation function for block in metal_plugin::runMPSGraph(MetalStream*, MPSGraph*, NSDictionary*, NSDictionary*) + 164 14 libdispatch.dylib 0x18e79b910 _dispatch_client_callout + 20 15 libdispatch.dylib 0x18e7aacc4 _dispatch_lane_barrier_sync_invoke_and_complete + 56 16 libmetal_plugin.dylib 0x1156dfd14 metal_plugin::runMPSGraph(MetalStream*, MPSGraph*, NSDictionary*, NSDictionary*) + 108 17 libmetal_plugin.dylib 0x115606634 metal_plugin::MPSStatelessRandomUniformOp<float>::ProduceOutput(metal_plugin::OpKernelContext*, metal_plugin::Tensor*) + 876 18 libmetal_plugin.dylib 0x115607620 metal_plugin::MPSStatelessRandomOpBase::Compute(metal_plugin::OpKernelContext*) + 620 19 libmetal_plugin.dylib 0x1156061f8 void metal_plugin::ComputeOpKernel<metal_plugin::MPSStatelessRandomUniformOp<float>>(void*, TF_OpKernelContext*) + 44 20 libtensorflow_framework.2.dylib 0x10b807354 tensorflow::PluggableDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) + 148 21 libtensorflow_framework.2.dylib 0x10b7413e0 tensorflow::(anonymous namespace)::SingleThreadedExecutorImpl::Run(tensorflow::Executor::Args const&) + 2100 22 libtensorflow_framework.2.dylib 0x10b70b820 tensorflow::FunctionLibraryRuntimeImpl::RunSync(tensorflow::FunctionLibraryRuntime::Options, unsigned long long, absl::lts_20230125::Span<tensorflow::Tensor const>, std::__1::vector<tensorflow::Tensor, std::__1::allocator<tensorflow::Tensor>>*) + 420 23 libtensorflow_framework.2.dylib 0x10b715668 tensorflow::ProcessFunctionLibraryRuntime::RunMultiDeviceSync(tensorflow::FunctionLibraryRuntime::Options const&, unsigned long long, std::__1::vector<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>, std::__1::allocator<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>>>*, std::__1::function<absl::lts_20230125::Status (tensorflow::ProcessFunctionLibraryRuntime::ComponentFunctionData const&, tensorflow::ProcessFunctionLibraryRuntime::InternalArgs*)>) const + 1336 24 libtensorflow_framework.2.dylib 0x10b71a8a4 tensorflow::ProcessFunctionLibraryRuntime::RunSync(tensorflow::FunctionLibraryRuntime::Options const&, unsigned long long, absl::lts_20230125::Span<tensorflow::Tensor const>, std::__1::vector<tensorflow::Tensor, std::__1::allocator<tensorflow::Tensor>>*) const + 848 25 libtensorflow_cc.2.dylib 0x2801b5008 tensorflow::KernelAndDeviceFunc::Run(tensorflow::ScopedStepContainer*, tensorflow::EagerKernelArgs const&, std::__1::vector<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>, std::__1::allocator<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>>>*, tsl::CancellationManager*, std::__1::optional<tensorflow::EagerFunctionParams> const&, std::__1::optional<tensorflow::ManagedStackTrace> const&, tsl::CoordinationServiceAgent*) + 572 26 libtensorflow_cc.2.dylib 0x28016613c tensorflow::EagerKernelExecute(tensorflow::EagerContext*, absl::lts_20230125::InlinedVector<tensorflow::TensorHandle*, 4ul, std::__1::allocator<tensorflow::TensorHandle*>> const&, std::__1::optional<tensorflow::EagerFunctionParams> const&, tsl::core::RefCountPtr<tensorflow::KernelAndDevice> const&, tensorflow::GraphCollector*, tsl::CancellationManager*, absl::lts_20230125::Span<tensorflow::TensorHandle*>, std::__1::optional<tensorflow::ManagedStackTrace> const&) + 452 27 libtensorflow_cc.2.dylib 0x2801708ec tensorflow::ExecuteNode::Run() + 396 28 libtensorflow_cc.2.dylib 0x2801b0118 tensorflow::EagerExecutor::SyncExecute(tensorflow::EagerNode*) + 244 29 libtensorflow_cc.2.dylib 0x280165ac8 tensorflow::(anonymous namespace)::EagerLocalExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) + 2580 30 libtensorflow_cc.2.dylib 0x2801637a8 tensorflow::DoEagerExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) + 416 31 libtensorflow_cc.2.dylib 0x2801631e8 tensorflow::EagerOperation::Execute(absl::lts_20230125::Span<tensorflow::AbstractTensorHandle*>, int*) + 132
Posted
by teuf.
Last updated
.
Post not yet marked as solved
0 Replies
391 Views
Hi, I've found that training loss is often unable to converge when training a model on a M2 Max or M1 on GPU. After I found many other requests in this forum and no answer from Apple or anyone to resolve this issue, I tried to find the package combination that failed and the one that made it work. This issue seems to happen on some TF (tensorflow), TFM (tensorflow-metal), and Batch size combinations. The only most recent combination that seems to work in any situation is: Tensorflow 2.12 Tensorflow-metal 0.8.0 They must be installed by pip and not conda forge like this : pip install tensorflow-macos==2.12 pip install tensorflow-metal==0.8.0 Every other recent combination failed to get their training loss converge. Here is the code to reproduce the issue. Sometimes the divergence appears clearly with only 10 epochs and sometimes Epoch must be increased up to 30 to get it more clearly. (train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1 train_images, test_images = train_images / 255.0, test_images / 255.0 class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] epochs = 20 batch_size = 128 with tf.device('gpu:0'): model = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(32,3,activation = 'relu',padding='same',input_shape=train_images.shape[1:]), tf.keras.layers.MaxPooling2D(2), tf.keras.layers.Conv2D(64,3,activation = 'relu',padding='same'), tf.keras.layers.MaxPooling2D(2), tf.keras.layers.Conv2D(128,3,activation = 'relu',padding='same'), tf.keras.layers.MaxPooling2D(2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(64,activation='relu'), tf.keras.layers.Dense(10) ]) model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) history = model.fit(train_images, train_labels, epochs=epochs, batch_size=batch_size, validation_data=(test_images, test_labels)) pd.DataFrame(history.history).plot(subplots=(['loss','val_loss'],['accuracy','val_accuracy']),layout=(1,2),figsize=(15,5)); Here are the results for some combinations. TF TF Metal Batch Size Epochs Training Loss Convergence 2.14.0 1.1.0 128 10 NO 2.14.0 1.1.0 512 10 YES 2.13.0 1.0.0 128 20 NO 2.13.0 1.0.0 512 20 YES 2.12.0 1.0.0 128 20 NO 2.12.0 1.0.0 512 30 NO 2.12.0 0.8.0 128 20 YES 2.12.0 0.8.0 512 30 YES As an example here is the loss and accuracy curve for TF2.14 and TFM 1.1.0 with batch size = 128. The training loss (blue line) goes up. For TF2.12, TFM 1.0.0, batch size =128, The training loss (blue line) goes up. And the one that work as expected : TF2.12, TFM 0.8.0, batch size=128 So, Apple, can you please fix it in the next release ? I also suggest that before to publish a release, you implement a simple automated testing procedure that train some models like this with various batch size and epoch an analyze the loss in the history to detect major training loss divergence. Thank you Best regards
Posted Last updated
.
Post not yet marked as solved
1 Replies
344 Views
https://developer.apple.com/metal/tensorflow-plugin/ In verify, I have put the code into the terminal, but it gives the following error: zsh: parse error near`,'
Posted
by abharwal.
Last updated
.
Post not yet marked as solved
0 Replies
324 Views
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation tf_bert_for_sequence_classification/bert/embeddings/Gather: Could not satisfy explicit device specification '' because the node {{colocation_node tf_bert_for_sequence_classification/bert/embeddings/Gather}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Posted
by dhruv2295.
Last updated
.
Post not yet marked as solved
0 Replies
470 Views
Working Environment MacBook Pro 14' with M2-Pro chip macOS Sonoma 14.0 Python 3.11.4 tensorflow 2.14.0, tensorflow-macos 2.14.0, tensorflow-metal 1.1.0 Issue Description Hi there! I met an issue when working around with Keras' TextVectorization preprocessing layer. text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) The inputs are string contents. And here is the trackback: --------------------------------------------------------------------------- NotFoundError Traceback (most recent call last) /Users/ken/Workspaces/MLE101/tfrs101/preprocess.ipynb Cell 13 line 3 1 # with tf.device('/CPU:0'): 2 text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") ----> 3 text_vectorization.adapt(ds.map(lambda x: x['title'])) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/layers/preprocessing/text_vectorization.py:473, in TextVectorization.adapt(self, data, batch_size, steps) 423 def adapt(self, data, batch_size=None, steps=None): 424 """Computes a vocabulary of string terms from tokens in a dataset. 425 426 Calling `adapt()` on a `TextVectorization` layer is an alternative to (...) 471 argument is not supported with array inputs. 472 """ --> 473 super().adapt(data, batch_size=batch_size, steps=steps) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/engine/base_preprocessing_layer.py:258, in PreprocessingLayer.adapt(self, data, batch_size, steps) 256 with data_handler.catch_stop_iteration(): 257 for _ in data_handler.steps(): --> 258 self._adapt_function(iterator) 259 if data_handler.should_sync: 260 context.async_wait() File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs) 151 except Exception as e: 152 filtered_tb = _process_traceback_frames(e.__traceback__) --> 153 raise e.with_traceback(filtered_tb) from None 154 finally: 155 del filtered_tb File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:60, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 53 # Convert any objects of type core_types.Tensor to Tensor. 54 inputs = [ 55 tensor_conversion_registry.convert(t) 56 if isinstance(t, core_types.Tensor) 57 else t 58 for t in inputs 59 ] ---> 60 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 61 inputs, attrs, num_outputs) 62 except core._NotOkStatusException as e: 63 if name is not None: NotFoundError: Graph execution error: Detected at node StringSplit/stack defined at (most recent call last): ... No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}} (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, Tdim=DT_INT32, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; Tdim in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT64] device='CPU'; Tdim in [DT_INT32] device='CPU'; Tdim in [DT_INT64] [[StringSplit/stack]] [Op:__inference_adapt_step_71204] I have to explicitly specify to use CPU to make it work - with tf.device('/CPU:0'): text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) I have referred to this post: https://developer.apple.com/forums/thread/700108
Posted Last updated
.
Post not yet marked as solved
2 Replies
436 Views
Hi, I've been going over this tutorial of autoencoders https://www.tensorflow.org/tutorials/generative/autoencoder#third_example_anomaly_detection Notebook link https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/autoencoder.ipynb And when I downloaded and ran the notebook locally on my M2 Pro Max - the results were dramatically different and the plots were way off. This is the plot in the working notebook: This is the local plot: I checked every moving piece and the difference seems to be in the output of the autoencoder, these lines: encoded_data = autoencoder.encoder(normal_test_data).numpy() decoded_data = autoencoder.decoder(encoded_data).numpy() The working notebook output is: The local output: And the overall result is notebook: Accuracy = 0.944 Precision = 0.9941176470588236 Recall = 0.9053571428571429 local: Accuracy = 0.44 Precision = 0.0 Recall = 0.0 I'm using Mac M2 Pro Max Python 3.10.12 Tensorflow 2.14.0 Can anyone help? Thanks a lot in advance.
Posted Last updated
.
Post not yet marked as solved
0 Replies
378 Views
`print("Hello") import tensorflow as tf` I have an error during installing tensorflow "Process finished with exit code 132 (interrupted by signal 4: SIGILL)" Mac air 2022 M2 14.1 | Tensorflow latest version | Python version 3.11.5 Who can help me please? I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have tried anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)
Posted
by toniX.
Last updated
.
Post not yet marked as solved
0 Replies
311 Views
I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have used anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)
Posted
by toniX.
Last updated
.