ML Compute

Why is the case that every operator is supported by the ANE but the model still runs on GPU

Hi everyone, Wondering if you know how the device decide which compute unit (GPU, CPU or ANE) to use when compute units are set to ALL? I'm working on optimizing a GPT2 model to run on ANE. I ran the performance report for the existing model and the report showed me operators not supported by ANE. Then I went onto remove these operators and converted the model to CoreML again. This time the performance report showed that every operator is supported by ANE but the device still prefers GPU when the compute units are set to ALL and perfers CPU when the compute units are set to CPU and ANE. ALL CPU and ANE Does anyone know why? Thank you in advance!

Posted

by

dcdcdc123.

Last updated

.

How to fully apply parallel computing on CPU and GPU of M1max

Project is based on python3.8 and 3.9, containing some C and C++ source How can I do parallel computing on CPU and GPU of M1max In deed, I buy Mac m1max for the strong GPU to do quantitative finance, for which the speed is extremely important. Unfortunately, cuda is not compatible with Mac. Show me how to do it, thx. Are Accelerate(for CPU) and Metal(for GPU) can speed up any source by building like this: Step 1: download source from github Step 2: create a file named "site.cfg"in this souce file, and add content: [accelerate] libraries=Metal, Acelerate, vecLib Step 3: Terminal: NPY_LAPACK_Order=accelerate python3 setup.py build Step 4: pip3 install . or python3 setup.py install ? (I am not sure which method to apply) 2、how is the compatibility of such method? I need speed up numpy, pandas and even a open souce project, such as https://github.com/microsoft/qlib 3、just show me the code 4、when compiling C++, C source, a lot of errors were reported, which gcc and g++ to choose? the default gcc installed by brew is 4.2.1, which cannot work. and I even tried to download gcc from the offical website of ARM， still cannot work. give me a hint. thx so much urgent

Posted

by

jefftang.

Last updated

.

Is there any way to make the model run on GPU / Neural Engine?

Hi folks, I'm working on converting a GPT2 model to coreml with KV caching enabled. I have a GPT2 model runinng on GPU with static input shape It seems once I enable flexible shape (i.e. either range shape or enumerated shape), the model will be run on CPU according to the performance report. I can see new operators being added ( get_shape and general_slice ) and it is not supported by GPU / ANE Wondering if there's any way to get around this to get the model running on GPU / ANE? How does the machine decide whether to run the model on GPU / Neural Engine? Thanks!

Posted

by

dcdcdc123.

Last updated

.

Action Classifier basing on VNDetectHumanBodyPose3DRequest data

Hi, I am developing a fitness app that detects technique mistakes during workout. Can we use 3D data from VNDetectHumanBodyPose3DRequest with ML model?

Posted

by

patrykogon.

Last updated

.

Core ML Model performance far lower on iOS 17 vs iOS 16 (iOS 17 not using Neural Engine)

Hello, I posted an issue on the coremltools GitHub about my Core ML models not performing as well on iOS 17 vs iOS 16 but I'm posting it here just in case. TL;DR The same model on the same device/chip performs far slower (doesn't use the Neural Engine) on iOS 17 compared to iOS 16. Longer description The following screenshots show the performance of the same model (a PyTorch computer vision model) on an iPhone SE 3rd gen and iPhone 13 Pro (both use the A15 Bionic). iOS 16 - iPhone SE 3rd Gen (A15 Bioinc) iOS 16 uses the ANE and results in fast prediction, load and compilation times. iOS 17 - iPhone 13 Pro (A15 Bionic) iOS 17 doesn't seem to use the ANE, thus the prediction, load and compilation times are all slower. Code To Reproduce The following is my code I'm using to export my PyTorch vision model (using coremltools). I've used the same code for the past few months with sensational results on iOS 16. # Convert to Core ML using the Unified Conversion API coreml_model = ct.convert( model=traced_model, inputs=[image_input], outputs=[ct.TensorType(name="output")], classifier_config=ct.ClassifierConfig(class_names), convert_to="neuralnetwork", # compute_precision=ct.precision.FLOAT16, compute_units=ct.ComputeUnit.ALL ) System environment: Xcode version: 15.0 coremltools version: 7.0.0 OS (e.g. MacOS version or Linux type): Linux Ubuntu 20.04 (for exporting), macOS 13.6 (for testing on Xcode) Any other relevant version information (e.g. PyTorch or TensorFlow version): PyTorch 2.0 Additional context This happens across "neuralnetwork" and "mlprogram" type models, neither use the ANE on iOS 17 but both use the ANE on iOS 16 If anyone has a similar experience, I'd love to hear more. Otherwise, if I'm doing something wrong for the exporting of models for iOS 17+, please let me know. Thank you!

Posted

by

mrdbourke.

Last updated

.

How to use GPU in Tensorflow?

Im using my 2020 Mac mini with M1 chip and this is the first time try to use it on convolutional neural network training. So the problem is I install the python(ver 3.8.12) using miniforge3 and Tensorflow following this instruction. But still facing the GPU problem when training a 3D Unet. Here's part of my code and hoping to receive some suggestion to fix this. import tensorflow as tf from tensorflow import keras import json import numpy as np import pandas as pd import nibabel as nib import matplotlib.pyplot as plt from tensorflow.keras import backend as K #check available devices def get_available_devices(): local_device_protos = device_lib.list_local_devices() return [x.name for x in local_device_protos] print(get_available_devices()) Metal device set to: Apple M1 ['/device:CPU:0', '/device:GPU:0'] 2022-02-09 11:52:55.468198: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-02-09 11:52:55.468885: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) X_norm_with_batch_dimension = np.expand_dims(X_norm, axis=0) #tf.device('/device:GPU:0') #Have tried this line doesn't work #tf.debugging.set_log_device_placement(True) #Have tried this line doesn't work patch_pred = model.predict(X_norm_with_batch_dimension) InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] [[model/conv3d/Conv3D/_4]] (1) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] 0 successful operations. 0 derived errors ignored. The code is executable on Google Colab but can't run on Mac mini locally with Jupyter notebook. The NHWC tensor format problem might indicate that Im using my CPU to execute the code instead of GPU. Is there anyway to optimise GPU to train the network in Tensorflow?

Posted

by

MW_Shay.

Last updated

.

Tensorflow on M1 Macbook Pro, error when model fit executes

It doesn't matter if I install miniforge or mamba, directly or through brew, when I try to fit the sample model from https://developer.apple.com/metal/tensorflow-plugin/, even with a simple sequential model, I always get this error. Is there any workaround on this? I'll appreciate any help, thanks! 2022-12-10 11:18:19.941623: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2022-12-10 11:18:20.427283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. 2022-12-10 11:18:21.222950: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.223003: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.363366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.364757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 NotFoundError Traceback (most recent call last) Cell In[25], line 2 1 model = create_model() ----> 2 history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None: NotFoundError: Graph execution error: Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/traitlets/config/application.py", line 992, in launch_instance app.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 711, in start self.io_loop.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start self.asyncio_loop.run_forever() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once handle._run() ... File "/var/folders/f9/bp40pn0d401d974fy48dxm8h0000gn/T/ipykernel_63636/3393788193.py", line 2, in <module> history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler return fn(*args, **kwargs) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit tmp_logs = self.train_function(iterator) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function return step_function(self, iterator) ...... File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step outputs = model.train_step(data) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step self.optimizer.minimize(loss, self.trainable_variables, tape=tape) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize self.apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients return super().apply_gradients(grads_and_vars, name=name) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients iteration = self._internal_apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients return tf.__internal__.distribute.interim.maybe_merge_call( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn distribution.extended.update( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var return self._update_step_xla(grad, var, id(self._var_key(var))) Node: 'StatefulPartitionedCall_4' could not find registered platform with id: 0x28edf1f90 [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1241]

Posted

by

ppobar.

Last updated

.

LiDAR dev

I am looking for an experienced LiDAR programmer for our room surveying app, any recommendations? Thanks

ML Compute

Posted

by

Captures-4.

Last updated

.

Tensorflow Metal Malfunctioning Completely

I am just starting to learn neural networks. If I run my code and try to fit a simple trigonometric function, the model builds a good-looking function. If I pip install tensorflow-metal and run, I get a straight line not resembling the non-linear function at all. if I uninstall metal, everything works again. Which suggests there is something wrong with metal. Any help would be appreciated. I would use the metal acceleration for the next steps in my project. Thank you

Posted

by

Yash11.

Last updated

.

AdamW crashes on tf-macos (tf-nightly-macos 2.14.0-dev20230523)

I initially raised this issue in the tensorflow forum, and they directed me back here since this is a tf-macos specific problem [see https://github.com/tensorflow/tensorflow/issues/60673]. When calling Model.compile() with the AdamW optimizer, a warning is thrown saying that v2.11+ optimizers have a known slowdown on M1/M2 devices, and so the backend attempts to fallback to a legacy version. However, no legacy version of the AdamW optimizer exists. In a previous tf-macos version 2.12, this lead to an error during Model.compile() [see issue https://github.com//issues/60652 and https://developer.apple.com/forums/thread/729732]. In the current nightly, this error is not thrown - however, after calling model.compile(), the attribute model.optimizer is set to string 'adamw' instead of an optimizer object. Later, when we call model.fit(), this leads to an AttributeError, because model.optimizer.minimize() does not exist when model.optimizer is a string. Expected behaviour: correctly compile the model with either a v2.11+ optimiser without slowdown, or a legacy-compatible implementation of the AdamW optimizer. Then the model will train correctly with a valid AdamW optimizer when calling model.fit(). Note: a warning message suggests using the optimizer located at tf.keras.optimizers.legacy.AdamW, but this does not exist It would be nice to be able to either use modern optimizers, or have a legacy-compatible version of AdamW, since weight-decay is an important tool in modern ML research, and currently cannot be used on mac. Standalone code to reproduce the issue ##===========## ## Imports ## ##===========## import sys import tensorflow as tf import numpy as np from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense from tensorflow.keras.optimizers import AdamW ##===================## ## Report versions ## ##===================## # # Expected outputs: # Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:01:19) [Clang 14.0.6 ] # TF version is: 2.14.0-dev20230523 # Numpy version is: 1.23.2 # print(f"Python version is: {sys.version}") print(f"TF version is: {tf.__version__}") print(f"Numpy version is: {np.__version__}") ##==============================## ## Create a very simple model ## ##==============================## # # Expected outputs: # Model: "model_1" # _________________________________________________________________ # Layer (type) Output Shape Param # # ================================================================= # Layer_in (InputLayer) [(None, 2)] 0 # # Layer_hidden (Dense) (None, 10) 30 # # Layer_out (Dense) (None, 2) 22 # # ================================================================= # Total params: 52 (208.00 Byte) # Trainable params: 52 (208.00 Byte) # Non-trainable params: 0 (0.00 Byte) # _________________________________________________________________ # x_in = Input(2 , dtype=tf.float32, name="Layer_in" ) x = x_in x = Dense(10, dtype=tf.float32, name="Layer_hidden", activation="relu" )(x) x = Dense(2 , dtype=tf.float32, name="Layer_out" , activation="linear")(x) model = Model(x_in, x) model.summary() ##===================================================## ## Compile model with MSE loss and AdamW optimizer ## ##===================================================## # # Expected outputs: # WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.AdamW` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.AdamW`. # WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.AdamW`. # model.compile( loss = "mse", optimizer = AdamW(learning_rate=1e-3, weight_decay=1e-2) ) ##===========================## ## Generate some fake data ## ##===========================## # # Expected outputs: # X shape is (100, 2), Y shape is (100, 2) # dataset_size = 100 X = np.random.normal(size=(dataset_size, 2)) X = tf.constant(X, dtype=tf.float32) Y = np.random.normal(size=(dataset_size, 2)) Y = tf.constant(Y, dtype=tf.float32) print(f"X shape is {X.shape}, Y shape is {Y.shape}") ##===================================## ## Fit model to data for one epoch ## ##===================================## # # Expected outputs: # --------------------------------------------------------------------------- # AttributeError Traceback (most recent call last) # Cell In[9], line 51 # 1 ##===================================## # 2 ## Fit model to data for one epoch ## # 3 ##===================================## # (...) # 48 # • mask=None # 49 # # ---> 51 model.fit(X, Y, epochs=1) # File ~/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs) # 67 filtered_tb = _process_traceback_frames(e.__traceback__) # 68 # To get the full stack trace, call: # 69 # `tf.debugging.disable_traceback_filtering()` # ---> 70 raise e.with_traceback(filtered_tb) from None # 71 finally: # 72 del filtered_tb # File /var/folders/6_/gprzxt797d5098h8dtk22nch0000gn/T/__autograph_generated_filezzqv9k36.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator) # 13 try: # 14 do_return = True # ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) # 16 except: # 17 do_return = False # AttributeError: in user code: # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1338, in train_function * # return step_function(self, iterator) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1322, in step_function ** # outputs = model.distribute_strategy.run(run_step, args=(data,)) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1303, in run_step ** # outputs = model.train_step(data) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1084, in train_step # self.optimizer.minimize(loss, self.trainable_variables, tape=tape) # AttributeError: 'str' object has no attribute 'minimize' model.fit(X, Y, epochs=1)

Posted

by

smenary.

Last updated

.

Unable to use GPU

Hello, I am trying to use gpu for machine learning task from apple using "mps" as device for GPU but it is not working. I am using PyTorch Stable version. How can I use MacBook GPU for machine learning tasks?

Posted

by

mishrr.

Last updated

.

Ai controlled i-devices dev help

hey if i wanted to create an app that takes screenshots from an apple device (and any app within) to give context to an ai so the ai can then respond. Then the app parses the response then executes commands on behalf of the ai/user, how would I do so with the rule that "screenshots/captures are not allowed within other apps"? Want to stay within bounds of the rules in place. Possibilities: Ai assistant, Ai pals, passive automation

Posted

by

Sirmeowmeow.

Last updated

.

can xcode simulate coreml for ios17

I have export a quantization model with ct.convert whose "minimum_deployment_target=ct.target.iOS17,", can I run it without a iphone ?

ML Compute

Posted

by

feyman1999.

Last updated

.

Maximize memory read bandwidth on M1 Ultra/M2 Ultra

I am in the process of developing a matrix-vector multiplication kernel. While conducting performance evaluations, I've noticed that on M1/M1 Pro/M1 Max, the kernel demonstrates an impressive memory bandwidth utilization of around 90%. However, when executed on the M1 Ultra/M2 Ultra, this figure drops to approximately 65%. My suspicion is that this discrepancy is attributed to the dual-die architecture of the M1 Ultra/M2 Ultra. It's plausible that the necessary data might be stored within the L2 cache of the alternate die. Could you kindly provide any insights or recommendations for mitigating the occurrence of on-die L2 cache misses on the Ultra chips? Additionally, I would greatly appreciate any general advice aimed at enhancing memory load speeds on these particular chips.

Posted

by

lshzh.

Last updated

.

Integrate ARKit/SceneKit with React Native

I've been working on an app that combines CoreML and ARKit/SceneKit to detect and measure some objects, with success. Now I need to make it available to a React Native app, and I'm trying this approach here: https://github.com/riteshakya037/react-native-native-module where I can navigate and instantiate the view controller. The problem occurs when my view gets called. I have errors at the sceneView, not being loaded. Is there a way to use it without the Storyboard? For now it seems the incompatibility.

Posted

by

nailsonlandim.

Last updated

.

Comparing Performance: Inference CoreML Model with CoreMLTools in Python VS CoreML with Swift in a MacOS Application

Hello fellow developers, I am currently developing an application involving machine learning models, specifically CoreML models, and I have encountered an intriguing issue that I am hoping to get some insights on. In my current scenario, I'm planning to create a simple application with minimal UI, possibly using PyQT or similar tools. Therefore, I'm seeking a way to utilize NeuralEngine and GPU for CoreML model inference in Python. I discovered the 'predict' API in CoreMLTools which allows for model inference, but I'm unsure if its performance is on par with that of a properly built MacOS application using Swift and Neural Engine. Can anyone provide insights into whether there's a considerable difference in inference performance between these two methods? Is the performance of CoreMLTools 'predict' API comparable to that of a full-fledged Swift MacOS application leveraging the Neural Engine? Any clarification or guidance on this matter would be greatly appreciated. Thanks!

Posted

by

beeble123.

Last updated

.

Train Tensorflow models using Neural Engine on M2 chip

Hi! I've just got a MacBook Air with M2 chip. I'm doing some research on ML and I was wondering if it is possible to use Neural Engine as an accelerator for training. If it is possible, where can I find ressources on how to enable it and how to check it? Thank you!

ML Compute

Posted

by

gn48.

Last updated

.

Jax-metal on M2 Pro does not recognize GPU

Build and installed Jax and Jax-metal following instructions on a M2Pro Mac-mini from here - https://developer.apple.com/metal/jax/ However, the following check seems to suggest XLA using CPU and not GPU. >>> from jax.lib import xla_bridge >>> print(xla_bridge.get_backend().platform) cpu Has anyone got it working to dump GPU? Thanks in advance!

Posted

by

shibd.

Last updated

.

Getting ValueError: Categorical Cross Entropy loss layer input (Identity) must be a softmax layer output.

I am working on the neural network classifier provided on the coremltools.readme.io in the updatable->neural network section(https://coremltools.readme.io/docs/updatable-neural-network-classifier-on-mnist-dataset). I am using the same code but I get an error saying that the coremltools.converters.keras.convert does not exist. But this I know can be coreml version issue. Right know I am using coremltools version 6.2. I converted this model to mlmodel with .convert only. It got converted successfully. But I face an error in the make_updatable function saying the loss layer must be softmax output. Even the coremlt package API reference there I found its because the layer name is softmaxND but it should be softmax. Now the problem is when I convert the model from Keras sequential model to coreml model. the layer name and type change. And the softmax changes to softmaxND. Does anyone faced this issue? if I execute this builder.inspect_layers(last=4) I get this output [Id: 32], Name: sequential/dense_1/Softmax (Type: softmaxND) Updatable: False Input blobs: ['sequential/dense_1/MatMul'] Output blobs: ['Identity'] [Id: 31], Name: sequential/dense_1/MatMul (Type: batchedMatmul) Updatable: False Input blobs: ['sequential/dense/Relu'] Output blobs: ['sequential/dense_1/MatMul'] [Id: 30], Name: sequential/dense/Relu (Type: activation) Updatable: False Input blobs: ['sequential/dense/MatMul'] Output blobs: ['sequential/dense/Relu'] In the make_updatable function when I execute builder.set_categorical_cross_entropy_loss(name='lossLayer', input='Identity') I get this error ValueError: Categorical Cross Entropy loss layer input (Identity) must be a softmax layer output.

Posted

by

anaamrasool.

Last updated

.

MPSGraph randomTensor works for inference but crashes when training

I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?

Posted

by

noahmartin.

Last updated

.

Posts under ML Compute tag