ML Compute

RSS for tag

Accelerate training and validation of neural networks using the CPU and GPUs.

ML Compute Documentation

Posts under ML Compute tag

44 Posts
Sort by:
Post not yet marked as solved
0 Replies
545 Views
Hi everyone, Wondering if you know how the device decide which compute unit (GPU, CPU or ANE) to use when compute units are set to ALL? I'm working on optimizing a GPT2 model to run on ANE. I ran the performance report for the existing model and the report showed me operators not supported by ANE. Then I went onto remove these operators and converted the model to CoreML again. This time the performance report showed that every operator is supported by ANE but the device still prefers GPU when the compute units are set to ALL and perfers CPU when the compute units are set to CPU and ANE. ALL CPU and ANE Does anyone know why? Thank you in advance!
Posted
by
Post not yet marked as solved
1 Replies
646 Views
Hi folks, I'm working on converting a GPT2 model to coreml with KV caching enabled. I have a GPT2 model runinng on GPU with static input shape It seems once I enable flexible shape (i.e. either range shape or enumerated shape), the model will be run on CPU according to the performance report. I can see new operators being added ( get_shape and general_slice ) and it is not supported by GPU / ANE Wondering if there's any way to get around this to get the model running on GPU / ANE? How does the machine decide whether to run the model on GPU / Neural Engine? Thanks!
Posted
by
Post not yet marked as solved
0 Replies
925 Views
Hello, I posted an issue on the coremltools GitHub about my Core ML models not performing as well on iOS 17 vs iOS 16 but I'm posting it here just in case. TL;DR The same model on the same device/chip performs far slower (doesn't use the Neural Engine) on iOS 17 compared to iOS 16. Longer description The following screenshots show the performance of the same model (a PyTorch computer vision model) on an iPhone SE 3rd gen and iPhone 13 Pro (both use the A15 Bionic). iOS 16 - iPhone SE 3rd Gen (A15 Bioinc) iOS 16 uses the ANE and results in fast prediction, load and compilation times. iOS 17 - iPhone 13 Pro (A15 Bionic) iOS 17 doesn't seem to use the ANE, thus the prediction, load and compilation times are all slower. Code To Reproduce The following is my code I'm using to export my PyTorch vision model (using coremltools). I've used the same code for the past few months with sensational results on iOS 16. # Convert to Core ML using the Unified Conversion API coreml_model = ct.convert( model=traced_model, inputs=[image_input], outputs=[ct.TensorType(name="output")], classifier_config=ct.ClassifierConfig(class_names), convert_to="neuralnetwork", # compute_precision=ct.precision.FLOAT16, compute_units=ct.ComputeUnit.ALL ) System environment: Xcode version: 15.0 coremltools version: 7.0.0 OS (e.g. MacOS version or Linux type): Linux Ubuntu 20.04 (for exporting), macOS 13.6 (for testing on Xcode) Any other relevant version information (e.g. PyTorch or TensorFlow version): PyTorch 2.0 Additional context This happens across "neuralnetwork" and "mlprogram" type models, neither use the ANE on iOS 17 but both use the ANE on iOS 16 If anyone has a similar experience, I'd love to hear more. Otherwise, if I'm doing something wrong for the exporting of models for iOS 17+, please let me know. Thank you!
Posted
by
Post not yet marked as solved
1 Replies
561 Views
When attempting to load an mlmodel and run it on the CPU/GPU by passing the ComputeUnit you'd like to use when creating the model with: model = ct.models.MLModel('mymodel.mlmodel', ct.ComputeUnit.CPU_ONLY) Documentation for coremltools v7.0 says: compute_units: coremltools.ComputeUnit coremltools.ComputeUnit.ALL: Use all compute units available, including the neural engine. coremltools.ComputeUnit.CPU_ONLY: Limit the model to only use the CPU. coremltools.ComputeUnit.CPU_AND_GPU: Use both the CPU and GPU, but not the neural engine. coremltools.ComputeUnit.CPU_AND_NE: Use both the CPU and neural engine, but not the GPU. Available only for macOS >= 13.0. coremltools 7.0 (and previous versions I've tried) now seems to ignore that hint and only runs my models on the ANE. Same model when loaded into XCode and run a perf test with cpu only runs happily on the CPU and selected in Xcode performance tool. Is there a way in python to get our models to run on different compute units?
Posted
by
Post not yet marked as solved
0 Replies
429 Views
I am looking for an experienced LiDAR programmer for our room surveying app, any recommendations? Thanks
Posted
by
Post not yet marked as solved
1 Replies
601 Views
I am just starting to learn neural networks. If I run my code and try to fit a simple trigonometric function, the model builds a good-looking function. If I pip install tensorflow-metal and run, I get a straight line not resembling the non-linear function at all. if I uninstall metal, everything works again. Which suggests there is something wrong with metal. Any help would be appreciated. I would use the metal acceleration for the next steps in my project. Thank you
Posted
by
Post not yet marked as solved
1 Replies
448 Views
I have export a quantization model with ct.convert whose "minimum_deployment_target=ct.target.iOS17,", can I run it without a iphone ?
Posted
by
Post not yet marked as solved
0 Replies
582 Views
I am in the process of developing a matrix-vector multiplication kernel. While conducting performance evaluations, I've noticed that on M1/M1 Pro/M1 Max, the kernel demonstrates an impressive memory bandwidth utilization of around 90%. However, when executed on the M1 Ultra/M2 Ultra, this figure drops to approximately 65%. My suspicion is that this discrepancy is attributed to the dual-die architecture of the M1 Ultra/M2 Ultra. It's plausible that the necessary data might be stored within the L2 cache of the alternate die. Could you kindly provide any insights or recommendations for mitigating the occurrence of on-die L2 cache misses on the Ultra chips? Additionally, I would greatly appreciate any general advice aimed at enhancing memory load speeds on these particular chips.
Posted
by
Post marked as solved
1 Replies
912 Views
I've been working on an app that combines CoreML and ARKit/SceneKit to detect and measure some objects, with success. Now I need to make it available to a React Native app, and I'm trying this approach here: https://github.com/riteshakya037/react-native-native-module where I can navigate and instantiate the view controller. The problem occurs when my view gets called. I have errors at the sceneView, not being loaded. Is there a way to use it without the Storyboard? For now it seems the incompatibility.
Posted
by
Post not yet marked as solved
0 Replies
654 Views
Hello fellow developers, I am currently developing an application involving machine learning models, specifically CoreML models, and I have encountered an intriguing issue that I am hoping to get some insights on. In my current scenario, I'm planning to create a simple application with minimal UI, possibly using PyQT or similar tools. Therefore, I'm seeking a way to utilize NeuralEngine and GPU for CoreML model inference in Python. I discovered the 'predict' API in CoreMLTools which allows for model inference, but I'm unsure if its performance is on par with that of a properly built MacOS application using Swift and Neural Engine. Can anyone provide insights into whether there's a considerable difference in inference performance between these two methods? Is the performance of CoreMLTools 'predict' API comparable to that of a full-fledged Swift MacOS application leveraging the Neural Engine? Any clarification or guidance on this matter would be greatly appreciated. Thanks!
Posted
by
Post not yet marked as solved
6 Replies
3k Views
Build and installed Jax and Jax-metal following instructions on a M2Pro Mac-mini from here - https://developer.apple.com/metal/jax/ However, the following check seems to suggest XLA using CPU and not GPU. >>> from jax.lib import xla_bridge >>> print(xla_bridge.get_backend().platform) cpu Has anyone got it working to dump GPU? Thanks in advance!
Posted
by
Post not yet marked as solved
4 Replies
1.6k Views
I initially raised this issue in the tensorflow forum, and they directed me back here since this is a tf-macos specific problem [see https://github.com/tensorflow/tensorflow/issues/60673]. When calling Model.compile() with the AdamW optimizer, a warning is thrown saying that v2.11+ optimizers have a known slowdown on M1/M2 devices, and so the backend attempts to fallback to a legacy version. However, no legacy version of the AdamW optimizer exists. In a previous tf-macos version 2.12, this lead to an error during Model.compile() [see issue https://github.com//issues/60652 and https://developer.apple.com/forums/thread/729732]. In the current nightly, this error is not thrown - however, after calling model.compile(), the attribute model.optimizer is set to string 'adamw' instead of an optimizer object. Later, when we call model.fit(), this leads to an AttributeError, because model.optimizer.minimize() does not exist when model.optimizer is a string. Expected behaviour: correctly compile the model with either a v2.11+ optimiser without slowdown, or a legacy-compatible implementation of the AdamW optimizer. Then the model will train correctly with a valid AdamW optimizer when calling model.fit(). Note: a warning message suggests using the optimizer located at tf.keras.optimizers.legacy.AdamW, but this does not exist It would be nice to be able to either use modern optimizers, or have a legacy-compatible version of AdamW, since weight-decay is an important tool in modern ML research, and currently cannot be used on mac. Standalone code to reproduce the issue ##===========## ## Imports ## ##===========## import sys import tensorflow as tf import numpy as np from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense from tensorflow.keras.optimizers import AdamW ##===================## ## Report versions ## ##===================## # # Expected outputs: # Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:01:19) [Clang 14.0.6 ] # TF version is: 2.14.0-dev20230523 # Numpy version is: 1.23.2 # print(f"Python version is: {sys.version}") print(f"TF version is: {tf.__version__}") print(f"Numpy version is: {np.__version__}") ##==============================## ## Create a very simple model ## ##==============================## # # Expected outputs: # Model: "model_1" # _________________________________________________________________ # Layer (type) Output Shape Param # # ================================================================= # Layer_in (InputLayer) [(None, 2)] 0 # # Layer_hidden (Dense) (None, 10) 30 # # Layer_out (Dense) (None, 2) 22 # # ================================================================= # Total params: 52 (208.00 Byte) # Trainable params: 52 (208.00 Byte) # Non-trainable params: 0 (0.00 Byte) # _________________________________________________________________ # x_in = Input(2 , dtype=tf.float32, name="Layer_in" ) x = x_in x = Dense(10, dtype=tf.float32, name="Layer_hidden", activation="relu" )(x) x = Dense(2 , dtype=tf.float32, name="Layer_out" , activation="linear")(x) model = Model(x_in, x) model.summary() ##===================================================## ## Compile model with MSE loss and AdamW optimizer ## ##===================================================## # # Expected outputs: # WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.AdamW` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.AdamW`. # WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.AdamW`. # model.compile( loss = "mse", optimizer = AdamW(learning_rate=1e-3, weight_decay=1e-2) ) ##===========================## ## Generate some fake data ## ##===========================## # # Expected outputs: # X shape is (100, 2), Y shape is (100, 2) # dataset_size = 100 X = np.random.normal(size=(dataset_size, 2)) X = tf.constant(X, dtype=tf.float32) Y = np.random.normal(size=(dataset_size, 2)) Y = tf.constant(Y, dtype=tf.float32) print(f"X shape is {X.shape}, Y shape is {Y.shape}") ##===================================## ## Fit model to data for one epoch ## ##===================================## # # Expected outputs: # --------------------------------------------------------------------------- # AttributeError Traceback (most recent call last) # Cell In[9], line 51 # 1 ##===================================## # 2 ## Fit model to data for one epoch ## # 3 ##===================================## # (...) # 48 # • mask=None # 49 # # ---> 51 model.fit(X, Y, epochs=1) # File ~/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs) # 67 filtered_tb = _process_traceback_frames(e.__traceback__) # 68 # To get the full stack trace, call: # 69 # `tf.debugging.disable_traceback_filtering()` # ---> 70 raise e.with_traceback(filtered_tb) from None # 71 finally: # 72 del filtered_tb # File /var/folders/6_/gprzxt797d5098h8dtk22nch0000gn/T/__autograph_generated_filezzqv9k36.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator) # 13 try: # 14 do_return = True # ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) # 16 except: # 17 do_return = False # AttributeError: in user code: # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1338, in train_function * # return step_function(self, iterator) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1322, in step_function ** # outputs = model.distribute_strategy.run(run_step, args=(data,)) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1303, in run_step ** # outputs = model.train_step(data) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1084, in train_step # self.optimizer.minimize(loss, self.trainable_variables, tape=tape) # AttributeError: 'str' object has no attribute 'minimize' model.fit(X, Y, epochs=1)
Posted
by
Post not yet marked as solved
0 Replies
819 Views
Hello, I have my largely iOS app running using Mac Catalyst, but I need to limit what Macs will be able to install it from the Mac App Store based on the GPU Family like MTLGPUFamily.mac2. Is that possible? Or I could limit it to Apple Silicon using the Designed for iPad target, but I would prefer to use Mac Catalyst instead of Designed for iPad. Is it possible to limit Mac Catalyst installs to Apple Silicon Macs? Side question: what capabilities are supported by MTLGPUFamily.mac2? I can't find it. My main interest is in CoreML inference acceleration. Thank you.
Posted
by
Post not yet marked as solved
0 Replies
817 Views
I am performing a grid search over a parameter grid and train the model with different combinations of hyperparameters. I am receiving the following Warning: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Why is that and what can I do to fix it? Thank you very much. Here is the code: def grid_search(model_name): ... elif model_name == 'LSTM': def build_model(units, activation, dropout, layers): model = Sequential() model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, return_sequences=True, input_shape=(2, 1152), recurrent_dropout=0)) model.add(Dropout(dropout)) for i in range(layers): if i != layers-1: model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, return_sequences=True,recurrent_dropout=0)) model.add(Dropout(dropout)) elif i == (layers-1): model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, recurrent_dropout=0)) model.add(Dropout(dropout)) model.add(Dense(units=6, kernel_initializer="normal", activation=activation)) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]) return model param_grid = {'units': [200, 300, 400], 'activation': ['tanh'], 'dropout': [0, 0.2, 0.4, 0.6], 'layers': [0, 5]} group_kfold = GroupKFold(n_splits=len(np.unique(groups_train))) model = KerasClassifier(model=build_model, units=param_grid['units'], activation=param_grid['activation'], dropout=param_grid['dropout'], layers=param_grid['layers']) grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=group_kfold) X_test, X_train, y_test, y_train = raw_dataassigner(model_name) (X_train, y_train) = shuffle(X_train, y_train) with tf.device('/cpu:0'): grid_result = grid_search.fit(X_train, y_train, groups=groups_train) print(f'Best score ({grid_search.best_score_}) for {model_name} model achieved with parameters: ', grid_search.best_params_) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print("%f (%f) with: %r" % (mean, stdev, param)) grid_search('LSTM')
Posted
by
Post not yet marked as solved
5 Replies
1.4k Views
hey if i wanted to create an app that takes screenshots from an apple device (and any app within) to give context to an ai so the ai can then respond. Then the app parses the response then executes commands on behalf of the ai/user, how would I do so with the rule that "screenshots/captures are not allowed within other apps"? Want to stay within bounds of the rules in place. Possibilities: Ai assistant, Ai pals, passive automation
Posted
by
Post marked as solved
1 Replies
1.4k Views
When I run the performance test on a CoreML model, it shows predictions are 834% faster running on the Neural Engine as it is on the GPU. It also shows, that 100% of the model can run on the Neural Engine: GPU only: But when I set the compute units to all: let config = MLModelConfiguration() config.computeUnits = .all and profile, it shows that the neural engine isn’t used at all. Well, other than loading the model which takes 25 seconds when allowed to use the neural engine versus less than a second when not allowing the neural engine: The difference in speed is the difference between the app being too slow to even release versus quite reasonable performance. I have a lot of work invested in this, so I am really hoping that I can get it to run on the Neural Engine. Why isn't it actually running on the Neural Engine when it shows that it is supported and I have the compute unit set to run on the Neural Engine?
Posted
by
Post marked as solved
1 Replies
1.2k Views
I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?
Posted
by
Post marked as solved
3 Replies
2.4k Views
Hi! I've just got a MacBook Air with M2 chip. I'm doing some research on ML and I was wondering if it is possible to use Neural Engine as an accelerator for training. If it is possible, where can I find ressources on how to enable it and how to check it? Thank you!
Posted
by