ML Compute

How to fully apply parallel computing on CPU and GPU of M1max

Project is based on python3.8 and 3.9, containing some C and C++ source How can I do parallel computing on CPU and GPU of M1max In deed, I buy Mac m1max for the strong GPU to do quantitative finance, for which the speed is extremely important. Unfortunately, cuda is not compatible with Mac. Show me how to do it, thx. Are Accelerate(for CPU) and Metal(for GPU) can speed up any source by building like this: Step 1: download source from github Step 2: create a file named "site.cfg"in this souce file, and add content: [accelerate] libraries=Metal, Acelerate, vecLib Step 3: Terminal: NPY_LAPACK_Order=accelerate python3 setup.py build Step 4: pip3 install . or python3 setup.py install ? (I am not sure which method to apply) 2、how is the compatibility of such method? I need speed up numpy, pandas and even a open souce project, such as https://github.com/microsoft/qlib 3、just show me the code 4、when compiling C++, C source, a lot of errors were reported, which gcc and g++ to choose? the default gcc installed by brew is 4.2.1, which cannot work. and I even tried to download gcc from the offical website of ARM， still cannot work. give me a hint. thx so much urgent

Posted

by

jefftang

How to use GPU in Tensorflow?

Im using my 2020 Mac mini with M1 chip and this is the first time try to use it on convolutional neural network training. So the problem is I install the python(ver 3.8.12) using miniforge3 and Tensorflow following this instruction. But still facing the GPU problem when training a 3D Unet. Here's part of my code and hoping to receive some suggestion to fix this. import tensorflow as tf from tensorflow import keras import json import numpy as np import pandas as pd import nibabel as nib import matplotlib.pyplot as plt from tensorflow.keras import backend as K #check available devices def get_available_devices(): local_device_protos = device_lib.list_local_devices() return [x.name for x in local_device_protos] print(get_available_devices()) Metal device set to: Apple M1 ['/device:CPU:0', '/device:GPU:0'] 2022-02-09 11:52:55.468198: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-02-09 11:52:55.468885: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) X_norm_with_batch_dimension = np.expand_dims(X_norm, axis=0) #tf.device('/device:GPU:0') #Have tried this line doesn't work #tf.debugging.set_log_device_placement(True) #Have tried this line doesn't work patch_pred = model.predict(X_norm_with_batch_dimension) InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] [[model/conv3d/Conv3D/_4]] (1) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] 0 successful operations. 0 derived errors ignored. The code is executable on Google Colab but can't run on Mac mini locally with Jupyter notebook. The NHWC tensor format problem might indicate that Im using my CPU to execute the code instead of GPU. Is there anyway to optimise GPU to train the network in Tensorflow?

Posted

by

MW_Shay

Tensorflow on M1 Macbook Pro, error when model fit executes

It doesn't matter if I install miniforge or mamba, directly or through brew, when I try to fit the sample model from https://developer.apple.com/metal/tensorflow-plugin/, even with a simple sequential model, I always get this error. Is there any workaround on this? I'll appreciate any help, thanks! 2022-12-10 11:18:19.941623: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2022-12-10 11:18:20.427283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. 2022-12-10 11:18:21.222950: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.223003: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.363366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.364757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 NotFoundError Traceback (most recent call last) Cell In[25], line 2 1 model = create_model() ----> 2 history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None: NotFoundError: Graph execution error: Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/traitlets/config/application.py", line 992, in launch_instance app.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 711, in start self.io_loop.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start self.asyncio_loop.run_forever() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once handle._run() ... File "/var/folders/f9/bp40pn0d401d974fy48dxm8h0000gn/T/ipykernel_63636/3393788193.py", line 2, in <module> history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler return fn(*args, **kwargs) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit tmp_logs = self.train_function(iterator) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function return step_function(self, iterator) ...... File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step outputs = model.train_step(data) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step self.optimizer.minimize(loss, self.trainable_variables, tape=tape) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize self.apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients return super().apply_gradients(grads_and_vars, name=name) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients iteration = self._internal_apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients return tf.__internal__.distribute.interim.maybe_merge_call( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn distribution.extended.update( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var return self._update_step_xla(grad, var, id(self._var_key(var))) Node: 'StatefulPartitionedCall_4' could not find registered platform with id: 0x28edf1f90 [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1241]

Posted

by

ppobar

Getting ValueError: Categorical Cross Entropy loss layer input (Identity) must be a softmax layer output.

I am working on the neural network classifier provided on the coremltools.readme.io in the updatable->neural network section(https://coremltools.readme.io/docs/updatable-neural-network-classifier-on-mnist-dataset). I am using the same code but I get an error saying that the coremltools.converters.keras.convert does not exist. But this I know can be coreml version issue. Right know I am using coremltools version 6.2. I converted this model to mlmodel with .convert only. It got converted successfully. But I face an error in the make_updatable function saying the loss layer must be softmax output. Even the coremlt package API reference there I found its because the layer name is softmaxND but it should be softmax. Now the problem is when I convert the model from Keras sequential model to coreml model. the layer name and type change. And the softmax changes to softmaxND. Does anyone faced this issue? if I execute this builder.inspect_layers(last=4) I get this output [Id: 32], Name: sequential/dense_1/Softmax (Type: softmaxND) Updatable: False Input blobs: ['sequential/dense_1/MatMul'] Output blobs: ['Identity'] [Id: 31], Name: sequential/dense_1/MatMul (Type: batchedMatmul) Updatable: False Input blobs: ['sequential/dense/Relu'] Output blobs: ['sequential/dense_1/MatMul'] [Id: 30], Name: sequential/dense/Relu (Type: activation) Updatable: False Input blobs: ['sequential/dense/MatMul'] Output blobs: ['sequential/dense/Relu'] In the make_updatable function when I execute builder.set_categorical_cross_entropy_loss(name='lossLayer', input='Identity') I get this error ValueError: Categorical Cross Entropy loss layer input (Identity) must be a softmax layer output.

Posted

by

anaamrasool

Train Tensorflow models using Neural Engine on M2 chip

Hi! I've just got a MacBook Air with M2 chip. I'm doing some research on ML and I was wondering if it is possible to use Neural Engine as an accelerator for training. If it is possible, where can I find ressources on how to enable it and how to check it? Thank you!

ML Compute

Posted

by

gn48

Kernel_Task using all my CPU when using extended monitor at random times

Hi Guys, I am using a MacBook pro 2019 Intel core i7, 16inch 2019 model device. When i am using an external monitor my CPU spikes and Kernel_task is using more than 700% CPU. I am using the apple manufactured HDMI extender to connect to the monitor. As soon as i disconnect the monitor, everything works fine. I am running Ventura 13.3.1.

ML Compute

Posted

by

crazybiker_AJ

MPSGraph randomTensor works for inference but crashes when training

I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?

Posted

by

noahmartin

Confusion Matrix in CreateML

Hi, someone can tell me if it's possible and eventually how to see the Confusion Matrix about a CreateML Model? Thank you!

I am in the process of developing a matrix-vector multiplication kernel. While conducting performance evaluations, I've noticed that on M1/M1 Pro/M1 Max, the kernel demonstrates an impressive memory bandwidth utilization of around 90%. However, when executed on the M1 Ultra/M2 Ultra, this figure drops to approximately 65%. My suspicion is that this discrepancy is attributed to the dual-die architecture of the M1 Ultra/M2 Ultra. It's plausible that the necessary data might be stored within the L2 cache of the alternate die. Could you kindly provide any insights or recommendations for mitigating the occurrence of on-die L2 cache misses on the Ultra chips? Additionally, I would greatly appreciate any general advice aimed at enhancing memory load speeds on these particular chips.

Posted

by

lshzh

can xcode simulate coreml for ios17

I have export a quantization model with ct.convert whose "minimum_deployment_target=ct.target.iOS17,", can I run it without a iphone ?

ML Compute

Posted

by

feyman1999

Tensorflow Metal Malfunctioning Completely

I am just starting to learn neural networks. If I run my code and try to fit a simple trigonometric function, the model builds a good-looking function. If I pip install tensorflow-metal and run, I get a straight line not resembling the non-linear function at all. if I uninstall metal, everything works again. Which suggests there is something wrong with metal. Any help would be appreciated. I would use the metal acceleration for the next steps in my project. Thank you

Posted

by

Yash11

Posts under ML Compute tag