Machine Learning

How to monitor Neural Engine usage on M1 macs?

I'm now running Tensorflow models on my Macbook Air 2020 M1, but I can't find a way to monitor the Neural Engine 16 cores usage to fine tune my ML tasks. The Activity Monitor only reports CPU% and GPU% and I can't find any APIs available on Mach include files in the MacOSX 11.1 sdk or documentation available so I can slap something together from scratch in C. Could anyone point me in some direction as to get a hold of the API for Neural Engine usage. Any indicator I could grab would be a start. It looks like this has been omitted from all sdk documentation and general userland, I've only found a ledger_tag_neural_footprint attribute, which looks memory related, and that's it.

Posted

by

rgolive

Supported classes in the built-in Sound Classifier model

Where can I find a comprehensive list of all the classes that the built in Sound Classifier model supports?

Posted

by

jacobbmoncur

Memoji, AI

I wish there was a tool to create a Memoji from a photo using AI 📸➡️👨 It is a pity there are no tools for artists

Machine Learning

Posted

by

Lebizhor

Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

Device: MacBook Pro 16 M1 Max, 64GB running MacOS 12.0.1. I tried setting up GPU Accelerated TensorFlow on my Mac using the following steps: Setup: XCode CLI / Homebrew/ Miniforge Conda Env: Python 3.9.5 conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal brew install libjpeg conda install -y matplotlib jupyterlab In Jupyter Lab, I try to execute this code: from tensorflow.keras import layers from tensorflow.keras import models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.summary() The code executes, but I get this warning, indicating no GPU Acceleration can be used as it defaults to a 0MB GPU. Error: Metal device set to: Apple M1 Max 2021-10-27 08:23:32.872480: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-10-27 08:23:32.872707: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Anyone has any idea how to fix this? I came across a bunch of posts around here related to the same issue but with no solid fix. I created a new question as I found the other questions less descriptive of the issue, and wanted to comprehensively depict it. Any fix would be of much help.

Posted

by

suprateembanerjee

Sklearn is unstable on Apple Silicon

Hi, I installed skearn successfully and ran the MINIST toy example successfully. then I started to run my project. The finning thing everything seems good at the start point (at least no ImportError occurs). but when I made some changes of my code and try to run all cells (I use jupyter lab) again, ImportError occurs..... ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file) then I have to uninstall scipy, sklearn, etc and reinstall all of them. and my code can be ran again..... Magically I hate to say, anyone knows how to permanently solve this problem? make skearn more stable?

Posted

by

dkjdjdfdskln

M1 GPU is extremely slow, how can I enable CPU to train my NNs?

Hi everyone, I found that the performance of GPU is not good as I expected (as slow as a turtle), I wanna switch from GPU to CPU. but mlcompute module cannot be found, so wired. The same code ran on colab and my computer (jupyter lab) take 156s vs 40 minutes per epoch, respectively. I only used a small dataset (a few thousands of data points), and each epoch only have 20 baches. I am so disappointing and it seems like the "powerful" GPU is a joke. I am using 12.0.1 macOS and the version of tensorflow-macos is 2.6.0 Can anyone tell me why this happens?

Posted

by

dkjdjdfdskln

Why i enabled Metal API in `encode` function but my Coreml custom layer still run on CPU

I implement a custom pytorch layer on both CPU and GPU following [Hollemans amazing blog] (https://machinethink.net/blog/coreml-custom-layers ). The cpu version works good, but when i implemented this op on GPU it cannot activate "encode" function. Always run on CPU. I have checked the coremltools.convert() options with compute_units=coremltools.ComputeUnit.CPU_AND_GPU, but it still not work. This problem also mentioned in https://stackoverflow.com/questions/51019600/why-i-enabled-metal-api-but-my-coreml-custom-layer-still-run-on-cpu and https://developer.apple.com/forums/thread/695640. Any idea on help this would be grateful. System Information mac OS: 11.6.1 Big Sur xcode: 12.5.1 coremltools: 5.1.0 test device: iphone 11

Posted

by

stx-000

[Apple M1]: I got No registered 'AddN' OpKernel for 'GPU' devices compatible with node while training my model

Hi! GPU acceleration lacks of M1 GPU support (only with this specific model), getting this message when trying to run a trained model on GPU: NotFoundError: Graph execution error: No registered 'AddN' OpKernel for 'GPU' devices compatible with node {{node model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2}} (OpKernel was found, but attributes didn't match) Requested Attributes: N=2, T=DT_INT64, _XlaHasReferenceVars=false, _grappler_ArithmeticOptimizer_AddOpsRewriteStage=true, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, 16534343205130372495, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_VARIANT] device='GPU'; T in [DT_FLOAT] device='DEFAULT'; T in [DT_INT32] device='CPU'; T in [DT_UINT64] device='CPU'; T in [DT_INT64] device='CPU'; T in [DT_UINT32] device='CPU'; T in [DT_UINT16] device='CPU'; T in [DT_INT16] device='CPU'; T in [DT_UINT8] device='CPU'; T in [DT_INT8] device='CPU'; T in [DT_INT32] device='CPU'; T in [DT_HALF] device='CPU'; T in [DT_BFLOAT16] device='CPU'; T in [DT_FLOAT] device='CPU'; T in [DT_DOUBLE] device='CPU'; T in [DT_COMPLEX64] device='CPU'; T in [DT_COMPLEX128] device='CPU'; T in [DT_VARIANT] [[model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2]] [Op:__inference_train_function_300451]

Posted

by

sm_96

Error loading a model of CoreML

Hello, I'm new using CoreML and I'm trying to do a test app with the models that already exist. I'm having next error at the moment to classifier the image: [coreml] Failed to get the home directory when checking model path. I would like to receive your help to solve this error. Thanks.

Posted

by

Alejoortga

Export of model preview broken? CreateML and Xcode

Im on the recent version of MacOs and I recently trained a Style Transfer model using CreateML. I used the preview tab of CreateML to preview my model with a video (as well as an image), however when I press the button to export or share the result from the neural network none are exported. The modal window appears but doesnt save after the progress bar shows up for the conversion I tried converting the CoreML model file into a CoreML package, however when I tried exporting the preview it crashed and switched tabs to the package information section. I've been having this issue with all three export buttons on the model preview section of both the CreateML application and Xcode. Is this happening to anyone else? Ive also tried using the coremltools package for Python to extract a preview, however documentation for Style Transfer networks doesnt exist for loading videos with that package. The style transfer network only takes an input of images, so its unclear where a video file can be loaded.

Posted

by

trzroy

Keras with tensorflow-metal freezes during training with image augmentation

I am trying to train an image classification network in Keras with tensorflow-metal. The training freezes after the first 2-3 epochs if image augmentation layers are used (RandomFlip, RandomContrast, RandomBrightness) The system appears to use both GPU as well as CPU (as indicated by Activity Monitor). Also, warnings appear both in Jupyter and Terminal (see below). When the image augmentation layers are removed (i.e. we only rebuild the head and feed images from disk), CPU appears to be idle, no warnings appear, and training completes successfully. Versions: python 3.8, tensorflow-macos 2.11.0, tensorflow-metal 0.7.1 Sample code: img_augmentation = Sequential( [ layers.RandomFlip(), layers.RandomBrightness(factor=0.2), layers.RandomContrast(factor=0.2) ], name="img_augmentation", ) inputs = layers.Input(shape=(384, 384, 3)) x = img_augmentation(inputs) model = tf.keras.applications.EfficientNetV2S(include_top=False, input_tensor=x, weights='imagenet') model.trainable = False x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output) x = tf.keras.layers.BatchNormalization()(x) top_dropout_rate = 0.2 x = tf.keras.layers.Dropout(top_dropout_rate, name="top_dropout")(x) outputs = tf.keras.layers.Dense(179, activation="softmax", name="pred")(x) newModel = Model(inputs=model.input, outputs=outputs, name="EfficientNet_DF20M_species") reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.9, patience=2, verbose=1, min_lr=0.000001) optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, momentum=0.9) newModel.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy']) history = newModel.fit(x=train_ds, validation_data=val_ds, epochs=30, verbose=2, callbacks=[reduce_lr]) During training with image augmentation, Jupyter prints the following warnings while training the first epoch: WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op. ... During training with image augmentation, Terminal keeps spamming the following warning: 2023-02-21 23:13:38.958633: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.958920: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959071: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959115: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959359: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. ... Any suggestions?

Posted

by

Cardu6lis

Can I run CatBoost/XGBoost on my GPU(s) on my Mac?

I'm interested in using CatBoost and XGBoost for some machine learning projects on my Mac, and I was wondering if it's possible to run these algorithms on my GPU(s) to speed up training times. I have a Mac with an AMD Radeon Pro 5600M and an Intel UHD Graphics 630 GPUs, and I'm running macOS Ventura 13.2.1. I've read that both CatBoost and XGBoost support GPU acceleration, but I'm not sure if this is possible on my system. Can anyone point me in the right direction for getting started with GPU-accelerated CatBoost/XGBoost on macOS? Are there any specific drivers or tools I need to install, or any other considerations I should be aware of? Thank you.

Posted

by

Yesterdays

MPSNDArrayConvolutionA14.mm:3967: failed assertion `destination kernel width and filter kernel width mismatch'

Hi, I am training an adversarial auto encoder using PyTorch 2.0.0 on Apple M2 (Ventura 13.1), with conda 23.1.0 as manager. I encountered this error: /AppleInternal/Library/BuildRoots/5b8a32f9-5db2-11ed-8aeb-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:3967: failed assertion `destination kernel width and filter kernel width mismatch' /Users/vk/miniconda3/envs/betavae/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown To my knowledge, the code broke down when running self.manual_backward(loss["g_loss"]) this block: g_opt.zero_grad() self.manual_backward(loss["g_loss"]) g_opt.step() The same code run without problems on linux distribution. Any thoughts on how to fix it are highly appreciated!

Posted

by

RayXC

MPSGraph randomTensor works for inference but crashes when training

I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?

Posted

by

noahmartin

ANE-Optimized Layer Norm Fails on ANE

In the ml-ane-transformers repo, there is a custom LayerNorm implementation for the Neural Engine-optimized shape of (B,C,1,S). The coremltools documentation makes it sound like the layer_norm MIL op would support this natively. In fact, the following code works on CPU: B,C,S = 1,768,512 g,b = 1, 0 @mb.program(input_specs=[mb.TensorSpec(shape=(B,C,1,S)),]) def ln_prog(x): gamma = (torch.ones((C,), dtype=torch.float32) * g).tolist() beta = (torch.ones((C), dtype=torch.float32) * b).tolist() return mb.layer_norm(x=x, axes=[1], gamma=gamma, beta=beta, name="y") However it fails when run on the Neural Engine, giving results that are scaled by an incorrect value. Should this work on the Neural Engine?

Posted

by

smpanaro

Create ML activity training problem

Hi everyone! I’m trying to train an activity classification model with 3 classes. The problem is that only one class has precision and recall > 0 after training. Even with 2 classes result is the same First I’d thought that there is a problem with my data but when I switched “left” label to “right” and vice versa the results were the same: only “left”-labeled data get non-zero precision and recall.

Posted

by

corle

-[_MTLCommandBuffer addCompletedHandler:]:867:

failed assertion `Completed handler provided after commit call'. how to clear this error any. when i run with cpu i am getting storage error so i tried with GPU. partial code #PositionalEncoding class PositionalEncoding(nn.Module): def init(self, d_model, max_len, dropout_prob=0.1): super(PositionalEncoding, self).init() self.dropout = nn.Dropout(p=dropout_prob) # Create positional encoding matrix pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model)) # Pad div_term with zeros if necessary div_term_padded = torch.zeros(d_model) div_term_padded[:div_term.size(0)] = div_term pe[:, 0::2] = torch.sin(position * div_term_padded[0::2]) pe[:, 1::2] = torch.cos(position * div_term_padded[1::2]) pe = pe.unsqueeze(0).transpose(0, 1) self.register_buffer('pe', pe) def forward(self, x): x = x + self.pe[:x.size(0), :] return self.dropout(x) #transformermodel class class TransformerModel(nn.Module): def init(self, input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len): super(TransformerModel, self).init() self.device = device self.hidden_size = hidden_size self.d_model = d_model self.num_heads = num_heads #self.embedding = nn.Embedding(input_size, d_model).to(device) self.embedding = nn.Linear(input_size, d_model).to(device) self.pos_encoder = PositionalEncoding(d_model, max_len, dropout_prob).to(device) self.transformer_encoder_layer = nn.TransformerEncoderLayer(d_model, num_heads, hidden_size, dropout_prob).to(device) self.transformer_encoder = nn.TransformerEncoder(self.transformer_encoder_layer, num_layers).to(device) self.decoder = nn.Linear(d_model, output_size).to(device) self.to(device) # Ensure the model is on the correct device def forward(self, x): #x = x.long() x = x.transpose(0, 1) # Transpose the input tensor to match the expected shape for the transformer x = x.squeeze() # Remove the extra dimension from the input tensor x = self.embedding(x) # Apply the input embedding x = self.pos_encoder(x) # Add positional encoding x = self.transformer_encoder(x) # Apply the transformer encoder x = self.decoder(x[:, -1, :]) # Decode the last time step's output to get the final prediction return x #train transformer model class def train_transformer_model(train_X_scaled, train_y, input_size, d_model, hidden_size, num_layers, output_size, learning_rate, num_epochs, num_heads, dropout_prob, device, n_accumulation_steps=32): train_X_tensor = torch.from_numpy(train_X_scaled).float().to(device) train_y_tensor = torch.from_numpy(train_y).float().unsqueeze(1).to(device) # Create the dataset and DataLoader train_data = TensorDataset(train_X_tensor, train_y_tensor) train_loader = DataLoader(train_data, batch_size=8, shuffle=True) # Compute the maximum length of the input sequences max_len = train_X_tensor.size(1) # Create the model model = TransformerModel(input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len).to(device) q = 0.5 criterion = lambda y_pred, y_true: quantile_loss(q, y_true, y_pred) optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) for epoch in range(1, num_epochs + 1): model.train() print(f"Transformer inputs shape: {train_X_tensor.shape}, targets shape: {train_y_tensor.shape}") for epoch in range(1, num_epochs + 1): model.train() print(f"transformer Epoch {epoch}/{num_epochs}") for i, (batch_X, batch_y) in enumerate(train_loader): batch_X = batch_X.to(device) print("transformer batch_X shape:", batch_X.shape) batch_y = batch_y.to(device) print("transformer batch_Y shape:", batch_y.shape) optimizer.zero_grad() batch_X = batch_X.transpose(0, 1) train_pred = model(batch_X.squeeze(0)).to(device) print("train_pred=",train_pred) loss = criterion(train_pred, batch_y).to(device) loss.backward() # Gradient accumulation if (i + 1) % n_accumulation_steps == 0: optimizer.step() optimizer.zero_grad() print(f"transformer Epoch {epoch}/{num_epochs}, Step {i+1}/{len(train_loader)}, Loss: {loss.item():.6f}") return model

Posted

by

sugumar0107

Divergent output between PyTorch and converted CoreML Huggingface Bert model

This issue has already been raised a few times in the coremltools repo (here, here, and here). I'm reposting here because this may be an issue in CoreML itself. In short, converting Huggingface's Bert implementation from PyTorch to CoreML results in significantly different model outputs. This test was originally posted in one of the linked issues: import numpy as np import torch from transformers import AutoTokenizer, AutoModel import coremltools as ct MODEL_NAME = "bert-base-uncased" sentences = ["This is a test."] tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) model = AutoModel.from_pretrained(MODEL_NAME, torchscript=True).eval() encoded_input = tokenizer(sentences, return_tensors='pt') traced_model = torch.jit.trace(model, tuple(encoded_input.values())) scripted_model = torch.jit.script(traced_model) model = ct.convert(scripted_model, source="pytorch", inputs=[ct.TensorType(name="input_ids", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32), ct.TensorType(name="token_type_ids", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32), ct.TensorType(name="attention_mask", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32)], convert_to="mlprogram", compute_units=ct.ComputeUnit.CPU_ONLY) with torch.no_grad(): pt_out = scripted_model(**encoded_input) cml_inputs = {k: v.to(torch.int32).numpy() for k, v in encoded_input.items()} pred_coreml = model.predict(cml_inputs) np.testing.assert_allclose(pt_out[0].detach().numpy(), pred_coreml["hidden_states"], atol=1e-5, rtol=1e-4) Running this shows that the model outputs are highly divergent: Max absolute difference: 7.901174 Max relative difference: 3424.6594 By contrast, running the same test with Huggingface's Distilbert implementation (distilbert-base-uncased) shows a much smaller difference in output: Max absolute difference: 0.00523943 Max relative difference: 45.603153 Again, I'm not totally sure that this is an issue in CoreML, but it would be great to be able to run Bert based models with CoreML!

Posted

by

ggordonhall

MacBook Pro M2 Max: 32GB vs 64GB RAM for Machine Learning and Longevity

Hi everyone, I'm a Machine Learning Engineer, and I'm planning to buy the MacBook Pro M2 Max with a 38-core GPU variant. I'm uncertain about whether to choose the 32GB RAM or 64GB RAM option. Based on my research and use case, it seems that 32GB should be sufficient for most tasks, including the 4K video rendering I occasionally do. However, I'm concerned about the longevity of the device, as I'd like to keep the MacBook up-to-date for at least five years. Additionally, considering the 38-core GPU, I wonder if 32GB of unified memory might be insufficient, particularly when I need to train Machine Learning models or run docker or even kubernetes cluster. I don't have any budget constraints, as the additional $400 cost isn't an issue, but I want to make a wise decision. I would appreciate any advice on this matter. Thanks in advance!

Posted

by

Aditya-ai

CoreML model only runs on CPU

Hey guys, I converted a T5-base (encoder/decoder) model to a CoreML model using https://github.com/huggingface/exporters (which are using coremltools under the hood). When creating a performance report for the decoder model within XCode it shows that all compute units are mapped to the CPU. This is also the experience I have when profiling the model (GPU and ANE are not used). I was under the impression that CoreML would divide up the layers and run those that can run on the GPU / ANE, but maybe I misunderstood. Is there anything I can do to get this to not run on the CPU exclusively?

Posted

by

seboslaw

Posts under Machine Learning tag