Machine Learning

RSS for tag

Create intelligent features and enable new experiences for your apps by leveraging powerful on-device machine learning.

Posts under Machine Learning tag

81 Posts
Sort by:
Post not yet marked as solved
5 Replies
5.8k Views
I'm now running Tensorflow models on my Macbook Air 2020 M1, but I can't find a way to monitor the Neural Engine 16 cores usage to fine tune my ML tasks. The Activity Monitor only reports CPU% and GPU% and I can't find any APIs available on Mach include files in the MacOSX 11.1 sdk or documentation available so I can slap something together from scratch in C. Could anyone point me in some direction as to get a hold of the API for Neural Engine usage. Any indicator I could grab would be a start. It looks like this has been omitted from all sdk documentation and general userland, I've only found a ledger_tag_neural_footprint attribute, which looks memory related, and that's it.
Posted
by
Post not yet marked as solved
1 Replies
928 Views
I wish there was a tool to create a Memoji from a photo using AI 📸➡️👨 It is a pity there are no tools for artists
Posted
by
Post not yet marked as solved
41 Replies
29k Views
Device: MacBook Pro 16 M1 Max, 64GB running MacOS 12.0.1. I tried setting up GPU Accelerated TensorFlow on my Mac using the following steps: Setup: XCode CLI / Homebrew/ Miniforge Conda Env: Python 3.9.5 conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal brew install libjpeg conda install -y matplotlib jupyterlab In Jupyter Lab, I try to execute this code: from tensorflow.keras import layers from tensorflow.keras import models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.summary() The code executes, but I get this warning, indicating no GPU Acceleration can be used as it defaults to a 0MB GPU. Error: Metal device set to: Apple M1 Max 2021-10-27 08:23:32.872480: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-10-27 08:23:32.872707: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Anyone has any idea how to fix this? I came across a bunch of posts around here related to the same issue but with no solid fix. I created a new question as I found the other questions less descriptive of the issue, and wanted to comprehensively depict it. Any fix would be of much help.
Posted
by
Post not yet marked as solved
6 Replies
6.3k Views
Hi, I installed skearn successfully and ran the MINIST toy example successfully. then I started to run my project. The finning thing everything seems good at the start point (at least no ImportError occurs). but when I made some changes of my code and try to run all cells (I use jupyter lab) again, ImportError occurs..... ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib   Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so   Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file) then I have to uninstall scipy, sklearn, etc and reinstall all of them. and my code can be ran again..... Magically I hate to say, anyone knows how to permanently solve this problem? make skearn more stable?
Posted
by
Post not yet marked as solved
9 Replies
9.9k Views
Hi everyone, I found that the performance of GPU is not good as I expected (as slow as a turtle), I wanna switch from GPU to CPU. but mlcompute module cannot be found, so wired. The same code ran on colab and my computer (jupyter lab) take 156s vs 40 minutes per epoch, respectively. I only used a small dataset (a few thousands of data points), and each epoch only have 20 baches. I am so disappointing and it seems like the "powerful" GPU is a joke. I am using 12.0.1 macOS and the version of tensorflow-macos is 2.6.0 Can anyone tell me why this happens?
Posted
by
Post not yet marked as solved
1 Replies
792 Views
I implement a custom pytorch layer on both CPU and GPU following [Hollemans amazing blog] (https://machinethink.net/blog/coreml-custom-layers ). The cpu version works good, but when i implemented this op on GPU it cannot activate "encode" function. Always run on CPU. I have checked the coremltools.convert() options with compute_units=coremltools.ComputeUnit.CPU_AND_GPU, but it still not work. This problem also mentioned in https://stackoverflow.com/questions/51019600/why-i-enabled-metal-api-but-my-coreml-custom-layer-still-run-on-cpu and https://developer.apple.com/forums/thread/695640. Any idea on help this would be grateful. System Information mac OS: 11.6.1 Big Sur xcode: 12.5.1 coremltools: 5.1.0 test device: iphone 11
Posted
by
Post not yet marked as solved
4 Replies
1.8k Views
Hi! GPU acceleration lacks of M1 GPU support (only with this specific model), getting this message when trying to run a trained model on GPU: NotFoundError: Graph execution error: No registered 'AddN' OpKernel for 'GPU' devices compatible with node {{node model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2}} (OpKernel was found, but attributes didn't match) Requested Attributes: N=2, T=DT_INT64, _XlaHasReferenceVars=false, _grappler_ArithmeticOptimizer_AddOpsRewriteStage=true, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, 16534343205130372495, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_VARIANT] device='GPU'; T in [DT_FLOAT] device='DEFAULT'; T in [DT_INT32] device='CPU'; T in [DT_UINT64] device='CPU'; T in [DT_INT64] device='CPU'; T in [DT_UINT32] device='CPU'; T in [DT_UINT16] device='CPU'; T in [DT_INT16] device='CPU'; T in [DT_UINT8] device='CPU'; T in [DT_INT8] device='CPU'; T in [DT_INT32] device='CPU'; T in [DT_HALF] device='CPU'; T in [DT_BFLOAT16] device='CPU'; T in [DT_FLOAT] device='CPU'; T in [DT_DOUBLE] device='CPU'; T in [DT_COMPLEX64] device='CPU'; T in [DT_COMPLEX128] device='CPU'; T in [DT_VARIANT] [[model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2]] [Op:__inference_train_function_300451]
Posted
by
Post not yet marked as solved
12 Replies
4.0k Views
Hello, I'm new using CoreML and I'm trying to do a test app with the models that already exist. I'm having next error at the moment to classifier the image: [coreml] Failed to get the home directory when checking model path. I would like to receive your help to solve this error. Thanks.
Posted
by
Post not yet marked as solved
4 Replies
2.1k Views
Im on the recent version of MacOs and I recently trained a Style Transfer model using CreateML. I used the preview tab of CreateML to preview my model with a video (as well as an image), however when I press the button to export or share the result from the neural network none are exported. The modal window appears but doesnt save after the progress bar shows up for the conversion I tried converting the CoreML model file into a CoreML package, however when I tried exporting the preview it crashed and switched tabs to the package information section. I've been having this issue with all three export buttons on the model preview section of both the CreateML application and Xcode. Is this happening to anyone else? Ive also tried using the coremltools package for Python to extract a preview, however documentation for Style Transfer networks doesnt exist for loading videos with that package. The style transfer network only takes an input of images, so its unclear where a video file can be loaded.
Posted
by
Post not yet marked as solved
3 Replies
1.5k Views
I am trying to train an image classification network in Keras with tensorflow-metal. The training freezes after the first 2-3 epochs if image augmentation layers are used (RandomFlip, RandomContrast, RandomBrightness) The system appears to use both GPU as well as CPU (as indicated by Activity Monitor). Also, warnings appear both in Jupyter and Terminal (see below). When the image augmentation layers are removed (i.e. we only rebuild the head and feed images from disk), CPU appears to be idle, no warnings appear, and training completes successfully. Versions: python 3.8, tensorflow-macos 2.11.0, tensorflow-metal 0.7.1 Sample code: img_augmentation = Sequential( [ layers.RandomFlip(), layers.RandomBrightness(factor=0.2), layers.RandomContrast(factor=0.2) ], name="img_augmentation", ) inputs = layers.Input(shape=(384, 384, 3)) x = img_augmentation(inputs) model = tf.keras.applications.EfficientNetV2S(include_top=False, input_tensor=x, weights='imagenet') model.trainable = False x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output) x = tf.keras.layers.BatchNormalization()(x) top_dropout_rate = 0.2 x = tf.keras.layers.Dropout(top_dropout_rate, name="top_dropout")(x) outputs = tf.keras.layers.Dense(179, activation="softmax", name="pred")(x) newModel = Model(inputs=model.input, outputs=outputs, name="EfficientNet_DF20M_species") reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.9, patience=2, verbose=1, min_lr=0.000001) optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, momentum=0.9) newModel.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy']) history = newModel.fit(x=train_ds, validation_data=val_ds, epochs=30, verbose=2, callbacks=[reduce_lr]) During training with image augmentation, Jupyter prints the following warnings while training the first epoch: WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op. WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op. ... During training with image augmentation, Terminal keeps spamming the following warning: 2023-02-21 23:13:38.958633: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.958920: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959071: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959115: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. 2023-02-21 23:13:38.959359: I metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation. ... Any suggestions?
Posted
by
Post not yet marked as solved
1 Replies
1.8k Views
I'm interested in using CatBoost and XGBoost for some machine learning projects on my Mac, and I was wondering if it's possible to run these algorithms on my GPU(s) to speed up training times. I have a Mac with an AMD Radeon Pro 5600M and an Intel UHD Graphics 630 GPUs, and I'm running macOS Ventura 13.2.1. I've read that both CatBoost and XGBoost support GPU acceleration, but I'm not sure if this is possible on my system. Can anyone point me in the right direction for getting started with GPU-accelerated CatBoost/XGBoost on macOS? Are there any specific drivers or tools I need to install, or any other considerations I should be aware of? Thank you.
Posted
by
Post not yet marked as solved
2 Replies
1.2k Views
Hi, I am training an adversarial auto encoder using PyTorch 2.0.0 on Apple M2 (Ventura 13.1), with conda 23.1.0 as manager. I encountered this error: /AppleInternal/Library/BuildRoots/5b8a32f9-5db2-11ed-8aeb-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:3967: failed assertion `destination kernel width and filter kernel width mismatch' /Users/vk/miniconda3/envs/betavae/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown To my knowledge, the code broke down when running self.manual_backward(loss["g_loss"]) this block: g_opt.zero_grad() self.manual_backward(loss["g_loss"]) g_opt.step() The same code run without problems on linux distribution. Any thoughts on how to fix it are highly appreciated!
Posted
by
Post marked as solved
1 Replies
1.2k Views
I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?
Posted
by
Post not yet marked as solved
2 Replies
702 Views
In the ml-ane-transformers repo, there is a custom LayerNorm implementation for the Neural Engine-optimized shape of (B,C,1,S). The coremltools documentation makes it sound like the layer_norm MIL op would support this natively. In fact, the following code works on CPU: B,C,S = 1,768,512 g,b = 1, 0 @mb.program(input_specs=[mb.TensorSpec(shape=(B,C,1,S)),]) def ln_prog(x): gamma = (torch.ones((C,), dtype=torch.float32) * g).tolist() beta = (torch.ones((C), dtype=torch.float32) * b).tolist() return mb.layer_norm(x=x, axes=[1], gamma=gamma, beta=beta, name="y") However it fails when run on the Neural Engine, giving results that are scaled by an incorrect value. Should this work on the Neural Engine?
Posted
by
Post not yet marked as solved
1 Replies
1k Views
Hi everyone! I’m trying to train an activity classification model with 3 classes. The problem is that only one class has precision and recall > 0 after training. Even with 2 classes result is the same First I’d thought that there is a problem with my data but when I switched “left” label to “right” and vice versa the results were the same: only “left”-labeled data get non-zero precision and recall.
Posted
by
Post not yet marked as solved
1 Replies
1.1k Views
failed assertion `Completed handler provided after commit call'. how to clear this error any. when i run with cpu i am getting storage error so i tried with GPU. partial code #PositionalEncoding class PositionalEncoding(nn.Module): def init(self, d_model, max_len, dropout_prob=0.1): super(PositionalEncoding, self).init() self.dropout = nn.Dropout(p=dropout_prob) # Create positional encoding matrix pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model)) # Pad div_term with zeros if necessary div_term_padded = torch.zeros(d_model) div_term_padded[:div_term.size(0)] = div_term pe[:, 0::2] = torch.sin(position * div_term_padded[0::2]) pe[:, 1::2] = torch.cos(position * div_term_padded[1::2]) pe = pe.unsqueeze(0).transpose(0, 1) self.register_buffer('pe', pe) def forward(self, x): x = x + self.pe[:x.size(0), :] return self.dropout(x) #transformermodel class class TransformerModel(nn.Module): def init(self, input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len): super(TransformerModel, self).init() self.device = device self.hidden_size = hidden_size self.d_model = d_model self.num_heads = num_heads #self.embedding = nn.Embedding(input_size, d_model).to(device) self.embedding = nn.Linear(input_size, d_model).to(device) self.pos_encoder = PositionalEncoding(d_model, max_len, dropout_prob).to(device) self.transformer_encoder_layer = nn.TransformerEncoderLayer(d_model, num_heads, hidden_size, dropout_prob).to(device) self.transformer_encoder = nn.TransformerEncoder(self.transformer_encoder_layer, num_layers).to(device) self.decoder = nn.Linear(d_model, output_size).to(device) self.to(device) # Ensure the model is on the correct device def forward(self, x): #x = x.long() x = x.transpose(0, 1) # Transpose the input tensor to match the expected shape for the transformer x = x.squeeze() # Remove the extra dimension from the input tensor x = self.embedding(x) # Apply the input embedding x = self.pos_encoder(x) # Add positional encoding x = self.transformer_encoder(x) # Apply the transformer encoder x = self.decoder(x[:, -1, :]) # Decode the last time step's output to get the final prediction return x #train transformer model class def train_transformer_model(train_X_scaled, train_y, input_size, d_model, hidden_size, num_layers, output_size, learning_rate, num_epochs, num_heads, dropout_prob, device, n_accumulation_steps=32): train_X_tensor = torch.from_numpy(train_X_scaled).float().to(device) train_y_tensor = torch.from_numpy(train_y).float().unsqueeze(1).to(device) # Create the dataset and DataLoader train_data = TensorDataset(train_X_tensor, train_y_tensor) train_loader = DataLoader(train_data, batch_size=8, shuffle=True) # Compute the maximum length of the input sequences max_len = train_X_tensor.size(1) # Create the model model = TransformerModel(input_size, hidden_size, num_layers, d_model, num_heads, dropout_prob, output_size, device, max_len).to(device) q = 0.5 criterion = lambda y_pred, y_true: quantile_loss(q, y_true, y_pred) optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) for epoch in range(1, num_epochs + 1): model.train() print(f"Transformer inputs shape: {train_X_tensor.shape}, targets shape: {train_y_tensor.shape}") for epoch in range(1, num_epochs + 1): model.train() print(f"transformer Epoch {epoch}/{num_epochs}") for i, (batch_X, batch_y) in enumerate(train_loader): batch_X = batch_X.to(device) print("transformer batch_X shape:", batch_X.shape) batch_y = batch_y.to(device) print("transformer batch_Y shape:", batch_y.shape) optimizer.zero_grad() batch_X = batch_X.transpose(0, 1) train_pred = model(batch_X.squeeze(0)).to(device) print("train_pred=",train_pred) loss = criterion(train_pred, batch_y).to(device) loss.backward() # Gradient accumulation if (i + 1) % n_accumulation_steps == 0: optimizer.step() optimizer.zero_grad() print(f"transformer Epoch {epoch}/{num_epochs}, Step {i+1}/{len(train_loader)}, Loss: {loss.item():.6f}") return model
Posted
by
Post not yet marked as solved
0 Replies
547 Views
This issue has already been raised a few times in the coremltools repo (here, here, and here). I'm reposting here because this may be an issue in CoreML itself. In short, converting Huggingface's Bert implementation from PyTorch to CoreML results in significantly different model outputs. This test was originally posted in one of the linked issues: import numpy as np import torch from transformers import AutoTokenizer, AutoModel import coremltools as ct MODEL_NAME = "bert-base-uncased" sentences = ["This is a test."] tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) model = AutoModel.from_pretrained(MODEL_NAME, torchscript=True).eval() encoded_input = tokenizer(sentences, return_tensors='pt') traced_model = torch.jit.trace(model, tuple(encoded_input.values())) scripted_model = torch.jit.script(traced_model) model = ct.convert(scripted_model, source="pytorch", inputs=[ct.TensorType(name="input_ids", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32), ct.TensorType(name="token_type_ids", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32), ct.TensorType(name="attention_mask", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32)], convert_to="mlprogram", compute_units=ct.ComputeUnit.CPU_ONLY) with torch.no_grad(): pt_out = scripted_model(**encoded_input) cml_inputs = {k: v.to(torch.int32).numpy() for k, v in encoded_input.items()} pred_coreml = model.predict(cml_inputs) np.testing.assert_allclose(pt_out[0].detach().numpy(), pred_coreml["hidden_states"], atol=1e-5, rtol=1e-4) Running this shows that the model outputs are highly divergent: Max absolute difference: 7.901174 Max relative difference: 3424.6594 By contrast, running the same test with Huggingface's Distilbert implementation (distilbert-base-uncased) shows a much smaller difference in output: Max absolute difference: 0.00523943 Max relative difference: 45.603153 Again, I'm not totally sure that this is an issue in CoreML, but it would be great to be able to run Bert based models with CoreML!
Posted
by
Post not yet marked as solved
2 Replies
4.1k Views
Hi everyone, I'm a Machine Learning Engineer, and I'm planning to buy the MacBook Pro M2 Max with a 38-core GPU variant. I'm uncertain about whether to choose the 32GB RAM or 64GB RAM option. Based on my research and use case, it seems that 32GB should be sufficient for most tasks, including the 4K video rendering I occasionally do. However, I'm concerned about the longevity of the device, as I'd like to keep the MacBook up-to-date for at least five years. Additionally, considering the 38-core GPU, I wonder if 32GB of unified memory might be insufficient, particularly when I need to train Machine Learning models or run docker or even kubernetes cluster. I don't have any budget constraints, as the additional $400 cost isn't an issue, but I want to make a wise decision. I would appreciate any advice on this matter. Thanks in advance!
Posted
by
Post not yet marked as solved
0 Replies
1.1k Views
Hey guys, I converted a T5-base (encoder/decoder) model to a CoreML model using https://github.com/huggingface/exporters (which are using coremltools under the hood). When creating a performance report for the decoder model within XCode it shows that all compute units are mapped to the CPU. This is also the experience I have when profiling the model (GPU and ANE are not used). I was under the impression that CoreML would divide up the layers and run those that can run on the GPU / ANE, but maybe I misunderstood. Is there anything I can do to get this to not run on the CPU exclusively?
Posted
by