ML Compute

MPSGraph randomTensor works for inference but crashes when training

I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?

Posted

by

noahmartin.

Last updated

.

Mac Catalyst Required Device Capabilities

Hello, I have my largely iOS app running using Mac Catalyst, but I need to limit what Macs will be able to install it from the Mac App Store based on the GPU Family like MTLGPUFamily.mac2. Is that possible? Or I could limit it to Apple Silicon using the Designed for iPad target, but I would prefer to use Mac Catalyst instead of Designed for iPad. Is it possible to limit Mac Catalyst installs to Apple Silicon Macs? Side question: what capabilities are supported by MTLGPUFamily.mac2? I can't find it. My main interest is in CoreML inference acceleration. Thank you.

Posted

by

3DTOPO.

Last updated

.

Failed to get CPU frequency: 0 Hz

I am performing a grid search over a parameter grid and train the model with different combinations of hyperparameters. I am receiving the following Warning: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Why is that and what can I do to fix it? Thank you very much. Here is the code: def grid_search(model_name): ... elif model_name == 'LSTM': def build_model(units, activation, dropout, layers): model = Sequential() model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, return_sequences=True, input_shape=(2, 1152), recurrent_dropout=0)) model.add(Dropout(dropout)) for i in range(layers): if i != layers-1: model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, return_sequences=True,recurrent_dropout=0)) model.add(Dropout(dropout)) elif i == (layers-1): model.add(LSTM(units=units, kernel_initializer="normal", activation=activation, recurrent_dropout=0)) model.add(Dropout(dropout)) model.add(Dense(units=6, kernel_initializer="normal", activation=activation)) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]) return model param_grid = {'units': [200, 300, 400], 'activation': ['tanh'], 'dropout': [0, 0.2, 0.4, 0.6], 'layers': [0, 5]} group_kfold = GroupKFold(n_splits=len(np.unique(groups_train))) model = KerasClassifier(model=build_model, units=param_grid['units'], activation=param_grid['activation'], dropout=param_grid['dropout'], layers=param_grid['layers']) grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=group_kfold) X_test, X_train, y_test, y_train = raw_dataassigner(model_name) (X_train, y_train) = shuffle(X_train, y_train) with tf.device('/cpu:0'): grid_result = grid_search.fit(X_train, y_train, groups=groups_train) print(f'Best score ({grid_search.best_score_}) for {model_name} model achieved with parameters: ', grid_search.best_params_) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print("%f (%f) with: %r" % (mean, stdev, param)) grid_search('LSTM')

Posted

by

hefl99.

Last updated

.

CoreML not using Neural Engine even though it should

When I run the performance test on a CoreML model, it shows predictions are 834% faster running on the Neural Engine as it is on the GPU. It also shows, that 100% of the model can run on the Neural Engine: GPU only: But when I set the compute units to all: let config = MLModelConfiguration() config.computeUnits = .all and profile, it shows that the neural engine isn’t used at all. Well, other than loading the model which takes 25 seconds when allowed to use the neural engine versus less than a second when not allowing the neural engine: The difference in speed is the difference between the app being too slow to even release versus quite reasonable performance. I have a lot of work invested in this, so I am really hoping that I can get it to run on the Neural Engine. Why isn't it actually running on the Neural Engine when it shows that it is supported and I have the compute unit set to run on the Neural Engine?

Posted

by

3DTOPO.

Last updated

.

Kernel_Task using all my CPU when using extended monitor at random times

Hi Guys, I am using a MacBook pro 2019 Intel core i7, 16inch 2019 model device. When i am using an external monitor my CPU spikes and Kernel_task is using more than 700% CPU. I am using the apple manufactured HDMI extender to connect to the monitor. As soon as i disconnect the monitor, everything works fine. I am running Ventura 13.3.1.

ML Compute

Posted

by

crazybiker_AJ.

Last updated

.

Posts under ML Compute tag

MPSGraph randomTensor works for inference but crashes when training

Mac Catalyst Required Device Capabilities

Failed to get CPU frequency: 0 Hz

CoreML not using Neural Engine even though it should

Kernel_Task using all my CPU when using extended monitor at random times