Cannot assign a device for operation ReadVariableOp

Hello, I cannot predict with my model on Apple M1. I get a error:

Traceback (most recent call last):

  File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/__main__.py", line 154, in <module>

    agent.run()

  File "/Users/martin/Documents/Projects/rl-toolkit/rl_toolkit/training.py", line 213, in run

    losses = self._train(sample)

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__

    result = self._call(*args, **kwds)

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call

    return self._stateless_fn(*args, **kwds)

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3023, in __call__

    return graph_function._call_flat(

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat

    return self._build_call_outputs(self._inference_function.call(

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call

    outputs = execute.execute(

  File "/Users/martin/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute

    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 

Colocation Debug Info:

Colocation group had the following types and supported devices: 

Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]

ResourceApplyAdamWithAmsgrad: CPU 

ReadVariableOp: GPU CPU 

_Arg: GPU CPU 



Colocation members, user-requested devices, and framework assigned devices, if any:

  readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  adam_2_adam_update_6_resourceapplyadamwithamsgrad_m (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  adam_2_adam_update_6_resourceapplyadamwithamsgrad_v (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  adam_2_adam_update_6_resourceapplyadamwithamsgrad_vhat (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0

  ReadVariableOp (ReadVariableOp) 

  Exp/ReadVariableOp (ReadVariableOp) 

  ReadVariableOp_1 (ReadVariableOp) 

  actor/ReadVariableOp (ReadVariableOp) 

  actor/Exp/ReadVariableOp (ReadVariableOp) 

  actor/ReadVariableOp_1 (ReadVariableOp) 

  actor_critic/actor/ReadVariableOp (ReadVariableOp) 

  actor_critic/actor/Exp/ReadVariableOp (ReadVariableOp) 

  actor_critic/actor/ReadVariableOp_1 (ReadVariableOp) 

  Adam_2/Adam/update_6/ResourceApplyAdamWithAmsgrad (ResourceApplyAdamWithAmsgrad) /job:localhost/replica:0/task:0/device:GPU:0



	 [[{{node ReadVariableOp}}]] [Op:__inference__train_4206]

Post not yet marked as solved Up vote post of markub3327 Down vote post of markub3327
6.2k views
  • I have the same problem. tensorflow.config.set_soft_device_placement(True) should solve such a problem, but it did not.

Add a Comment

Replies

Hi @markub3327, The issue is related to colocation error in Tensorflow due to missing operation ResourceApplyAdamWithAmsgrad in Metal plugin. Thanks for providing a reproducible case, we will take a look and provide update here.


ResourceApplyAdamWithAmsgrad: CPU       <== this Op is currently not supported in Metal plugin

ReadVariableOp: GPU CPU 

_Arg: GPU CPU 
  • Is there any update on when tensorflow-metal will implement these missing operations? I am using: tensorflow-macos 2.8.0 tensorflow-metal 0.4.0

    I am getting the following error which I assume is the same issue as above.

    Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ReadVariableOp: GPU CPU ResourceApplyAdam: CPU _Arg: GPU CPU Colocation members, user-requested devices, and framework assigned devices, if any: sequential_dense_matmul_readvariableop_resource (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 adam_adam_update_resourceapplyadam_m (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 adam_adam_update_resourceapplyadam_v (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 sequential/dense/MatMul/ReadVariableOp (ReadVariableOp) Adam/Adam/update/ResourceApplyAdam (ResourceApplyAdam) /job:localhost/replica:0/task:0/device:GPU:0 [[{{node sequential/dense/MatMul/ReadVariableOp}}]] [Op:__inference_train_function_809]
Add a Comment

Hi - I have the same issue when applying certain Data Augmentation layers - RandomFlip works but RandomZoom does not. I'm using tf.keras.layers.experimental.preprocessing.*** for my layers.

Is there any update on a fix?

Iain

  • Did you ever find a solution for the data augmentation layers?

Add a Comment

Hi! I get same error when trying to fine-tune EfficientNetB7. Any updates at this issue?

M1 Max 32Gb, Monterey 12.1

InvalidArgumentError: Cannot assign a device for operation sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ResourceApplyAdaMax: CPU 
ReadVariableOp: GPU CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  sequential_efficientnetb7_stem_conv_conv2d_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adamax_adamax_update_resourceapplyadamax_m (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adamax_adamax_update_resourceapplyadamax_v (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp (ReadVariableOp) 
  Adamax/Adamax/update/ResourceApplyAdaMax (ResourceApplyAdaMax) /job:localhost/replica:0/task:0/device:GPU:0

	 [[{{node sequential/efficientnetb7/stem_conv/Conv2D/ReadVariableOp}}]] [Op:__inference_train_function_56534]

M1 Max 64GB, Monterey 12.0.1

This is a strange error. To resolve I added the expected input shape to the first layer

# Failing:
# Possibly unrelated:  the `with` block was added as the following code didn't run on GPU:
with tf.device('/cpu:0'):
  data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip('horizontal'),
    tf.keras.layers.RandomRotation(0.2),
    ])

The above caused the following errors

# The following error was received - but after adding the input shape it worked fine
# Colocation group had the following types and supported devices: 
# Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
# RngReadAndSkip: CPU 
# _Arg: GPU CPU 

# Colocation members, user-requested devices, and framework assigned devices, if any:
#   model_sequential_2_random_flip_2_stateful_uniform_full_int_rngreadandskip_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
#   model/sequential_2/random_flip_2/stateful_uniform_full_int/RngReadAndSkip (RngReadAndSkip) 

#          [[{{node model/sequential_2/random_flip_2/stateful_uniform_full_int/RngReadAndSkip}}]] [Op:__inference_train_function_12915]

Adding the input_shape size seems to have resolved the issue and the model is training.

# Working
with tf.device('/cpu:0'):
  data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip('horizontal',input_shape=(IMG_SIZE[0],IMG_SIZE[1],3)),
    tf.keras.layers.RandomRotation(0.2),
    ])

I also had this configuration:

tf.config.set_soft_device_placement(True) 

@Frameworks Engineer

Is there any update on when tensorflow-metal will implement these missing operations?

I am using:

tensorflow-macos         2.8.0
tensorflow-metal         0.4.0

I am getting the following error which I assume is the same issue as above.

Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 
requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
supported_device_types_=[CPU] possible_devices_=[]
ReadVariableOp: GPU CPU 
ResourceApplyAdam: CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned 
devices, if any:
  sequential_dense_matmul_readvariableop_resource (_Arg)  framework 
assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_resourceapplyadam_m (_Arg)  framework assigned 
device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_resourceapplyadam_v (_Arg)  framework assigned 
device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential/dense/MatMul/ReadVariableOp (ReadVariableOp) 
  Adam/Adam/update/ResourceApplyAdam (ResourceApplyAdam) 
/job:localhost/replica:0/task:0/device:GPU:0

	 [[{{node sequential/dense/MatMul/ReadVariableOp}}]] [Op:__inference_train_function_809]

Hi I'm using Apple M1 Pro and OS is Monterey 12.3.1, and with

tensorflow-macos         2.8.0
tensorflow-metal         0.4.0

Getting the same problem. While I added an augmentation layer to my model.

Costs me like 3 hours to figure it out.

My solution is, I manually assign gpu using with tf.device("/gpu:0"): like this:

with tf.device("/gpu:0"):
    model.compile(loss="categorical_crossentropy", optimizer='adma', metrics=['accuracy'])
    history = model.fit(train_ds,epochs=epochs,validation_data=val_ds, use_multiprocessing=True)

Don't need to change/add other code. Just a simple line of code.

It worked like a charm for me, you guys can try it!!!

I hope it is helpful to anyone who come across this strange problem.

  • This same versioning and code almost worked for me - but needs to be changed to:

    with tf.device("/gpu:0"): model.compile(loss="categorical_crossentropy", optimizer='adma', metrics=['accuracy']) history = model.fit(train_ds,epochs=epochs,validation_data=val_ds, use_multiprocessing=True)

    Rational: We want to compile the model to use GPU, but keep CPU for image preprocessing. By moving model.fit outside the GPU, it allows both to be used when specified by the user.

Add a Comment

A workaround is to uninstall tensorflow-meta: pip uninstall tensorflow-meta

history=model.fit( train_ds, epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=1, validation_data=val_ds, use_multiprocessing=True )

I have use this code and getting this error in my MacBook Air m1 2020

InvalidArgumentError Traceback (most recent call last) Input In [57], in <cell line: 1>() ----> 1 history=model.fit( 2 train_ds, 3 epochs=EPOCHS, 4 batch_size=BATCH_SIZE, 5 verbose=1, 6 validation_data=val_ds, 7 use_multiprocessing=True 8 )

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback..error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.traceback) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 52 try: 53 ctx.ensure_initialized() ---> 54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 55 inputs, attrs, num_outputs) 56 except core._NotOkStatusException as e: 57 if name is not None:

InvalidArgumentError: Cannot assign a device for operation sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip: Could not satisfy explicit device specification '' because the node {{colocation_node sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] RngReadAndSkip: CPU _Arg: GPU CPU

Colocation members, user-requested devices, and framework assigned devices, if any: sequential_5_sequential_4_random_flip_1_stateful_uniform_full_int_rngreadandskip_resource (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip (RngReadAndSkip) sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int_1/RngReadAndSkip (RngReadAndSkip)

 [[{{node sequential_5/sequential_4/random_flip_1/stateful_uniform_full_int/RngReadAndSkip}}]] [Op:__inference_train_function_11336]

Solved: A simple workaround is to uninstall tensorflow-metal. It is not perfect for M1 Macs and still under development. Also it will boost up your processing speed as tensorflow-metal is not crafted to utilise full capacity of GPUs. Errors you are getting is due to inefficiency of tensorflow-metal plugin.

pip uninstall tensorflow-metal

And don't forget to update tensorflow-macos to latest version. Latest version is 2.11.0

pip install --upgrade tensoflow-macos

Prerequisites :

  • Python 3.8-3.10

  • macOS version 12.0 or later

Hope it helps!!

  • Confirmed that this is still an issue with M2 Macs and that this solution works. Workbooks have sped up quite a bit as mentioned.

Add a Comment

I had the same error M2 Macs but none of the above suggestions worked for me. The error has got to do with data_augmentation layer of Keras Sequential model.

Sample code producing errors:

data_augmentation = keras.Sequential(
    [
        layers.experimental.preprocessing.RandomFlip("horizontal"),
        layers.experimental.preprocessing.RandomRotation(0.1),
        layers.experimental.preprocessing.RandomZoom(0.1),
    ]
)
model_1 = tf.keras.models.Sequential([
  data_augmentation,
  tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3).......
.......
.......
.....

Rather, i moved Data Augmentation to the Image Generator:

train_datagen = ImageDataGenerator(rescale=1.&#x2F;255,
                                    rotation_range=0.2, # rotate the image slightly to up to 20%
                                    zoom_range=0.2, # zoom into the image up to 20%
                                    width_shift_range=0.2, # shift the image width ways up to 20%
                                    height_shift_range=0.2, # shift the image height ways up to 20%
                                    shear_range=0.2, # shear the image up to 20%
                                    horizontal_flip=True # flip the image on the horizontal axis
                                        )

building the model

model_1 = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(filters=32, 
                         kernel_size=(3,3)....
.......
......
.....

Another *****

I encountered this problem when I was trying to run example code from a notebook on Tensorflow-Probability: https://www.tensorflow.org/probability/examples/Probabilistic_Layers_Regression

Specifically, Case 5: Functional Uncertainty includes an item not previously seen in the notebook:

tf.keras.backend.set_floatx('float64')

If I change that to

tf.keras.backend.set_floatx('float32')

Then the notebook runs. Not well: the numerical stability that prompted the authors to use float64 occur. But it runs.

Don't know what this means, but it seems like the missing "ResourceApplyAdamWithAmsgrad" is not the issue. Dunno