Yeah I guess that's the culprit. I'm not sure clearml and wandb were planned to work together and we are probably interfering with each other. Can you try removing the wandb model save callback and try again with output_uri=True?
Also, I'd be happy to learn of your use-case that uses both clearml and wandb. Is it for eval purposes or anything else?
Sure I will try that. Does ClearML have a specific Stable Baselines 3 framework tag or should I try with just PyTorch?
I don't see SB3 here so PyTorch would be best: https://clear.ml/docs/latest/docs/integrations/libraries
This run with no output_uri
specified produces artifacts.
I don't explicitly call torch save
Hi AnxiousSeal95 , the models are saved both with a weights and biases call back and through stable baselines 3 model.save. Yes that makes sense to me that the files local to the docker container can't be downloaded. But yes when setting output_uri to true no models appear in the UI at all which seems strange
Hi MysteriousSeahorse54 How are you saving the models? torch.save() ? If you're not specifying output_uri=True it makes sense that you can't download as they are local files 🙂
And when you put output_uri = True, does no model appear in the UI at all?
Manual logging has the same behavior. When the output destination is not set the model artifacts are saved but can't be downloaded. They are saved to the docker in which they ran and not the fileserver. When the output uri is set the artifacts don't appear at all.
Could you try to see if it does work when you log those manually?
https://clear.ml/docs/latest/docs/clearml_sdk/model_sdk#manually-logging-models
Sure I am just trying to get the saved model weights. Logging scalers works fine. I am using stable baselines 3 and pytorch.
Thanks! I couldn't find it either, but better to ask and be sure. Trying the run with manual logging now
Sure, here is a snippet.
` run = wandb.init(project="rsTest",sync_tensorboard=True)
add tensorboard logging to the model
model = PPO('MlpPolicy', env, verbose=1, tensorboard_log=f"runs/{run.id}",
learning_rate=args.learning_rate,
batch_size=args.batch_size,
n_steps=args.n_steps,
n_epochs=args.n_epochs,
device='cpu')
create wandb callback
wandb_callback = WandbCallback(model_save_freq=1000,
model_save_path=f"models/{run.id}",
verbose=2,
)
variable for how often to save the model
time_steps = 100000
for i in range(25):
# add the reset_num_timesteps=False argument to the learn function to prevent the model from resetting the timestep counter
# add the tb_log_name argument to the learn function to log the tensorboard data to the correct folder
model.learn(total_timesteps=time_steps, callback=wandb_callback, progress_bar=True, reset_num_timesteps=False,tb_log_name=f"runs/{run.id}")
# save the model to the models folder with the run id and the current timestep
model.save(f"models/{run.id}/{time_steps*(i+1)}") `The part I don't understand is that when output_uri is not set then model artifacts show up. But when it is they don't.
Hmm, can you give a small code snippet of the save code? Are you using a wandb specific code? If so it makes sense we don't save it as we only intercept torch.save() and not wandb function calls
Can you give me a bit more info what exactly you're trying to log and what framework you're using?