Regarding pending pipelines: please make sure a free agent is bound to the queue you wish to run the pipeline in. You can check queue information by accessing the INFO section of the controller (as in the first screenshort)
then by pressing on the queue, you should see the worker status. There should be at least one worker that has a blank "CURRENTLY EXECUTING" entry

(and get rid of the wait
).
Hi @<1694157594333024256:profile|DisturbedParrot38> ! If you want to override the parameter, you could add a DiscreteParameterRange
to hyper_paramters
when calling HyperParameterOptimizer
. The DiscreteParameterRange
should have just 1 value: the value you want to override the parameter with.
You could try setting the parameter to an empty string in order to mark it as cleared
DeliciousKoala34 can you upgrade to clearml==1.8.0
? the issue should be fixed now
Hi ApprehensiveSeahorse83 ! Looks like this is a bug. We will fix it ASAP
Hi JumpyDragonfly13 ! Try using get_task_log
instead of download_task_log
Hi SoreHorse95 ! I think that the way we interact with hydra doesn't account for overrides. We will need to look into this. In the meantime, do you also have somesort of stack trace or similar?
Hi NonchalantGiraffe17 ! Thanks for reporting this. It would be easier for us to check if there is something wrong with ClearML if we knew the number and sizes of the files you are trying to upload (content is not relevant). Could you maybe provide those?
Perfect! Can you please provide the sizes of the files of the other 2 chunks as well?
The {$step.id}
is the most viable way to reference that step imo @<1633638724258500608:profile|BitingDeer35>
Hi @<1571308010351890432:profile|HurtAnt92> ! Yes, you can create intermediate datasets. Just batch your datasets, for each batch create new child datasets, then create a dataset that has as parents all of these resulting children.
I'm surprized you get OOM tho, we don't load the files in memory, just the name/path of the files + size, hash etc. Could there be some other factor that causes this issue?
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! Note that the Datasets
section is created only if you get the dataset with an alias? are you sure that number_of_datasets_on_remote != 0
?
If so, can you provide a short snippet that would help us reproduce? The code you posted looks fine to me, not sure what the problem could be.
Hi @<1633638724258500608:profile|BitingDeer35> ! You could attach the configuration using set_configuration_object
None in a pre_execute_callback
. The argument is set here: None
Basically, you would have something like:
def pre_callback(pipeline, node, params):
node.job.task.set_configuration_object(config)...
FierceHamster54initing the task before the execution of the file like in my snippet is not sufficient ?
It is not because os.system
spawns a whole different process then the one you initialized your task in, so no patching is done on the framework you are using. Child processes need to call Task.init
because of this, unless they were forked, in which case the patching is already done.
` But the training.py has already a CLearML task created under the hood since its integratio...
do you have any STATUS REASON
under the INFO
section of the controller task?
Hi @<1676400486225285120:profile|GracefulSquid84> ! Each step is indeed a clearml task. You could try using the step ID. Just make sure you pass the ID to the HPO step (you can do that by simply returning the Task.current_task().id
Hi @<1610083503607648256:profile|DiminutiveToad80> ! You need to somehow serialize the object. Note that we try different serialization methods and default to pickle if none work. If pickle doesn't work then the artifact can't be uploaded by default. But there is a way around it: you can serialize the object yourself. The recommended way to do this is using the serialization_function
argument in upload_artifact
. You could try using something like dill
which can serialize more ob...
@<1657556312684236800:profile|ManiacalSeaturtle63> can you share how you are creating your pipeline?
Do you want to remove steps/add steps from the pipeline after it has ran basically? If that is the case, then it is theoretically possible, but we don't expose and methods that would allow you to do that...
What you would need to do is modify all the pipeline configuration entries you find in the CONFIGURATION section (see the screenshot), Not sure if that is worth the effort. I would simply create another version of the pipeline with the added/removed steps

Or if you ran it via an IDE, what is the interpreter path?
because I think that what you are encountering now is an NCCL error
are you running this locally or are you enqueueing the task (controller)?
@<1626028578648887296:profile|FreshFly37> can you please screenshot this section of the task? Also, how does your project's directory structure look like?
Hi RoundMole15 ! Are you able to see a model logged when you run this simple example?
` from clearml import Task
import torch.nn.functional as F
import torch.nn as nn
import torch
class TheModelClass(nn.Module):
def init(self):
super(TheModelClass, self).init()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
s...
Hi @<1543766544847212544:profile|SorePelican79> ! You could use the following workaround:
from clearml import Task
from clearml.binding.frameworks import WeightsFileHandler
import torch
def filter_callback(
callback_type: WeightsFileHandler.CallbackType,
model_info: WeightsFileHandler.ModelInfo,
):
print(model_info.__dict__)
if (
callback_type == WeightsFileHandler.CallbackType.save
and "filter_out.pt" in model_info.local_model_path
):
retu...