If the task is running remotely and the parameters are populated, then the local run parameters will not be used, instead the parameters that are already on the task will be used. This is because we want to allow users to change these parameters in the UI if they want to - so the paramters that are in the code are ignored in the favor of the ones in the UI
Hi @<1539417873305309184:profile|DangerousMole43> ! You need to mark the task you want to upload an artifact to as running. You can use task.mark_started(force=True)
to do so, then mark it back as completed using task.mark_completed(force=True)
What clearml sdk version are you using?
Hello MotionlessCoral18 . I have a few questions that might help us find out why you experience this problem:
Is there any chance you are running the program in offline mode? Is there any other message being logged that might help? The error messages might include Action failed
, Failed sending
, Retrying, previous request failed
, contains illegal schema
Are you able to connect to the backend at all from the program you are trying to get the dataset?
Thank you!
Hi @<1569496075083976704:profile|SweetShells3> ! Can you reply with some example code on how you tried to use pl.Trainer
with launch_multi_node
?
Hi! Can you please provide us with code that would help us reproduce this issue? Is it just downloading from gcp?
That is a clear bug to me. Can you please open a GH issue?
btw, to avoid clutter you could also archive runs you don't need anymore
Hi @<1543766544847212544:profile|SorePelican79> ! You could use the following workaround:
from clearml import Task
from clearml.binding.frameworks import WeightsFileHandler
import torch
def filter_callback(
callback_type: WeightsFileHandler.CallbackType,
model_info: WeightsFileHandler.ModelInfo,
):
print(model_info.__dict__)
if (
callback_type == WeightsFileHandler.CallbackType.save
and "filter_out.pt" in model_info.local_model_path
):
retu...
Hi @<1581454875005292544:profile|SuccessfulOtter28> ! You could take a look at how the HPO was built using optuna: None .
Basically: you should create a new class which inherits from SearchStrategy
. This class should convert clearml hyper_parameters to some parameters the Ray Tune understands, then create a Tuner
and run the Ray Tune hyper paramter optimization.
The function Tuner
will optim...
Hi @<1702492411105644544:profile|YummyGrasshopper29> ! Parameters can belong to different sections. You should append it before some_parameter
. You likely want ${step2.parameters.kwargs/some_parameter}
Hi @<1547752791546531840:profile|BeefyFrog17> ! Are you getting any exception trace when you are trying to upload your artifact?
With that said, can I run another thing by you related to this. What do you think about a PR that adds the functionality I originally assumed schedule_function was for? By this I mean: adding a new parameter (this wouldn't change anything about schedule_function or how .add_task() currently behaves) that also takes a function but the function expects to get a task_id when called. This function is run at runtime (when the task scheduler would normally execute the scheduled task) and use ...
Hi @<1546303293918023680:profile|MiniatureRobin9> ! When it comes to pipeline from functions/other tasks, this is not really supported. You could however cut the execution short when your step is being ran by evaluating the return values from other steps.
Note that you should however be able to skip steps if you are using pipeline from decorators
Hi UnevenDolphin73 ! We were able to reproduce the issue. We'll ping you once we have a fix as well 👍
@<1523701083040387072:profile|UnevenDolphin73> are you composing the code you want to execute remotely by copy pasting it from various cells in one standalone cell?
What OS are you running the scripts on, Abed?
Hi @<1523701168822292480:profile|ExuberantBat52> ! During local runs, tasks are not run inside the specified Docker container. You need to run your steps remotely. To do this you need to first create a queue, then run a clearml-agent
instance bound to that queue. You also need to specify the queue in add_function_step
. Note that the controller can still be ran locally if you wish to do that
You're welcome! Feel free to write here again if you believe this might be a ClearML problem
Hi @<1570583237065969664:profile|AdorableCrocodile14> ! get_local_copy
will always copy/download external files to a folder. To get the external files, there is property on the dataset called link_entries
which returns a list of LinkEntry
objects, which contain a link
attribute, and each such link should point to a extrenal file (in this case, your local paths prefixed with file://
)
The only expection is the models if I'm not mistaken, which are stored locally by default.
MotionlessCoral18 If you provide the model as a hyperparam, then I believe you should query its value by calling https://clear.ml/docs/latest/docs/references/sdk/task/#get_parameters or https://clear.ml/docs/latest/docs/references/sdk/task/#get_parameter
Could you try adding region
under credentials
as well?
Hi @<1523711002288328704:profile|YummyLion54> ! By default, we don't upload the models to our file server, so in the remote run we will try to pull the file from you local machine which will fail most of the time. Specify the upload_uri
to the api.files_server
entry in your clearml.conf
if you want to upload it to the clearml server, or any s3/gs/azure links if you prefer a cloud provider
FierceHamster54 I understand. I'm not sure why this happens then 😕 . We will need to investigate this properly. Thank you for reporting this and sorry for the time wasted training your model.
@<1531445337942659072:profile|OddCentipede48> Looks like this is indeed not supported. What you could do is return the ID of the task that returns the models, then use Task.get_task
and get the model from there. Here is an example:
from clearml import PipelineController
def step_one():
from clearml import Task
from clearml.binding.frameworks import WeightsFileHandler
from clearml.model import Framework
WeightsFileHandler.create_output_model(
"obj", "file...
@<1554638160548335616:profile|AverageSealion33> looks like hydra pulls the config relative to the scripts directory, and not the current working directory. The pipeline controller actually creates a temp file in /tmp
when it pulls the step, so the script's directory will be /tmp
and when searching for ../data
, hydra will search in /
. The .git
likely caused your repository to be pulled, so your repo structure was created in /tmp
, which caused the step to run correctly...
@<1554638160548335616:profile|AverageSealion33> Can you run the script with HYDRA_FULL_ERROR=1
. Also, what if you run the script without clearml? Do you get the same error?
Hi @<1578918167965601792:profile|DistinctBeetle43> ! This is currently not possible. A different task will be created for each instance