LOL 🙂
Make sure that when you train the model or create it manually you set the default "output_uri"
task = Task.init(..., output_uri=True)
or
task = Task.init(..., output_uri="s3://...")
Are they ephemeral or later used by other Tasks, execution etc ?
For example: configuration files, they are specific for an execution, and someone will edit them.
Initial weights files, are something that multiple execution might needs them, and they will be used to restore an execution. Data, even if changing, is usually used by multiple executions tasks etc.
It seems like you treat these files as "configurations", is that right ?
how would I get an agent to launch in the same instance of my clearml server
Actually that is my point, you do not have to spin the agent on the clearml-server instance. We added the services agent as part of the docker-compose for easier deployment, that said you can always manually SSH to the server, or run on any other machine, like you would spin any other clearml-agent .
Does that make sense ?
So it sounds as if for some reason calling Task.init inide a notebook on your jupyterhub is not detecting the notebook.
Is there anything special about the jupyterhub deployment ? how is it deployed ? is it password protected ? is this reproducible ?
@<1546303293918023680:profile|MiniatureRobin9>
, not the pipeline itself. And that's the last part I'm looking for.
Good point, any chance you want to PR this code snippet ?
def add_tags(self, tags):
# type: (Union[Sequence[str], str]) -> None
"""
Add Tags to this pipeline. Old tags are not deleted.
When executing a Pipeline remotely (i.e. launching the pipeline from the UI/enqueuing it), this method has no effect.
:param tags: A li...
Is this reproducible with the hpo example here:
https://github.com/allegroai/clearml/tree/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/examples/optimization/hyper-parameter-optimization
What's your clearml version? (And is it possible you verify with the latest version?)
Hi ElegantCoyote26 , yes I did 🙂
It seems cometml puts their default callback logger for you, that's it.
I have also tried with type hints and it still broadcasts to string. Very weird...
Type hints are ignored, it's the actual value you pass that is important:
` @PipelineDecorator.component(return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
...
@PipelineDecorator.pipeline(name='custom pipeline logic', project='examples', version='0.0.5')
def executing_pipeline(pickle_url, mock_parameter='mock'):
da...
JitteryCoyote63 to filter out 'archived tasks' (i.e. exclude archived tasks)Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(system_tags=["-archived"])))
Hi OutrageousGrasshopper93
which framework are you using? trains-agent will pull the correct torch based on the cuda version it detects, but no such thing for TF the default venv mode, trains-agent creates a new venv for the experiment (not conda) then everything is installed there. If you need conda you need to change the package_manager to conda: https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c6736d12614de9870eff48bc/docs/trains.conf#L49 The safest way to control CUDA dri...
what if for some old tasks I get WARNING:root:Could not delete Task ID=a0908784a2a942c3812f947ec1f32c9f, 'Task' object has no attribute 'delete'? What's the best way of cleaning them?
This seems like an old SDK no?
Hi CleanPigeon16
You need to be able access the machine running the agent, usually the default port will be 10022.
If you need further debug message, add --debug at the beginning of the clearml-session.clearml-session --debug ...To get all the debug print, please upgrade to clearml-session==0.3.3
I see them run reliably (no killed), are they running in service mode?
How do you deploy agents, with the clearml k8s glue ?
so you have a repo with poetry that some users update and some do not?
All working on the same branch ?
PompousParrot44 please try to reply on the thread, so we do not create a mess in the main channel 🙂
What's the "working directory" in the execution section? Do you have package "test" in the installed packages?
I would expect that after calling Task.enqueue(exit=True), the local task is closed and no processes related to it is running
Ohh my apologies, I did not understand that.
Are you saying that locally you call task.remote_execute(exit_process=True) and it does not leave the local process ?
Hi ReassuredOwl55
The easiest is to configure it as default output_uri in the clearml.conf of file the agent, wdyt?
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L430
So the way it works when you run a component the return value with the entire function execution is cached, basically:
this did NOT add the artifact to the pipeline via caching on subsequent runs ❌
you just need to do:
PipelineDecorator.upload_artifact(name='images', artifact_object=img_dir, wait_on_upload=True)
return Task.current_task().artifacts['images'].url
This will return the URL of the uploaded images (i.e. S3 bucket)
which means if this is cached you will get it...
Hi YummyFish22
Looks like the task does not have "Task.init" call on the main script (or an import of clearml)? could that be the case?
And when running
get
the files on the parent dataset will be available as links.
BTW: if you call get_mutable_copy() the files will be copied, so you can work on them directly (if you need)
ConvolutedSealion94 Let me try to explain how it works, I hope this will help in debugging.
There are two different entities here
Clearml-server: In this context clearml server acts as a control-plane, it stores configuration on the different endpoints, models, preprocessign code etc. It does Not perform any compute or serving clearml-serving-inference https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/docker-compose-triton-gpu.yml#L77 . This ...
PompousBeetle71 you can check this example:
https://github.com/allegroai/trains/blob/master/examples/distributed/example_torch_distributed.py
I think it should help, if you want a more manual approach, you can check the POpen subprocesses here:
https://github.com/allegroai/trains/blob/master/examples/distributed/example_subprocess.py
Hi JoyousElephant80
Another possibility would be to run a process somewhere that periodically polls ClearML Server for tasks that have recently finished
this is the easiest way to implement what you are after, and have full control over the logic itself.
Basically you inherit from the Monitor class
And implement the callback function:
https://github.com/allegroa...
Need - in my CI, the url used is https but I need the ssh url to be used. I see that we can pass repo to Task.create but not Task.init
Are you cloning an existing Task, or creating a new one ?