It looks like you're running on different machines and the file your code is looking for is not available on the other machine
How do you normally mount the ssh keys?
It can run dockers and it can run over K8s
Hi ScrawnyLion96 ,
I think it handles some data like worker stats. It's required for the server to run. What do you mean by the redis getting fuller and fuller?
Hi @<1567321739677929472:profile|StoutGorilla30> , this is a good question. I would assume that the CLI tool uses API calls under the hood. I think you can either look at the code and see what is being sent or simply do CLI commands from the code
WackyRabbit7 , isn't this what you need?
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#get_running_nodes
Is it your own server installation or are you using the SaaS?
I see, thanks for the input!
I think the serving engine ip depends on how you set it up
Hi VirtuousFish83 ,
You can do it using the API directly tasks.get_all is what you're looking for
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
In the UI check under the execution tab in the experiment view then scroll to the bottom - You will have a field called "OUTPUT" what is in there? Select an experiment that is giving you trouble?
How are you reporting / generating them now?
Hi @<1649221394904387584:profile|RattySparrow90> , events and console logs are logged to elastic so they can be fetched. Debug samples are also events so they are saved to elastic (A link to the debug is saved in the event itself).
It is suggested to keep a dedicated elastic to ClearML
How did the tasks fail?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> , that's an interesting point. Please open a Github feature request for this. To circumvent this you can add Task.init to that code as well
Hi @<1724960468822396928:profile|CumbersomeSealion22> , can you provide a log of such a run?
Hi JitteryCoyote63 , can I assume you can ssh into the machine directly?
Hi @<1570220858075516928:profile|SlipperySheep79> , you need to apply the same setting on the machine that is running the agent. clearml.conf files are local and apply the settings only on the machine they're sitting on. In the Scale/Enterprise licenses there are configuration vaults that take care of this.
Hi PunyWoodpecker71 ,
It's best to run the pipeline controller in the services queue because the assumption is that the controller doesn't require much compute power as opposed to steps that can be resource exhausting (depends on pipeline of course)
Could be. If it's not picking what you expect then it means something is misconfigured
Hi @<1643060831954407424:profile|ScrawnyMole16> , you can export your report to PDF and share it with your colleagues 🙂
Hi SmugTurtle78 , I think you can set it up as follows (or something similar):pipe.add_step( name="stage_train", parents=["stage_process"], base_task_project="examples", base_task_name="Pipeline step 3 train model", parameter_override={"General/dataset_task_id": "${stage_process.id}"}, )Note that in parameter_override I take a task id from a previous step and insert it into the configuration/parameters of the current step. Is that what you're looking for?
Hi EnormousCormorant39 ,
is there a way to enqueue the dataset
add
command on a worker
Can you please elaborate a bit on this? Do you want to create some sort of trigger action to add files to a dataset?
Hi @<1811208768843681792:profile|BraveGrasshopper38> , you can do anything programatically that you can do via the webUI. I suggest opening dev tools (F12) and checking what is being sent in the network tab when you create credentials.
Hi IrritableJellyfish76 , it looks like you need to create the services queue in the system. You can do it directly through the UI by going to Workers & Queues -> Queues -> New Queue
Hi @<1523701949617147904:profile|PricklyRaven28> , note that steps in a pipeline are special tasks with hidden system tag, I think you might want to enable that in your search
In that case you are correct. If you want to have a 'central' source of data then Datasets would be the suggested approach. Regarding your question on adding data, you would always have to create a new child version and append new data to the child.
Also maybe squashing the dataset might be relevant to you - None
Hi @<1856144866401062912:profile|VirtuousHorse94> , there was a hotfix released, try pip install -U clearml-agent and run again 🙂
Hi @<1742355077231808512:profile|DisturbedLizard6> , for this you have the clearml-task CLI - None
Hi @<1577468638728818688:profile|DelightfulArcticwolf22> , what email did you use? Can you try again now?