Reputation
Badges 1
18 × Eureka!To me it looks as if somebody were going in to the UI and hitting abort on the task but that's definitely not the case
Any time I run the agent locally via:
clearml-agent daemon --queue services --services-mode --cpu-only --docker --foreground
It works without fail so I've tried removing the clearml
mount from agent-services
in docker-compose.yml
:
CLEARML_WORKER_ID: "clearml-services"
# CLEARML_AGENT_DOCKER_HOST_MOUNT: "/opt/clearml/agent:/root/.clearml"
SHUTDOWN_IF_NO_ACCESS_KEY: 1
volumes:
- /var/run/docker.sock:/var/run/docker.sock
# - /opt/c...
Just user abort by the looks of things:
This should be the full log cleaned
Thanks @<1523701087100473344:profile|SuccessfulKoala55> - Yeah I found that allegroai/clearml-agent-services:latest
was running clearml-agent==1.1.1
. Tried plugging various other images into docker-compose.yml
& restarting to see if versions clearml-agent==1.6.1
or clearml-agent==1.7.0
would fix the issue but no luck unfortunately 😕
Hi @<1523701070390366208:profile|CostlyOstrich36>
We've got quite a bit of sensitive info in the logs - I'll see what I can grab
Does this help at all? (I can go a lil further back, just scanning through for any potential sensitive info!)
Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for getting back to me!
What do you mean by " the pipeline page doesn't show anything at all."? are you running the pipeline ? how ?
This (see attached screenshot below) is the pipeline page for "big_pipe" specified in the snippet above. I think I understand the issue though - without PipelineDecorator.component
being top level, the SDK is unable to see each of the nodes?
Basically a Pipeline is a Task (of a specific...
This snippet works as expected in terms of computing the results and using caching where specified but the pipeline page doesn't show anything at all.
Moving append_string
anywhere else in the script results in the following error:
File "/home/user/miniconda3/envs/clearml/lib/python3.10/site-packages/clearml/automation/controller.py", line 3544, in wrapper
_node = cls._singleton._nodes[_node_name].copy()
KeyError: 'append_string'
Using None & ...
Excellent, that makes complete sense. I thought we'd be restricted to creating pipeline A via the PipelineController
but I guess we could still use PipelineDecorator
with something along the lines of your example. Probably not much need for nested components this way! Still learning the clearml way of doing things but this is a massive help, thank you so much!
Nice one, thank you very much!
Literally just opened that page as your message came through!
Great to know I'm going in the right direction, I'll give it a go, thanks!
Cool I'll let you know how it goes 🙂
Took some time to get extra_vm_bash_script
set up properly but that's done the job!
Thank you again for the help!
That's all set up and handy to use locally but I can't see that there's any ClearML support for it. Nor can I think of any way of getting the binary into an instance span up by my auto scaler...
Maybe there's something I can change in the aws autoscaler example
I managed to get it running with:
task.set_packages('./package/requirements.txt')
where one of the lines in ./package/requirements.txt
points to the package within our repo. E.g:
-e git+
I'll try pointing it directly to the package, that would be much easier to work with!
Similar error if I set the package after cloning the task:
Task.clone(...)
task.set_packages('./package')
File "/home/ec2-user/miniconda3/envs/ml/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1414, in set_packages
with open(packages) as f:
IsADirectoryError: [Errno 21] Is a directory: './ml'
We are cloning an existing task (pipeline). Adding Task.add_requirements("./path/to/package")
before .Task.clone(...)
gives:
2023-02-22 14:08:31,508 - clearml.task - WARNING - Requirement ignored, Task.add_requirements() must be called before Task.init()
Followed by this further down:
with Path(package_name).open() as requirements_txt:
File "/home/ec2-user/miniconda3/envs/ml/lib/python3.10/site-packages/pathlib2/__init__.py", line 1548, in open
return io.o...
Hi @<1523701087100473344:profile|SuccessfulKoala55> , we're using the pre-built EC2 AMI I believe. We did an update on this a few weeks back so hopefully still up to date! Here's the version info from the app:
WebApp: 1.9.2-317 • Server: 1.9.2-317 • API: 2.23
I noticed that workers.set_runtime_properties
isn't in the API reference either?
Missed your message there @<1544853721739956224:profile|QuizzicalFox36> - I've been receiving the same warning with the docker flag provided too unfortunately.
@<1523701087100473344:profile|SuccessfulKoala55> , that's a shame but thank you for the clarification. I'm not sure how much on an option the scale/enterprise tiers are for us right now so it sounds like we'll need to re-think this a little bit. Just to double check - there's no other way of us limiting the number of workers on a d...