Reputation
Badges 1
25 × Eureka!JitteryCoyote63 nice hack π
how come it is not automatically logged as console output ?
The main question I have is why is the ALB not passing the request, I think you are correct it never reaches the serving server at all, which leads me to think the ALB is "thinking" the service is down or is not responding, wdyt?
JitteryCoyote63 Is this an Ignite feature ? what is the expectation ? (I guess the ClearML Logger just inherits from the base ignite logger)
Could it be someone deleted the file? this is inside the temp venv folder but it should not get there
Would be very cool if you could include this use case!
I totally think we should, any chance you can open an Issue, so this feature is not lost?
Can you see it on the console ?
In the main pipeline I want to work with the secondary pipeline and other functions decorated withΒ
PipelineDecorator
. Does ClearMl allow this? I have not been able to get it to work.
Usually when we think about pipelines or pipelines, the nested pipeline is just another Task you are running in the DAG (where the target queue is the services
queue).
When you say nested pipelines with decorators, what exactly do you have in mind ?
I mean to use a function decorated withΒ
PipelineDecorator.pipeline
Β inside another pipeline decorated in the same way.
Ohh... so would it make sense to add "helper_functions" so that a function will be available in the step's context ?
Or maybe we need a new to support "standalone" decorator?! Currently to actually "launch" the function step, you have to call it from the "pipeline" main logic function, but, at least in theory, one could do without the Pipeline itself.....
Why do you ask? is your server sluggish ?
BTW: if you only need the git diff you can just copy them from the UI into a txt file and do:git apply <copied-diff.txt>
EnviousStarfish54 Notice that you can configure it on the agent machine only, so in development you are not "wasting" storage when uploading debug checkpoints/models π
trains-agent RC (which they tell me will be out tomorrow) will have a switch to do that, just so it is easier π
unless the domain is different
Β ?
Imagine that you are working with both github and bitbucket for example, if you are using git-ssh than git will know which of the domains to send the key to. Currently there is a single user/pass entry so all domains will get the same credentials. But I think this is a rare use case.
I'll try to go with this option, I think its actually perfect for my needs
Great!
Thanks @<1634001106403069952:profile|DefeatedMole42>
A follow up, (1) how are you spinning the agent ? (2) could it be the docker image "ultralytics/yolov5" does not have Bash as entry point ?
you can force that with
@PipelineDecorator.component(return_values=['int'], cache=False,
task_type='training',
docker="ultralytics/yolov5",
docker_args="--entrypoint /bin/bash",
pa...
Hi JitteryCoyote63 you can bus obviously you should be careful they might both try to allocate more GPU memory than they the HW actually has.TRAINS_WORKER_NAME=machine_gpu0A trains-agent daemon --gpus 0 --queue default --detached TRAINS_WORKER_NAME=machine_gpu0B trains-agent daemon --gpus 0 --queue default --detached
Hi LudicrousParrot69
Not sure I follow, is this pyfunc running remotely ?
Or are you looking for interfacing with previously executed Tasks ?
Are you getting the error from boto failing to launch additional ec2 instances ?
SteadyFox10 TRAINS_CONFIG_FILE or CLEARML_CONFIG_FILE
ClumsyElephant70 the odd thing is the error here:docker: Error response from daemon: manifest for nvidia/cuda:latest not found: manifest unknown: manifest unknown.
I would imagine it will be with "nvidia/cuda:11.3.0-cudnn8-runtime-ubuntu18.04" but the error is saying "nvidia/cuda:latest"
How could that be ?
Also can you manually run the same command (i.e. docker run --gpus device=0 --rm -it nvidia/cuda:11.3.0-cudnn8-runtime-ubuntu18.04 bash
)?
Hi @<1570220858075516928:profile|SlipperySheep79>
I think this is more complicated than one would expect. But as a rule of thumb, console logs and metrics are the main ones. I hope it helps? Maybe sort by number of iterations in the experiment table ?
BTW: probable better to ask in channel
Where is the log of the Task that is created for the component?
The error you are getting is the Pipeline itself throwing error because the component Task failed, the question is why did the component Task failed
Just to make sure, the first two steps are working ?
Maybe it has to do with the fact the "training" step specifies a docker image, could you try to remove it and check?
BTW: A few pointers
The return_values
is used to specify multiple returned objects stored individually, not the type of the object. If there is a single object, no need to specify
The parents
argument is optional, the pipeline components optimizes execution based on inputs, for example in your code, all pipeline comp...
think perhaps it came across as way more passive aggressive than I was intending.
Dude, you are awesome for saying that! no worries π we try to assume people have the best intention at heart (the other option is quite depressing π )
I've been working on a Azure load balancer example, ...
This sounds exciting, let me know if we can help in any way