Reputation
Badges 1
25 × Eureka!ShinyLobster84
fatal: could not read Username for '
': terminal prompts disabled
This is the main issue, it needs git credentials to clone the repo code, containing the pipeline logic (this is the exact same behaviour as pipeline v1 execute_remotely(), which is now the default, could it be that before you executed the pipeline logic, locally ?)
WackyRabbit7 could the local/remote pipeline logic could apply in your case as well ?
no, at least not yet, someone definitely needs to do that though haha
Currently all the unit tests are internal (the hardest part is providing server they can run against and verify the results, hence the challange)
For example, if ClearML would offer a
TestSession
that is local and does not communicate to any backend
Offline mode? it stores everything into a folder, then zips it, you can access the target folder or the zip file and verify all the data/states
Hi DeliciousKoala34
Happened when cloning and running a task on an agent on a different machine. I
sounds like torch internal issue, can you send the full log of the remote Task ?
Hi MuddySquid7
Hmmm what would be the use case ? (I mean how are we using Vertex ?)
Hi StaleKangaroo85 which trains
version are you using ? Also which trains-server
are you using?
Only those components that are imported in the script where the pipeline is defined would be included in the DAG plot, is that right?
Actually the way it works currently (and we might change it if there is a better way), every time you call PipelineDecorator.component
a new component is stored on the Pipeline Task, which is later translated into DaG graph and Table (next version will have a very nice UI to display / edit them).
The idea is first to have a representation of the p...
Hi DeliciousBluewhale87
Yes that should have worked, can you verify the task status ?
Print(Task.get_task(...).get_status())
Check the log, the container has torch 1.13.0 but the task requires torch==1.13.1
Now torch package inside those nvidia prepackaged containers are compiled a bit differently . What I suspect happens is the torch wheel from pytorch is not compatible with this container . Easiest fix , change the task requirments to 1.13
Wdyt ?
Yes this seems like it is stuck, could you test with the demo server ?
(basically remove the clearml.conf it will connect automatically)
GrittyHawk31
what are you getting when you are running:docker ps
and what are you getting with:netstat -natp | grep LISTEN
JitteryCoyote63 virtualenv v20 is supported, pip v21 needs the latest trains/trains-agent RC,
ReassuredTiger98 you mean when calling clearml-init
? or default value?
MotionlessCoral18 I think there is a fix in the latest clearml-agent RC 1.4.0rc0 can you test and update if your are still having this issue?
Hi RotundHedgehog76
I think it should work out of the box, I mean at the end both spin jupyter notebooks, which is what clearml interacts with. Are you getting any errors?
because step can be constructed with multiple
sub-components
but not all of them might be added to the UI graph
Just to make sure I fully understand when we decorate with @sub_node we want that to also appear in the UI graph (and have it's own Task / metrics etc)
correct?
WackyRabbit7 interesting! Are those "local" pipelines all part of the same code repository? do they need their own environment ?
What would be the easiest pipeline interface to run them locally? (I would if we could support this workflow, it seems you are not alone in this approach, and of course that you can always use them remotely, i.e. clone the pipeline and launch it on an agent)
In Azure VMSS, there is a method called "Custom Data", which is basically a way of passing things to be executed
I know that it is in the to do list to add "azure_autoscaler" which is basically asybling to the aws_autoscaler.
With the same idea of the "custom data" as initial bash script:
You can check here:
https://github.com/allegroai/clearml/blob/4a2099b53c09d1feaf0e079092c9e075b43df7d2/clearml/automation/aws_auto_scaler.py#L54
Those variables are not passed to the remote instance they are used by the aws autoscaler to launch it, but there is no need to pass them.
I think the easiest is to add them to the "extra_vm_bash_script" as well
MelancholyChicken65 found it ! thank you for finding this issue.
I'm hoping to get an update soon 🙂
well that depends on you, what did you write there to know it is the best one ? file name ? added some metric ?
Even before we had a chance to properly notice everyone 🙂
Thank you! All the details will follow in a dedicated post, for the time being, I can say that pushing a model with pre/post processing python code and full scalable inference solution has never been easier
https://github.com/allegroai/clearml-serving/tree/main/examples/sklearn
Okay, I think I understand, but missing something. It seems you call get_parameters from old API , is your code actually calling get_parameters ? The trains-agent runs the code externally, whatever happens inside the agent should have now effect on the code. So who exactly is calling the task.get_parameters, and well, why ? :)
some dependencies will sometimes require different pip versions.
none 🙂 maybe setuptools, but not pip
version
(pip is just a utility to install packages, it will not be a dependency of one)
These both point to nvidia docker runtime installation issue.
I'm assuming that in both cases you cannot run the docker manually as well, which is essentially what the agent will have to do ...
The dokcer itself does not have the host configured.