Reputation
Badges 1
25 × Eureka!BTW: the agent will resolve pytorch based on the install CUDA version.
it overwrites the previous run?
It will overwrite the previous if
Under 72h from last execution no artifact/model was createdYou can control it with "reuse_last_task_id=False" passed to Task.init
Task name itself is Not unique in the system, think of it as short description
Make sense ?
Hi TrickySheep9
Long story short, clearml-session fully supports k8s (using k8s glue)
The --remote-gateway along side ports mode will basically allow you to setup a k8s service so that every session will register with a specific port so k8s does ingest foe you and route the SSH connection to the pod itslef, everything else is tunneled over the original SSH connection.
Make sense ?
Have to get glue setup, which I couldnβt understand fully, so thatβs a different topic
I suggest using the apply template setup (basically you provide a Job/Service template, and it uses that to setup k8s jobs based on the Tasks coming in from the specific queue)
same: Not Found (#404)
May I suggest to DM it to me (so it is not public)
But itβs running in docker mode and it is trying to ssh into the host machine and failing
It is Not sshing to the machine it is sshing directly Into the container.
Notice the port is is sshing to is 10022 which is mapped into the container
JitteryCoyote63 with pleasure π
BTW: the Ignite TrainsLogger will be fixed soon (I think it's on a branch already by SuccessfulKoala55 ) to fix the bug ElegantKangaroo44 found. should be RC next week
But what should I do? It does not work, it says incorrect password as you can see
How are you spinning the agent machine ?
Basically 10022 port from the host (agent machine) is routed into the container, but it still needs to be open on the host machine, could it be it is behind a firewall? Are you (client side runnign clearml-session) on the same network as the machien runnign the agent ?
SillyPuppy19 are you aborting the experiment or are you trying to protect crash? Is it like a callback functionality you are looking for?
Thanks MagnificentPig49 !
BitingKangaroo95 can you post here the entire console output of clearml-session (including full command line) ?
Thank you for saying ! π
SillyPuppy19 I think this is a great idea, basically having the ability to have a callback function called before aborting/exiting the process.
Unfortunately today abort will give the process 2 seconds to gracefully quit and then it kills the process. It was not designed to just send an abort signal, as these will more often than not, will not actually terminate the process.
Any chance I can ask you to open a GitHub Issue and suggest the callback feature. I have a feeling a few more users ...
Hmm do you host it somewhere? Is it pre-installed on the container?
DisgustedDove53 , TrickySheep9
I'm all for it!
I can think of two options here, (1) use the k8s glue + apply template with ports mode see discussion https://clearml.slack.com/archives/CTK20V944/p1628091020175100
(2) create an interface (queue) to launch arbitrary job on the k8s cluster, with the full pod definition on the Task. This will allow the clearml-session to setup everything from the get go.
How would you interface with the k8s operator, and what exactly will it do?
(BTW: the reas...
Ssh is used to access the actual container, all other communication is tunneled on top of it. What exactly is the reason to bind to 0.0.0.0 ? Maybe it could be a flag that you, but I'm not sure in what's the scenario and what are we solving, thoughts?
Hi ThoughtfulBadger56
If I clone and enqueue the cloned task on the webapp, does the clearml server execute the whole cmd above?
You mean agent will execute it? Do you have Task.init inside your code ?
agentservice...
Not related, the agent-services job is to run control jobs, such as pipelines and HPO control processes.
GiganticTurtle0 we had this discussion in the wrong thread, I moved it here.
Moved from the wrong thread
Martin.B Β Β [1:55 PM]
GiganticTurtle0 Β the sample mock pipeline seems to be running perfectly on the latest code from GitHub, can you verify ?
Martin.B Β Β [1:55 PM]
Spoke too soon, sorryΒ π Β issue is reproducible, give me a minute here
Alejandro C Β Β [1:59 PM]
Oh, and which approach do you suggest to achieve the same goal (simultaneously running the same pipeline with differen...
... training script was set to upload every epoch. Seems like this resulted in a torrent of metrics being uploaded.
oh that makes sense, so basically you were bombarding the server with requests, and ending with kind of denial of service
Awesome ! thank you so much!
1.0.2 will be out in an hour
Hi CrookedAlligator14
or is underlying data also accessible?
What do you mean by "underlying data" ?
Here is a nice hack for you:Task.add_requirements( package_name='carla', package_version="> 0 ; python_version < '2.7' # this hack disables the pip install")This will essentially make sure the agent will skip the installation of the package, but at least you will know it is there.
Right, you need to pass "repo" and direct it to the repository path
(BTW, what's the cleaml version)
Hi @<1695969549783928832:profile|ObedientTurkey46>
Why do tags only show on a version level, but not on the dataset-level? (see images)
Tags of datasets are tags on "all the dataset versions" i.e. to help someone locate datasets (think locating projects as an analogy). Dataset Version tags are tags on a specific version of the dataset, helping users to locate a specific version of the dataset. Does that make sense ?
Hi ResponsiveCamel97
The agent generates a new configuration file to be mounted into the docker, with all the new folders as they will be seen inside the docker itself. One of the changes is the system_site_packages as inside the docker we want the new venv to inherit everything from the docker system installed packages.
Make sense ?
I prepared my own image and want use this venv
No worries, it creates a "transparent" venv, it uses everything from the docker (the penalty of create a new venv is negligible π , you end up with the exact same set of packages)