
Reputation
Badges 1
22 × Eureka!you can put task.execute_remotely() to create it in draft mode. I've taken to configuring defaults to run things very quickly just in case i forget though (e.g. placeholder string for dataset, bail out early if not changed… or just do one epoch on a small subset of samples, etc).
I think you’d have to run the cleanup service. That’s what seems to be what is controlling deletion based on archived status and some other temporal filters
oh i see. you're talking about the agent-services, not a separate agent in a container.
yup, I've got the same thing going there.
fwiw...
for me, HOST_IP is 0.0.0.0 and the other "HOSTS" env vars don't contain "http" in them.
and my server is publicly reachable, not sure if that matter either.
ah, thank you for the clarity. A quarterly release schedule makes sense, it's about what I've observed.
Let me know if I can be of any assistance in early testing!
Yup if you scroll through the logs in the console, near the top (post config dump), you’ll see a git clone and checkout to the specific hash.
PS You can actually change this parameter in an experiment’s configuration if it is in draft mode.
probably, but the syntax would be in that of a git diff, so it’d be a touch clunky if you asked me
Are you trying to avoid local development?
Oh yes. I see. Yeah, no ML here actually (doing the testing infra of endpoints), but certainly when there is its an issue.
How does clearml session avoid it? I guess only if autoscaling is used (one worker one machine)?
Oh neat! I want to take a look at this. Only a few more weeks at the client but it’d be nice to reduce the complexity of the software stack if I can before handoff.
Can you please elaborate on the latter point? My jupyterhub’s fully containerized and allows users to select their own containers (from a list i built) at launch, and launch multiple containers at the same time, not sure I follow how toes are stepped on.
yeah let's step through this, i'm having her execute these steps as we speak.
create a task with the new project name. its created as a draft. can see it in the UI under the new project.
pipeline script is updated with new project name for. execute script to create pipeline. now see in UI under this new project name. nothing hidden.
the pipeline is running. when the queue is default (only serviced by one container with agent in it ( clearml-agent==1.5.2
). abort it. everything is still ...
If you can hit the endpoint with curl, you for sure can hook it up to many frontend frameworks.
Personal recs: gradio, streamlit
Abstract the interaction into a function call, and wrap it all in some UI elements using python.
if you commit but do not push, the metadata tells clearml that it needs to pull a non-existant commit. any changes you made on top may be saved as a diff, but they'd fail to apply.
for clearml to work on un-pushed commits, it'd have to wait for a push to register a new diff target, which can become a problem (what if you have multiple remotes? which one will it wait for?) so rather, it assumes it can access the most recent commit from your remote repo, and records this as the "base" upon whi...
I tried mounting a config file (in the structure of the one on github but with just the relevant s3 section) into the agent-services container at /root/clearml.conf
and after restarting the container, it seems to have had an impact. thank you!
When I inspect the console of the task I'm trying to run, I see there's a call to cp /tmp/clearml.conf ~/default_clearml.conf
in the docker command and that the volume /tmp/clearml.conf
is picked up from the host at some custom-named file ...
dug deeper. if i'm to make a guess.../root/clearml.conf
-> used on startup of agent-services as a template of sorts to create .clearml_agent.<id>.cfg
on demand -> this task-specific file is used to mount to /tmp/clearml_default.conf
in a new container (docker in docker bc of the socket mounted to the agent-services) -> used to execute the task
I ran into something similar during deployment. Hopefully this helps with your debugging: if the agent was launched separately from the rest of the stack, it may not have proper docker-DNS resolution to None . (e.g. if in the same docker-compose, perhaps you didnt add the backend
network field, or if it was launched separately through docker run
without an explicit external network defined)
if the agent's on the same machine, try docker network connect
to add...
credentials for the server to do things with s3 will be in /opt/clearml/apiserver.conf.
the clearml github, search for a file named cleanup service dot py (or something to that effect)
For reproducibility, it kind of makes sense though. The existence of the file is contingent on the worker cloning the source code. I'm sure things can be done to maintain state differently but I personally adapted to the git-based workflow for managing files pretty quickly.
though yes I will admit I had the same thought first: why must I run it each time?
Beware: squash merges will ruin the ability to reproduce the experiment at that time since the git commit will be lost (presuming th...
I'm guessing this is done through code-server?
I'm currently rolling a JupyterHub instance (multiuser, with codeserver inside) on the same machine as clearml-server. That’s where tasks are executed etc. so, all browser dev env.
It sounds like there’s an option to basically bypass this latter step and just use clearml’s credentialing to accomplish much the same thing? Am I understanding clearml-session correctly?
namely, I'm very interested in testing this unmerged feature, will be trying to leverage it as soon as possible
None
but isnt that just the same as running agent in daemon mode? thats what i was hoping James could do
ah . that's a shame its under Enterprise only . no wonder I missed it .
im helping train my friend @<1798162804348293120:profile|FlutteringSeahorse49> on clearml to assist with his astrophysics research, and his university has a slurm cluster . So we're trying to figure out if we can launch an agent process on the cluster to pull work from the clearml queue (fwiw: on their cluster containers is not supported ) .
youre basically asking to sample from a distribution where not all parameters are mutually independent .
the short answer is no- this is not directly supported . optuna needs each hyperparam to be independent, so its up to you to handle the dependencies between parameters yourself unfortunately .
your solution of defining them independently and then using num_layers to potentially ignore other parameters is a valid one .
thank you!
I'll add a volume mount to the services-agent container, and from what I understand that will become the template it uses?
is this the structure of the file?
None
or is it the "dot" syntax (like what shows up in the console when the task executes / your snippet)?
my approach was to spin up an EC2 and run the deployment there from within the EBS volume mount.
I symlinked /opt/clearml
to /mnt/xvda/clearml
to minimize docker-compose changes. been working out fine so far.
with aws-cdk, the deployment steps can be automated (format the volume, clone a repo with the config, etc). I can link you to a resource that may help with that if you're interested.
you could also take the route of NOT specifying num_layers, and instead write your own code to create a set of viable layer designs to choose from and pass that as a parameter, so optuna selects from a countable set instead of suggesting integer values .
the downside of this is the lack of gradient information in the optimization process
@<1798162804348293120:profile|FlutteringSeahorse49> wants to start HPO though, so the desire is to deploy agents to listen to queues on the slurm cluster (perhaps the controller runs on his laptop).
would that still make sense?
waiting now to see if they disappear.
any problems you may have spotted with the versions used?
project hasn't disappeared just yet. but it's happened twice now
the dataset, task, and pipeline were under the same project name. i'm seeing what happens if the dataset project name was different ( f"{project_name}_data"
). which project would get deleted... the dataset one or the project of the task that kicked it off?
and the answer is...
the project is preserved, the dataset's project hidden.
so ... empty dataset names due to a small typo in parameter override + the choice for the dataset to have the same project name as the task that created it (...