ManiacalLizard2

31 Questions, 239 Answers

Active since 05 June 2023

Last activity one month ago

Reputation

Badges 1

92 × Eureka!

Answers 239

0 Hi Everyone!! Is Clearml Multi Gpu Support. I Have 2 Machines. Each Machine Have 2 A100 Gpu. Do We Have Any Option To Run Workload In These Gpu. Same Time Like Virtually. Do We Have The Support??

if you have 2 agent serving the same queue and then send 2 task to that queue, each agent should take one task
But if you queue sequentially one task then wait until that task to finish and queue the next: then it will be random which agent will take the task. Can be the same on from the previous task
Are you saying that you have 1 agent running task, 1 agent sitting idle while there is a task waiting in the queue and no one is processing it ??

one year ago

0 Hi. From Python, Is There Away To Check How Many Worker Is There In A Given Queue ?

Should I get all the workers None
Then go through them and count how many is in my queue of interest ?

one month ago

0 Hello, I'M Trying To Use The Agent To Orchestrate Tasks - Our Install Is Quite Complicated And I'Ve Wrapped It All Up With The Code In A Docker Container; Is There A Way To Get The Agent To Just Run A Command In The Container Rather Than Try To Build/Inst

in that case yes. What happen is that in docker mode:
you run a clearml agent, that then receive a task
create a container
install another agent inside that container
then run that second agent inside the container
that second agent then pull the task and do the usuall build/install

CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true need to be set on that second agent somehow ...

one year ago

0 Hello, All. I’Ve Recently Started Experiencing A Weird Issue With Arg Parsing Where Any String Values Are Being Repeated As Lists Of Strings When The Values Are Sent To The Clearml Server (See Attached Screenshot). I Believe This Issue Started Around The

my code looks like this :

    parser = argparse.ArgumentParser()
    parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
                        help='train config file')
    parser.add_argument('-t', '--train-times', type=int, default=1,
                        help='train the same model several times')
    parser.add_argument('--dataset_dir', help='path to folder containing the preped dataset.', required=True)
    parser.add_argument('--backup', action='s...

one year ago

0 I Have A Problem Where My Clearml Doesn'T Pick Up From Uncommitted Changes. It Used To Work For A Long Time, But Now It Is Not Working. What Am I Missing?

I don't use submodule so don't really know how that behave with ClearML

one month ago

0 I Have A Problem Where My Clearml Doesn'T Pick Up From Uncommitted Changes. It Used To Work For A Long Time, But Now It Is Not Working. What Am I Missing?

Are the uncommit changes in un-tracked files ?
In other words: clearml will only save uncommited changes from files that are tracked by your local git repo

one month ago

0 I Have A Problem Where My Clearml Doesn'T Pick Up From Uncommitted Changes. It Used To Work For A Long Time, But Now It Is Not Working. What Am I Missing?

Looks like your issue is not that ClearML is not tracking your changes but more about your Configuration is overwrriten.
This often happen to me. The way I debug this is put a lot of print statement along the code to track when the Configuration is overwriten and narrow down why. print statement will show up in the Console tab.

one month ago

0 Dumb Question I Know But I Have Code Running On My Local Machine And I’M Using The Remotely_Execute() Call In There So It’D Go To My Gpu Machine

You don't need agent on your local machine.
You want an agent running on the GPU machine.
Local code will create an experiment in ClearML Server, then run up to the line remotely_execute() then stop
Once local code stop, the Clearml Server will take over and enqueue the experiment to the prescribe queue
The agent on the GPU see there is a experiment on its queue and then pull it and execute it. This time, clearml lib magic will make the code on the GPU machine, launched by the agent, run...

one month ago

0 Hi Everyone, I’M New To Clearml And Server Administration. We Are Considering Tools To Manage A Dgx H100 Server. Ideally, The Tool Could Provide "Sandboxes" That Are Already Equipped With All The Necessary Tools And Frameworks. This Way, Each Team Member

Feels like Docker, Kubernetes is more fit for that purpose ...

2 months ago

0 Hi, I Am Using

Looks like it's because I did not do the mmpretrain way with dict config file ...

8 months ago

0 Another Quick Question About Fileservers And Clearml-Agent: Clearml-Agent Seems To Ignore The Output Destination Set In The Task Config

nope, we are self-hosted in Azure

one year ago

0 Another Quick Question About Fileservers And Clearml-Agent: Clearml-Agent Seems To Ignore The Output Destination Set In The Task Config

oh ..... did not know about that ...

one year ago

0 Hi Team, Can Someone Help Better Understand The Prerequisites For Running A Cloned Task (If Any)? Background: I Was Able To Run And Complete A Task From Vscode. The Same Task, When Cloned And Enqueued Would Result In A File Or Directory Not Found Error. T

please provide the full logs and error message.

one year ago

0 How To Use Zscaler (Or Custom Certificate) With Clearml ? I Installed The Zscaler Certificate Into The Os System.

@<1523701087100473344:profile|SuccessfulKoala55> Thanks. Manage to get it working now with

export REQUESTS_CA_BUNDLE=/etc/ssl/certs/zscaler.crt

(Ubuntu system)

11 months ago

0 We Have Deployed Our Own Clearml Server In Azure. We Have 2 Separate Address For The Api And Web Server. Both Serving At Port 443 In The Local Pc Config File We Have Something Like:

Ok I think I found the issue. I had to point the file server to azure storage:

api {
    # Notice: 'host' is the api server (default port 8008), not the web server.
    api_server:


    web_server:


    files_server: "

"
    credentials {"access_key": "REDACTED", "secret_key": "REDACTED"}
}

one year ago

0 Is There A Command Line Interface That Lets You Query And Download Models From The Clearml Model Registry The Way You Can With Mlflow? Example:

something like this: None ?

one year ago

0 Question About Pipeline : My Setup Is As Follow:

thanks for all the pointer ! I will try to have a good play around

6 months ago

0 Another Questions Related To

interesting, the issue happen with mamba venv. Now I use a python native venv and it is detecting correctly

one year ago

0 Another Questions Related To

(I use ssh key to access to our git server in our private network)

one year ago

0 Another Questions Related To

that format is correct as I can run pip install -r requirements.txt
using the exact same file

one year ago

0 Another Questions Related To

Ok. Found the solution.
The importance is to use this:

Task.add_requirements("requirements.txt")
task = Task.init(project_name='hieutest', task_name='foo',reuse_last_task_id=False)

And not:

task = Task.init(project_name='hieutest', task_name='foo',reuse_last_task_id=False)
task.add_requirements("requirements.txt")

one year ago

0 Another Questions Related To

but then it still missing a bunch of library in the Taks (that succeed) > Execution > INSTALLED PACKAGES
So when I do a clone of that task, and try to run the clone, the task fail because it is missing python package 😞

one year ago

0 Another Questions Related To

is task.add_requirements("requirements.txt") redundant ?
Is ClearML always look for a requirements.txt in the repo root ?

one year ago

0 Another Questions Related To

it is actually in the repo root folder.

one year ago

0 Question About Pipeline : My Setup Is As Follow:

following your example, if the seeds are hard coded in the code, then git hash will detect if changed happen and the step need to be run or not

6 months ago

0 Question About Pipeline : My Setup Is As Follow:

how does it work if I create my pipeline from code ? Does the task will get the git repo state when first run and use commit hash and uncommited changed as "signature" ?

6 months ago

0 Question About Pipeline : My Setup Is As Follow:

To me the whole point of having pipeline is to have a system that "know" previous state and make "smart" decision on what should run and what not. If it's just about if then else, then code already handle all that.
And what I struggle a bit is to find doc on how it determine the existing state and how it make decision what to run. thus the initial question

6 months ago

0 Question About Pipeline : My Setup Is As Follow:

may be I will play around a bit and ask more specific questions .... It's just I cannot find much docs around how the pipeline caching work (which is the main point of pipeline ?)

6 months ago

0 Another Questions Related To

and in the train.py , I have task.add_requirements("requirements.txt")

one year ago

0 Hi Everyone! I'M Working On A Solution That Uses Clearml Agent Running On An Ec2 Instances. These Instances (And Agents) Are Provisioned Automatically And Listen To A Specific Clearml Queue. Different Users Can Send Jobs To This Queue And Therefore To Th

if you are on github.com , you can use Fine tune PAT token to limit access to minimum. Although the token will be tight to an account, it's quite easy to change to another one from another account.

6 months ago

Show more results