AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 5 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Question About The Usage Of Trains Agents. In Our Company We Have 3 Hpc Servers, Two Of Them Have Multiple Gpus, One Is Cpu Only. I Saw In The Docs The Multiple Agents Can Be Run Separately Assigning Gpus In Whatever Manner You Want. My Questions Are 1

So I assume, trains assumes I have nvidia-docker installed on the agent machine?

docker + nvidia-docker-runtime are assumed to be installed
nvidia/cuda docaker image is pulled when requested (like any other container image)

Moreover, since I'm going to use Task.execute_remotely (and not through the UI) is there any code way to specify the docker image to be used?

Sure, task.set_base_docker(docker_cmd='nvidia/cuda -v /mnt:/tmp')
Notice that you can not only pass the dock...

4 years ago

0 Hi Guys, I Have Many Questions To Ask, Sorry If This Questions Were Posted Already - If The Answer Exist, Please, Point Me To It. Thank You For Your Help. I'M Training Object Detection Model Using Tf 2.3 Object Detection Api And Use Clearml On Local Serve

BTW MagnificentSeaurchin79 just making sure here:

but I don't see the loss plot in scalars

This is only with Detect API ?

3 years ago

0 Hi, I Run 'Manually' On My Local Machine With No Errors. Then, I Clone The Completed Task And Enqueue It. I Get To Stage When 'Environment Setup Completed Successfully'. But Right After I Get An Error Related To 'Connect' Method - Task.Connect(Config.Mode

After removing the task.connect lines, it encountered another error related to 'einops' that is not recognized. It does exist on my environment file but was not installed by the agent (according to what I see on 'Summary - installed python packages'. should I add this manually?

Yes, I'm assuming this is a derivative package that is needed by one of your packages?

Task.add_requirements("einops")
task = Task.init(...)

one year ago

0 Btw: There Seems To Be No Support For Videos In Tensorboard/Experiment View (E.G.

Well if we the "video" from TB is not in mp4/gif format than someone will have to encode it.
I was just pointing that for the encoding part we might need additional package

3 years ago

0 Btw: There Seems To Be No Support For Videos In Tensorboard/Experiment View (E.G.

ReassuredTiger98 in theory it should work, do you know what is actually stored ? (I mean reencoding it means you have to have opencv / ffmpeg which might be too much to ask)

3 years ago

0 Hello Team, I Got An Issue Of

None
None

11 months ago

0 Btw: There Seems To Be No Support For Videos In Tensorboard/Experiment View (E.G.

Thanks!

3 years ago

0 Btw: There Seems To Be No Support For Videos In Tensorboard/Experiment View (E.G.

👍

3 years ago

0 Btw: There Seems To Be No Support For Videos In Tensorboard/Experiment View (E.G.

ReassuredTiger98 do you know if tensorboard (not tensorboardX) also supports gif there ?

3 years ago

0 Hi All! I Have A Couple Of Things That Are Not Completely Clear To Me, Hope You Can Help Me To Sort Them Out.

yes, looks like. Is it possible?

Sounds odd...
Whats the exact project/task name?
And what is the output_uri?

3 years ago

0 Btw: There Seems To Be No Support For Videos In Tensorboard/Experiment View (E.G.

ReassuredTiger98 I think it is using moviepy for the encoding... No?

3 years ago

0 Crazy Idea:

I see, good point. It does look like mostly boiler plate code, not sure where it actually runs the python command, but I'm sure it is there (python.ts, but could not locate who is actually using it)

11 months ago

0 Hey, I'M Probably Being Thick Here But I Would Like To Pull Some Data From A Database And Write It To A Particular Bucket In S3 Within A Task I'M Doing. I'M Using Task.Upload_Artifact But Can'T Understand Where I Write The Bucket Path.

Okay, let me check soemthing

3 years ago

0 Did Someone Here Already Try The

It seems like the naming Task.create a lot of confusion (we are always open to suggestions and improvements). ReassuredTiger98 from your suggestion, it sounds like you would actually like more control in Task.init (let's leave Task.create aside, as its main function is Not to log the current running code, but to create an auxiliary Task).
Did I understand you correctly ?

3 years ago

0 Btw: There Seems To Be No Support For Videos In Tensorboard/Experiment View (E.G.

Ohh then we can definitely support it, could you maybe post a toy example for testing? Or even better PR it to the examples/tensorboardX folder?

3 years ago

0 Did Someone Here Already Try The

I can then programmatically choose which file to import with importlib. Is there a way to tell clearml programmatically to analyze the files, so it can built up the requirements correctly?

Sadly no 😞
It analyzes the running code, then if it decides it is not a self contained script it will analyze the entire repo ...

I just saw that

Task.create

takes

Task.create is Not Task.init. It is meant to allow you to create new Tasks (think Jobs) from ...

3 years ago

0 Did Someone Here Already Try The

For now I come to the conclusion, that keeping a

requirements.txt

and making clearml parse

Maybe we could just have that as another option?

3 years ago

0 Did Someone Here Already Try The

I see... Let me check something

3 years ago

0 Crazy Idea:

This is awesome man !

11 months ago

0 Did Someone Here Already Try The

Here is a nice hack for you:
Task.add_requirements( package_name='carla', package_version="> 0 ; python_version < '2.7' # this hack disables the pip install")This will essentially make sure the agent will skip the installation of the package, but at least you will know it is there.

3 years ago

0 Did Someone Here Already Try The

Yes, the mechanisms under the hood are quite complex, the automagic does not come for "free" 🙂
Anyhow, your perspective is understood. And as you mentioned I think your use case might be a bit less common. Nonetheless we will try to come-up with a solution (probably an argument for Task.init so you could specify a few more options for the auto package detection)

3 years ago

0 Hi, Recently Came Across Trains And Very Impressed By The Work So Far. But A Problem Has Been Bugging Me, This Is Part Of The Trains Log Files I Thought Might Be Useful From Cloning And Enqueuing The Same Task On 2 Remote Machines. The First Machine Defau

One more question, in the second log, trains agent is configured with Conda, on the first it is configured with pip, or at least this is what it looks like, can you confirm?

4 years ago

0 Hi! I Deployed Clearml Server Along With Jupyterhub On Azure K8S (Aks). The Way It Works Is That Every User Is Assigned A New Pod That Is Spawned With A Docker Image Of A Choice (One Of Them With Clearml Sdk Installed). I Managed To Configure Most Of The

from your jupyterlab can you do:
!curl

3 years ago

0 Did Someone Here Already Try The

if I use automatic code analysis it will not find all packages because of

importlib

.

But you can manually add them with Task.add_requirements, no?

3 years ago

0 Hello! I Was Hoping I Could Get Some Debug Help. I'Ve Set Up A Clearml Pipeline Using The Pipelinecontroller, And When Running Through

sets up the venv correctly, prints

Starting Task Execution:

then does nothing

Can you provide a log?
Do you see the code/git reference in the Pipeline Task details - Execution Tab ?

one year ago

0 Hi, Can I Run An

I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?

RoundMosquito25 you mean when you reach a limit of loss<Threshold or something similar ?

one year ago

0 Hi, I Am Trying To Setup Multi-Node Training With Pytorch Distributeddataparallel. Ddp Requres A Launch Script With A Set Of Parameters To Be Run On Each Node. One Of These Parameters Is Master Node Address. I Am Currently Using The Following Scheme:

Yes, i basically plan to use ClearML as user-friendly cluster manager

and it is 🙂
I think the main "drawback" is that you cannot "reserve" nodes for the multi-node training. The easiest solution is to have high-priority queue that is never used, and then have the DDP master process push into the high priority queue, which will ensure these are the next Tasks to be executed (now the only thing that is missing is preemption to running Tasks, but this automation policy is unfortunate...

3 years ago

0 [Caching Of Environment And Storage When Using Aws Auto Scaler]

I can see that the data is reloaded each time, even if the machine was not shut down in between.

You can verify by looking into the Task's Log, it will contain all the docker arguments, one of them should be the cache folder mount

one year ago

0 Hello! I Think I'Ve Found A Bug, But Couldn'T Fix It Completely To Make A Pull Request. I Want To Optimizer Hyperparameters With Trains.Automation But:

PungentLouse55 , make sure you fix the metric objective and args:
Add "General/" prefix to the list of arguments to optimize, and change the objective metric from "Accuracy" to "epoch_accuracy"

4 years ago

0 Hi, I Am Trying To Start A Poc With Server And Agent And A Git Repository That Has A Submodule. I Don'T Need The Agent To Try To Fetch The Submodule, Is There A Way To Control The Clone Command? Avoid Calling Submodules?

AstonishingSeaturtle47 How would the code run without the sub-modules? And what is the problem we are trying to solve? (Because unfortunately there is no switch to disable it)

4 years ago

Show more results