AgitatedDove14

49 Questions, 8122 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8122

0 Hello Everone, I Have Hosted Clearml Server And Trained A Yolov8 Model To Test My Installations. The Model Was Trained Successfully And I Tried To Optimize The Hyderparameters By Using The Sample Code From Clearml But Im Getting Some Error In Doing So An

I think you are correct, None values should be listed as empty values not the String None.
What's the clearml version you are using? And could you retest with the latest RC?

one year ago

is the base Task a file or a notebook ?

one year ago

0 ..

None

2 years ago

0 Hi, Is There A Possibility To Use One Gpu Card With 2 Agents Concurrently (There Are Tasks That Need Only Fraction Of A Card)

Hi, is there a possibility to use one GPU card with 2 agents concurrently

RoundMosquito25 / EnviousPanda91
You need to change the WORKER_ID (no two workers can share the same ID)
CLEARML_WORKER_ID="machine:gpu01" clearml-agent daemon ....

2 years ago

0 How Can I Ensure Tasks In A Pipeline Have The Same Environment As The Pipeline Itself? It Seems A Bit Counter-Intuitive That The Pipeline (Executed Remotely) Captures The Local Environment, But The Tasks (Executed Remotely) Do Not Use That Same Environmen

what format should I specify it

requirements.txt format e.g. ["package >= 1.2.3"]

Would this enforce that package on various components

This is a per component control, so you can have different packages / containers based on the componnent

Would it then no longer capture import statements?

This is replacing the auto detected packages, but obviously this fails to detect your internal repo package, which is the main issue here.
How is "internal package" installed, in o...

2 years ago

0 Hi All! I Am Currently Using A Self-Hosted Clearml Server And Was Looking To Integrate The Clearml Agent To Make Better Usage Of Our Hpc Resources With Gpu Autoscaling. I Am Aware That Clearml Already Supports Aws Autoscaler (In The Pro-Tier), But My Tea

Hi @<1600661428556009472:profile|HighCoyote66>

However, we need to allocate resources to ourselves manually, using an

srun

command or

sbatch

Long story short, there is a full SLURM integration, basically you push a job into the ClearML queue and it produces a slurm job that uses the agent to setup the venv/container and run your Task, but this is only part of the enterprise version 😞
You can however do the following (notice this is ...

one year ago

0 Hi Community

Great to hear!

2 years ago

0 Trains Seems To Fail To Capture My Conda Environment, Any Idea? Os: Window 10

EnviousStarfish54 something is also off in the git detection, it has not remote address, it just says "origin"
Any chance you have no git server ?
Regrading the installed packages, any chance you can send a sample code for me to debug ?

4 years ago

0 Is It Possible To Report A Static Html To A Task And Have It Shown In The Ui? I Used The Following:

Hi HandsomeCrow5 hmm interesting use case,
we have seen html reports as artifacts, then you can press "download" and it should open in another tab, what would you expect on "debug samples" ?

5 years ago

0 Are There Any Particular System Dependencies Needed To Enable

with tensorboard logging, it works fine when running from my machine, but not when running remotely in an agent.

This is odd, could you send the full Task log?

one year ago

0 [Pipeline] Hey, Is It Possible To Specify The Output Uri For Pipelines And Their Components Using Pipeline Decorators? I Would Like To Store Pipeline Artifacts And Component Artifacts On S3.

So the way it works when you run a component the return value with the entire function execution is cached, basically:

this did NOT add the artifact to the pipeline via caching on subsequent runs ❌

you just need to do:

PipelineDecorator.upload_artifact(name='images', artifact_object=img_dir, wait_on_upload=True)
return Task.current_task().artifacts['images'].url

This will return the URL of the uploaded images (i.e. S3 bucket)
which means if this is cached you will get it...

2 years ago

0 Heya, Just Wanting To Know The State Of Current And Planned Features In Term Of Secret Managements/Key-Value Stores To Be Used In Clearml Pipelines, If They Exists Of Course. That Would Be Very Handy And Less Cumbersome Than Having To Deal With External S

Hi FierceHamster54
Thanks for bringing it up 🙂

... in term of secret managements/key-value stores

Currently the open-source version does not include the Vault support (e.g. secret management), this is something they added to the enterprise version a few versions away, and as far as I understand this is a per user/project/company granularity feature (i.e. company wide merging with project merging with user specific).
Is this what you are looking for or am I missing something ?

2 years ago

I'm not sure I'm the right person to answer that, but yes my understanding is that this is a Scale/Enterprise tier feature, at least for the time being.

2 years ago

0 Hi Team. I Want To Use Cleargpt For My Org. Requested For A Demo. What'S The Pricing, Is There Any Free Trial First?

Hi @<1624941407783358464:profile|GrievingTiger47>
I think you should try to contact the sales guys here: None

one year ago

0 Hi All, I'M Looking For An Easy Clean Way To Check If The Code Is Executed By A Clearml Agent Programatically, Any Suggestions?

Task.running_localy()
Should do the trick

4 years ago

0 Hi, I Am Trying To Execeute My Code On Nvidia/Cuda Docker, But It Keeps Running, It Is Not Failed Or Not Aborted. The Last Log Message Is

MysteriousBee56 Edit in your ~/trains.conf:
api_server: http://localhost:8008
to
api_server: http://192.168.1.11:8008
and obliviously the same for web & files

I'll make sure we fix the trains-agent to output an error message instead of trying to silently keep accessing the API server

Getting you machine ip:
just run :
ifconfig | grep 'inet addr:'Then you should see a bunch of lines, pick the one that does not start with 127 or 172
Then to verify run
ping <my_ip_here>

5 years ago

0 Hi Maya, Can You Please Copy The Response For "Events.Get_Task_Plots" Request From The Network Tab In The Browser Developer Tools (F12)?

FranticCormorant35 DeterminedCrab71 please continue the discussion in this thread

5 years ago

0 Hi All, I'M Trying To Use The Relatively New Jupyter Preview Feature But For Some Reason I Have The Notebook Artifact Under Artifacts But The Preview Is Unavailable.. Am I Missing Some Needed Steps? Thanks!

(RC will be out in a few days)

4 years ago

0 Hi, I'M Using Clearml'S Hosted Free Saas Offering. I'M Running Model Training In Pytorch On A Server And Pushing Metrics To Cml. I'Ve Noticed That Anytime My Training Job Fails Due To Gpu Oom Issues, Cml Marks The Job As

Hi JumpyPig73
Funny enough this is being fixed as we speak 🙂
The main issue is that as you mentioned, ClearML does not "detect" the exit code when os.exit() is called, and this is why it is "missing" the failed test (because as mentioned, all exceptions are caught). This should be fixed in the next RC

3 years ago

0 Autoscaler Parallelization Issue: I Have An Aws Autoscaler Set Up With A Resource That Has A Max Of 3 Instances Assigned To The

This looks good to me...
I will have to look into it, because it should not download it...

3 years ago

0 Hi All, I Observed That When I Get A Dataset With

Is there any documentation on versioning for Datasets?

You mean how to select the version name ?

3 years ago

0 Hi, I'M Having Trouble Using Task.Clone And Task.Create- I'M Running Two Experiments One After The Other, And I Would Like To Report The Second Experiment To A New Task (New Experiment On The Server) But It Doesn'T Work. The Flow Is Task.Init -> Experimen

HappyLion37 did you check the https://github.com/allegroai/trains/tree/master/examples/services/hyper-parameter-optimization ?
You can very quickly get it distributed as well

5 years ago

0 I Have A Reporting Task I Want To Schedule Using Taskscheduler. 2 Main Input Params Are

the Task scheduler itself is a Task. What we did is we added a new parameter section on the Task (the task.connect call), so that we can later clone and modify it and use the new value in runtime
(Task.connect will put the data from the Task/UI back into the dict when the agent is running the Scheduler)
Does that make sense?

3 years ago

0 Has Anyone Tried Using Clearml With Ray Based Distributed Training For Computer Vision Models Like Resnet?

Should work out of the box, maybe the only thing to notice is that you will get a Task for every local_rank 0 process
does that make sense ?

one year ago

0 Hi! How To Add Files Locally To

at means I need to pass a single zip file to

path

argument in

add_files

, right?

actually the opposite, you pass a folder (of files) to add_files. Then add_files remembers the files location (and pre calculates the hash of the files content). When you call upload it will actually compress the files that changed into a zip file (or files depending on the chunk size), and upload the files to the destination (as specified in the upload call...

3 years ago

0 Anyone Seeing These Errors?

is it consistent ? (the error), meaning it happens on other integer values ?

3 years ago

0 Hi Everyone, I'M Running Into A Weird Error When Trying To Clone And Run And Task That Has Completed Successfully. I Have A Test Task That Loads A Dummy Dataset And Trains A Toy Model With Pytorch. When Running Remotely, I Use My Own Docker Image That Has

Hi @<1533620191232004096:profile|NuttyLobster9>
First nice workaround!
Second could you send the full log? When the venv is skipped then pytorch resolving should be skipped as well, and no error should be raised...
And Lastly could you also send the log of the task that executed correctly (the one you cloned), because you are correct it should have been the same

one year ago

RipeGoose2 yes that will work 🙂
That said, we should probably fix the S3 credentials popup 😉

4 years ago

0 Hi All! I Am A Bit Confused As To How The Python Environment Is Set. I Can Submit Jobs That Build The Environment And Run Perfectly Fine. But, If I Abort The Job -> Requeue It From The Gui, Then A Different Environment Is Installed (Which Has Some Package

this is very odd, can you post the log?

one year ago

Hi RipeGoose2
Can you try with the latest from git ?
pip install -U git+

4 years ago

Show more results