AgitatedDove14

49 Questions, 8122 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8122

0 When I Run Experiments I Set

IntriguedRat44 If the monitoring only shows a single GPU (the selected one) it means it reads the correct CUDA_VISIBLE_DEVICES (this is how it knows that you are only using a selected GPU not all of them).
There is nothing else in the code that will change the OS environment.
Could you print os.environ['CUDA_VISIBLE_DEVICES'] while running the code to verify ?

4 years ago

0 Hi Everybody, I’M Getting Errors With Automatic Model Logging On Pytorch (Running On A Dockered Agent).

CrookedWalrus33 this is odd I tested the exact same code.
I suspect something with the environment maybe?
Whats the python version / OS ? also can you send full pipe freeze?
2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>Yes this is odd, it should add the content-type of the file (for example "application/x-tar" but you are getting N...

3 years ago

0 Hello, We Are Currently Working On A Hyperparameter Tuning Job For Object Detection Following This Tutorial

LOL, Okay I'm not sure we can do something that one.
You should probably increase the storage on your instance 🙂

4 years ago

0 Hi, I Am Having Difficulties When Using The Dataset Functionality. I Am Trying To Create A Dataset With The Following Simple Code:

Found it
GiganticTurtle0 you are 🧨 ! thank you for stumbling across this one as well.
Fix will be pushed later today 🙂

3 years ago

0 Hello, We Are Currently Working On A Hyperparameter Tuning Job For Object Detection Following This Tutorial

Could you explain how I can reproduce the missing jupyter notebook (i.e. the ipykernel_launcher.py)

4 years ago

0 How Can I Log My Configuration Like This? I Have A Dict Params = {'Data':{'Data_Key':123}, 'Model':{'Model_Key':123}}, But It Become Data/Datakey Instead Of An Foldable Config. In Addition, I Don'T Want To Name It As "General", Where Can I Change It?

Hi EnviousStarfish54
I think this is what you are after
task.connect_configuration(my_dict_here, name='my_section_name')
BTW:
if you do task.connect(a_flat_dict, name='new section') you will have the key/value in a section name called "new section"

5 years ago

0 Hi All, I Am Getting A Bunch Of This Kind Of Log Messages "Clearml.Storage - Info - Starting Upload: /Tmp/.Clearml.Upload_Model_6Ou50Pb1.Tmp =>" I Am Pretty Sure They Happen As A Part Of The Model Initialization About 10 Of Those, My Guess Is That Every T

RipeGoose2 you can put ut before/after the Task.init, the idea is for you to set it before any of the real training starts.
As for not effecting anything,
Try to add the callback and just have it returning None (which means skip over the model log process) let me know if this one works

4 years ago

0 Hi, We Are Having Some Issues With Model Snapshots Uploading To The Fileserver. We Configured Sdk.Development.Default_Output_Uri To Point To Our File Server, And When We Run Some Experiment We Can See Under The Models Tab Some Url Pointing To

RipeGoose2 That sounds familiar. Could you test with the latest RC?
pip install trains==0.16.4rc0

4 years ago

0 Hello, I'M Using Trains For Logging My Training Script. However, While Using The Logger I'M Getting This: Trains.Task - Warning - ### Task Stopped - User Aborted - Status Changed ### And Eventually The Process Is Killed. If I Disable The Logger, The Proc

SoreDragonfly16 could you reproduce the issue?
What's your OS? trains versions?

5 years ago

0 With

I made a custom image for the VMSS nodes, which is based on Ubuntu and has multiple CUDA versions installed, as well as conda and docker pre-installed.

This is very cool, any reason for not using dockers the multiple CUDA versions?

4 years ago

0 Trains Seems To Fail To Capture My Conda Environment, Any Idea? Os: Window 10

And still a difference between A/B , one detecting the repo the other does not?

5 years ago

0 I Uncommented The Line

Actually with

base-task-id

it uses the cached venv, thanks for this suggestion! Seems like this is equivalent to cloning via UI.

exactly !

But “cloning” via UI runs an exact copy of the code/config, not a variant,

You can override the commit/branch and get the latest ...

run exp tweak code/configs in IDE, or tweak configs via CLI have it re-rerun in exact same venv (with no install overhead etc)So you can actually launch it remotely directly from the code:
...

3 years ago

0 I Have Some Old Training Jobs That I Logged With Tensorboard, Is It Possible To Add Them To Clearml?

I can read them programmatically using tensorboard and the log the using clearml logger,

StaleButterfly40 this will be a great script to put somewhere (I'm sure you are not the only one with this problem). Maybe put it as a GitHub issue ? wdyt ?

3 years ago

0 Hi Everyone, I Am Running A Pipeline Using The Autoscaler, I Am Able To Spin Up The Vm Instance Using The Autoscaler And The Docker Is Also Getting Installed In There Perfectly. The Issue I Am Facing Is That During Executing A Pipeline Task While Cloning

Hmm I see, add this for example

extra_docker_shell_script: ["rm ~/.bashrc", "echo removed bashrc"]

None

one year ago

0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

basically

would allow blocking the machine from being scaled-in when

Oh this is what I was missing 🙂 That makes sense to me!
So what you are saying is that the AWS autoscaler agent, when it is launching a Task, inside the container you will set "protection flag" when the Task ends, you will unset "protection flag"
Is that correct?

3 years ago

0 Hello, We Are Currently Working On A Hyperparameter Tuning Job For Object Detection Following This Tutorial

Hi DeterminedToad86
I just verified on a clean sagemaker instance everything should just work, see here: https://demoapp.demo.clear.ml/projects/0e919ea1cc5c499b99e1ab85004b6e97/experiments/887edef09d4549e88b829a34c87d4d5b/output/execution Yes if you have more than one file (either notebook or python script) than you must have a git repo, in order to run the task using the Agent.

4 years ago

0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

how did you try to restart them ?

Yes, but how did you restart the agent on the remote machine ?

2 years ago

0 Hello All, We’Re Trying To Use

Hi @<1560798754280312832:profile|AntsyPenguin90>
The image itself is uploaded in a blackground process, flush just triggers the starting of the process.
Could it be that it is showing a few seconds after?

2 years ago

0 Getting This Error At

Hi TrickySheep9
Hmm I think you are correct, exit remotely will not work inside a jupyter notebook because it will not be able to close it.

I was just revising workflows that might be similar, wdyt?
https://clearml.slack.com/archives/CTK20V944/p1620506210463400?thread_ts=1614234125.066600&cid=CTK20V944

4 years ago

0 Hi, I Expect There Is A Limitation In Time The Free Service

WickedGoat98 are you running the agent with --gpus ?

4 years ago

0 Hi All, Looking For Some Help When Executing Pipelines With Custom Docker Images. I Have A Component Defined And I Expect Its Python Runtime Environment To Be Managed By A Custom Docker Image (

What’s interesting to me (as a ClearML newbie) is it’s clearly compiling that wheel using my host machine (MacOS).

Hmm kind of, and kind of not.
If you take a look at the Tasks created (regardless on how they are created,. pipeline, manually, etc.), you have a list of python packages required by the code, as they are detected at runtime (i.e. when the code was first executed, on the development machine). When creating a Pipeline controller (runner), the pipeline Tasks are just lists, ...

3 years ago

0 Clearml Team Is No Longer To Develp Clearml-Session..? I Wrote An Issue But Nobody Answer

looks like a great idea, I'll make sure to pass it along and that someone reply 🙂

2 years ago

0 Hi Community! I'M Currently Trying To Serve My Ai Model Using Clearml-Serving So I Can Access And Try My Model Through The Model Endpoint. Currently The Dataflow Of Clearml-Serving I Know Looks Like On This Diagram 1 (Model As A Rest Service). How Ever I

If this is the case why not have the stream process call the rest api, then move forward with the result? This way it scales out of the box, the main "conceptual" difference is that the restapi is used internally, and the upside is the event streaming processing becomes part of the application layer, not tied with the compute cost of the model , wdyt?

3 years ago

0 Hello All

Basically what I want is a

clearml-session

but with a docker container running JupyterHub instead of JupyterLab.

I missed that 🙂

The idea of clearml-session is to launch a container with jupyterlab (or vscode) on a remote machine, and connect the users machines (i.e. the machine executed the clearml-session CLI) directly into the container.

Pleacing the jupyterlab with JupyterHub will be meaningless here, becuase the idea it spins an instance (contai...

3 years ago

0 Is There A Way Clearml Can Be Stopped From Updating Dependencies When Cloning?

BroadSeaturtle49 agent RC is out with a fix:
pip3 install clearml-agent==1.5.0rc0Let me know if it solved the issue

2 years ago

0 Hi, I Try To Write An Article On Medium About Clearml And Face Some A Problem With Plotly Figures. When Displaying The Figure Locally In A Browser Works Fine, But On The Cleaml Server (I Use The Free Tier Service) The Plot Is Empty And Has The Title 'Unkn

WickedGoat98 what's the clearml version you are using?

4 years ago

0 Anyone Else Experiencing That Tasks That Fail Are Not Shown As "Failed", But Instead Keep On Running And Hogging The Clearml-Agent?

ReassuredTiger98 could you provide more information ? (versions, scenario. etc.)

4 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

Many thanks!

3 years ago

0 Hi Everyone, I Have A Question About Using

Hmm that is odd. Let me take a look and ask the guys. Thank you for quickly testing the RC! I'm hoping a new RC with a fix will be there tomorrow, if we can quickly replicate

one year ago

0 Getting This Error At

BTW:
TrickySheep9 what's the jupyter version / python version / OS ?

4 years ago

Show more results