SubstantialElk6

117 Questions, 310 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

282 × Eureka!

Answers 310

0 Hi, In The New Datasets Ui. It Doesn'T Seem To Display The Entire Lineage Of The Datasets. For Example. If A Dataset Is Create As Such Id1 (Parent)->Id2, Then Another Dataset Created As Id2(Parent)-> Id3. When You Look At Id3, It Only Shows Id2 As Parent.

Hi CostlyOstrich36 , That's correct.

2 years ago

0 Hi, I Can'T Seem To Set A Password To Clearml, Anyone Seems To Be Able To Just Enter The Username And They Can Enter That Username'S Workspace.

It's a local deployment. I was only presented with username without a need to enter passwords. When I'm in, I don't see an option in my profile to set a password as well. Neither is there integration with ldap for example.

3 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

The problem is resolved by doing a git push. Somehow the git diff didn't capture the difference in requirements.txt in the project. I can't reproduce the same issue after this as well.

3 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

Hi AgitatedDove14 , that's what i am trying to figure out as well. The task has nothing to do with torch, and the requirements.txt doesn't have any torch packages as well.

3 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

Hi, i can't seem to find the source. What are the kind of situations where it will try to install torch outside of user requirements?

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Hi.

We tried as advised above and it still didn't work.
Host: http://ecs.ai:443
output_uri = S3://ecs.ai:443/bucketname

This time round the client gave this error.
Botocore.exceptions.connectiinclosederror: connection was closed before we received a valid response from endpoint URL: ' http://ecs.ai/bucketname/.clearml.test '.

It's quite apparent that whatever clearml passed to boto3 ends up as a http call instead of https, which is wrong.

3 years ago

0 Hi We Have Had Some Crashes On Clearml Server And It Was Caused By Clearml Uploading The Models Into Clearml Server (By Default). Is It Possible To Have An Overriding Config So Clients Can Never Upload To Clearml Server Itself As Default?

Hi. If we disable the API service, how will it affect the system? How do we disable?

2 years ago

0 Hi Guys, Thanks For The Previous Discussion On Ml-Ops With Clearml Agent. I'M Still Not Sure How To Monitor A Training Job On K8S (That Wasn'T Scheduled By Clearml). My Clearml Server Is Deployed And Functional For Tracking Non-K8S Jobs. But For A K8S Job

Hi HelpfulDeer76 , I'm facing similar issues. Would you mind describing in detail how you deploy clearml-agent? Is it running as a pod on k8s?

3 years ago

0 Hi, I Am Running Several Python Scripts But All For The Same Project/Task. Is It Possible To Task.Init To Existing Running/Completed Task And Adding On The Results?

Hi,
basically i run this block first and ended the script.
task = Task.init(project_name="afro-nmt", task_name=args.taskname, continue_last_task=args.taskid) Logger.current_logger().report_scalar(title="BLEU",series="JW300",value=args.jwbleu, iteration=args.lastiter)Then i run another script, with series different.
` task = Task.init(project_name="afro-nmt", task_name=args.taskname, continue_last_task=args.taskid)
Logger.current_logger().report_scalar(title="BLEU",series="SS900",value=arg...

3 years ago

0 Hi, I Am Running Several Python Scripts But All For The Same Project/Task. Is It Possible To Task.Init To Existing Running/Completed Task And Adding On The Results?

It didn't work as expected.
` task init
task report iter 10

task init
task report iter 10

The second task pushed the reporting iteration to 20 instead. `

3 years ago

0 Hi, I Am Running Several Python Scripts But All For The Same Project/Task. Is It Possible To Task.Init To Existing Running/Completed Task And Adding On The Results?

Hi TimelyPenguin76 , i am adding a debug sample to an existing task using the above method. What should i put for the iteration? I do not want to overwrite existing ones but i do not know what's the last count. This is for both scalar and media reporting.

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

It would make sense on a very large resource cluster. Unfortunately we only have less than 50 GPUs to share across. A multi-tenant SAAS would cut the resources into even more smaller clusters and not help with efficiency. Or would you have a suggestion?

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Ok, let me check this out first thing on Monday, thanks AgitatedDove14 .

3 years ago

0 Hi, How Might I Use The Sdk To Pull Parameters Of The Agent'S Clearml.Conf Into My Code During Runtime? For Example, If I Wish To Pull The Configuration For Aws.S3.Credentials.Key And Aws.S3.Credentials.Secret?

thanks, let me try that.

3 years ago

0 Hi, I Started My Agent Using. Clearml-Agent Daemon --Gpus 0 --Queue Gpu --Docker --Foreground, With The Following Parameters In Clearml.Conf.

Ok that works. thanks.

3 years ago

0 Hi, Clearml Console Leaks Credentials Passed In As Env Vars. The Issue Remains With Clearml Version==1.1.1.135 - 1.1.1 - 2.1.4 (As Listed On The Profile Page) I Am Using K8S Glue And The Clearml.Conf Has The Following In The Agent Section.

Can this issue be solved with vault? It doesn't make sense to expose secrets like that.

3 years ago

0 Hi, I'M Having Problems With The Installed Packages When Creating An Experiment. The Installed Packages Used To Be A List With The Versions Of All The Installed Packages In The Venv. However, Now I Get The Following:

Previously we had similar issues when we switched images used in agent. Might want to check on that.

3 years ago

0 Hi, Can I Do A Quick Check If All The Documentation I Find On Trains Are Still Valid For Clearml? Specifically, I Am Looking At Integration Of Clearml And Kubernetes.

Thanks. This appears to be solely for web UI and API, What if i want to orchestrate on K8S?

3 years ago

0 Hi, I Am Running Several Python Scripts But All For The Same Project/Task. Is It Possible To Task.Init To Existing Running/Completed Task And Adding On The Results?

Thanks TimelyPenguin76 , let me try it out now.

3 years ago

0 Hi, We Recently Upgraded Clearml To 1.1.1-135 . 1.1.1 . 2.14. The Task Init Is

Hi, when i tried ip:port, it references the right host and bucket....BUT... the file is not found on the ECS S3 even though i can see from the logs that it states Completed model upload to s3://ecs.ai:80/clearml-models/artifacts/ ...

3 years ago

0 Hi, Several Changes Occurred Recently And I Would Like To Know If There'S A Way To Verbose Catch All The Printout That Happening Within A K8S Glue Spawned Pod. We Have An Issue Where All Of Our New Remote_Execution Tasks Are Stuck In The 'Pending' Stage.

I did notice that in the tmp folder, .clearml_agent.xxxxx.cfg does not exists.

3 years ago

and out of curiosity, what did you think we were talking about? cos i didn't see anywhere else that might print the secrets.

3 years ago

0 Hi! I'M Trying Clearml 1.1.3. I'M Trying To Get A Dataset With

Thought this looked familiar.
https://clearml.slack.com/archives/CTK20V944/p1635323823155700?thread_ts=1635323823.155700&cid=CTK20V944

3 years ago

0 Hi Everyone! I'Ve Noticed That If I Run An Experiment And It Fails, The Clearml Agent Will Delete All Datasets That Have Been Downloaded During The Run. Is It Correct Behavior? How Can I Force The Agent To Preserve Such Datasets?

From an efficiency perspective, we should be pulling data as we feed into training. That said, always a good idea to uncompress large zip files and store them as smaller ones that allow you to batch pull for training.

2 years ago

0 Hi, We Are Using Gitlab And It Is A Security Requirement To Use Ssh Keys To Access The Repos For Each Individual. We Are Also Using K8S Glue. Is There Any Provisions To Do This Seamlessly?

And any roadmap on this? The organisation's on ssh auth is firm. This can end up not possible to use ClearML for remote execution.

3 years ago

This is the log i extracted.

3 years ago

0 I Had A Good Look At All The Introduction Video On Youtube And Had Some Questions. Context: If We Are Going To Deploy And Maintain Clearml Servers Our Self In Azure:

I also think it make sense that when you do certain definitive CI actions like publish, it would support some custom scripts to run.

one year ago

0 Hi, Is There A Way To Use Api To Return The Urls Of All The Models In The Clearml Repository?

Ok thanks.

one year ago

0 Hi, How Can I Make A Stage In A Clearml Pipeline Non-Blocking? The Scenario Is That Stages Downstream Needed Runtime Info From The First Stage, However The First Stage Needs To Continue Running To Act As A Monitor For The Other Downstream Stages.

The first stage is a rank0 pytorch script. The downstream stages are rankN scripts, they are waiting for the IP address of the first stage. But the first stage doesn’t return, it simply waits for the rankN scripts to connect to it. But in this case, the rankN scripts doesn’t start. So its probably necessary to have just a single stage.

If i were to start a single rank0, and subsequent rankN tasks, it would be rather messy on ClearML Dashboard. Best to have either a single clearml application...

one year ago

Ok. That brings me back to the spawned pod. At this point, clearml-agent and its config would be a controbuting factor. Is the absence of /tmp/.clearml_agent.xxxxxx.cfg an issue?

3 years ago

Show more results