SubstantialElk6

117 Questions, 310 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

282 × Eureka!

Answers 310

0 Hi, I Was Using The K8S Glue And It Worked Fine On One Project But Didn'T Work On Another. At The Point Just Before A Git Clone Was Executed, I Get The Error

I'm using the k8s Glue.

3 years ago

0 Hi, I'M Attempting To Upgrade My Clearml Server On Offline Env. I Wish To Retain All Existing Data. Can I Check If It Suffice To Just Docker-Compose Down --Remove-Orphans Replace Clearml-Server:Latest And Clearml-Agent-Services:Latest With Latest Pull.

Its. 0.17-63.
It doesn't appear in profile page.

3 years ago

0 I Had A Good Look At All The Introduction Video On Youtube And Had Some Questions. Context: If We Are Going To Deploy And Maintain Clearml Servers Our Self In Azure:

I also think it make sense that when you do certain definitive CI actions like publish, it would support some custom scripts to run.

one year ago

0 Hi I Saw This On The Clearml-Agent Docs But Other Than The Docker Image, I'M Not Sure How To Integrate This With Clearml Py And Clearml-Server. Please Advise.

python k8s_glue_example.py --queue gpu --namespace default
Traceback (most recent call last):
File "k8s_glue_example.py", line 86, in <module>
  main()
File "k8s_glue_example.py", line 80, in main
  namespace=args.namespace,
File "/home/administrator/clearml-agent-k8s/venv/lib/python3.6/site-packages/clearml_agent/helper/base.py", line 239, in _ call _
  cls. instances[cls] = super(Singleton, cls). call_(*args, **kwargs)
TypeError: _ init _() got an unexpected keyword argument 'base_pod...

3 years ago

0 Hi, I Was Trying Out The Steps On This (

Hi yes, still getting the SSLs. It looks like some incompatibility with the OS ssl libraries.

3 years ago

0 Hi, How Can I Pass A Env Variable To The Docker That'S Running The Agent When I Run This? I'M Havving Issues With The Agent'S Git Clone Where It Requires Sslverification To Be Disabled. Clearml-Agent Daemon --Gpus 0 --Queue Gpu --Docker --Foreground

Sorry take back. Just realised that this argument only worked on running the agent, but when you enqueue a task into this agent, the argument is not passed on to the container that the agent spawned.

This is the same issue for the docker image. It reverts back to nvidia/cuda:10.1-runtime-ubuntu18.04 despite me setting something else.

3 years ago

0 Hi, Trying To Understand Clearml-Session. I Have An Agent Running On A Machine Monitoring A Queue Then I Ran Clearml-Session --Queue Myqueu --Docker Torch-Image. The Clearml Session Ended Up Tunneling Into The Physical Machine That My Agent Is Running

Hi, I was expecting to see the container rather then the actual physical machine. For example, in the file panel on the left of the jupyter panel, I see the file contents of the physical machine. I was expecting this to be the container.

3 years ago

0 I Just Getting This In My Agent Run Task. Would Appreciate If Someone Can Advise Where I Externalrequirement Is Pointing At.

yah i got that too. This happens when i run the client code on the same machine as the clearml-agent. So i'm wondering if sharing the same clearml.conf cause that problem. Is there a way to specify the clearml.conf instead of defaulting to ~/clearml.conf?

3 years ago

0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

I managed to find out why. The docker image I'm using is not set as root user thus the error. But I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.

3 years ago

Hi it is missing --docker on the agent. Thanks! Dynamic GPU option only available with Enterprise version right?

3 years ago

0 Hi, How Can I Make A Stage In A Clearml Pipeline Non-Blocking? The Scenario Is That Stages Downstream Needed Runtime Info From The First Stage, However The First Stage Needs To Continue Running To Act As A Monitor For The Other Downstream Stages.

Yes it is! But ClearML didn't support multi node training out of the box in a way that it streamline the process. So we are trying to figure out a way to do it.

one year ago

0 Hi, My Devsecops Team Has Raised Some Issues Of Us Deploying Clearml For Use. In Particular, They Are Not Happy With Docker.Sock Configuration As It Would Potentially Expose The Entire Cluster To Unauthorised View. Can We Do Without It?

Hi thanks. How about Agent, does its docker mode or k8s mode require docker.sock to be exposed?

3 years ago

0 Hi, How Might I Use The Sdk To Pull Parameters Of The Agent'S Clearml.Conf Into My Code During Runtime? For Example, If I Wish To Pull The Configuration For Aws.S3.Credentials.Key And Aws.S3.Credentials.Secret?

thanks, let me try that.

3 years ago

0 Hi, I'M Using The K8S Glue And Have A Few Questions.

I think the default action of clearml-agent k8s glue when running a task is to create a virtual env and installing the dependancies. So i'm just checking how to change that behaviour to look at global instead.

3 years ago

0 Hi, We Are Having An Interesting Issue Here. We Serve Many Users And Each User Has Their Own Credentials In Accessing The Private Git Repo. We Can'T Seem To Find A Way For The End User To Pass In Their Git Credentials When They Run Their Codes In Both Age

No i didn't indicate this particular issue on the git issue. Only the apply template.yml is on the issue.

3 years ago

0 Hi, I'M Getting This Long Error When Running

[root@2c7498711bef elasticsearch]# curl `
{
"cluster_name" : "clearml",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" ...

3 years ago

0 Hi, We Recently Upgraded Clearml To 1.1.1-135 . 1.1.1 . 2.14. The Task Init Is

Hi, when i tried ip:port, it references the right host and bucket....BUT... the file is not found on the ECS S3 even though i can see from the logs that it states Completed model upload to s3://ecs.ai:80/clearml-models/artifacts/ ...

3 years ago

0 Hi, Can I Choose Not Print The Clearml-Agent Config Logs In The Console? Reason Is We Are Passing Credentials Via Env Var To The K8S Glue And Its Being Displayed In The Console As ...

Hi, this is the setup.

client
from clearml import Task, Logger task = Task.init(project_name='DETECTRON2',task_name='Train',task_type='training') task.set_base_docker("quay.io/fb/detectron2:v3 --env GIT_SSL_NO_VERIFY=true --env TRAINS_AGENT_GIT_USER=testuser --env TRAINS_AGENT_GIT_PASS=testuser" ) task.execute_remotely(queue_name="single_gpu", exit_process=True)
k8s_glue_example.py spawned a pod and starts running.

ClearML UI -> Experiment -> Results -> Console.
` At the top it will pri...

3 years ago

0 Hi, I Started My Agent Using. Clearml-Agent Daemon --Gpus 0 --Queue Gpu --Docker --Foreground, With The Following Parameters In Clearml.Conf.

Hi,
It did, nvidia/cuda:10.1-runtime-ubuntu18.04.

So if i need to set this every time, what is the following config for? And how do i pass in new env parameters?
` default_docker: {
# default docker image to use when running in docker mode
image: "dockerrepo/mydocker:custom"

    # optional arguments to pass to docker image
    # arguments: ["--ipc=host", ]
    arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `

3 years ago

0 Hi, I Have A Future Roadmap Question On Clearml-Datasets. The Current Implementation Works Well For Small Datasets But Its Rather In Effective For Very Large Datasets. For Example, Let'S Say I Have 10 Million Images Just For The Training Dataset, And My T

This one can be solved with shared cache + pipeline step, refreshing the cache in the shared cache machine.

Would you have an example of this in your code blogs to demonstrate this utilisation?

3 years ago

0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

Its actually in your documentation. Its removed since 0.17 apparently.
https://allegro.ai/clearml/docs/docs/release_notes/ver_0_17.html#clearml-agent-0-17-2

And this is my logs, it tried to install something and encountered permission denied. It wouldn't if it obeyed the force_repo_requirements_txt.

1620664917916 Kahs-MacBook-Pro.local info ClearML Task: created new task id=024a421c0e174650a1c7ff64af756c26 ClearML results page: `
1620664920359 Kahs-MacBook-Pro.local info ClearML Mon...

3 years ago

0 Sorry Folks Too Many Questions - If I Have A Project (And I Set The Output Uri In It While Creating, To A S3 Folder) How Can I Ensure That A Experiment (Task) That I Run On My Local Outputs The Model To The Uri?

Hi, i was reading this thread and wondered which version of clearml-server and clearml-agent has this taken effect with?

3 years ago

0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

which clearml.conf is it refering to? I'm executing on my client, which is then remotely executed by the agent. Both of them has ~/clearml.conf.

3 years ago

0 Hi, Is There A Command I Can Use To Generate A Report That Can

ok thanks! will try it out.

3 years ago

0 Hi, I Am Trying To Use Clearml-Data To Upload My Data To S3, Which Is Password Protected. How Should I Indicate The Credentials After I Set --Storage S3://.... ?

Got that thanks. Just to better understand. When clearml-data upload my recursive folder of image data, it convert it into a compressed form with a different folder structure than the original datasets.

When my software pull the data, i'm returned a str. How would we manipulate the data from there?

3 years ago

0 Hi, Several Changes Occurred Recently And I Would Like To Know If There'S A Way To Verbose Catch All The Printout That Happening Within A K8S Glue Spawned Pod. We Have An Issue Where All Of Our New Remote_Execution Tasks Are Stuck In The 'Pending' Stage.

ok. Any idea what can go on between the setting up of clearml-agent and initialising the clearml-agent itself? Does the clearml-agent try to communicate with any internet address. From another perspective, it looks like a long time out issue. I happen to be deploying on a disconnected on-premise setup.

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

It would make sense on a very large resource cluster. Unfortunately we only have less than 50 GPUs to share across. A multi-tenant SAAS would cut the resources into even more smaller clusters and not help with efficiency. Or would you have a suggestion?

3 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

The problem is resolved by doing a git push. Somehow the git diff didn't capture the difference in requirements.txt in the project. I can't reproduce the same issue after this as well.

3 years ago

I can't seem to find the version number on the clearml web app. Is there a specific way?

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Hi Jake, thanks for the suggestion, let me try it out.

3 years ago

Show more results