SubstantialElk6

117 Questions, 310 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

282 × Eureka!

Answers 310

0 Hi, I'M Getting This Long Error When Running

[root@2c7498711bef elasticsearch]# curl `
{
"cluster_name" : "clearml",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" ...

4 years ago

0 Hi Community, I’Ve Just Posted My First Blog Post About Mlops. I Am Open To Any Suggestions.

Hi. nice read. Your permalink is wrong though, here's the right one.
https://cpatrickalves.com/mlops-what-it-is-and-why-does-it-matter

3 years ago

0 Hi, Can I Choose Not Print The Clearml-Agent Config Logs In The Console? Reason Is We Are Passing Credentials Via Env Var To The K8S Glue And Its Being Displayed In The Console As ...

Hi, this is the setup.

client
from clearml import Task, Logger task = Task.init(project_name='DETECTRON2',task_name='Train',task_type='training') task.set_base_docker("quay.io/fb/detectron2:v3 --env GIT_SSL_NO_VERIFY=true --env TRAINS_AGENT_GIT_USER=testuser --env TRAINS_AGENT_GIT_PASS=testuser" ) task.execute_remotely(queue_name="single_gpu", exit_process=True)
k8s_glue_example.py spawned a pod and starts running.

ClearML UI -> Experiment -> Results -> Console.
` At the top it will pri...

4 years ago

0 Hi, We Recently Upgraded Clearml To 1.1.1-135 . 1.1.1 . 2.14. The Task Init Is

Hi,
I'm running on Dell ECS storage appliance, which offers S3 compatibility.
yes http://ECS.ai is the DNS name of the server.
ClearML-models is the bucket.
Let me try with ip:port.

4 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

My assumption is that the agent will have pulled that off the client's clearml.conf.

4 years ago

0 Hi, I Am Running Several Python Scripts But All For The Same Project/Task. Is It Possible To Task.Init To Existing Running/Completed Task And Adding On The Results?

Hi, for both of them, args.lastiter is the exact same value. But when plotted out, they are 2 actually iterations apart.

4 years ago

0 Hi, I'M Using The K8S Glue And Have A Few Questions.

I think the default action of clearml-agent k8s glue when running a task is to create a virtual env and installing the dependancies. So i'm just checking how to change that behaviour to look at global instead.

4 years ago

0 Hi, I'M Working On A Post Deployment Data And Model Monitoring Using Clearml. The Idea Is This.

Hi SuccessfulKoala55 , just wondering how i can follow up on this.

3 years ago

0 Hi, I Would Like To Check What Would Be The Recommended Hardware Specs For The Server Host Clearml Server. I Had One Configured With 32 Cpu Cores, 64Gb Ram And I Noticed That If We Have A Surge In Remote Task Creation, The Following Delays Occurs.

The server is running only the ClearML components. Could you advise on the ELB part, how should we optimise it?

4 years ago

0 Hi, I Cant Access The Clearml Ui, Is There Something Wrong With Your Servers?

Hi, it looks like the entire http://clear.ml domain is offline for more than 12 hours. Main pages and documentation are inaccessible as well.

3 years ago

0 Hi, We Noted That Using K8S Glue, There Are Some Situations Where The Task Cannot Be Registered As Error And Will Be Stuck At Pending. An Example Of One Situation Is When The Task Is Pulling A Docker Image That Doesn'T Exist. Is There A Way To Catch Such

Oh, this meant i have been using the latest agent which is v1.0.0. The problems were still there.

4 years ago

0 Hi, Clearml Console Leaks Credentials Passed In As Env Vars. The Issue Remains With Clearml Version==1.1.1.135 - 1.1.1 - 2.1.4 (As Listed On The Profile Page) I Am Using K8S Glue And The Clearml.Conf Has The Following In The Agent Section.

Can this issue be solved with vault? It doesn't make sense to expose secrets like that.

4 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

i see. Can i take it that when the client uses
task.execute_remotely(queue_name="1gpu", exit_process=True)then none of the content in its clearml.conf will be used, except for the API part. And Clearml simply uses whatever is on the Agent side.
api { # Notice: 'host' is the api server (default port 8008), not the web server. api_server: web_server: files_server: # Credentials are generated using the webapp, `
# Override with os environment: ...

4 years ago

0 Hi, I Have A Future Roadmap Question On Clearml-Datasets. The Current Implementation Works Well For Small Datasets But Its Rather In Effective For Very Large Datasets. For Example, Let'S Say I Have 10 Million Images Just For The Training Dataset, And My T

Yes! I definitely think this is important, and hopefully we will see something there

(or at least in the docs)

Hi AgitatedDove14 , any updates in the docs to demonstrate this yet?

4 years ago

0 Hi, I Notice A New Behavuour With Clearml-Agent=1.1.0. When It Is Installing The Packages I Nrequirements.Txt, It Failed With.

Its hard to tell, but the agent change was a significant one. Unless python versions has something to do with it.

4 years ago

0 Hi. Try To Use Clearml On Work. I'M Have Problem With Clearml-Agent, Because On Work We Dont Have Internet Acceses. For Install Packages We Used Mirror Pypi (Not All Packages) And Manualy Add Package On Disk With Line In Pip.Conf --Follow-Link=~/Pypi. It

I used nvcr pytorch image and instruct clearml to inherit global dependencies. No need to install torch and work well.

4 years ago

0 Hi, I Would Like To Pass In Some Pip Arguments That Clearml-Agent Would Include When Setting Up The Venv On The Containers. How Should I Specify This? The Argument In Question Are --Trusted-Host And --Find-Links . I Need Them As I'Ve Installed A Pypi Repo

Do you mean this?
Removing containers section: [{'image': 'clearml-agent:latest"', 'env': [{'name': 'PIP_INDEX_URL', 'value': ' '},

4 years ago

I'm also noticing a lot of this while the k8s glue is running.
Ex: Expecting value: line 1 column 1 (char 0) K8S Glue pods monitor: Failed parsing kubectl output:

4 years ago

0 Hi, We Are Having An Interesting Issue Here. We Serve Many Users And Each User Has Their Own Credentials In Accessing The Private Git Repo. We Can'T Seem To Find A Way For The End User To Pass In Their Git Credentials When They Run Their Codes In Both Age

alright. 🙂

4 years ago

0 Hi, I Would Like To Understand More On How Clearml Deal With Codes.

I see i understand better now. Thanks.

4 years ago

0 Hi, I Would Like To Understand More On How Clearml Deal With Codes.

Hi thanks.
So i suppose ClearML make use of the information in .git folder at the root of the script folder to gather those info.

I have yet to go through thoroughly with ClearML agent. TimelyPenguin76 , so if i run a training with uncommited changes and didn't commit/push after. When i clone the task, isn't ClearML agent unable to pull that script from the git repo?

4 years ago

0 Hi, Several Changes Occurred Recently And I Would Like To Know If There'S A Way To Verbose Catch All The Printout That Happening Within A K8S Glue Spawned Pod. We Have An Issue Where All Of Our New Remote_Execution Tasks Are Stuck In The 'Pending' Stage.

Hi, i dont't think clearml agent actually ran at that point in time. All i can see in the pod is
apt install of libpthread-stubs, libx11, libxau and libxcb1 packages. pip install of clearml-agentAfter the above are successful, the pod just hang there.

4 years ago

0 Hi, I Started My Agent Using. Clearml-Agent Daemon --Gpus 0 --Queue Gpu --Docker --Foreground, With The Following Parameters In Clearml.Conf.

Thank. Gonna try that out. But i hit another snag. Strangely, the Agent is not creating the right venv. This is what the Agent created.
` pip:

asn1crypto==0.24.0
attrs==20.3.0
certifi==2020.12.5
chardet==4.0.0
cryptography==2.1.4
Cython==0.29.22
furl==2.1.0
future==0.18.2
humanfriendly==9.1
idna==2.6
importlib-metadata==3.7.0
jsonschema==3.2.0
keyring==10.6.0
keyrings.alt==3.0
orderedmultidict==1.0.1
pathlib2==2.3.5
psutil==5.8.0
pycrypto==2.6.1
pygobject...

4 years ago

0 Hi, How Can I Pass A Env Variable To The Docker That'S Running The Agent When I Run This? I'M Havving Issues With The Agent'S Git Clone Where It Requires Sslverification To Be Disabled. Clearml-Agent Daemon --Gpus 0 --Queue Gpu --Docker --Foreground

Any advice on this?

4 years ago

0 Hi I Saw This On The Clearml-Agent Docs But Other Than The Docker Image, I'M Not Sure How To Integrate This With Clearml Py And Clearml-Server. Please Advise.

Hi, so this means if i want to use Kubernetes, i would have to 'manually' install multiple agents on all the worker nodes?

4 years ago

0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

which clearml.conf is it refering to? I'm executing on my client, which is then remotely executed by the agent. Both of them has ~/clearml.conf.

4 years ago

0 Hi, I Have A Scenario Where When The Code Is Run Remotely Via Clearml-Agent, The Code Appears To Get Stuck At

Is there anyway to see an error log from that?

2 years ago

0 Hi, I Am Trying To Use Clearml-Data To Upload My Data To S3, Which Is Password Protected. How Should I Indicate The Credentials After I Set --Storage S3://.... ?

Got that thanks. Just to better understand. When clearml-data upload my recursive folder of image data, it convert it into a compressed form with a different folder structure than the original datasets.

When my software pull the data, i'm returned a str. How would we manipulate the data from there?

4 years ago

0 Hi, I Had A Task Successfully Completed. Then I Cloned It And Enqueued It Again Without Any Changes. But The Task Ends Up With An Error. Here'S The Logs, Not Sure What Went Wrong.

I'm also beginning to think this is related to https://clearml.slack.com/archives/CTK20V944/p1620664770492400 . Previously when i set force_repo_requirements_txt=true and system_site_packages: true , it seems to work. upgrading to v1.02 seems to change things.

4 years ago

0 Hi, How Can I Make A Stage In A Clearml Pipeline Non-Blocking? The Scenario Is That Stages Downstream Needed Runtime Info From The First Stage, However The First Stage Needs To Continue Running To Act As A Monitor For The Other Downstream Stages.

If we run all the rank 0 and rank n tasks individually, it's defeats the purpose of using ClearML.

2 years ago

Show more results