AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hey, We'Ve Experienced Some Issues With Clearml Trigger Schedulers We Were Playing With In The Last Few Days. This Is What Happened:

Hi RotundHedgehog76
Notice that the "queued" is on the state of the Task, as well as the the tag
We tried to enqueue the stopped task at the particular queue and we added the particular tagWhat do you mean by specific queue ? this will trigger on any Queued Task with the 'particular-tag' ?

one year ago

0 Hi I Have An Issue Where Experiments Are All Showing That They Started From Iteration 0. This Is Even True For Experiments Which I Know Used To Show The Correct Iteration, So It Seems To Be Due To An Update Of The Web Interface. Here You Can See That Sup

this is not the case as all the scalars report the same iterations

MassiveHippopotamus56 could it be the the machine statistics? (i.e. cpu/gpu etc. these are considered scalars as well...)

2 years ago

0 Question Regarding Tensorboard (If There Is An Answer Here Already Please Send Me A Link). I Have A Few Graphs With The Same X Axis But Different Y Axis That Are Presented On Different Graphs In Tensorboard And For Some Reason Trains Joins Them On The Sam

BTW: CloudyHamster42 I think this issue was discussed on GitHub, and the final "verdict" was we should have an option to split/combine graphs on the UI side (i.e. similar to the "smoothing" or wall-time axis etc.)

4 years ago

0 Question About The Usage Of Trains Agents. In Our Company We Have 3 Hpc Servers, Two Of Them Have Multiple Gpus, One Is Cpu Only. I Saw In The Docs The Multiple Agents Can Be Run Separately Assigning Gpus In Whatever Manner You Want. My Questions Are 1

Hi WackyRabbit7 ,
Running in Docker mode provides you greater flexibility in terms of environment control, from switching cuda versions, to pre-compiled packages that are needed (think apt-get) etc. Specifically for DL if you are using multiple tensorflow versions, they are notorious for compiling against a specific CUDA version, and the only easy way to be able to switch between them would be different dockers. If your are a PyTorch user, then you are in luck, they have all the pytorch ver...

4 years ago

0 Hello. I Have An Issue In Regards To A Task That I Run As A Service ( Should Always Run). I Run The Clearml Server And Agents In Kubernetes. I Think This Is A Design Problem With The Way Clearml Agents Run On Kubernetes. The K8S Glue Will Launch A Worker

This means that if something happens with the k8s node the pod runs on,

Actually if the pod crashed (the pod not the Task) k8s should re spin it, no?

I also experience that if a worker pod running a task is terminated, clearml does not fail/abort the task.

From the k8s perspective, if the task ended (failed/completed) it always return with exit code 0, i.e. success. Because the agent was able to spin the Task. We do not want Tasks with exception to litter the k8s with endless r...

one year ago

0 Task Struck At

Hi PanickyMoth78

it was uploading fine for most of the day but now it is not uploading metrics and at the end

Where are you uploading metrics to (i.e. where is the clearml-server) ?
Are you seeing any retry logging on your console ?
packages/clearml/backend_interface/metrics/reporter.py", line 124, in wait_for_eventsThis seems to be consistent with waiting for metrics to be flushed to the backend, but usually you will see retry messages on your console when that happens

one year ago

0 I Found Here

Hi DrabCockroach54
I think the Kubernetes integration (k8s glue) is not part of the open-source features, and is only available as enterprise feature 😞

one year ago

0 Hi, Suppose, That I Run Server On Machine

RoundMosquito25 good news, no no need to open any ports 🙂
Basically B_i agents are always polling the server for "jobs" create an http/s request from them to the server, so all connections are out connections. Firewall is intact 🙂

one year ago

0 I Deployed A Model With:

Hmm seems like everything is working, can you check in the UI if you see the serving session ID in the DevOps project? maybe there are two, and you configured one an dthe docker-compose is running another ?

one year ago

0 I Deployed A Model With:

Hi ConvolutedSealion94
Just making sure, you spinned the docker-compose of the clearml serving as well ?

one year ago

0 I Deployed A Model With:

yes, I do, I added a

auxiliary_cfg

and I saw it immediately both in CLI and in the web ui

How many Tasks do you see in the UI in DevOps project with the system Tag SERVING-CONTROL-PLANE ?

TBH our Preprocess class has an import in it that points to a file that is not part of the preprocess.py so I have no idea how you think this can work.

ConvolutedSealion94 actually you can add an entire folder as preprocessing, including multiple files
See example des...

one year ago

0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

That seems reasonable to me:)

4 years ago

0 Hi Everyone, I'M Using The

AttractiveCockroach17 could it be Hydra actually kills these processes?
(I'm trying to figure out if we can fix something with the hydra integration so that it marks them as aborted)

2 years ago

0 Quest About

clearml-task

seems does not allow me passing the

run

argument without value

EnviousStarfish54 did you try --args run=True
I'm assuming run is a boolean of a sort ?

a. The submitted job would automatically download data from internal data repository, but it will be time consuming if data is re-downloaded every time. Does ClearML caching the data somewhere?

What do you mean by the agent will download the data ? are you referring to Dataset ?

3 years ago

0 Question About Using Agents. When Initializing An Agent, Credentials Are Required. As I See It, Credentials Is Something Personal, Which Belongs To Data Scientists Working Remotely Sharing The Same Server And The Same Set Of Agents. So I Wonder - Why Sho

So I wonder - why should an agent be related to a specific user's credentials? Is the right way to go about this is to create a "fake user" for the sake of the agent?

Very true you have to have credentials for the trains-agent, so it can "report" to the trains-server, that said, the creator of the Task (i.e. the person who cloned it) will be registered as the "user" in the UI.
I would recommend to create an "agent" user and put it's credentials on the trains-agent machine (the same way...

4 years ago

0 Hello! I'M Using The Self-Hosted Version Of Clearml. I'M Doing Some Testing And It Seems That The Clearml Isn'T Auto-Logging My Matplotlib Plots. The Versions I'M Using Are Matplotlib==3.6.2 And Clearml==1.6.4. Am I Missing Something?

FrothyShark37 any chance you can share snippet to reproduce?

one year ago

0 So, Here'S A Question. Does Clearml Automatically Save Everything Necessary To Continue Training A Pytorch Language Model? Specifically, I'Ve Been Looking At The Checkpoint Folders Created When I'M Training A Huggingface Robertaformaskedlm. I Checked What

Sorry, you are correct this is where the json is created:
https://github.com/huggingface/transformers/blob/040283170cd559b59b8eb37fe9fe8e99ff7edcbc/src/transformers/feature_extraction_utils.py#L470

other links are the function calling it. make sense ?

3 years ago

0 Anyway To Make A Job Fail If The Required Python Version (3.7 Vs 3.8 For Example) Is Not Available In The Agent?

Won't it be too harsh to have system wide restriction like that ?

3 years ago

0 Hi All, I Have Deployed A Clearml Server With Docker To One Of Our Local Machine. I Had Set Up The Filesserver Folder As Mount Point To The Cloud. How Easy Is It To Migrate Our Existing Experiments Later On To A Clearml Server That We Deploy In The Cloud

Hi @<1576381444509405184:profile|ManiacalLizard2>
If you make sure all server access is via a host name (i.e. instead of IP:port, use host_address:port), you should be able to replace it with cloud host on the same port

one year ago

0 Is It Possible To Give The Agent Access To Install Private Pip Packages (Needs To Be Installed From The Repo)?

I see..
Generally speaking If that is the case, I would think it might be better to use the docker mode, it offers way more stable environment, regardless on the host machine runinng the agent. Notice there is no need to use custom containers, as the agent will basically run the venv process, only inside a container, allowing you to reuse offf the shelf containers.

If you were to add this, where would you put it? I can use a modified version of

clearml-agent

Yep, that would b...

3 years ago

0 Hi, I'M Trying To Set Storage Manager To Use Our Internal Miniio Installation But I Ran Into This Issue With This Testing Code:

an implementation of this kind is interesting for you or do you suggest to fork

You mean adding a config map storing a default trains.conf for the agent?

3 years ago

0 Is There An Upgrade Guide From Trains To Clearml? I Can'T Seem To Find It.

Hey SarcasticSparrow10 see here 🙂
https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html#upgrading

3 years ago

0 What Is The Recommended Way To Stop The Execution Of A Specific Agent? This Command Doesn'T Allow Me To Specify The Agent Ip I Want To Stop:

Maybe this one?
https://github.com/allegroai/clearml/issues/448
I think it is already there (i.e. 1.1.1)

2 years ago

0 Brand New User Here. I’M Trying To Run An Optimization Task. The Tasks Resulting From The Optimization All Fail Because A Necessary Package Is Not Installed On Them. I Checked The Template Task And The List Of “Installed Packages” Indeed Does Not Have One

Hi BeefyHippopotamus73

. I checked the template task and the list of “Installed Packages” indeed does not have one of my required packages in the list.

Basically the "installed packages" is auto populated based on the directly imported packages n your code base.
Could it be you do not have import snowflake-connector-python and this is a derivative package (i.e. required from a different package)

BTW: when you clone your Task in the UI you can edit and add the missing packages,...

2 years ago

0 Hi Guys, I’M Using Clearml For A While Now, And I’M Trying To Install

DeliciousSeal67
are we talking about the agent failing to install the package ?

2 years ago

0 Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

JitteryCoyote63 so now everything works as expected ?

3 years ago

0 How Do People Solve This? If I Am Pip Installing A Custom Package From .Tar.Gz, How Can I Ensure That If I Run The Experiment (Initially Run From A Notebook) Via The Queueing It Can Be Properly Installed Steps - Notebook -> Get A Tar.Gz From S3 -> Pip I

Could you send the "installed packages" section of the Task that was created in the notebook ?

3 years ago

0 Hi All, I Was Wondering If It Is Possible To Set The Aws Autoscaler (And Other Aws Services Such As S3) To Assume The Permissions Of A Specific Iam Role. I Didn'T Find Any Reference To This In The Documentation

LovelyHamster1 what do you mean by "assume the permissions of a specific IAM Role" ?
In order to spin an ec2 instance (aws autoscaler) you have to have correct credentials, to pass those credentials you must create a key/secret pair to pass to the autoscaler. There is no direct support for IAM Role. Make sense ?

3 years ago

0 Hi, I Faced With A Silly Error, When I Run The Python Script With Task = Trains.Init(Project_Name='My Project', Task_Name='My Task'). The Task Goes To The Trains Server, But In The Trains Server, In Installed Packages Part One Of The Line

MysteriousBee56 yes, please change the trains code!!! Wee pee, if you think someone else can benefit, feel free to PR :)
Regrading the double entry, that seems like an odd bug, how can I reproduce it?

4 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

The problem is not really for the agents to wait (this is easily solved by additional high priority queue) the problem is will you have a "free" agent... you see my point ?

3 years ago

Show more results