AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 Hi!

Ohh I see now the force SSH did not replace the user in the SSH link (only if the original was http), right ?

4 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

What's the exact error you are getting ?
(Maybe this is privilege error on the cache folder, what are the folders it is using, you can see in the configuration as well)

5 years ago

0 Hello All. I'M Generating An Outputmodel In One Task And Using It As An Inputmodel For Another Task. Since There'S Already A Timestamp On The Model Creation Date, Is There A Way To Get The Date From The Inputmodel?

Hi @<1545216070686609408:profile|EnthusiasticCow4>

is there a way to get the date from the InputModel?

You should be able to with model._get_model_data()
But I think we should have it all exposed, wdyt?

2 years ago

0 Hi Anyone

(I'll make sure we reply on the issue as well later)

4 years ago

0 Hi All. I Am Struggling With Integrating Plots Into My Task. Without The Plotting Code, The Task Never Completes The Execution And Seems To Hang. Also, The Plots Are Not Visible In The Plots Tab. I Am Running A For Loop For Different Models And Attemptin

Hi TenseOstrich47 whats the matplotlib version and clearml version you are using ?

4 years ago

0 Hi, I Am Trying To Setup An Auto Scaler, But I Am Getting The Following Dependency Error:

Hi SkinnyPanda43
Can you attache the full log?
Clearml agent is installed before your requirements.txt , at least in theory it should not collide

2 years ago

0 Hi, I'M Using The Autoscaler And Getting The Error

Hi CloudySwallow27

This error occurs randomly during training (in other words training does successfully start).

What's the cleamrl-agent version you are using, and the clearml version ?

3 years ago

0 Hi, I Was Wondering If Anyone Had A Similar Problem And How You Fixed It? My Code Fails On

DrabSwan66
Did you set "docker_install_opencv_libs: true" in your clearml.conf on the host machine ?
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docs/clearml.conf#L150
Just making sure, you are running clearml-agent in docker mode, correct?
What's the container you are using ?

4 years ago

0 Hi! I Was Wondering Regarding This Issue:

Okay, some progress, so what is the difference ?
Any chance the issue can be reproduced with a small toy code ?
Can you run the tqdm loop inside the code that exhibits the CR issue ? (maybe some initialization thing that is causing it to ignore the value?!)

4 years ago

0 How Can I Tell Clearml To Ignore Certain Submodules Existing In The Project? My Projects Consists Of Multiple Git Submodules And It Is Rather Annoying That The Task Always Tries To Fetch All Submodules, When They Are Not Even Necessary. I Don'T Know How I

Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround

one year ago

0 [Injecting Secrets Into A Clearml Agent / Accessing

The remaining problem is that this way, they are visible in the ClearML web UI which is potentially unsafe / bad practice, see screenshot below.

Ohhh that makes sense now, thank you 🙂
Assuming this is a one time credntials for every agent, you can add these arguments in the "extra_docker_arguments" in clearml.conf
Then make sure they are also listed in: hide_docker_command_env_vars which should cover the console log as well
https://github.com/allegroai/clearml-agent/blob/26e6...

3 years ago

0 Hello, Does Anybody Know What Triggers A New Model To Be Added In A Project (Working In Pytorch) ? I'M New To Trains And Adding It To My Script Generated A Huge Amount Of Models (Almost 1 Per Datapoint I Would Say) And It Would Also Prompt

Hi MiniatureShells8
The torch.save triggers the model creation.
If you are using the same filename, then the same model in the system will be used.
New filenames will create new models.
What exactly is your use case ?

5 years ago

0 Hi. After Upgrading Clearml To Latest Version, Got This Error From My Pipeline (Windows10, Configured And Running Tensorflowod For Tf 2.3.):

but instead, they cannot be run if the files they produce, were not committed.

The thing with git, if you have new files and you did not add them, they will not appear in the git diff, hence missing when running from the agent. Does that sound like your case?

4 years ago

0 Hi! I Am Using The Modelcheckpoint Callback From Tensorflow To Save The Best Model. When The Experiment Finishes If I Go On The Server To Experiment > Artifacts > Output Model I Can See The Model And Subsequently By Clicking On It The Weights. How Can I

Yes, that sounds like the issue, is the file actually there ?

4 years ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

Yes you can 🙂 (though not on the open-source version)

2 years ago

0 What Is

PipelineController works with default image, but it incurs overhead 4-5 min

You can try to spin the "services" queue without docker support, if there is no need for containers it will accelerate the process.

Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.

This error is about failing to clone the pipeline code repo, how is that connected to changing the container ?!
Can you provide the full log?

4 years ago

0 Hey Guys, I Believe

Hi CluelessElephant89

hey guys, I believe

clearml-agent-services

isn't necessary right?

Generally speaking, yes you are corrected 🙂
Specifically, this is the "services" queue agent, running your pipeline logic, services etc.
But it is not a must to get the server to work, and you can also spin it on a different host

4 years ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

I managed to set up my (Windows) laptop as a worker and reproduce the issue.

Any insight on how we can reproduce the issue?

one year ago

0 Hey Guys, Sorry For The Rapid Fire Questions In The Past Few Days. I Have Another Issue Though. I Initially Ran A Task, Directly From A Repo. It Succesfully Installed The Requirements From The Requirements File In The Repo And Ran The Task Without Any Iss

It runs into the above error when I clone the task or reset it.

from here:

AssertionError: ERROR: --resume checkpoint does not exist

I assume the "internal" code state changed, and now it is looking for a file that does not exist, how would your code state change, in other words why would it be looking for the file only when cloning? could it be you put the state on the Task, then you clone it (i.e. clone the exact same dict, and now the newly cloned Task "thinks" it resuming ?!)

3 years ago

0 Hello, Is It Possible For The Clearml-Agent In Docker Mode To Not Pull A Specific Docker Image, But To Build One From The Experiment Repository Using The Dockerfile And .Dockerignore Of The Experiment Repository?

Docker would recognise that image locally and just use it right? I won’t need to update that image often anyway

Correct 🙂

3 years ago

0 Hi, Is It Possible To Re-Use Task-Id, But Keep The Old Execution Tab ? (Git Diff Specifically).

Is there a way to connect to the task without initiating a new one without overriding the execution?

You can, but not with automagic, you can manually send metrics/logs...
Does that help? or do we need the automagic?

3 years ago

0 Hi, I Try To Execute Pipeline With Pipelinecontroller And Define It Like This: Pipe = Pipelinecontroller(

Hi @<1523719753099644928:profile|ImmenseMole52>

but tasks of this pipeline dont inherit docker and packages, why? I want to build or pull one docker and env for all pipeline steps only once, so ow can i do it?

you have to specify the docker image for the pipeline Tasks, by default it will not assume it is the same as the pipeline controller, basically just pass:

pipe.add_function_step(
        name="load_data",
        function=load_data,
        function_kwargs={"config": conf...

9 months ago

0 Good Morning, I'M Wondering If Someone Has Any Advice/Experience Configuring Clearml-Agent To Include Private Packages From Aws Codeartifact? So Far I Know I Have To Edit The

What do you have under the "installed packages" section? Also you can configure the agent to use poetry to restore the environment (instead of pip)

4 years ago

0 Hey Guys Trying To Save A Model Via The Outputmodel.Update_Weights Function I Get The Following Error:

of that makes sense, basically here is what you should do:

Task.init(... output_uri='

')
output_model.update_weights(register_uri=model_path)

It will automatically create a unique target folder / file under None to store your model
(btw: passing the register_uri basically sais: "I already uploaded the model there, just store the link" - i.e. does Not upload the model)

2 years ago

0 Question About Pipelines - So The Default For Pipeline Tasks That Are Executed Remotely Is To Execute On The

Hi WackyRabbit7
the services (or the agent running there) is spinning multiple Tasks (as opposed to regular agent where it is one task at a time).

how can I give this agent git access?

in the docker-compose you can configure the git credentials (user/pass or user/key it is the same).
https://github.com/allegroai/clearml-server/blob/d0e2313a24eb1248ebf0ddf31bf589de0d675562/docker/docker-compose.yml#L137

3 years ago

0 Hello, Is It Possible To Run Trains Offline Where There'S No Http Connection Between The Node Running The Job And Where The Web Ui Runs? I See In Your Diagram The Connection Between Training Machine And Trains Server (Which Contains The Web Ui) Is Over Ht

The import process actually creates a new Task every import, that said if you take a look here:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/task.py#L1733
you can pass a pre-existing Task ID to "import_task" https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/task.py#L1653

5 years ago

0 Hi, With The Upcoming Version Of Hydra It Seems The Binding Breaks. Specifically In The

Ohh sorry you will also need to fix the

def _patched_task_function

The parameter order is important as the partial call relies on it.

My bad no need for that 🙂

4 years ago

0 Hi

Almost forgot, pipeline screenshot 🙂

5 years ago

0 Hi There! Can Anybody Help Me With Specifying The 'Platform' For A Model In Clearml-Serving. I Am Using The K8S Clearml-Serving Setup (Version 1.3.1). I Already Tried A Bunch Of Variants Like

I think the real issue is that I am not able to specify a platform for the model,

None
there is no need to specify it, remove it from the config.pbtxt - the clearml-serving will automatically add the background

one year ago

0 Hello Guys, I Have 4 Workers (2 In Default And 2 In Service Queue On Same Machine) And Running A Cron Job Of Data Preparation.It Works Well For About 3 Days But After That Tasks Are Getting Failed By Their Own With Given Below Error.Can Anyone Help Me O

Hello guys, i have 4 workers (2 in default and 2 in service queue on same machine)

Hi @<1526734437587357696:profile|ShaggySquirrel23>
I think what happens is one agent is deleting it's cfg file when it is done, but at least in theory each one should have it's own cfg
One last request can you try with the agent's latest RC version 1.5.3rc2 ?

2 years ago

Show more results