AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 Hello! I Have An Issue Reproducing My Runs. The Task.Create Completes Successfully. When I Clone And Enqueue A Completed Task The Clone Fails. It Fails During The Python Requirements Installation. Why Is This? Do You Know How I Can Debug? Thank You In Adv

lastly try to add:

extra_pip_install_flags: ["--use-deprecated=legacy-resolver", ]

None

3 months ago

0 I'M Trying To Configure The Glue Agent To Use Aws Ecr Via Helm Charts. Below Is My Configuration. It Is Not Pulling The Image Though, It Is Failing With

Can you fix locally, just to verify ?

2 years ago

0 I'M Trying To Configure The Glue Agent To Use Aws Ecr Via Helm Charts. Below Is My Configuration. It Is Not Pulling The Image Though, It Is Failing With

I cannot test it at the moment, hence my question.
JuicyFox94 any chance you can blindly approve ?

2 years ago

0 I'M Trying To Configure The Glue Agent To Use Aws Ecr Via Helm Charts. Below Is My Configuration. It Is Not Pulling The Image Though, It Is Failing With

Merged, is it working for you now?

2 years ago

0 I'M Trying To Configure The Glue Agent To Use Aws Ecr Via Helm Charts. Below Is My Configuration. It Is Not Pulling The Image Though, It Is Failing With

Yes! Thanks so much for the quick turnaround

My pleasure 🙂

BTW: did you see this (it seems like the same bug?!)
https://github.com/allegroai/clearml-helm-charts/blob/0871e7383130411694482468c228c987b0f47753/charts/clearml-agent/templates/agentk8sglue-configmap.yaml#L14

2 years ago

0 Fatal: Could Not Read From Remote Repository. Please Make Sure You Have The Correct Access Rights And The Repository Exists.

in order to work with ssh cloning, one has to manually install openssh-client to the docker image, looks like that

Correct, you have to have SSH inside the container so that git can use it.
You can always install with the following setup inside your agent's clearml.conf:
extra_docker_shell_script: ["apt-get install -y openssh-client", ]
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L145

2 years ago

0 Fatal: Could Not Read From Remote Repository. Please Make Sure You Have The Correct Access Rights And The Repository Exists.

Hi MelancholyElk85

I have strong deja vu feeling. Credentials are OK. How to solve this? If you need the full log, how to share the full log without sharing private information? I'm fed up with this shit

Is this coming from the agent ?

2 years ago

0 Fatal: Could Not Read From Remote Repository. Please Make Sure You Have The Correct Access Rights And The Repository Exists.

I don't think so. it is solved by installing openssh-client to the docker image or by adding deploy token to the cloning url in web ui

You can also have the token (token==password) configured as the defauylt user/pass in your agent's clearml.conf
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L19

2 years ago

0 Fatal: Could Not Read From Remote Repository. Please Make Sure You Have The Correct Access Rights And The Repository Exists.

where is it in the docs?

https://clear.ml/docs/latest/docs/clearml_agent (section 6)
https://clear.ml/docs/latest/docs/configs/clearml_conf#agent-section

2 years ago

0 Hi, I'M Attempting To Use

I execute the

clearml-session

with

--docker

flag.

This is to control the docker image the agent will spin for you (think dev enviroment you want to work in, like nvidia pytorch container already having everything you need)

3 years ago

0 When We Train The Models, We Often Choose Checkpoint Based On The Validation Accuracy, But Test Set Accuracy (Or Specific Class Validation Accuracy) Is Not Necessarily The Best For This Checkpoint. Right Now There Are Options To Add Columns With Max And L

DilapidatedDucks58 I see ...
This might be more complicated that one would imagine, a simple solution might be to store a snapshot of the values every-time we reach a new maximum, a quick hack might be to add it as text on one of the task's parameters or properties (that we can later add to the table as custom column).
wdyt?

3 years ago

0 Hi Everyone, Is It Possible To Show The Upload Progress Of Artificats? E.G. I Use

server-->agent is fast, but agent-->server is slow.

Then multiple connection will not help, this is the bottleneck of the upload speed of your machine, regardless of what the target is (file-server, S3, etc...)

3 years ago

0 Hi

No sure I follow, you mean to launch it on the kubernretes cluster from the ClearML UI?
(like the clearml-k8s-glue ?)

2 years ago

0 Hi, I Am Trying To Execeute My Code On Nvidia/Cuda Docker, But It Keeps Running, It Is Not Failed Or Not Aborted. The Last Log Message Is

MysteriousBee56 that is so weird ... last one, I promise 🙂
docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo \$(which python3) && echo \$(which trains-agent)"

4 years ago

0 Hi, I Am Trying To Execeute My Code On Nvidia/Cuda Docker, But It Keeps Running, It Is Not Failed Or Not Aborted. The Last Log Message Is

ohh right, my bad:
docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && pip install trains-agent && echo done"

4 years ago

0 I Wanted To Suggest Something. We'Re Creating A Lot Of Projects And It Starts Getting A Bit Difficult To Navigate Through Them. I Think An Option To Have A Hierarchy In The Projects Can Be Very Useful.

Hi PompousBeetle71 , this actually fits with other feedback we received.
And for that reason it is already being worked on! 🙂

I have a few questions as we are designing the new interface.

I think our biggest question was, are projects like folders?
That is: I can have experiments in a project, but also sub-projects?
Or parent projects are a way to introduce hierarchy into the mess, which means a project has either experiments in it, or sub-projects, but not both
(obviously in both cases...

4 years ago

0 Hi, I Have Another Problem

Hi JitteryCoyote63
What do you have in the agent.cuda_version ?
(you can see it printed at the beginning of the log)

4 years ago

0 Hey All, We Are Trying To Clone A Task That Uses Custom Pip Installed Packages And Run It Via An Agent. When Running Locally, We Simply “

@<1523701079223570432:profile|ReassuredOwl55> did you try adding manually ?

./path/to/package

You can also do that from code:

Task.add_requirements("./path/to/package")
# notice you need to call Task.add_requirements before Task.init
task = Task.init(...)

one year ago

0 Hi, I Try To Write An Article On Medium About Clearml And Face Some A Problem With Plotly Figures. When Displaying The Figure Locally In A Browser Works Fine, But On The Cleaml Server (I Use The Free Tier Service) The Plot Is Empty And Has The Title 'Unkn

WickedGoat98 this is awesome! Let me know how I could help 🙂
BTW: I checked regrading the plot comparison, this is a BE issue due to the size of the plot, I was told a fix will be deployed in a day or two.

3 years ago

0 Hi Everyone, Is It Possible To Show The Upload Progress Of Artificats? E.G. I Use

I use

torch.save

to store some very large model, so it hangs forever when it uploads the model. Is there some flag to show a progress bar?

I'm assuming the upload is http upload (e.g. the default files server)?
If this is the case, the main issue we do not have callbacks on http uploads to update the progress (which I would love a PR for, but this is actually a "requests" issue)
I think we had a draft somewhere, but I'm not sure ...

3 years ago

0 Executed From Within A Pipelinecontroller Task, What Possible Reason Does

This is a part of a bigger process which times quite some time and resources, I hope I can try this soon if this will help get to the bottom of this

No worries, if you have another handle on how/why/when we loose the current Task, please share 🙂

3 years ago

0 Not Able To Resume A Hyper-Parameter Optmization.

Is this reproducible with the hpo example here:
https://github.com/allegroai/clearml/tree/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/examples/optimization/hyper-parameter-optimization

What's your clearml version? (And is it possible you verify with the latest version?)

2 years ago

0 Hi Team, Could We Just Share The Entire Project Instead Of Workspace ? I Tried Sharing With Link Of Particular Task But I Want To Share Entire Project Instead Of Every Tasks

Hi @<1536881167746207744:profile|EnormousGoose35>

, Could we just share the entire project instead of Workspace ?

You mean allow access to a project between workspaces ?
If the answer is yes, then unfortunatly the SaaS version (app.clear.ml) does not really support these level of RBAC, this is part of the enterprise version, which assumes a large organization with the need for that kind of access limit.
What is the use case ? Why not just share the entire workspace ?

one year ago

Hey WickedGoat98
I found the bug, it is due to the fact the numpy (passed to plotly) contains both datetime and nan, and plotly.js does not like it. I'll make sure this is fixed, in the meantime you can just remove the first row (it contains the nan):
df = pd.concat([tickerDf.Close, tickerDf_Change.Close_pcent], axis=1) df = df[1:]

3 years ago

0 Hi Everyone, Is It Possible To Show The Upload Progress Of Artificats? E.G. I Use

ReassuredTiger98 after 20 hours, was it done uploading ?
What do you see in the Task resource monitoring? (notice there is network_tx_mbs metric that should be accordig to this, 0.152)

3 years ago

0 Hello, In The Following Context:

task.wait_for_status() task.reload() task.artifacts["output"].get()

4 years ago

0 Is There A Way To Control How Many Parallel Connections Are Used When Downloading From

Hi ShakyJellyfish91

It seems clearml is using a single connection, that takes a long time download

Hmm, I found this one:
https://github.com/allegroai/clearml/blob/1cb5dbb276026644ae20fef63d58256cdc887818/clearml/storage/helper.py#L1763

Does max_connections=10 mean 10 concurrent connections ?

3 years ago

0 Hello, In The Following Context:

I called task.wait_for_status() to make sure the task is done

This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object

4 years ago

0 Is There A Way To Control How Many Parallel Connections Are Used When Downloading From

in clearml.conf we could have:
azure.storage { max_connections = 10 # containers: [ # { # account_name: "clearml" # account_key: "secret" # # container_name: # } # ] }Then in AzureContainerConfigurations :
` @classmethod
def from_config(cls, configuration):
...
class AzureContainerConfigurations(object):
def init(self, container_configs=None, max_connections=None):
...

3 years ago

0 Is There A Way To Control How Many Parallel Connections Are Used When Downloading From

Ohh wow

3 years ago

Show more results