AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hi, When Trying To Use A Remote Agent To Train A Model, The Initial Environment Setup On The Remote Machine Fails Because The List Of Requirements Located In /Tmp/Cached-Reqsaw90Argk.Txt Contains A Link To An Aarch64 Wheel:

Thanks TroubledJellyfish71 I manged to locate the bug (and indeed it's the new aarach package support)
I'll make sure we push an RC in the next few days, until then as a workaround, you can put the full link (http) to the torch wheel
BTW: 1.11 is the first version to support aarch64, if you request a lower torch version, you will not encounter the bug

3 years ago

0 Hi, What Is The Eta For Clearml-Server 1.3?

Hi DisturbedElk70
I think in a few hours 🙂

3 years ago

0 Hello, Is It Possible To Run Trains Offline Where There'S No Http Connection Between The Node Running The Job And Where The Web Ui Runs? I See In Your Diagram The Connection Between Training Machine And Trains Server (Which Contains The Web Ui) Is Over Ht

BTW: you can quite easily add an option to set the offline folder, check here:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/config/init.py#L31
PRs are always appreciated :)

5 years ago

0 Please Tell Me, Is The Limit Of 10 Copies For Comparison, Is It Ideological Or Can It Be Changed Somehow?

CheerfulGorilla72 as I understand there were some delays wit the current release, so it is going to be out this week. The one after that includes this feature and as far as I understand would be mid Dec.

2 years ago

0 Collecting Click Using Cached Click-8.0.1-Py3-None-Any.Whl (97 Kb)

So it makes sense it installs v8.0.1
(maybe originally you provided no version and it installed the latest one)
This is basically pip's doing the package version resolving

4 years ago

0 Hi, I'M Eric. I'M An Mlops Engineer At A Company With 9 De'S, 6 Ds'S, And 2 Mlops Engineers. I Just Learned About Clearml A Few Hours Ago And I'M Getting Excited About It!! I'M Wondering If We Could Replace Our Current Mlops Platform With Clearml. Right N

We almost didn't go with ClearML because of this, but then I suggested some steps,

😞
Just in case @<1557175205510516736:profile|ShallowSwan53>
None
None
None
None

2 years ago

0 When Using Docker Mode (And Specifically K8S Glue), What Are The Options For Caching? One Option Is Definitely Having A Base Image That Has The Things Needed. Anything Else? Thanks!

pip cache & git cache & venvs cache
Are all supported, you just need to map the folders.
If you do not want to spin a PVC with NFS mount, you can just mount an S3 bucket with s3fs as part of the container extra bash script,
https://github.com/allegroai/clearml-agent/blob/b39b54bbafab39e6731cb742fdf317bc6dcae54a/docs/clearml.conf#L140

s3 FUSE fuse filesystems:
https://github.com/kahing/goofys
https://github.com/s3fs-fuse/s3fs-fuse

WDYT?

4 years ago

0 Hi I Came Across Some Inconsistency In The Iteration Reporting In The Clearml With Pytorch-Lightning When Calling Trainer.Fit Multiple Times, Before I Dive In I Wondered If There Is A Known Issue Related To This?

but the debug samples and monitored performance metric show a different count

Hmm could you expand on what you are getting, and what you are expecting to get

4 years ago

0 Maybe This Is More A Git Question Than A Clearml Question, But How Do I Get The Clearml_Agent_Git_User And Clearml_Agent_Git_Pass For Step 11 In

None
Change to:

CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER:my_git_user_here}

and the same for the password.
You can also just set the environment variables before launching docker-compose, whatever is more convenient for you

4 years ago

0 How Come

ShinyLobster84

fatal: could not read Username for '

': terminal prompts disabled

This is the main issue, it needs git credentials to clone the repo code, containing the pipeline logic (this is the exact same behaviour as pipeline v1 execute_remotely(), which is now the default, could it be that before you executed the pipeline logic, locally ?)
WackyRabbit7 could the local/remote pipeline logic could apply in your case as well ?

3 years ago

0 How Come

Is this a common case? maybe we should change the run_pipeline_steps_locally argument to False?
(The idea of run_pipeline_steps_locally=True is that it will be easier to debug the entire pipeline on the same machine)

3 years ago

0 Having Issues Running Trains-Server On Win10. Trains-Elastic Exited With Code 137 Trains-Mongo Exited With Code 100 Trains-Apiserver Exited With Code 1 Some Errors=> Requests.Exceptions.Connectionerror: Httpconnectionpool(Host='Elasticsearch', Port=9200

LazyLeopard18 well done on locating the issue.
Yes Docker on Windows is a bit flacky...

5 years ago

0 Hi, I’M Training On Multi-Node, Clearml Captures Only A Single Machine Utility (Memory/Cpu/Etc.). I Assume It Captures Node 0. Is There A Way To Make It Report All Nodes?

multiple machines and reporting to the same task.

Out of curiosity , how do you launch it on multiple machines?

reporting to the same task.

So the "funny" think is, they all report on on top (overwriting) the other...
In order for them to report individually, it might be that you need multiple Tasks (i.e. one per machine)
Maybe we could somehow have prefix with rank on the cpu/network etc?! or should it be a different "title", wdyt?

2 years ago

0 I'M Using

Let me check what we can do 😉

3 years ago

0 Hi Guys, I Managed To Set Up A Kubernetes Cluster And Install Trains Into It. While Testing My Set-Up I Run The Test_Reporting.Py Example

or do you mean the machine I ran the experiment locally?

Yes this one

4 years ago

0 Hi! I’M Running An Experiment As Follows:

Now I’m just wondering if I could remove the PIP install at the very beginning, so it starts straightaway

AbruptCow41 CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 does exactly that 🙂 BTW, I would just set the venv cache and this means it will just be able to restore the entire thing (even if you have changed the requirements
https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L115

2 years ago

0 Hi, I Tried To Setup Clearml Serving And Ran The Example Given

Containers are not running

? but you are running the docker-compose, how come no containers are running ?

3 years ago

0 Hi Everyone, Yesterday I Pushed An Experiment To The

Is it still running ?

4 years ago

0 Hi All, Looking For Some Help When Executing Pipelines With Custom Docker Images. I Have A Component Defined And I Expect Its Python Runtime Environment To Be Managed By A Custom Docker Image (

If this is the case, there is nothing you need to change, just provide the docker image (no need to pass packages )

3 years ago

0 Hi, I'M Trying To Set Up My Trains-Server And I'M Getting The Following:

sudo curl -L " -s)-$(uname -m)" -o /usr/local/bin/docker-compose

4 years ago

0 Hi There, I Have A Package Called

Hi IrritableGiraffe81

I have a package called

feast[redis]

in my requirements.txt file.

This means feast is installing additional packages, once the agent is done installing everything, it basically calls pipe freeze and stores back All the packages including versions
Now the question is, how come redis is not installed.
Notice that the Task already has the autodetected packages (it basically ignores requirem,ents.txt as it is often not full missing or just wrong)
...

3 years ago

0 Hi, I'M Using Huggingface Trainer, Is There A Way To Capture Grad_Norm Per Layer? Thanks!

Hi @<1558624430622511104:profile|PanickyBee11>
You mean this is not automatically logged? do you have a callback that logs it in HF?

6 months ago

0 Hi All, Looking For Some Help When Executing Pipelines With Custom Docker Images. I Have A Component Defined And I Expect Its Python Runtime Environment To Be Managed By A Custom Docker Image (

What’s interesting to me (as a ClearML newbie) is it’s clearly compiling that wheel using my host machine (MacOS).

Hmm kind of, and kind of not.
If you take a look at the Tasks created (regardless on how they are created,. pipeline, manually, etc.), you have a list of python packages required by the code, as they are detected at runtime (i.e. when the code was first executed, on the development machine). When creating a Pipeline controller (runner), the pipeline Tasks are just lists, ...

3 years ago

0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

DeliciousBluewhale87 could you restart the pod and ssh to the Host and make sure the folder /opt/clearml/agent exists and there is not *.conf file in it ?

4 years ago

0 Executed From Within A Pipelinecontroller Task, What Possible Reason Does

[Assuming the above is what you are seeing]
What I "think" is happening is that the Pipeline creates it's own Task. When the pipeline completes, it closes it's own Task, basically making any later calls to Tasl.current_task() return None, because there is no active Task. I think this is the reason that when you are calling process_results(...) you end up with None.
For a quick fix, you can do
pipeline = Pipeline(...) MedianPredictionCollector.process_results(pipeline._task)Maybe we should...

3 years ago

0 Hi All! I Have A Question About Pipelines. My Pipeline Consists Of Several Steps:

GrotesqueDog77 when you say "the second issue" , do you mean the fact that both step 1 and step 2 should have access to the same filesystem?

2 years ago

0 Hey, I'M Looking Into The Aws Autoscaler. I Couldn'T Find The Task In My Ui, So I Ran The

no need for it actually

4 years ago

0 Hi, I Tried To Setup Clearml Serving And Ran The Example Given

GrittyHawk31
what are you getting when you are running:
docker psand what are you getting with:
netstat -natp | grep LISTEN

3 years ago

0 Hello! I'M Running Clearml-Server On Kubernetes, And It Seems My Models Are Not Really Saved. I See That Doing Task.Init(Output_Uri=True) Should Send Models To Fileserver. The Models Are Visible In The Ui But The Download Button Is Greyed Out And When I D

...I'm not sure I follow, the clearml-task is designed to always be used so that at the end the agent will be running the Task. What am I missing?