AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Why Does My Task Execution Freeze After Pip Installation (Running Agent In Foreground Mode)? The Task Is:

Yep 🙂

2 years ago

0 Why Does My Task Execution Freeze After Pip Installation (Running Agent In Foreground Mode)? The Task Is:

AdventurousButterfly15 this one is quite self container:
https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py

So I guess pip install finished working
But the task is evidently not being executed.

This is very odd ... you can run the agent with debugging with --debug --foreground to see all the outputs and logs

2 years ago

0 Why Does My Task Execution Freeze After Pip Installation (Running Agent In Foreground Mode)? The Task Is:

Yes it seems so 😞

2 years ago

0 Hi All, Where Does The Installed Packages List Populate From In The Task Viewer?

Hi EnchantingOstrich20
You how doe s clearml get it there?
In runtime it analyzes the code you are running looking for imports then checks the version you have actively used (i.e. active venv / python) and lists it there.
You can also override those in code, or edit them after you clone the ask and before you enqueue it for remote execution

2 years ago

0 Hi! I’M Running An Experiment As Follows:

Now I’m just wondering if I could remove the PIP install at the very beginning, so it starts straightaway

AbruptCow41 CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 does exactly that 🙂 BTW, I would just set the venv cache and this means it will just be able to restore the entire thing (even if you have changed the requirements
https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L115

2 years ago

0 Hi, I Am Considering Making Automated Backups Of My Clearml-Server Using Amazon Ebs Snapshots. Should I Be Concerned With The Same Problem Described Here >

Hi JitteryCoyote63
So the main issue is backing up the elastic & mongo DB while they are running, once they are backed/restored, the server will spin as is. (Let me check regrading the reddis, it might be that since it is used for caching there is no need to actually backup the content only the configuration)

3 years ago

0 Hi! I Am Running A Code From Repository, Which Is Cloned By The Following Command:

EnviousPanda91 this seems like a specific issue with the clearml-task cli, could that be ?
Can you send a full clearml-task command-line to test ?

one year ago

0 I Have A Question About The Clearml Self Hosted Instance, I Notice There Is Elastic Search, Mondodb, And Redis In The Helm Chart Are These Required Or Can We Bring Our Own? I'M Wondering What Happens If I Were To Host The Instance And One Of These Were

I'm wondering what happens if i were to host the instance and one of these were to go down from time to time in production, as the deployments provided by the helm chart are not redundant.

Long story short, it will break the clearml-server, please do not take them down, if you do need to do that, also take down the clearml-server. The python clients will wait until it is up again, so no session would be destroyed

2 years ago

0 Hi, I Try To Optimize My Hyperparamters With

Hi ConvincingSwan15

For the train.py do I need a setup.py file in my repo to work corerctly with the agent ? For now it is just the path to train,py

I'm assuming the train.py is part of the repository, no?
If it is, how come the agent after cloning the repository cannot find it ?
Could it be it was accidentally not added to the git repo ?

3 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

just to check. Does the k8s glue install torch by default?

SubstantialElk6 what do you mean the glue installs torch ?
The glue will take a Task from the queue create a k8s job (basically use the same docker and inside the docker run get the agent to execute the requested Task). Where would the "torch" come into play?

3 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

SubstantialElk6
Hmm do you have torch in the "installed packages" section of the Task ?
(This what the agent is using to setup the environment inside the docker, running as a pod)

3 years ago

0 Hey, I’M Getting A Lot Of These

CourageousKoala93 when you call Task.close() it will mark the task as completed, there is no need to do that manually. The idea with mark_completed is that you can forcefully change the state if needed, or externally stop the task and mark it completed. Make sense?

2 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

SubstantialElk6 "Execution Tab" scroll down you should have "Installed Packages" section, what do you have there?

3 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

Nice SubstantialElk6 !
BTW: you can configure your cleaml client to store the changes from the latest Pushed commit (and not the default which is latest local commit)
see store_code_diff_from_remote: in clearml.conf:
https://github.com/allegroai/clearml/blob/9b962bae4b1ccc448e1807e1688fe193454c1da1/docs/clearml.conf#L150

3 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

That didn’t gave useful infos, was that docker was not installed in the agent machine x)

JitteryCoyote63 you mean "docker" was not installed and it did not throw an error ?

3 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Let's assume the host has a folder for all users for persistence storage, for example '/mnt/user_data/and you have a user named 'myuser' and a matching subfolder '/mnt/user_data/myuser
Then we can do:
clearml-session ... --docker "my_docker_image -v /mnt/user_data/:/host_mount/" --user-folder "/host_mount/myuser"BTW: The next time you call clearml-session these will become the default parameters, so no need to change anything 🙂

3 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Alright I have a followup question then: I used the param --user-folder “~/projects/my-project”, but any change I do is not reflected in this folder. I guess I am in the docker space, but this folder is not linked to my the folder on the machine. Is it possible to do so?

Yes you must make sure the docker can mount a persistent folder for you to work on.
Let me check what's the easiest way to do that

3 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Yes docker was not installed in the machine

Okay make sense, we should definitely check that you have docker before starting the daemon 😉

Ok, it would be nice to have a --user-folder-mounted that do the linking automatically

It might be misleading if you are running on k8s cluster, where one cannot just -v mount volume...
What do you think?

3 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

I still see things being installed when the experiment starts. Why does that happen?

This only means no new venv is created, it basically means install in "default" python env (usually whatever is preset inside the docker)
Make sense ?
Why would you skip the entire python env setup ? Did you turn on venvs cache ? (basically caching the entire venv, even if running inside a container)

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

Try this one 🙂
HyperParameterOptimizer.start_locally(...)
https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer#start_locally

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

I'm trying to achieve a workflow similar to the one

You mean running everything on a single machine (manually)?

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task

ExcitedFish86 Oh if this is the case:
in your cleaml.conf:
agent.package_manager.type: conda agent.package_manager.conda_env_as_base_docker: truehttps://github.com/allegroai/clearml-agent/blob/36073ad488fc141353a077a48651ab3fabb3d794/docs/clearml.conf#L60
https://git...

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

the hack doesn't work if conda is not installed

Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing

Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace

And the assumption is the code is also there ?

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

in which I can just spawn an ad-hoc worker

Can you elaborate on what you would do with it? Like an OS environment disable the entire setup itself ? will it clone the code base ?

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient

client = APIClient()

queue_ids = client.queues.get_all(name="queue_name_here")

while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)

hows does this work with HPO?

The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice...

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?

2 years ago

0 Hi Im Getting This Error And I Have No Idea How To Solve It, Please Help

Question - why is this the expected behavior?

It is 🙂 I mean the original python version is stored, but pip does not support replacing python version. It is doable with conda, but than you have to use conda for everything...

2 years ago

0 Hi Everyone, I’M Getting An Error During Model Upload To S3. The Error Shows Up In The Console Like Below And I Don’T See Any Uploaded Objects In S3:

Hi ScantChimpanzee51
btw: this seems like an S3 internal error
https://github.com/boto/s3transfer/issues/197

2 years ago

0 What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.

Yeah I think that for some reason it fails detecting this is actually jupyter noteboko (not really sure why), Thank you for double checking on the container !!

one year ago

Show more results