AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 It Seems Like Clearml Agent Does Not Support Arparse Subparsers, Right?

I can verify the behavior, I think it has to do with the way the subparser was setup.
This was the only way for me to get it to run:
script.py test blah1 blah2 blah3 42When I passed specific arguments (for example --steps) it ignored them...

3 years ago

0 It Seems Like Clearml Agent Does Not Support Arparse Subparsers, Right?

When I passed specific arguments (for example --steps) it ignored them...

script.py test blah1 blah2 blah3 42

Is this how it is intended to be used ?

3 years ago

0 Hi Everybody. When I Want To Force The Agent To Not Reproduce My Local Pip Environment, I Add

task.set_script(working_dir=dir, entry_point="my_script.py")Why do you have this part? isn't it the same code, the script entry point is auto detected ?

... or when I run my_script.py locally (in order to create and enqueue the task)?

the latter, When the script is running locally

So something like

os.path.join(os.path.dirname(file), "requirements.txt")

is the right way?

Sure this will work 🙂

2 years ago

0 Hi Everybody. When I Want To Force The Agent To Not Reproduce My Local Pip Environment, I Add

👍

2 years ago

0 If The Trains-Server Stops Responding, Would Any Running Experiment Keep A Cache Of To-Be-Sent-Data, Fail The Experiment, Or Continue The Run, Skipping The Recordings Until The Server Is Back Up?

Hmm TrickyRaccoon92 take a look at the cleanup service, I think you can hack it so instead of deleting the artifacts, it will archive them somewhere (also you can change the filter, maybe only perform on experiments with specific user tag)
What do you think?

https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py

4 years ago

0 Hey, I Have A Question Regarding Pipelines. Let'S Say I Have 2 Scripts: Train.Py And Evaluate.Py. Each Of Them Creates A Task Using Task.Init And Logs Some Information. These Scripts Are Run Independently (In My Case They Are Run By Dvc). I Would Like Bot

That's why I want to keep it as separate tasks under a single pipeline.

Hmm Yes, if this is the case then you definitely have to have two Tasks (with execution info on each one).
So you could just create a "draft" pipeline Task and report everything to it? Does that make sense ?
(By design a pipeline is in charge of spinning the Tasks and pulling the data/metric from them if needed, in your case it sounds like you need the Tasks to push the data/metric onto the pipeline Task, this is ...

2 years ago

0 Hi! I Deployed Clearml Server Along With Jupyterhub On Azure K8S (Aks). The Way It Works Is That Every User Is Assigned A New Pod That Is Spawned With A Docker Image Of A Choice (One Of Them With Clearml Sdk Installed). I Managed To Configure Most Of The

Hi GreasyPenguin66
So the way clearml can store your notebook is by using the jupyter-notebook rest api. It assumes, that it can communicate with it as the kernel is running on the same machine. What exactly is the setup? is the jupyter-lab/notebook running inside the docker? maybe the docker itself is running with some --network argument ?

3 years ago

Thanks GreasyPenguin66
How about:
!curlBTW, no need to rebuild the docker, next time you can always do !apt update && apt install -y <package here> 🙂

3 years ago

0 Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

JitteryCoyote63 so now everything works as expected ?

3 years ago

0 Is It Possible To Avoid The Clearml-Agent For Local Installations, And Have The File Server Automatically Use An S3 Bucket? I'Ve Found

Is it

CLEARML_CONFIG_FILE

? (I had to dig this from the GH code

)

Yes it is !
https://clear.ml/docs/latest/docs/faq#clearml-configuration
(I will make sure we add it to https://clear.ml/docs/latest/docs/configs/env_vars#server-connection as well 🙂 )

2 years ago

0 Hi Great Trains Community! I Have A Question Regarding Version Control. How Trains Manages Model/Dataset Version Control?

Thank you MuddyCrab47 !
Regrading model versioning:
All models are logged automatically by trains (no need so specify it, as long as you are using one of the automagically connected frameworks: PyTorch/keras/TF/SKlearn)
You can see see how it looks like on the demoapp:
https://demoapp.trains.allegro.ai/projects/5371015f43f043b1b4ad7203c1ff4a95/models

Regrading Dataset management, we have a simple workflow demonstrated below, bascially using artifacts as dataset storage, with very easy int...

4 years ago

0 Hello Everyone, How Do I Tell The Agent That It Needs To Install A Local Module Of The Repo? If I Put Git+<Repopath> In The Requirements It Will Install The Module Version In The Repo And Not Necessarily The Version That Launched The Task. I Basically Wan

ok, yes, but this will install the package of the branch specified there.

Correct

So If im working on my own branch and want to run an experiment, I would have to manually put in the git path my current branch name.

When you say your own branch you mean local (i.e. not pushed to remote git repo) ?

2 years ago

from your jupyterlab can you do:
!curl

3 years ago

0 Hey, I'M Trying To Run The Aws Autoscaler And Pull A Docker Image From Ecr (Private Repository). I'M Currently Getting The Error:

Hi CleanPigeon16
You need to pass the private repository docker credentials to the aws instance, I would use the custom bash script option of the aws autoscaler to create the docker credentials file.

3 years ago

0 I Created Credentials From My Account, Stored Them As A K8S Secret And They Are Reused Whenever Anyone From Our Ml Team Starts A New Ml Model Training, Which Causes All The Tasks To Have Me As The Author In The Dashboard. Can The User Be Overwritten Durin

and those env variables are credentials for ClearML. Since they are taken from k8s secrets, they are the same for every user.

Oh ...

I can create secrets for every new user and set env variables accordingly, but perhaps you see a better way out?

So the thing is, if a User spins the k8s job, the user needs to pass their credentials (so the system knows who it is)... You could just pass the user's key/secret (not nice, but probably not a big issue, as everyone is an Admin anyhow,...

2 years ago

0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

JitteryCoyote63 This is odd you have both python3.9 and python3.8 on the container, and since it says (probably) ob the task the agent should run python3.9 it's trying to use it for creating the enthronement (it does not matter that agent is using python3.8).
1673431344706 agent-1 DEBUG /usr/bin/python3.9 /usr/bin/python3.9: No module named pipThis is the main issue, pip is missing for python3.9 and this is why it reverted to python 3.8 when it was setting the environment.
It should prob...

one year ago

0 Hey, I'M Running A Pipeline, And 1 Stage Passed - But The Next One Failed. I Fixed The Bug For The Second One - Is There Any Way To Retry The Pipeline From The Failure?

yup, it's there in draft mode so I can get the latest git commit when it's used as a base task

Yes that seems to be the problem, if it is in draft mode, you have no outputs...

3 years ago

0 Hi Everyone, Is There A Way To Increase The Cache Size Of Each Clearml Task? I'M Running An Experiment And Many Artifacts Are Downloaded. My Dataloader Fails To Load Some Of The File Since They Are Missing, Although They Were Downloaded. I Guess There Is

Hi ScaryKoala63
Sure, add the following to your clearml.conf:
sdk.storage.cache.default_cache_manager_size = 400I think you are correct, it seems like for some reason you hit the cache limit, and a previous entry was deleted

2 years ago

ScaryKoala63 nice!!!!!

2 years ago

0 Hi, Can We Search Tasks Using Wildcard In The Webapp. Say I Have Task Names

Yes, actually the first step would be a toggle button for regexp in the search, the second will be even more advanced search.
May I suggest you post it on the UI suggestion issue https://github.com/allegroai/trains/issues/81 ?

3 years ago

0 How To Do Continuous Training With Trains? Can Someone Share Examples Or Docs To Get Started With Continuous Learning.

For classification it's F1 score but for other task it maybe and I don't think that's problem. we just have to log it right?

Correct 🙂

Give me few days, I will work on your sugestions and then let you know if I am not able to do this

Sounds good!
BTW:
previous_tasks = Task.get_tasks(task_filter={'tags': 'best'}) local_model_file = previous_tasks[0].artifcats['my_model'].get_local_copy()

3 years ago

0 Hi Team, I'M Currently Trying To Install Clearml-Server On A Powerpc Server With Redhat7. The Issue Is That The Clearml-Server Pre-Built Images Doesn'T Run On The Powerpc, So The Docker Containers Need To Be Rebuild On The Powerpc Host. Is There Dockerfil

Hi Team, I'm currently trying to install ClearML-Server on a Powerpc server with RedHat7.

You are a brave man LividCrab90 !

s there dockerfiles for the ClearML-Server stack somewhere ?

The main issue is replacing the DB containers, do you have elastic/mongo/redis for powerpc ?

3 years ago

0 What Happens To File That Are Downloaded To A Remote_Execution Via Storagemanager? Are They Removed At The End Of The Run, Or Does It Continuously Increases Disk Space?

UnevenDolphin73 I have a suspicion we have a few terms mixed:
hyperparameters :
These are essentially key/value.
when you call Task. connect (dict_with_params), clearml will flatten the dict and you end up with key/value
configuration objects :
These are actually blobs of text, the UI will show as is
When you call my_local_file=Task. connect_configuration (name, "path/to/config/file")
The entire Content of the config file is stored on the Task object itself.

Back to the use case, instead ...

2 years ago

0 What Happens To File That Are Downloaded To A Remote_Execution Via Storagemanager? Are They Removed At The End Of The Run, Or Does It Continuously Increases Disk Space?

UnevenDolphin73 following the discussion https://clearml.slack.com/archives/CTK20V944/p1643731949324449 , I suggest this change in the pseudo code
` # task code
task = Task.init(...)

if not task.running_locally() and task.is_main_task():
# pre-init stage
StorageManager.download_folder(...) # Prepare local files for execution
else:
StorageManager.upload_file(...) # Repeated for many files needed
task.execute_remotely(...) `Now when I look at is, it kinds of make sense to h...

2 years ago

0 Hello Everyone! A Question Regarding Uploading Model Weights As Artifacts. I Use

Hi @<1684010629741940736:profile|NonsensicalSparrow35>

however for the remote file it always creates the name with the following pattern:

{filename_prefix}checkpoint{n}.pt

..

Is this the main issue?
Notice that the model name (i.e. the entry on the Task itself) is not directly connected with the stored file name on the target file server (or S3)

7 months ago

0 Hello People

you can also just create a venv and run the tests there (with the latest python package) ?

2 years ago

0 Hi Again

Hi JealousParrot68
no need for decorators, you can just pass the function to schedule_function=<function goes here> 🙂
See scheduler here
https://github.com/allegroai/clearml/blob/8708967a5ef4d8529a1a5ea417672e3ebbb258d7/clearml/automation/scheduler.py#L485
And triggers here:
https://github.com/allegroai/clearml/blob/8708967a5ef4d8529a1a5ea417672e3ebbb258d7/clearml/automation/trigger.py#L193
https://github.com/allegroai/clearml/blob/8708967a5ef4d8529a1a5ea417672e3ebbb258d7/clea...

3 years ago

0 Hi

Hi SubstantialElk6
We try to push a fix the same day a HIGH CVE is reported, that said since the external API interface is relatively far away from DBs / OS, and since as a rule of thumb, authorized users are trusted (basically inherit agent code execution means they have to be), it is an exception to have a CVE that affects the system. I think even this high profile one, does not actually have an effect on the system as even if ELK is susceptible (which it is not), only authorized users co...

2 years ago

0 Ist It Possible To Move Artifacts From Local Storage To S3? Or Do I Have To Delete The Old One And Create A New One With A Location In S3?

Yeah, but I still need to update the links in the clearml server

yes... how many are we talking about here?

2 years ago

0 Clearml Pipelines Can Be Build From Tasks, Functions, And Decorated Functions, According To The Examples In

Is the code in this "other" repo downloaded to the agent's machine? Or is the component's code pushed to the machine on which the repository is?

Yes this repo is downloaded into the agent, so your code has access to it

one year ago

Show more results