AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Um, Is There A Way To Delete An Artifact From A Task That Is Running?

Hmm, you can delete the artifact with:
task._delete_artifacts(artifact_names=['my_artifact']However this will not delete the file itself.
Do delete the file I would do :
remote_file = task.artifacts['delete_me'].url h = StorageHelper.get(remote_file) h.delete(remote_file) task._delete_artifacts(artifact_names=['delete_me']Maybe we should have a proper interface for that? wdyt? what's the actual use case?

3 years ago

0 Hello Guys, I Have A Strange Situation With A Pipeline Controller I'M Testing Atm. If I Run The Controller Directly In My Pycharm On Notebook It Connects Correctly To The K8S Cluster With Trains Installed. After This, If I Go Directly In The Ui, I Reset T

No worries, just found it. Thanks!
I'll make sure to followup on the GitHub issue for better visibility 🙂

4 years ago

If the manual execution (i.e. pycharm) was working it should have stored it on the Pipeline Task.

4 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

JitteryCoyote63

somehow the previous iterations, not sure yet if it’s coming from my code, ignite or clearml

ClearML will automatically continue reporting from the previous iteration (i.e. if before continuing the Task the last iteration was 100, then the next report with iteration =0 will actually be 101)

task.set_initial_iteration(engine.state.iteration)

Basically it is called automatically by ClearML (obviously only when you continue an aborted Task)

4 years ago

0 What Is The Suggested Way Of Running Trains-Agent With Slurm? I Was Able To Do A Very Naive Setup: Trains-Agent Runs A Slurm Job. It Has The Disadvantage That This Slurm Job Is Blocking A Gpu Even If The Worker Is Not Running Any Task. Is There An Easy Wa

Okay, what you can do is the following:
assuming you want to launch task id aabb12
The actual slurm command will be:
trains-agent execute --full-monitoring --id aabb12
You can test it on your local machine as well.
Make sure the trains.conf is available in the slurm job
(use trains-agent --config-file to point to a globally shared one)
What do you think?

4 years ago

0 Hi, I Am Looking To Upload "Already Trained Models" As Experiments In My Clearml Server. How Should I Go About Doing That? Clearml Picks Up The Tensorboard Automatically While It'S Training And Reports It But How Would I Do This If I Had Everything Alread

Hi SmarmyDolphin68
You have two options:
Automatically upload the models when training pass output_uri to Task.init. For example output_uri=True will upload to the clearml-server, output_uri=' s3://bucket/folder ' will upload to S3 etc. Manually upload a model that you have locally: https://github.com/allegroai/clearml/blob/9ff52a8699266fec1cca486b239efa5ff1f681bc/examples/reporting/model_config.py#L37

4 years ago

HealthyStarfish45 if I understand correctly the trains-agent is running as daemon (i.e. automatically pulling jobs and executes them), the only point might be cancelling a daemon will cause the Task executed by that daemon to be canceled as well.
Other than that, sounds great!

4 years ago

HealthyStarfish45

Is there a way to say to a worker that it should not take new tasks? If there is such a feature then one could avoid the race condition

Still undocumented, but yes, you can tag it as disabled.
Let me check exactly how.

4 years ago

0 Hi, I Am Getting Following Error While Trying To Checkout A Gut Hub Rep. Error: Rpc Failed; Curl 56 Gnutls Recv Error (-54): Error In The Pull Function. Fatal: The Remote End Hung Up Unexpectedly Fatal: Early Eof Fatal: Index-Pack Failed Repository Cloni

Simple git clone on that repo works well

On the machine running the trains-agent ?

5 years ago

0 Hello Everyone. I Don'T Uderstand Why Is My Training Slower With Connected Tensorboard Than Without It. I Have Some Thoughts About It But I Not Sure. My Internet Traffic Looks Wierd.I Think This Is Because Tensorboard Logs Too Much Data On Each Batch And

(this is the part that is not in the background, so if the epoch is short it might have an effect)

3 years ago

0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

basically

would allow blocking the machine from being scaled-in when

Oh this is what I was missing 🙂 That makes sense to me!
So what you are saying is that the AWS autoscaler agent, when it is launching a Task, inside the container you will set "protection flag" when the Task ends, you will unset "protection flag"
Is that correct?

3 years ago

0 Is There Any Testing Suite That Ships With Clearml? If We'D Like To Make Some Unit Tests For Our Code?

mostly by using

Task.create

instead of

Task.init

.

UnevenDolphin73 , now I'm confused , Task.create is Not meant to be used as a replacement for Task.init, this is so you can manually create an Additional Task (not the current process Task). How are you using it ?

Regarding the second - I'm not doing anything per se. I'm running in offline mode and I'm trying to create a dataset, and this is the error I get...

I think the main thing we need to...

2 years ago

0 Is There Any Customization Options With Respect To The Ui Of The Debug Samples Tab In Results? Specifically I Am Looking For Something More Similar To Tensorboard, Namely The Slider That Lets You Scroll Conveniently Through The Debug Samples Across The E

Hi RipeGoose2
Yes the slide feature is definelty on the do do list (a lot of users asked for it).
Unfortunately other than actually PR-ing to the UI repo, there is no easy way to add customization (If you have an idea on how we could have an easy interface, that would be great.)
I'll check what's the status with the slider, maybe we will be lucky enough to see it in he next update 🙂

4 years ago

0 Hey, I'M Trying To Run The Aws Autoscaler And Pull A Docker Image From Ecr (Private Repository). I'M Currently Getting The Error:

Is there any way to debug these sessions through clearml? Thanks!

Yes this is a real problem, AWS does not allow to get the data very easily...
Can you check the AWS console, see what you have there ?
In theory this should have worked.
Maybe we you are missing some escaping for the "extra_vm_bash_script" ?
I'm hoping the console output will tell us

4 years ago

0 Hi Everyone, I'M Trying To Execute Trains-Agent In Docker Mode With Conda As Package Manager, Is It Supported? I Tried To Work With Nvidia/Cuda:10.0-Runtime-Ubuntu18.04 And Got The Error "Trains_Agent: Error: Error: Package Manager "Conda" Selected, But '

RattySeagull0 I think you are correct, python 3.6 is the installed inside the docker. Is it important to have 3.7 ? You might need another docker (or change the installation script and install python 3.7 inside)

4 years ago

0 Hey, We'Ve Experienced Some Issues With Clearml Trigger Schedulers We Were Playing With In The Last Few Days. This Is What Happened:

This is odd... can you post the entire trigger code ?
also what's the clearml version?

2 years ago

0 Hi Guys, We Are Running Clearml-Serving On A Kube Cluster On Aws And We Have Noticed That We Are Getting Some 502 Errors Once In A While That We Can'T Seem To Trace Back.

None
The thing is the server will not return a 502 error, only a 500 error,
None
Could it be the k8s ingress maybe ?

one year ago

0 Hi, I Want To Install A Local Package Using Our Package Index, But I'M Struggling With Trying To Make Pip Trust This Host. If I Want To Install It In My Venv I Can Just Pass The

are you referring to

extra_docker_shell_

scrip

t

Correct

the thing is that this runs before you create the virtual environment, so then in the new environment those settings are no longer there

Actually that is better, because this is what we need to setup the pip before it is used. So instead of passing --trusted-host just do:
` extra_docker_shell_script: ["echo "[global] \n trusted-host = pypi.python.org pypi.org files.pythonhosted.org YOUR_S...

4 years ago

0 Sometimes I Notice That At The End Of An Experiment Clearml Keeps Hanging (Something With Repository Detection?) And The Script Does Not End. Do More People See This? Especially In Our Continuous Integration Pipeline This Give Problems Because Tests Are G

EcstaticGoat95 any chance you have an idea on how to reproduce? (even 1 out of 6 is a good start)

3 years ago

0 Hi, Is It Possible To Query All Experiments In A Project And Get The Best Performing One (Sorted By One Metric)? Something Similar As Search_Runs In Mlflow (

NastyOtter17 can you provide some more info ?

4 years ago

0 Hello I'M Running A Local Agent . While Its Running The Task I Get This Error. Any Suggestion? Uccessfully Installed Numpy-1.24.4 Found Pytorch Version Torch==2.0.1 Matching Cuda Version 0 Found Pytorch Version Torchaudio==2.0.2 Matching Cuda Version 0 Er

Yes in the UI, clone or reset the Task, then youcan edit the installed packages section under the Execution tab

2 years ago

You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79

2 years ago

0 Hi Guys, Since I Am Done With Implementing The Aws Autoscaler, I Would Like To Share Some Pain Points That I Encountered In The Process With The Hope That They Can Be Documented To Help Other Users:

Thanks JitteryCoyote63 !
Any chance you want to open github issue with the exact details or fix with a PR ?
(I just want to make sure we fix it as soon as we can 🙂 )

4 years ago

0 Is There An Elegant Way To Download All Images Posted In “Debug_Samples” From The Trains Server?

TrickyRaccoon92
I guess elegant is the challenge 🙂
What exactly is the use case ?

4 years ago

0 Hi! Is There A Way To Run A Task Without Reporting To The Server? For Example If I Want To Debug A Script By Running It Locally Without It Appearing On The Server

https://clear.ml/docs/latest/docs/references/sdk/task#taskimport_offline_session

4 years ago

0 Hey! I Would Like To Connect To Same Task From Multiple Consumer And Upload Debug Image. Is It Possibile? It Seems Like I Can Connect To The Task. Get The Logger But Nothing Is Uploaded.

os.environ['TRAINS_PROC_MASTER_ID'] = args.trains_idit should be '1:'+args.trains_id

os.environ['TRAINS_PROC_MASTER_ID'] = '1:{}'.format(args.trains_id)Also str(randint(1, sys.maxsize))

5 years ago

0 Hi, I'M Facing Some Issues When Try To Run A Pipeline, How Can A Import A Local Library Using Pipelines From Functions? Always Getting "Modulenotfounderror: No Module Named"

OK, I got it by modifying the .conf file and putting the credentials on node

Nice! 🙂

3 years ago

0 Is There Any Way To Clear The Installed Packages Of A Task Programmatically? (I.E. Using The Python Sdk And Not The Ui)

Regrading the missing packages, you might want to test with:
force_analyze_entire_repo: falsehttps://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L162

Or if you have a full venv you like to store instead:
https://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L169

BTW:
What is the missed package?

4 years ago

0 Hi, Is It Possible To Migrate A Dataset From A Self Hosted Clearml Solution To The Clearml Hosted Solution?

I don't know how I would be able to get the description and name?

Good point, how about doing that in code, then you have all the information and you can store it in jsons / pickle next to the data folder?
wdyt?

2 years ago

Show more results