AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 Follow Up On Execute_Remotely, I See One Can Limit The Available Gpu Resources In A Worker Daemon; Could One Also Limit The Number Of Cpu Cores Available?

could one also limit the number of CPU cores available?

If you are running in docker mode you can add:
--cpus=<value>see ref here: https://docs.docker.com/config/containers/resource_constraints/

Just add it to extra_docker_arguments :
https://github.com/allegroai/clearml-agent/blob/2cb452b1c21191f17635bcb6222fa8bfd82afe29/docs/clearml.conf#L142

3 years ago

0 Hi, I'M Trying To Clone And Queue Experiments For Running Them On My Workers. I Am Able To Successfully Clone And Queue The Task, But Seems Like The Task Does Not Pass The Correct Parameters To My Python Script On The Worker. We Use Hydra For Configuring

That said, the arguments are passed Inside the code executed (i.e. monkey patched into the frameworks). This allows it to log and change All the arguments, including the default ones , and allow you to edit them.
Does that make sense ?

3 years ago

0 Can One Compare Experiments/Tasks From Different Projects? Edit: I Mean, I Can Manually Navigate To Some

Finally managed; you keep saying "all projects" but you meant the "All Experiments" project instead. That's a good start

Thanks!

Yes, my apologies you are correct: "all experiments"

3 years ago

0 Hmm Is There Any Clear (Pun Intended) Documentation On The Roles Of Storagemanager, Dataset And Artefacts? It Seems To Me There Are Various Overlapping Roles And I'M Not Sure I Fully Grasp The Best Way Of Using Them. Especially When Looking At The Way Da

The file itslef is csv.gz compressed, it's actually sending from the file-server back that messes things
(you can test with output_uri=/tmp/folder )

4 years ago

0 Sorry I Have Again Another Problem, Does Clearml Have Its Own Package Resolution System And Doesn'T Use Pip ? I Use A Lib Named Pyfunctional (

VirtuousFish83
could that be that "inplace-abn" while installing the package needs torch ?

4 years ago

0 Hi All—First Off, Thanks For Being Such A Helpful And Thorough Group Of People. I Learn A Ton Just Searching Through The Channel For Problems. I’M Seeing A Weird Issue. I Have A Conda Env On My Linux Machine, And I Can Successfully Run A Training Script

not sure if this is considered a bug or not! but I’d happily make an issue on github if needed.

I think we should, at least for the sake of transparency and visibility 🙂

thanks again for all your help.

My pleasure 🙂

4 years ago

0 I Think There Is A Little Bug With The

I was not able to reproduce with the example code 😞
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py

3 years ago

0 Hello! I’M Currently Using Clearml-Server As An Artifact Manager And Clearml-Serving For Model Inference, With Each Running On Separate Hosts Using Docker Compose. I’Ve Successfully Deployed A Real-Time Inference Model In Clearml-Serving, Configured Withi

Let's start small. Do you have grafana enabled in your docker compose and can you login to your grafana web ui?
Notice grafana needs to access the prometheus container directly so easiest way is to have everything in the same docker compose

one year ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

After you call task.set_initial_iteration(0) what do you get with task.get_initial_iteration() , is it 0 ?

4 years ago

And can you see your promethues in your grafana?

one year ago

Nice! !!!
🎊

one year ago

0 For Some Runs Of My Experiments The Ressource Monitoring Exists, For Other It Does Not. Any Idea Why This Could Be The Case?

Yes, you are too quick for the resource monitoring 🙂

4 years ago

0 Hi New With Clearml I Create Clearml Server On Gcp With Docker Now I’M Training Yolov5 And I Want To Save All The Info (Model And Metrics ) With Clearml To My Bucket.. (So I Can Have Small Server And No Memory Issue ) Where Should I Start? Its Should Be C

AstonishingRabbit13 so is it working now ?

3 years ago

0 What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

What happens when you call:

from clearml.backend_interface.task.repo import ScriptInfo

print(ScriptInfo._ScriptInfo__legacy_jupyter_notebook_server_json_parsing(None))

2 years ago

0 Hi, I Have A Worker On A Machine Using Gpus 0,1 And Another Worker On The Same Machine Using Gpus 0,1,2,3,4,5. A Worker Ran A Task On Gpus 0,1 But For Some Reason The Second Worker Started Additional Task In Queue On Gpus 0,1,2,3,4,5, Which Caused Both Of

BTW: you still can get race/starvation cases... But at least no crash

5 years ago

0 And One More Question. How Can I Get Loaded Model In Preporcess Class In Clearml Serving?

ComfortableShark77 are you saying you need "transformers" in the serving container?
CLEARML_EXTRA_PYTHON_PACKAGES: "transformers==x.y"https://github.com/allegroai/clearml-serving/blob/6005e238cac6f7fa7406d7276a5662791ccc6c55/docker/docker-compose.yml#L97

3 years ago

0 What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

Try to add here:
None

server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/"

2 years ago

0 Hi, Does Anyone Have Some Issues With Cloning Git Repos Within Alegro? I Always Got Some Error Massage: Fatal: Unable To Access '

The dokcer itself does not have the host configured.

5 years ago

0 Hello All. I'M Generating An Outputmodel In One Task And Using It As An Inputmodel For Another Task. Since There'S Already A Timestamp On The Model Creation Date, Is There A Way To Get The Date From The Inputmodel?

Hi @<1545216070686609408:profile|EnthusiasticCow4>

is there a way to get the date from the InputModel?

You should be able to with model._get_model_data()
But I think we should have it all exposed, wdyt?

2 years ago

0 What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

This is strange, let me see if we can get around it, because I'm sure it worked 🙂

2 years ago

0 What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

At the top there should be the URL of the notebook (I think)

2 years ago

0 Assuming I Have A

That is correct.
Obviously once it is in the system, you can just clone/edit/enqueue it.
Running it once is a mean to populate the trains-server.
Make sense ?

5 years ago

0 Hello! Since Today I Get

Let me check

4 years ago

0 Hello Channel, Two Other Related Questions:

@<1556812486840160256:profile|SuccessfulRaven86> is the issue with flask reproducible ? if so could you open a github issue, so we do not forget to look into it?

2 years ago

0 I Have Set

what if the preexisting venv is just the system python? my base image is python:3.10.10 and i just pip install all requirements in that image. Does that not avoid venv still?

it will basically create a new venv inside the container forking the existing preinistalled stuff (i.e. the new venv already has everything the python system has preinstalled)
then it will call "pip install" on all the "installed packages of the Task.
Which should just check everything is there and install nothing...

one year ago

0 I Have A Problem With Clearml-Agent, The Agent Is Cloning Repository, But When Executing This Command:

UpsetTurkey67 are you saying there is a sym link in the original repository, and when it copies it, it breaks the symlink ?

3 years ago

0 I Think There Is A Little Bug With The

Hi ElegantCoyote26
what's the clearml version you are using?

3 years ago

0 Hello Everyone! I'M Using S3 For My Model Saving. During Hyperparameter Optimization My New Tasks Get Very Long Names Due To Override Parameters And Uploading Path Becomes Something Like This "/Traffic Lights Classification/

Thanks MinuteGiraffe30 , fix will be pushed later today

3 years ago

0 Encountered An Odd Bug. Upon Attempting To Write Images To Clearml (3D Projected, Matplotlib),

t seems there is some async behavior going on. After ending a run, this prompt just hangs for a long time:

2021-04-18 22:55:06,467 - clearml.Task - INFO - Waiting to finish uploads

And there's no sign of updates on the dashboard

Hmm that could point to an issue uploading the last images (which are larger than regular scalars) could you try flushing and waiting ?
i.e.
task.flush() sleep(45)

4 years ago

0 Any Pointers On Running Gpu Tasks With K8S Glue?

Can you let me know if i can override the docker image using template.yaml?

No, you cannot.
But you can pass OS environment "CLEARML_DOCKER_IMAGE" to set a diff default one

4 years ago

Show more results