AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hello All! I Have Some Trouble With Running Remotely Task With Code From Gitlab Repo With Ssl Cert. On The Machine Where Clearml Agent Installed Cert Is Added And Repo Cloning Successfully, But When I Tried To Run Task - It Failing With Git Repo Cloning F

Thanks @<1630377234361487360:profile|RoughSeaturtle43>

server certificate verification failed. CAfile: none CRLfile: none

Oh I see this is an https issue inside the container, you need to mount your self signed certificate
add something like that to your agent.conf:

extra_docker_arguments: ["-v", "/path/to/cert.pem:/etc/ssl/certs/myca.pem"]

None

one year ago

0 <image>

Why are you initializing 3 diff Tasks ?

4 years ago

0 Hi All, Looking For Some Help When Executing Pipelines With Custom Docker Images. I Have A Component Defined And I Expect Its Python Runtime Environment To Be Managed By A Custom Docker Image (

Funny enough I’m running into a new issue now.

Sorry my bad, I thought have known 😉 yes it probably should be packages=["clearml==1.1.6"]
BTW: do you have any imports inside the pipeline function itself ? if you do not, then no need to pass "packages" at all, it will just add clearml

3 years ago

0 Hi Everyone, I’M Getting An Error During Model Upload To S3. The Error Shows Up In The Console Like Below And I Don’T See Any Uploaded Objects In S3:

So without the flush I got the error apparently at the very end of the script -

Yes... it's a python thing, background threads might get killed in random order, so that when one needs a background thread that died you get this error, which basically should mean you need to do the work in the calling thread.
This actually explains why calling Flush solved the issue.
Nice!

2 years ago

0 Can I Run A Random Task From A Queue? Like This

os.environ['CLEARML_PROC_MASTER_ID'] = ''

Nice catch! (I'm assuming you also called Task.init somewhere before, otherwise I do not think this was necessary)

I think i solved it by deleting the project and running the base_task one time before the hyper parameter optimzation

So isit working now? everything is there ?

3 years ago

0 Hi, In My Setup I Run Multiple Experiments In Parallel From The Same Script. I Understand That There Can Only Be One Execution

Or a nicer one here:
https://demoapp.trains.allegro.ai/projects/97f6b5b53a0243c196d6f49c221cbdca/compare-experiments;ids=cdc2cc156ae042f08dab2b66756f468a,bb76b70520e046ebbcc21613926e7316,189a495824544718b4c271ce9575f32c/hyper-params/graph?hyper-params=graph

4 years ago

0 Hi, I'M Getting A Lot Of The Following Logs

Hi PompousBeetle71
Try this one, let me know if it helped
logging.getLogger('trains.frameworks').setLevel(ERROR)

5 years ago

0 Hello, Is There A Way To Update A Task Diff Programatically? Eg, I'M Creating A Task Using

I want to schedule bulk tasks to run via agents, so I'm running

create

I see, that makes sense.

specially when dealing with submodules,

BTW: submodule diff should always get stored, can you provide some error logs on fail cases?

Before manually modifying the diff:
If you have local commits (i.e. un-pushed) this might fail the diff apply, in that case you can set the following in your clearml.conf
store_code_diff_from_remote: truehttps://github.com/allegroai/clear...

3 years ago

0 Hi All, Where Does The Installed Packages List Populate From In The Task Viewer?

Hi EnchantingOstrich20
You how doe s clearml get it there?
In runtime it analyzes the code you are running looking for imports then checks the version you have actively used (i.e. active venv / python) and lists it there.
You can also override those in code, or edit them after you clone the ask and before you enqueue it for remote execution

2 years ago

0 Hello, Is There A Way To Update A Task Diff Programatically? Eg, I'M Creating A Task Using

Thanks ShakyJellyfish91 this really helps to narrow it down!
Let me see what I can find

3 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Maybe permissions?!
you can test it manually by installing pynvml
and running:
from pynvml.smi import nvidia_smi nvsmi = nvidia_smi.getInstance() nvsmi.DeviceQuery('memory.free, memory.total')

5 years ago

0 Hello, Is There A Way To Update A Task Diff Programatically? Eg, I'M Creating A Task Using

it seems it's following the path of the script i'm using to task.create, eg:

The folder it should run it is the script path you are passing (i.e. "script=ep_fn," )
Wrong path would imply that is it not finding the correct repository, is that the case ?

3 years ago

0 Hello, Is There A Way To Update A Task Diff Programatically? Eg, I'M Creating A Task Using

Can you see the repo itself ? the commit id ?

3 years ago

0 Hello, Is There A Way To Update A Task Diff Programatically? Eg, I'M Creating A Task Using

*Actually looking at the code, when you call Task.create(...) it will always store the diff from the remote server.
Could that be the issue?

To edit the Task's diff:
task.update_task(dict(script=dict(diff='DIFF TEXT HERE')))

3 years ago

0 Can I Run A Random Task From A Queue? Like This

, the easiest way possible would be if could just some how run task and let the lsf manage the environment

You mean let the LSF set the conda/venv ? or do you also mean to get the code-base, changes etc ?

3 years ago

0 Can I Run A Random Task From A Queue? Like This

can you get the agent to execute the task on the current conda env without setting up new environment?

Wouldn't that break easily ? Is this a way to avoid dockers, or a specific use case ?

is there any other way to get task from the queue running locally in the current conda env?

You mean including cloning the code etc. but not installing any python packages ?

3 years ago

0 Is Anyone Also Experiencing Network Error During Every Clearml Dataset Download? It'S Been A While And Almost Every Download Fails...

(currently I think the implementation expects that if the download completed, it was successful)

3 years ago

0 Getting A Super Weird Error. Everything Works Fine On Local, When Trying To Run On Remote, Getting This Error Failing To Apply The Git Diff

Failing when passing the diff to the git command...

4 years ago

0 What Is The Right Way To Increase Number Of Retries When Using

DilapidatedDucks58 I think they are used here:
https://github.com/allegroai/clearml/blob/3d3a835435cc2f01ff19fe0a58a8d7db10fd2de2/clearml/storage/helper.py#L1407

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html#boto3.session.Session.resource

3 years ago

0 Hi, Can You Pls Help Me? I Am Using V 0.14 (Will Update It Soon) And I Got The Following Error: /Usr/Bin/Python3.6: No Module Named Virtualenv Trains_Agent: Error: Command '['Python3.6', '-M', 'Virtualenv', '/Home/Ubuntu/.Trains/Venvs-Builds.2/3.6']' Ret

It should be the last line (or almost) of the Log. is it there ? Also it seems that from the log, that trains you are using trains 0.14.3 , try with trains 0.15 , let me know if you are still missing packages

5 years ago

0 Hi, Can Someone Give More Information About What An Api Call Means? Our Team Has Been Charged For 10 Millions Api Calls, But We Struggle To Understand Where They Are Coming From (We Are Only Making Training Tasks). Thanks

Hi @<1556812486840160256:profile|SuccessfulRaven86>
I'm assuming this relates to the SaaS service.
API calls are away to measure usage, basically metric reports are bunched into a single call, agents pings / query is API call, and so on so forth.
How many hours you had training tasks reporting data? how many agents running and so on

2 years ago

0 Is There Any Testing Suite That Ships With Clearml? If We'D Like To Make Some Unit Tests For Our Code?

If you create an initial code base maybe we can merge it?

2 years ago

0 Hello Everone, I Have Hosted Clearml Server And Trained A Yolov8 Model To Test My Installations. The Model Was Trained Successfully And I Tried To Optimize The Hyderparameters By Using The Sample Code From Clearml But Im Getting Some Error In Doing So An

the parameter datatypes are not being changed when loading them up.

These are the auto logged parameters , inside YOLO, correct?
Just to make sure, you can actually see the value None in the UI, is that correct? (if everything works as expected, you should see empty string there)

one year ago

0 Hey, So I'M Trying To Upload An Artefact To Clearml’S Fileserver(I Have A Self Hosted Clearml Server Running), I'Ve Uploaded The File Using Storagemanager.Upload_File(Path, Url) And Giving The Url As “

Are Kwargs supported in functions decorated as a pipeline component?

They are, but I think the main issue is the casting, without prior knowledge, everything will be a tring

3 years ago

0 I Am Starting To Use Clearml-Data, And I Have A Feature Request - Add A Progress Bar For The Upload Phase / Log Which Files Are Uploaded / Add Upload Speed Currently When Uploading Large Amounts Of Data, We Can An Obscure Message Of

The issue is uploading reporting fro http uploads (object storage will report upload). Basically the http upload is post with urllib that does not support upload callbacks for progress report. If you have an idea here, we will gladly add it (as you mentioned it can be quite annoying to have to open network manager to verify the upload is progressing)

4 years ago

0 Hi! Is There Something Happening With The

ModelCheckpoint('best_model', save_best_only=True)That worked for me now, what's the diff

4 years ago

0 Having Issues Running Trains-Server On Win10. Trains-Elastic Exited With Code 137 Trains-Mongo Exited With Code 100 Trains-Apiserver Exited With Code 1 Some Errors=> Requests.Exceptions.Connectionerror: Httpconnectionpool(Host='Elasticsearch', Port=9200

Many thanks LazyLeopard18 ! 🙂

5 years ago

0 Hi, And Thanks For The Great System. I'Ve Been Training Using

Hi StickyWhale51
I think this issue is due to some internal race condition, anyhow I think we have an RC out solving it, can you try with:
pip install clearml==1.2.0rc2

3 years ago

0 Hello, There'S A Particular Metric (Perplexity) I'D Like To Track, But Clearml Didn'T Seem To Catch It. Specifically, This "Evaluation" Section Of Run_Mlm.Py In The Transformers Repo:

Hi SmallDeer34
Can you see it in TB ? and if so where ?

4 years ago

0 Hi, Is There A Concept Of An Agent Taking More Then One Job?

pip install clearml-agent==0.17.3rc0

4 years ago

Show more results