AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Clearml Doesn'T Pick Up Model Checkpoints Automatically. Any Idea What Might Be Wrong? (Code Attached In The Thread). Thanks

Hi @<1631102016807768064:profile|ZanySealion18>

ClearML doesn't pick up model checkpoints automatically.

What's the framework you are using?
BTW:

Task.add_requirements("requirements.txt")

if you want to specify Just your requirements.txt, do not use add_requirements use:

Task.force_requirements_env_freeze(requirements_file="requirements.txt")

(add requirements with a filename does the same thing, but this is more readable)

one year ago

0 Hi, Is There A Concept Of An Agent Taking More Then One Job?

doing some extra "services"

what do you mean by "services" ? (from the system perspective any Task that is executed by an agent that is running in "services-mode" is a service, there are no actual limitation on what it can do 🙂 )

4 years ago

0 Hi All, I Am Running Into Ssl Verification Issues With Trying To Upload Model Artifacts To Minio. We Are Running The Clearml Agent In A Container, Have Mounted A Ca Bundle To The Container And Referenced It On Env Vars So That Aws Cli/Boto And Requests Us

Sure thing :)
BTW could you maybe PR this argument (marked out) so that we know for next time?

Say here:
https://github.com/allegroai/clearml-agent/blob/5a6caf6399a0128ad81e8723d0a847e2ded5b75e/docs/clearml.conf#L287

3 years ago

0 Hi Guys, I’M Trying To Install It My Lab Server, But When I Try To Create Credentials, It Says Error And Gives More Info: Error 301 : Invalid User Id: Id=F46262Bde88B4928997351A657901D8B, Company=D1Bd92A3B039400Cbafc60A7A5B1E52B

Can I assume that if we have two agents spinning the same experiment, your code will take it from there?

Is this true ?

4 years ago

hey, that worked! what library is being used that reads that configuration?

It's passed to boto3, but the pyhon interface and aws cli use different configuration, I guess, because otherwise it should have worked...

3 years ago

0 Is Anyone Also Experiencing Network Error During Every Clearml Dataset Download? It'S Been A While And Almost Every Download Fails...

Hmm maybe we should add a test once the download is done, comparing the expected file size and the actual file size, and if they are different we should redownload ?

3 years ago

0 <image>

Releasing an RC

4 years ago

0 Hey, I'M Looking Into The Aws Autoscaler. I Couldn'T Find The Task In My Ui, So I Ran The

Ohh ignore the YAML

4 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

maybe I should use explicit reporting instead of Tensorboard

It will do just the same 😞

there is no method for setting

last iteration

, which is used for reporting when continuing the same task. maybe I could somehow change this value for the task?

Let me double check that...

overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine ...

That is a very good point

but for the metrics, I explicitly pass th...

3 years ago

0 , This Is A Great Tool For Visualizing All Your Experiments. I Wanted To Know That When I Am Logging Scalar Plots With Title As Train Loss And Test Loss They Are Getting Diplayed As Train Loss And Test Loss In The Scalar Tab. I Wanted That The Title Shoul

Create one experiment (I guess in the scheduler)
task = Task.init('test', 'one big experiment')
Then make sure the the scheduler creates the "main" process as subprocess, basically the default behavior)
Then the sub process can call Task.init and it will get the scheduler Task (i.e. it will not create a new task). Just make sure they all call Task init with the same task name and the same project name.

5 years ago

0 Having Issues Running Trains-Server On Win10. Trains-Elastic Exited With Code 137 Trains-Mongo Exited With Code 100 Trains-Apiserver Exited With Code 1 Some Errors=> Requests.Exceptions.Connectionerror: Httpconnectionpool(Host='Elasticsearch', Port=9200

could you send the entire log here?
i.e. from the "docker-compose" command line and onward

5 years ago

0 Hi Everyone, I'M Running Into A Weird Error When Trying To Clone And Run And Task That Has Completed Successfully. I Have A Test Task That Loads A Dummy Dataset And Trains A Toy Model With Pytorch. When Running Remotely, I Use My Own Docker Image That Has

Sure thing, anyhow we will fix this bug so next version there is no need for a workaround (but the workaround will still hold so you won't need to change anything)

one year ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

Retrying (Retry(total=239, connect=240, read=240, redirect=240, status=240)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))': /auth.login

OH that makes sense I'm assuming on your local machine the certificate is installed but not on remote machines / containers
Add the following to your clearml.conf:

api.verify_certificate: false

[None](https...

one year ago

0 Very Weird Error, Trying To Run An Experiment Through An Agent In Docker Mode, And I Get This Error

🤞

4 years ago

logger.report_scalar(title="loss", series="train", iteration=0, value=100)
logger.report_scalar(title="loss", series="test", iteration=0, value=200)

5 years ago

0 Hey Trains Riders, This Must Be Something Simple I Am Missing, But Still I Couldn'T Realize What The Problem Is. I Am Trying To Run Trains-Agent On My Experiments. Setup Of The Server And The Agent Is Fine, But I Am Struggling To Run Real Experiments (Not

Hi ColossalDeer61 ,

Xxx is the module where my main experiment script resides.

So I think there are two options,
Assuming you have a similar folder structure-main_folder
--package_folder
--script_folder
---script.py
Then if you set the "working directory" in the execution section to "." and the entry point to "script_folder/script.py", then your code could do:
from package_folder import ABC
2. After cloning the original experiment, you can edit the "installed packages", and ad...

5 years ago

0 Can Anyone Complete This [Demo](

Hi, what is host?

The IP of the machine running the ClearML server

one year ago

0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

Ohh I see, makes total sense. I'm assuming the code base itself can do both 🙂

5 years ago

0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

Sorry that was a reply to:

Otherwise I could simply create these tasks with Task.init,

5 years ago

0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

Hi JitteryCoyote63 , I have to admit, we have not thought of this scenario... what's the exact use case to clone a Task and change the type?

Obviously you can always change the task type, a bit of a hack but should work:
task._edit(type='testing')

5 years ago

0 Can Anyone Complete This [Demo](

I'm assuming the reason it fails is that the docker network is Only available for the specific docker compose. This means when you spin Another docker compose they do not share the same names. Just replace with host name or IP it should work. Notice this has nothing to do with clearml or serving these are docker network configurations

one year ago

0 Hey Guys, I'M Hosting A Private Server Configured To Link To A S3 Bucket. I'M Having Difficulties Identifying The Reason For An Error In A Worker Recurring Repeatedly (Shown In The Screenshots Attached). It Basically Uses The Same Clearml.Conf File As The

another option is the download fails (i.e. missing credentials on the client side, i.e. clearml.conf)

3 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

JitteryCoyote63 you mean? (notice no brackets)
task.update_requirements(".") Either pass a text or a list of lines:
The safest would be '\n'.join(all_req_lines)

4 years ago

0 Hello, This Is The Following Python Code I Had Saved As Main.Py.

Seems like credentials error
Do you have everything setup correctly in your ~/clearml.conf ?

4 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Actually, dumb question: how do I set the setup script for a task?

When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait 🙂

2 years ago

0 Hello Clearml Ppl

Sure, something like : https://github.com/feast-dev/feast

3 years ago

0 Is It Not Possible To Add Artifacts To A Completed Task?

task = Task.get_task('task_id_here') task.mark_started(force=True) task.upload_artifact(..., wait_on_upload=True) task.mark_completed()

3 years ago

0 Hi, I Am Trying To Delete Experiments From The Archive In Order To Free Some Disk Space. I Select An Experiment And Choose 'Delete', But Get A Message Saying "The Following Experiments Were Not Deleted". This Happens For Any Experiment That I Select. What

What's the error you are getting ?
(open the browser web developer, see if you get something on the console log)

2 years ago

logger.report_scalar("loss", "train", iteration=0, value=100)
logger.report_scalar("loss", "test", iteration=0, value=200)

5 years ago

0 When Use Gcp Bucket As Files_Server + Yolov5 Train For Now Its Upload The Model In The End To

Yes, or at least credentials and API...
Maybe inside your code you can later copy the model into fixed location ?
This way you have the model in the model repository and a copy in a fixed location (StorageManager can upload to a specific bucket/folder with the same credentials you already have)
Would that work?

2 years ago

Show more results