Hi @<1631102016807768064:profile|ZanySealion18>
ClearML doesn't pick up model checkpoints automatically.
What's the framework you are using?
BTW:
Task.add_requirements("requirements.txt")
if you want to specify Just your requirements.txt, do not use add_requirements use:
Task.force_requirements_env_freeze(requirements_file="requirements.txt")
(add requirements with a filename does the same thing, but this is more readable)
doing some extra "services"
what do you mean by "services" ? (from the system perspective any Task that is executed by an agent that is running in "services-mode" is a service, there are no actual limitation on what it can do ๐ )
Sure thing :)
BTW could you maybe PR this argument (marked out) so that we know for next time?
Can I assume that if we have two agents spinning the same experiment, your code will take it from there?
Is this true ?
hey, that worked! what library is being used that reads that configuration?
It's passed to boto3, but the pyhon interface and aws cli use different configuration, I guess, because otherwise it should have worked...
Hmm maybe we should add a test once the download is done, comparing the expected file size and the actual file size, and if they are different we should redownload ?
Ohh ignore the YAML
maybe I should use explicit reporting instead of Tensorboard
It will do just the same ๐
there is no method for settingย
last iteration
, which is used for reporting when continuing the same task. maybe I could somehow change this value for the task?
Let me double check that...
overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine ...
That is a very good point
but for the metrics, I explicitly pass th...
Create one experiment (I guess in the scheduler)
task = Task.init('test', 'one big experiment')
Then make sure the the scheduler creates the "main" process as subprocess, basically the default behavior)
Then the sub process can call Task.init and it will get the scheduler Task (i.e. it will not create a new task). Just make sure they all call Task init with the same task name and the same project name.
could you send the entire log here?
i.e. from the "docker-compose" command line and onward
Sure thing, anyhow we will fix this bug so next version there is no need for a workaround (but the workaround will still hold so you won't need to change anything)
Retrying (Retry(total=239, connect=240, read=240, redirect=240, status=240)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))': /auth.login
OH that makes sense I'm assuming on your local machine the certificate is installed but not on remote machines / containers
Add the following to your clearml.conf:
api.verify_certificate: false
[None](https...
logger.report_scalar(title="loss", series="train", iteration=0, value=100)
logger.report_scalar(title="loss", series="test", iteration=0, value=200)
Hi ColossalDeer61 ,
Xxx is the module where my main experiment script resides.
So I think there are two options,
Assuming you have a similar folder structure-main_folder
--package_folder
--script_folder
---script.py
Then if you set the "working directory" in the execution section to "." and the entry point to "script_folder/script.py", then your code could do:from package_folder import ABC
2. After cloning the original experiment, you can edit the "installed packages", and ad...
Hi, what is host?
The IP of the machine running the ClearML server
Ohh I see, makes total sense. I'm assuming the code base itself can do both ๐
Sorry that was a reply to:
Otherwise I could simply create these tasks with Task.init,
Hi JitteryCoyote63 , I have to admit, we have not thought of this scenario... what's the exact use case to clone a Task and change the type?
Obviously you can always change the task type, a bit of a hack but should work:task._edit(type='testing')
I'm assuming the reason it fails is that the docker network is Only available for the specific docker compose. This means when you spin Another docker compose they do not share the same names. Just replace with host name or IP it should work. Notice this has nothing to do with clearml or serving these are docker network configurations
another option is the download fails (i.e. missing credentials on the client side, i.e. clearml.conf)
JitteryCoyote63 you mean? (notice no brackets)task.update_requirements(".")ย
Either pass a text or a list of lines:
The safest would be '\n'.join(all_req_lines)
Seems like credentials error
Do you have everything setup correctly in your ~/clearml.conf
?
Actually, dumb question: how do I set the setup script for a task?
When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait ๐
task = Task.get_task('task_id_here') task.mark_started(force=True) task.upload_artifact(..., wait_on_upload=True) task.mark_completed()
What's the error you are getting ?
(open the browser web developer, see if you get something on the console log)
logger.report_scalar("loss", "train", iteration=0, value=100)
logger.report_scalar("loss", "test", iteration=0, value=200)
Yes, or at least credentials and API...
Maybe inside your code you can later copy the model into fixed location ?
This way you have the model in the model repository and a copy in a fixed location (StorageManager can upload to a specific bucket/folder with the same credentials you already have)
Would that work?