Which would mean the error is because of a company firewall/self-signed certificate.
The easiest solution,Disable SSL certificate check for ClearML.
Create the ~/clearml.conf manually:
` #disable SSL certificate check
api.verify_certificate: False
copy paste the credentials section from the UI
it should look something like:
api {
# web_server on port 8080
web_server: " "
# Notice: 'api_server' is the api server (default port 8008), not the web server.
api_server: ...
GiganticTurtle0 found it, fix will be pushed tomorrow 🙂
Hi SmugTurtle78
Unfortunately there is no actual filtering for these logs, because they are so important for debugging and visibility. I have to ask, what's the use case to remove some of them ?
Thanks JumpyPig73
Yeah this would explain it ... (if hydra is setting something else we can tap into that as well)
MagnificentSeaurchin79
"requirements.txt" is ignored if the Task has an "installed packges" section (i.e. not completely empty) Task.add_requirements('pandas') needs to be called before Task.init() (I'll make sure there is a warning if called after)
DefiantHippopotamus88 you can create a custom endpoint and do that, but it will be running I the same instance , is this what you are after? Notice that Triton actually supports it already, you can check the pytorch example
Hi CleanPigeon16
Put the specific git into the "installed packages" section
It should look like:... git+ ...(No need for the specific commit, you can just take the latest)
assume clearml has some period of time that after it, shows this message. am I right?
Yes you are 🙂
is this configurable?
It is 🙂task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
BTW:
Error response from daemon: cannot set both Count and DeviceIDs on device request.
Googling it points to a docker issue (which makes sense considering):
https://github.com/NVIDIA/nvidia-docker/issues/1026
What is the host OS?
it means it should work in
~/clearml.conf
no?
Yes exactly
I was hoping to be able to set the default server-wide
I think this type of server-side wide defaults is not supported in the open-source version.
But in most cases, setting it up on the clearml-agents is probably the important thing. btw: you can also set it in an OS environment CLEARML_DEFAULT_OUTPUT_URI
Ohh so even easier:print(client.workers.get_all())
Hi MelancholyElk85
Can I manually delete
.zip
files with datasets in
.clearml/cache/storage_manager/datasets
directory?
Yes, you can. I "think" the .zip is stored for easier access, but you can delete it, as long as the "extracted" folder exists, it should be fine.
ProudMosquito87 I think this is what you are looking for: https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L101
Hmm is "model_monitoring_eps" another version of the model and it does not have all the properties of the "original" one?
Hmm are you getting the warning on the client side , or in the clearml-server ?
Check the log, the container has torch 1.13.0 but the task requires torch==1.13.1
Now torch package inside those nvidia prepackaged containers are compiled a bit differently . What I suspect happens is the torch wheel from pytorch is not compatible with this container . Easiest fix , change the task requirments to 1.13
Wdyt ?
SoggyFrog26 there is a full pythonic interface, why don't you use this one instead, much cleaner 🙂
Yes the clearml-server AMI - we want to be able to back it up and encrypt it on our account
I think the easiest and safest way for you is to actually have full control over the AMI, and recreate once from scratch.
Basically any ubuntu/centos + docker and docker-compose should do the trick, wdyt ?
/opt/clearml/data/fileserver this is ion the host machine and it is mounted Into the container to /mnt/fileserer
ThickDove42 Windows conda python3.6 was exactly what I was using,
started the jupyter with:
"python -m jupyter notebook"
Then opened / created a new notebook, everything worked.
Tested on latest clearml 0.17.2
Maybe it's something with the path to the repo that breaks it? Because obviously the issue is it is looking in the wrong folder.
Hmm... That's what happens with the exception of None/'' if type is str... There is no way to differentiate in the UI.
This is why we opted for type=str will "cast" everything to str so you always get str, while not specifying a type will leave the variable as is... If you have an idea on how to support both, feel free to suggest 🙂
I have to assume that I do not know the dataset ID
Sorry I mean:
datasets = Dataset.list_datasets(dataset_project="some_project")
for d in datasets:
d["version"] = Dataset.get(dataset_id=d["id"]).version
wdyt?
So could you re-explain assuming my piepline object is created by
pipeline = PipelineController(...)
?
pipe.add_step(name='stage_train', parents=['stage_process', ], monitor_artifact=['my_created_artifact'], base_task_project='examples', base_task_name='pipeline step 3 train model', parameter_override={'General/dataset_task_id': '${stage_process.id}'})This will put the artifact names "my_created_artifact" from the step Tas...
So are you saying the large file size download is the issue ? (i.e. network issues)
MysteriousBee56
Well we don't want to ask sudo permission automatically, and usually setups do no change, but you can diffidently call this one before running the agent 😉sudo chmod 777 -R ~/.trains/