Reputation
Badges 1
979 × Eureka!I think my problem is that I am launching an experiment with python3.9 and I expect it to run in the agent with python3.8. The inconsistency is from my side, I should fix it and create the task with python3.8 with:task.data.script.binary = "python3.8" task._update_script(convert_task.data.script)
Or use python:3.9 when starting the agent
I have a mental model of the clearml-agent as a module to spin my code somewhere, and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)
Should I open an issue in github clearml-agent repo?
Just tried, still the same issue
then print(Task.get_project_object().default_output_destination)
is still the old value
Yes, perfect!!
This works well when I run the agent in virtualenv mode (remove --docker
)
Hi AgitatedDove14 , that’s super exciting news! 🤩 🚀
Regarding the two outstanding points:
In my case, I’d maintain a client python package that takes care of the pre/post processing of each request, so that I only send the raw data to the inference service and I post process the raw output of the model returned by the inference service. But I understand why it might be desirable for the users to have these steps happening on the server. What is challenging in this context? Defining how t...
Hi CostlyOstrich36 , most of the time I want to compare two experiments in the DEBUG SAMPLE, so if I click on one sample to enlarge it I cannot see the others. Also once I closed the panel, the iteration number is not updated
I could delete the files manually with sudo rm
(sudo is required, otherwise I get Permission Denied
)
These images are actually stored there and I can access them via the url shared above (the one written in the pop up message saying that these files could not be deleted)
What I put in the clearml.conf is the following:
agent.package_manager.pip_version = "==20.2.3" agent.package_manager.extra_index_url: ["
"] agent.python_binary = python3.8
I can also access these files directly if I enter the url in the browser
SuccessfulKoala55 They do have the right filepath, eg:https://***.com:8081/my-project-name/experiment_name.b1fd9df5f4d7488f96d928e9a3ab7ad4/metrics/metric_name/predictions/sample_00000001.png
AgitatedDove14 I have a machine with two gpus and one agent per GPU. I provide the same trains.conf to both agents, so they use the same directory for caching venvs. Can it be problematic?
I am using an old version of the aws autoscaler, so the instance has the following user data executed:echo "{clearml_conf}" >>/root/clearml.conf ... python -m clearml_agent --config-file '/root/clearml.conf' daemon --detached --queue '{queue}' --docker --cpu-only
super, thanks SuccessfulKoala55 !
Will the from clearml import Task
raise an error if no clearml.conf exists? Or only when actual features requiring to define the server (such as Task.init
) will be called
BTW, is there any specific reason for not upgrading to clearml?
I just didn't have time so far 🙂
I managed to do it by using logger.report_scalar, thanks!
Thanks! With this I’ll probably be able to reduce the cluster size to be on the safe side for a couple of months at least :)
but according to the disks graphs, the OS disk is being used, but not the data disk
Seems like it just went unresponsive at some point
But you might want to double check
AgitatedDove14 https://clear.ml/docs/latest/docs/apps/clearml_session/#running-in-docker in the docs there is a --docker
option, that’s what confuses me, since the agent should always run in docker mode
SuccessfulKoala55 Since 2 hours I get 504 errors and I cannot ssh into the machine. AWS reports that instance health checks fail. Is it safe to restart the instance?
There’s a reason for the ES index max size
Does ClearML enforce a max index size? what typically happens when that limit is reached?
SuccessfulKoala55 I am looking for ways to free some space and I have the following questions:
Is there a way to break-down all the document to identify the biggest ones? Is there a way to delete several :monitor:gpu and :monitor:machine time series? Is there a way to downsample some time series (eg. loss)?