Reputation
Badges 1
979 × Eureka!SuccessfulKoala55 Thanks to that I was able to identify the most expensive experiments. How can I count the number of documents for a specific series? Ie. I suspect that the loss, that is logged every iteration, is responsible for most of the documents logged, and I want to make sure of that
Here I have to do it for each task, is there a way to do it for all tasks at once?
But I would need to reindex everything right? Is that a expensive operation?
I should also rename /opt/trains/data/elastic_migrated_2020-08-11_15-27-05
folder to /opt/trains/data/elastic
before running the migration tool right?
The part where I'm lost is why would you need the path to the temp venv the agent creates/uses ?
let's say my task is calling a bash script, and that bash script is calling another python program, I want that last python program to be executed with the environment that was created by the agent for this specific task
AgitatedDove14 I eventually found a different way of achieving what I needed
We would be super happy to have the possibility of documenting experiments (new tab in experiments UI) with a markdown editor!
Is there any logic on the server side that could change the iteration number?
btw I monkey patched ignite’s function global_step_from_engine
to print the iteration and passed the modified function to the ClearMLLogger.attach_output_handler(…, global_step_transform=patched_global_step_from_engine(engine))
. It prints the correct iteration number when calling ClearMLLogger.OutputHandler.__ call__ .
` def call(self, engine: Engine, logger: ClearMLLogger, event_name: Union[str, Events]) -> None:
if not isinstance(logger, ClearMLLogger):
...
Ok, I guess I’ll just delete the whole loss series. Thanks!
I think my problem is that I am launching an experiment with python3.9 and I expect it to run in the agent with python3.8. The inconsistency is from my side, I should fix it and create the task with python3.8 with:task.data.script.binary = "python3.8" task._update_script(convert_task.data.script)
Or use python:3.9 when starting the agent
I have a mental model of the clearml-agent as a module to spin my code somewhere, and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)
Should I open an issue in github clearml-agent repo?
Just tried, still the same issue
then print(Task.get_project_object().default_output_destination)
is still the old value
Yes, perfect!!
This works well when I run the agent in virtualenv mode (remove --docker
)
Hi AgitatedDove14 , that’s super exciting news! 🤩 🚀
Regarding the two outstanding points:
In my case, I’d maintain a client python package that takes care of the pre/post processing of each request, so that I only send the raw data to the inference service and I post process the raw output of the model returned by the inference service. But I understand why it might be desirable for the users to have these steps happening on the server. What is challenging in this context? Defining how t...
Hi CostlyOstrich36 , most of the time I want to compare two experiments in the DEBUG SAMPLE, so if I click on one sample to enlarge it I cannot see the others. Also once I closed the panel, the iteration number is not updated
These images are actually stored there and I can access them via the url shared above (the one written in the pop up message saying that these files could not be deleted)
What I put in the clearml.conf is the following:
agent.package_manager.pip_version = "==20.2.3" agent.package_manager.extra_index_url: ["
"] agent.python_binary = python3.8
SuccessfulKoala55 They do have the right filepath, eg:https://***.com:8081/my-project-name/experiment_name.b1fd9df5f4d7488f96d928e9a3ab7ad4/metrics/metric_name/predictions/sample_00000001.png
AgitatedDove14 I have a machine with two gpus and one agent per GPU. I provide the same trains.conf to both agents, so they use the same directory for caching venvs. Can it be problematic?
I am using an old version of the aws autoscaler, so the instance has the following user data executed:echo "{clearml_conf}" >>/root/clearml.conf ... python -m clearml_agent --config-file '/root/clearml.conf' daemon --detached --queue '{queue}' --docker --cpu-only
super, thanks SuccessfulKoala55 !
Will the from clearml import Task
raise an error if no clearml.conf exists? Or only when actual features requiring to define the server (such as Task.init
) will be called
BTW, is there any specific reason for not upgrading to clearml?
I just didn't have time so far 🙂
Thanks! With this I’ll probably be able to reduce the cluster size to be on the safe side for a couple of months at least :)
but according to the disks graphs, the OS disk is being used, but not the data disk