Reputation
Badges 1
25 × Eureka!NastyFox63 ask SuccessfulKoala55 tomorrow, I think there is a way to change the default settings even with the current version.
(I.e. increase the default 100 entries limit)
Nicely found @<1595587997728772096:profile|MuddyRobin9> !
Hi MistakenDragonfly51
I'm trying to set
default_output_uri
in
This should be set wither on your client side, or on the worker machine (running the clearml-agent).
Make sense ?
GorgeousSeagull44 I think this should have worked (basically replacing all the links on the mongo DB with the new IP)
assuming you have http://hparams.my _param my suggestion is:
` @hydra.main(config_path="solver/config", config_name="config")
def train(hparams: DictConfig):
task = Task.init(hparams.task_name, hparams.tag)
overrides = {'my_param': hparams.value}
task.connect(overrides, name='overrides')
in remote this will print the value we put in "overrides/my_param"
print(overrides['my_param'])
now we actually use overrides['my_param'] `Make sense ?
That experiment says it's completed, does it mean that the autoscaler is running or not?
Not running, it will be "running" if actually being executed
- Yes the challenge is mostly around defining the interface. Regarding packaging, I'm thinking a similar approach to the pipeline decorator, wdyt?
- Clearml agents will be running on k8s, but the main caveat is that I cannot think of a way to help with the deployment, at the end it will be kubectl that users will have to call in order to spin the containers with the agents, maybe a simple CLI to do that for you?
one of the two experiments for the worker that is running both experiments
So this is the actual bug ? I need some more info on that, what exactly are you seeing?
The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).
Understood, basically the moment we add nested project view to the dataset (and pipelines for that matter, and both are already being worked on), it should solve everything. Is that correct?
Docker would recognise that image locally and just use it right? I wonβt need to update that image often anyway
Correct π
Hi RoundMole15
What exactly triggers the "automagic" logging of the model and weights?
framework save call, for example torch.save or joblib.save
I've pulled my simple test project out of jupyter lab and the same problem still exists,
What is "the same problem" ?
from clearml import TaskTypes
That will only work if you are using the latest from the GitHub, I guess the example code was modified before a stable release ...
. I can't find any actual model files on the server though.
What do you mean? Do you see the specific models in the web UI? is the link valid ?
Yes, that seems to be the case. That said they should have different worker IDs agent-0 and agent-1 ...
What's your trains-agent version ?
seems like the server returned 400 error, verify that you are working with your trains-server and not the demoserver :)
Hi ElegantCoyote26
sometimes the agents load an earlier version of one of my libraries.
I'm assuming some internal package that is installed from a wheel file not a direct git repo+commit link ?
*Actually looking at the code, when you call Task.create(...) it will always store the diff from the remote server.
Could that be the issue?
To edit the Task's diff:task.update_task(dict(script=dict(diff='DIFF TEXT HERE')))
DAG which get scheduled at given interval and
Yes exactly what will be part of the next iteration of the controller/service
an example achieving what i propose would be greatly helpful
Would this help?from trains.automation import TrainsJob job = TrainsJob(base_task_id='step1_task_id_here') job.launch(queue_name='default') job.wait() job2 = TrainsJob(base_task_id='step2_task_id_here') job2.launch(queue_name='default') job2.wait()
ShallowCat10 Thank you for the kind words π
so I'll be able to compare the two experiments over time. Is this possible?
You mean like match the loss based on "images seen" ?
Where you able to pass the 'clearnl-init' configuration? It verifys your credentials against the api server
i'm sorry, I mean if the queue name is not provided to the agent , the agent will look for the queue with the "default" tag. If you are specifying the queue name, there is no need to add the tag.
Is it working now?
Hi @<1573119962950668288:profile|ObliviousSealion5>
Hello, I don't really like the idea of providing my own github credentials to the ClearML agent. We have a local ClearML deployment.
if you own the agent, that should not be an issue,, no?
forward my SSH credentials using
ssh -A
and then starting the clearml agent?
When you are running the agent and you force git clonening with SSH, it will autmatically map the .ssh into the container for the git to use
Ba...
Should I map the poetry cache volume to a location on the host?
Yes, this will solve it! (maybe we should have that automatically if using poetry as package manager)
Could you maybe add a github issue, so we do not forget ?
Meanwhile you can add the mapping here:
https://github.com/allegroai/clearml-agent/blob/bd411a19843fbb1e063b131e830a4515233bdf04/docs/clearml.conf#L137
extra_docker_arguments: ["-v", "/mnt/cache/poetry:/root/poetry_cache_here"]
Hi AstonishingRabbit13
now Iβm training yolov5 and i want to save all the info (model and metrics ) with clearml to my bucket..
The easiest thing (assuming you are running YOLOv5 with python train.py is to add the following env variable:CLEARML_DEFAULT_OUTPUT_URI=" " python train.pyNotice that you need to pass your GS credentials here:
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
My model files are also there, just placed in some usual non-shared linux directory.
So this is the issue, How would the container Get to these models? you either need to mount the folder to the container,
or you push them to ClearML model repo with the OutputModel class , does that make sense ?
GiddyTurkey39 I think I need some more details, what exactly is the scenario here?