Reputation
Badges 1
25 × Eureka!logger.report_scalar("loss-train", "train", iteration=0, value=100)
logger.report_scalar("loss=test", "test", iteration=0, value=200)
notice that the title of the graph is its uniue id, so if you send scalars to with the same "title" they will show on the same graph
JitteryCoyote63 Great to hear π
BTW:
Would it be possible to extendΒ
Task.init
Β with aΒ
force_reuse
Β that would enforce reusing these tasks
You can pass continue_last_task=True
I think it should be equivalent to what you suggest
Hi MortifiedDove27
I think you can resize the plot area in the UI (try to drag the horizontal separator)
Thank you @<1719524641879363584:profile|ThankfulClams64> for opening the GI, hopefully we will be able to reproduce it and fox ot quickly
Hi @<1727497172041076736:profile|TightSheep99>
Yes it can, it will upload the meta-data as well as the files (it will also do de-dup and will not upload files that already exist in the dataset based on the hash of teh file content)
Thanks JitteryCoyote63 let me double check if there is a reason for that (there might be one, not sure)
Hi @<1715900760333488128:profile|ScaryShrimp33>
hi everyone! Iβm trying to save my modelβs weights to storage. And I canβt do it.
See example here: None
or
task.update_output_model(model_path="/path/to/model.pt")
ShaggyHare67 are you saying the problem is trains
fails discovering the packages in the manual execution ?
Sure thing π
BTW: ReassuredTiger98 this is definitely an interesting use case, and I think you can actually write some code to solve it if you like.
Basically let's followup on you setup:Machine X: agent listening to queue A, B_machine_a *notice we have two agents here Machine Y: agent listening to queue B_machine_b
Now we (the users) will push our jobs into queues A and B
Now we have a service that does the following:
` see if we have a job in queue B
check if machine Y is working...
Yes... I think that this might be a bit much automagic even for clearml π
Thanks BoredHedgehog47 !
And yes if the Task.init() call was only in main.py
then the TB inside the subprocess (train.py) would as you perceived not be captured.
Did you by any chance test calling Task.init in Both main.py
and train.py
?
Hi GrittyCormorant73
When I archive the pipeline and go into the archive and delete the pipeline, the artifacts are not deleted.
Which clearml-server version are you using? The artifact delete was only recently added
Thanks!
fyi: This section is not necessary if you you have clearml.conf file in ~/Task.set_credentials( api_host="
", web_host="
", files_host="
", key='********************', secret='***********************' )
Let me check the code for a min
What should have happened is the experiments should have been pending (i.e. in a queue)
(Not sure why they are not).
You can manually send them for execution , right click on an experiment in the able, select enqueue and select the default queue (This will be the one the trains-agent will pull from , by default)
Hi @<1690896098534625280:profile|NarrowWoodpecker99>
Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,
yes it does.
Are there configuration options available that allow us to control this behavior?
I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that π this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (...
Hi RipeGoose2
I just test the hydra example, seems to work when you add the offline right after the import:
` from clearml import Task
Task.set_offline(True) `
Make sure you have the S3 credentials in your agent's clearml.conf :
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L210
It should move you directly into the queue pages.
Let me double check (working on the community server)
UnevenDolphin73 I have a suspicion we have a few terms mixed:
hyperparameters :
These are essentially key/value.
when you call Task. connect (dict_with_params), clearml will flatten the dict and you end up with key/value
configuration objects :
These are actually blobs of text, the UI will show as is
When you call my_local_file=Task. connect_configuration (name, "path/to/config/file")
The entire Content of the config file is stored on the Task object itself.
Back to the use case, instead ...
Hi @<1535069219354316800:profile|PerplexedRaccoon19>
On debugging, it looks like indices are corrupt.
ishhhhh, any chance you have a backup?
I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?
I'm pretty sure there is a nice way, let me check soemthing
Hi @<1695969549783928832:profile|ObedientTurkey46>
Why do tags only show on a version level, but not on the dataset-level? (see images)
Tags of datasets are tags on "all the dataset versions" i.e. to help someone locate datasets (think locating projects as an analogy). Dataset Version tags are tags on a specific version of the dataset, helping users to locate a specific version of the dataset. Does that make sense ?
It's seems you are are getting 401 unauthorized , is this the same domain? I'm assuming the issue the logged in cookie is not sent?
Why can I only callΒ
import_model
Actually creates a new Model object in the system
InputModel(id) will "load" a model based on the model id
Make sense ?
Should work out of the box, as long as the task was started. You can forcefully start the task with:task.mark_started()
See if this helps