JitteryCoyote63 so now everything works as expected ?
Will they get ordered ascending or descending?
Good point, I'll check the docs... but I think they do not specify
https://clear.ml/docs/latst/docs/references/sdk/task#taskget_tasks
From the code it seems the ordered is not guaranteed.
You can however pass '-last_update'
: order_by
which will give you the latest updated first
` task_filter = {
'page_size': 2,
'page': 0,
'order_by': ['last_metrics.{}.{}'.format(title, series), '-last_update']
}
Task.get_tasks(...
im helping train my friend
on clearml to assist with his astrophysics research,
if that's the case, what you can do is use the agent inside your sbatch script,
(full open source). This means the sbatch becomes " clearml-agent execute --id <task_id_here>
" this will set up the environment and monitor the job and still allow you to launch it from slurm, wdyt?
OutrageousSheep60 so this should work, no?ds.upload(output_url='gs://<BUCKET>/', compression=0, chunk_size=100000000000)
Notice the chunk size is the maximum size (in bytes) per chunk, so it should basically very large
this issue on when trying to set up on our remote machines
You mean setting up the trains-server on remote machine?
that might be it.
Is the web UI working properly ?
What ports are you using?
okay so the error should have been:
trains_agent: ERROR: Connection Error: it seems api_server is misconfigured. Is this the TRAINS API server http://<IP>:8008 ?
Not https nor 8010 ?!
curl seems okay, but this is odd https://<IP>:8010
it should be http://<IP>:8008
Could you change and test?
(meaning change the trains.conf and run trains-agent list
)
the only port configurations that will work are 8080 / 8008 / 8081
Thank you MuddyCrab47 !
Regrading model versioning:
All models are logged automatically by trains (no need so specify it, as long as you are using one of the automagically connected frameworks: PyTorch/keras/TF/SKlearn)
You can see see how it looks like on the demoapp:
https://demoapp.trains.allegro.ai/projects/5371015f43f043b1b4ad7203c1ff4a95/models
Regrading Dataset management, we have a simple workflow demonstrated below, bascially using artifacts as dataset storage, with very easy int...
JitteryCoyote63 no you should not (unless you already have the Task.init call in your code)clearml-data
add the Task.init call at the beginning of the code in the entry point.
This means you should be able to get Task.current_task()
and get back the object.
What do you have under the "uncommitted changes" on the Task that was created?
UnevenDolphin73 clearml.config.get_remote_task_id()
will return the Task ID not the Task object. in order to get automagic to work, one h...
Hi JoyousElephant80
Another possibility would be to run a process somewhere that periodically polls ClearML Server for tasks that have recently finished
this is the easiest way to implement what you are after, and have full control over the logic itself.
Basically you inherit from the Monitor class
And implement the callback function:
https://github.com/allegroa...
I believe a process is still running in the background. Is it expected? (v0.17.4)
Yes it is expected.
Basically it reports that the resource monitoring did not detect any "iterations"/"steps" reporting, so instead of reporting resources based on iterations it reports based on time. Make sense ?
Task status change to "completed" is set after all artifacts upload is completed.
JitteryCoyote63 that seems like the correct behavior for your scenario
JitteryCoyote63 are you suggesting it happens ?
(obviously it should not 🙂 )
Hi CooperativeFox72 trains 0.16 is out, did it solve this issue? (btw: you can upgrade trains to 0.16 without upgrading the trains-server)
containing the
Extension
module
Not sure I follow, what is the Extension module ? what were you running manually that is not just pip install /opt/keras-hannd
?
Hi @<1603198134261911552:profile|ColossalReindeer77>
Hello! does anyone know how to do
HPO
when your parameters are in a
Hydra
Basically hydra parameters are overridden with "Hydra/param"
(this is equivalent to the "override" option of hydra in CLI)
JitteryCoyote63 with pleasure 🙂
BTW: the Ignite TrainsLogger will be fixed soon (I think it's on a branch already by SuccessfulKoala55 ) to fix the bug ElegantKangaroo44 found. should be RC next week
nice @<1724960458047229952:profile|EnergeticKoala33> !
The issue was that the agent was trying to start the docker but had no credentials to do that, your solution is exactly what was needed to be done
Hi VexedCat68
(sorry I just saw the message)
I wanted to ask, how to run pipeline steps conditionally? E.g if step returns a specific value, exit the pipeline or run another step instead of the sequential step
So do do so you can do:
` def pre_execute_callback_example(a_pipeline, a_node, current_param_override):
# if we want to skip this node (and subtree of this node) we return False
...
# ew decided to skip so we return False
return False
pipe.add_step(name='...
DAG which get scheduled at given interval and
Yes exactly what will be part of the next iteration of the controller/service
an example achieving what i propose would be greatly helpful
Would this help?from trains.automation import TrainsJob job = TrainsJob(base_task_id='step1_task_id_here') job.launch(queue_name='default') job.wait() job2 = TrainsJob(base_task_id='step2_task_id_here') job2.launch(queue_name='default') job2.wait()
Okay this more complicated but possible.
The idea is to write a glue layer (service) that pulls from the (i.e UI) queue
sets the slurm job
and puts it in a pending queue (so you know the job s waiting in the slurm scheduler)
There is a template here:
https://github.com/allegroai/trains-agent/blob/master/trains_agent/glue/k8s.py
I would love to help and setup a slurm glue in a similar manner
what do you think?
Its stored on the Task, you can see it under the execution tab in the UI