Hi DilapidatedDucks58 ,
Are you running in docker or venv mode?
Do the works share a folder on the host machine?
It might be syncing issue (not directly related to the trains-agent but to the facts you have 4 processes trying to simultaneously access the same resource)
BTW: the next trains-agent RC will have a flag (default off) for torch-nightly repository support 🙂
- Could we add a comparison feature directly from the search results (Dashboard view -> search -> highlight some experiments for comparison)?
Totally forgot about the global search feature, hmm I'm not sure the webapp is in the correct "state" for that, i.e. I think that the selection only works in "table view", which is the "all experiments" flat table
- Could we add a filter on the project name in the "All Experiments" project?
You mean "filter by project" ?
Could we ad...
Hi @<1561885921379356672:profile|GorgeousPuppy74>
Please use threads to ask questions, so we keep everything tidy
(and if you can please remove your first message, and merge it with the above one, this one and edit this one, for better readability)
regrading the issue, you need to either have clearm.conf in your Home folder, I'm assuming thisis /root/
not /home/ubuntu/.
Also not sure why you need to expose ports...
Hi WackyRabbit7
the services
(or the agent running there) is spinning multiple Tasks (as opposed to regular agent where it is one task at a time).
how can I give this agent git access?
in the docker-compose you can configure the git credentials (user/pass or user/key it is the same).
https://github.com/allegroai/clearml-server/blob/d0e2313a24eb1248ebf0ddf31bf589de0d675562/docker/docker-compose.yml#L137
Hi UnevenDolphin73
Can one compare experiments/tasks from different projects?
Yes, the easiest way is to go to the parent project ("all projects" if they have no common parent, then search for the specific Tasks (i.e. filter or using the search bar), then multi-select them.
wdyt?
oh dear ...
ScrawnyLion96 let me check with front-end guys 😞
Hi TrickyRaccoon92
Are you sure plotly (the front-end module displaying the plots in the UI) supports it ?
Hi SteadySeagull18
However, it seems to be entirely hanging here in the "Running" state.
Did you set a an agent to listen to the "services" queue ?
Someone needs to run the pipeline logic itself, it is sometimes part of the clearml-server deployment but not a mist
CleanPigeon16 , just making sure, docker is installed and configured on the host machine (i.e. Azure machine)?
Hi CheerfulGorilla72
see
Notice all posts on that channel are @ channel 🙂
he said it was something in the nginx config though
That makes sense 🙂
I think what you are looking for is clearml-agent daemon
https://clear.ml/docs/latest/docs/clearml_agent
https://clear.ml/docs/latest/docs/getting_started/video_tutorials/agent_remote_execution_and_automation
TenseOstrich47 you can actually enter this script as part of the extra_docker_shell_script
This will be executed at the beginning of each Task inside the container, and as long as the execution time is under 12h, you should be fine. wdyt?
PanickyMoth78 'tensorboard_logger' is an old deprecated package that meant to create TB events without TB, it was created before TB was a separate package. Long story short, it is not supported. That said if you just run the same code and replace tensorboard_logger with tensorboard, you should see all scalars in the UI
background:
ClearML logs TB events as they are created in real-time, TB_logger is not TB, it creates events and dumps them directly into a TB equivalent event file
CourageousLizard33 if the two series are on the same graph, just click on the series in the legend, you can enable/disable it, and the scale will adjust automatically.
Regarding grouping, this is a feature that can be turned off, the idea is that we split the tag to title/series... So if you have the same prefix you get to group the TF scalars on the same graph, otherwise they will be on a diff title graph. That said you can make force it to have a series per graph like in TB. Makes sense?
well I do not think you set your pytorch lightining to use cuda:
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/code/.venv/lib/python3.9/site-packages/lightning/pytorch/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
You might be able to also find out exactly what needs to be pickled using theÂ
f_code
 of the function (but that's limited to C implementation of python).
Nice!
We are using k8s glue to spawn the job. ...
I think this is actual network latency, nothing to do with the jobs, could it be the server is very far away?
What happens when you manually start a Task from your machine ?
Is the latency fixed? Is it just when starting a new Task?
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
But the provided command is missing the url target for the curl so it is not complete.
Not sure I followed. did you specify "NEW_ADDRESS" ?
or is it the in both cases the URL is locahost ?
should be the full path, or just the file name?
just file name, this is basically fname matching
This is sitting on top of the serving engine itself, acting a s a control plane.
Integration with GKE is being worked on (basically KFServing as the serving engine)
What do you mean by a custom queue ?
In the queues page you have a plus button, this will just create a new queue
WickedGoat98
The trains-agent-services docker is always CPU, the idea is put long lasting services there (like the auto cleanup or slack integration or HPO etc.)
To spin an agent with GPU on any machine (regardless of where the trains-server is) you can check the trains-agent
readme.
https://github.com/allegroai/trains-agent#running-the-trains-agent
The difference is whether you are only supplying a "minutes" or you are also passing hour/day etc.
See the examples:
Every 15 minutesadd_task(task_id='1235', queue='default', minute=15)
Every hour on minute 20 of the hour (i.e. 00:20, 01:20 ...)add_task(task_id='1235', queue='default', hour=1, minute=20)
SourSwallow36 it is possible.
Assuming you are not logging metrics by the same name, it should work.
try:Task.init('examples', 'training', continue_last_task='<previous_task_id_here>')
Hi UnevenOstrich23
--cpu-mode means no GPU's are passed to the Tasks it executed.
--services-mode means that instead of the agent running a single job at a time, it will spin as many jobs as you need on the same machine
is there a way to increase the size of the text input for fields or a better way to handle lists?
No 😞
Maybe an easier way to use connect_configuration instead ? it will take an entire dict and store it as text (format is hocon, which is YAML/Json compatible, which means it is hard to break when editing)
and I've made a script to edit it to our needs as part of the installation processÂ
 Thanks Martin!
My pleasure, btw: there is no actual need to configure all the clearml.conf values. It will actually take the defaults from the clearml package itself. This means you only need something like:
` api {
server config here
}
sdk.aws.s3{
minio config here
} `