Reputation
Badges 1
25 × Eureka!Hi DangerousDragonfly8
You mean you want to trigger something when users archive a Task ?
assuming you have http://hparams.my _param my suggestion is:
` @hydra.main(config_path="solver/config", config_name="config")
def train(hparams: DictConfig):
task = Task.init(hparams.task_name, hparams.tag)
overrides = {'my_param': hparams.value}
task.connect(overrides, name='overrides')
in remote this will print the value we put in "overrides/my_param"
print(overrides['my_param'])
now we actually use overrides['my_param'] `Make sense ?
hardware monitoring etc.
This is averaged and being sent only every 30 seconds, not a lot of calls.
I just saw that I went through the first 200k API calls rather fast, so that is how I rationalized it.
Yes, that's kind of makes sens
Once every 2000 steps, which is every few seconds. So in theory those ~20 scalars should be batched since they are reported more or less at the same time. It's a bit odd that the API calls added up so quickly anyway.
The default flush is ever...
Hi DisgustedDove53
When you say "deployment" there are a lot of way to interpret that π what exactly are you looking for ?
Hmm ElegantKangaroo44 low memory that might explain the behavior
BTW: 1==stop request, 3=Task Aborted/Failed
Which makes sense if it crashed on low memory...
Have a grid view (e.g. 3 plots per line instead of just one)Yes the plots are resizable move the cursor to the separating line and drag π
2. Check the group by section, they can be split per series (like in TB)
Hmm so VSCode running locally connected to the remote machine over the SSH?
(I'm trying to figure out how to replicate the setup for testing)
Hi FloppyDeer99
Since this thread is a bit old, I might have missed something π
Are we saying the links are not working in the UI ?
(notice the links themselves are generated by the clearml package, so if there was a bug, still not sure here, then old links will remain invalid until manually fixed) Can you verify that the latest clearml generates working links?
JuicyFox94
NICE!!! this is exactly what I had in mind.
BTW: you do not need to put the default values there, basically it reads the defaults from the package itself trains-agent/trains and uses the conf file as overrides, so this section can only contain the parts that are important (like cache location credentials etc)
Wow, thank you very much. And how would I bind my code to task?
you mean the code that creates pipeline Tasks ?
(remember the pipeline itself is a Task in the system, basically if your pipeline code is a single script it will pack the entire thing )
I simplified the code, just so I could test it, this one seems to work, feel free to add the missing argparser parts :)
` from argparse import ArgumentParser
from trains import Task
model_snapshots_path = 'mnt/trains'
task = Task.init(project_name='examples', task_name='test argparser', output_uri=model_snapshots_path)
logger = task.get_logger()
def main(args):
print('Got args: %s' % args)
if name == 'main':
parent_parser = ArgumentParser(add_help=False)
parent_parser....
if the file is untracked by git, it is not saved by clearml
Yep π
Does clearml-agent install the repo withΒ
pip install -e .
It is supported, but the path to the repo cannot be absolute (as it will probably be something else in the agent env)
You can add "git+ https://github.com ...." to the "installed packages" The root path of your repository is always added to the PYTHONPATH when the agents executes it, so in theory there is no need to install it wi...
No worries, I'll see if I can replicate it anyhow
So currently there is a limit (from the elasticsearch) of about 10k (anything above the is subsampled)
In the new version we are adding a "maximize" button, then in the full screen you will have the raw data including all ???k samples. sounds good?
Or use python:3.9 when starting the agent
This is probably the best solution π
The 'on-premise' server fails to connect to the ClearML server because of the VPN I think
I think you are correct.
You can quickly test it, try ti run curl http://local-server:8008 see if that works
@<1587253076522176512:profile|HollowPeacock33>
Is this a commercial ad? this seems like out of scope for this channel
Can you expand?
Out of curiosity, what ended up being the issue?
Hi RoundMosquito25
What do you mean by "local commits" ?
we have a separate cache
Why? they can share
That makes sense...
Basically in the open-source version the approach is everyone sees everything for maximum transparency (and also ease of use). I know there are access-roles in the paid tier and vault for exactly these types of things...
Where do you currently save them? and how do you pass them to the remote machine ?
None
notice there is a scroll_id there, you might need to call the API multiple times until you scroll over All the events
could that be it?
I was just able to reproduce with "localhost"
Clearml automatically gets these reported metrics from TB, since you mentioned see the scalars , I assume huggingface reports to TB. Could you verify? Is there a quick code sample to reproduce?
Hi @<1560798754280312832:profile|AntsyPenguin90>
The image itself is uploaded in a blackground process, flush just triggers the starting of the process.
Could it be that it is showing a few seconds after?
Hi CheerfulGorilla72
see
Notice all posts on that channel are @ channel π
I have to problem that "debug samples" are not shown anymore after running many iterations.
ReassuredTiger98 could you expand on it? What do you mean by "not shown anymore" ?
Can you see other reports ?
Thanks for the logs @<1627478122452488192:profile|AdorableDeer85>
Notice that the log you attached means the preprocessing is executed and the GPU backend is returning an error.
Could you provide the log of the docker compose specifically the intersting part is the Triton container, I want to verify it loads the model properly
seems like the network inside the running code cannot access the localhost (even though you have --network=host . Could you test it with the machine's IP?
(Actually the best practice is to add a name to the machine (in your hosts file), so that if later you move the server, all the links will be valid)
You can switch to docker-mode for better control over cuda drivers, or use conda and specify cudatoolkit (this feature will be part of the next RC, meanwhile it will install the cudatoolkit based on the global cuda_version).