Hi GiddyPeacock64
If you already have K8s setup, and are already using ClearML.
In your kubeflow Yaml:trains-agent execute --id <task_id> --full-monitoring
This will install everything your Task needs inside the docker. Just make sure that you pass the env variable setting the ClearML , see here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L127
, but are you suggesting sending the requests to Triton frame-by-frame?
yes! trition backend will do the autobatching, and in an enterprise deployment the gRPC loadbalancer will split it across multiple GPU nodes ๐
Give me a minute, I'll check something
Yes, as long as the client is served from http://app.something.com it will look for the api server at http://api.something.com
LovelyHamster1 NICE! ๐
Hi @<1523701949617147904:profile|PricklyRaven28>
I'm trying to figure out if i have a way to report pipeline-step artifact paths in the main pipeline task. (So i don't need to dig into steps to find the artfacts.
Basically this is the monitor_artifacts
argument
None
:param monitor_artifacts: Optional, log the step's artifacts on the pipeline ...
OmegaConf
is the configuration, the overrides are in the Hyperparameters "Hydra" section
None
CrookedWalrus33 can you test what happens if you pass the credentials in the global scope as well, i.e. here:
https://github.com/allegroai/clearml/blob/397dcfacda8f133af0acc7d2f9a124dde38ecc4a/docs/clearml.conf#L80
OddShrimp85 you can see the full configuration at the top of the Task log. What do you have there? Also what is the clearml python version?
Hi UpsetBlackbird87
This is an Optuna decision on how many concurrent tests to run simultaneously.
You limited it to 100, but remember Optuna does a Bayesian optimization process, where it decides on the best set of arguments based on the performance of the previous set, this means it will first try X trials, then decide on the next batch.
That said you can a pruner to Optuna specifying how it should start
https://optuna.readthedocs.io/en/v1.4.0/reference/pruners.html#optuna.pruners.Median...
DrabCockroach54 notice here there is no aarch64 wheel for anything other than python 3.5...
(and in both cases only py 3.5/3.6 builds, everything else will be built from code)
https://pypi.org/project/pycryptodome/#files
Sure ๐
BTW: clearml-agent will mount your host .ssh into the docker to /root/.ssh by default.
So no need to do that manually
nice @<1724960458047229952:profile|EnergeticKoala33> !
The issue was that the agent was trying to start the docker but had no credentials to do that, your solution is exactly what was needed to be done
-e
:user/private_package.git@57f382f51d124299788544b3e7afa11c4cba2d1f#egg=private_package
Is this the correct link to the repo and a valid commit id ?
Can you post a few more lines from the agent's log ?
Something is failing to install I'm just not sure what
Hi @<1572395181150310400:profile|DeterminedHare56>
Yes Slack is not the best for knowledge sharing, but it is the easiest for users to communicate over, and it is the easiest to setup and scale.
Specifically you can find historical log of the Slack channel here: None
Which we hoped google will index, but seems like this is still not working as expected, if you have any inputs it will be great to improve it
MysteriousBee56 when you run the trains-agent
with --foreground , before it starts the docker it print the full command line, could you send it please?
I can't figure out where the extra ' came from...
Also could you send the trains.conf file?
(feel free to redact and confidential information)
TrickyFox41 are you saying that if you add Task.init inthe code it works, but when you are calling "clearml-task" it does not work? (in both cases editing the Args/overrides ?
Actually if you can send the full log of the Task that would be great
(with older clearml versions thoughโฆ).
Yes, we added content type header for the files when uploading to S3 (so it is easier for users to serve them back). But it seems the python 3.5 casting from Path to str breaks it mimetype call....
EnviousPanda91 this seems like a specific issue with the clearml-task
cli, could that be ?
Can you send a full clearml-task command-line to test ?
I'm already at 300MB of usage with just 15 tasks
Wow, what do you have there? I would try to download the console logs and see what the size you are getting, this is the only thing that makes sense, wdyt?
BTW: to get the detailed size for scalars, maximize the plot (otherwise you are getting "subsampled" data)
Hi @<1724960468822396928:profile|CumbersomeSealion22>
As soon as I refactor my project into multiple folders, where on top-level I put my pipeline file, and keep my tasks in a subfolder, the clearml agent seems to have problems:
Notice that you need to specify the git repo for each component. If you have a process (step) with more than a single file, you have to have those files inside a git repository, otherwise the agent will not be able to bring them to the remote machine
The agent ip? Generally whatโs the expected pattern to deploy and scale this for multiple models?
Yes the agent's IP, and with multiple agents, one would probably use k8s for the nodes, then configure ingest. This is the next step for the cleaml-serving, adding support for KFServing or manually configuring the ingest. wdyt?
I think you have it on the workers and queues page when you click on the worker you have its detials
Now, when I add delta to calculate the variation of this: error: bad_data: 1:110: parse error: ranges only allowed for vector selectors
This means your avg is already a scalar (i.e. not a vector) which means you can (as you said) have the alert based on that
Yes, there is no real limit, I think the only requirements id docker v19+
If there was an SSL issue it should log to console right?
correct, also the agent is able to report, so I'm assuming configuration is correct
@<1724960464275771392:profile|DepravedBee82> could you try to put the clearml import + Task .init at the top of your code?
Hi @<1536518770577641472:profile|HighElk97>
Is there a way to change the smoothing algorithm?
Just like with TB, this is front-end, not really something you can control ...
That said you can report a smoothed value (i.e. via python) as additional series, wdyt ?