Reputation
Badges 1
25 × Eureka!can you see these metric on TB ?
Hi HappyDove3task.set_script
is a great way to add the info (assuming the .git is missing)
Are you running it using PyCharm? (If so use the clearml pycharm plugin, it basically passes the info from your local git to the remote machine via OS environment)
and the agent default runtime mode is docker correct?
Actually the default is venv mode, to run in docker mode add --docker
to the command line
So I could install all my system dependencies in my own docker image?
Correct, inside the docker it will inherit all the preinstalled packages, But it will also install any missing ones (based on the Task requirements. i.e. "installed packages" section)
Also what is the purpose of the
aws
block in the clearml.c...
LazyTurkey38 , ohh I think you are correct π
it should be:# patch the Task and actually send it for execution if Task.running_locally(): # this will verify all auto repo detection and python is done. task.close() # so that we can edit the task task.reset() # update the repo task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}}) # now to actually enqueue the Task Task.enqueue(task, queue_name='default')
wdyt?
I want in my CI tests to reproduce a run in an agent
you mean to run it on the CI machine ?
because the env changes and some things break in agents and not locally
That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?
Hi @<1523711619815706624:profile|StrangePelican34>
if I am trying to deploy 100 models on a GPU that can handle 5 concurrently,
Main limitation is Triton's ability to dynamically load / unload models. We know Nvidia is adding this capability, but I think this is still not out, once they support it, it should be transparent
Will be shortly released with news RC :)
This depends on how you spined the server, basically as long as you configure the clients (i.e. python clients) correctly, there is no issue.
But the auto generated configuration might be off (in the UI when you credentials it tells the clearml-init
where the server is and the ports)
I would actually recommend subdomains if this is possible
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#sub-domain-configuration
wdyt?
Hi RipeGoose2
I think it "should" take of uploading the artifacts as well (they are included in the zip file created by the offline package)
Notice that the "default_output_uri" on the remote machine is meaningless as it stored them locally anyhow. It will only have an effect on the machine that actually imports the offline session.
Make sense ?
Hi @<1541954607595393024:profile|BattyCrocodile47>
Can you trigger a pre-existing Pipeline via the ClearML REST API?
Yes
'd want to have a Lambda function trigger the Pipeline for a batch without needing to have all the Pipeline code in the lambda function.
Easiest is to use clearml SDK, which basically is clone / enqueue (notice that pipeline is also a kind of a Task). See here: [None](https://github.com/allegroai/clearml/blob/3ca6900c583af7bec18792a4a92592b94ae80cac/example...
For reporting the console logs you can use :logger.report_text("my log line here", print_console=False)
https://github.com/allegroai/clearml/blob/b4942321340563724bc16f60ea5dd78c9161778d/clearml/logger.py#L120
WhimsicalLion91
What would you say the use case for running an experiment with iterations
That could be loss value per iteration, or accuracy per epoch (iteration is just a name for the x-axis in a sense , this is equivalent to time series)
Make sense?
Notice that if you are using TB, everything you report to the TB will appear as well π
You can however change the prefix, and you can always have access to these links.
Any reason for controlling the exact output destination ?
(BTW: You can manually upload via StorageManager, and then register the uploaded link)
Hi JitteryCoyote63 ,
The easiest would probably be to list the experiment folder, and delete its content.
I might be missing a few things but the general gist should be:from trains.storage import StorageHelper h = StorageHelper('s3://my_bucket') files = h.list(prefix='s3://my_bucket/task_project/task_name.task_id') for f in files: h.delete(f)
Obviously you should have the right credentials π
Why does ClearML hide the dataset task from the main WebUI?
Basically you have the details from the Dataset page, why should it be mixed with the others ?
If I specified a project for the dataset, I specifically want it there, in that project, not hidden away in some
.datasets
hidden sub-project.
This maybe a request for "Dataset" tab under project, why would you need the Dataset Task itself is the main question?
Not all dataset objects are equal, and perhap...
Yes. Because my old
has never been resolved (though closed), we use the dataset object to upload e.g. local files needed for remote execution.
Ohh No I remember... following this line, can I assume these files are reused, i.e. this is not a "per instance" . I have to admit that I have a feeling this is a very unique usecase. and Maybe the "old" way Dataset were shown is better suited ?
No, I mean why does it show up in the task view (see attached image), forcing me to clic...
Interesting!
Wouldn't Dataset (class) be a good solution ?
Yes RipeGoose2 you are totally correct π if you want the models to be auto uploaded in the offline session you have to pass output_uri (or default_output_uri).
is there a built in programmatic way to adjustΒ
development.default_output_uri
?
How about: In your Task.init(output_uri='...')
The base task is self-contained i.e. it downloads training/eval directly data and has direct access to it
I think this is the main issue, how come it does not catch it? Are you using argparser ?
For now we've monkey-patched it to our usecase:
LOL, that's a cool hack
That gives us the benefit of creating "local datasets" (confined to the scope of the project, do not appear in
Datasets
tabs, but appear as normal tasks within the project)
So what would be a "perfect" solution here?
I think I'm missing the point on why it became an issue in the first place.
Notice that in new versions Dataset will be registered on the Tasks that use them (they are already...
The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).
Understood, basically the moment we add nested project view to the dataset (and pipelines for that matter, and both are already being worked on), it should solve everything. Is that correct?
A definite maybe, they may or may not be used, but we'd like to keep that option
The precursor to the question is the idea of storing local files as "input artifacts" on the Task, which means that if the Task is cloned the links go with it. Let's assume for a second this is the case, how would you upload these artifacts in the first place?
Yes please, just to verify my hunch.
I think that somehow the docker mounts the agent is creating are (for some reason) messing it up.
Basically you can just run the following (it will do everything automatically) (replace the <TASK_ID_HERE> with the actual one)
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig ...
if you have an automation process, then you should have the Task object, no?
then you have task.id
What am I missing here?
YEYYYYYYyyyyyyyyyyyyyyyyyy