Reputation
Badges 1
25 × Eureka!AbruptHedgehog21 looking at the error, seems like you are out of storage π
Ohh then use the AWS autoscaler, basically it what you want, spin an EC2 and set an agent there, then if the EC2 goes down (for example if this is a spot), it will spin it up again automatically with the running Task on it.
wdyt?
The additional edges in the graph suggest that these steps somehow contain dependencies that I do not wish them to have.
PanickyMoth78 I think I understand what you are saying, but it is hard to see if there is a "bug" here or a feature...
Can you post the full code of the pipline?
, i thought there will be some hooks for deploying where the integration with k8s was also taken care automatically.
Hi ObedientToad56
Yes you are correct, basically now you have a docker-compose (spinning everything, even though per example you can also spin a standalone container (mostly for debugging).
We are working on a k8s helm chart so the deployment is easier, it will be based on these docker-compose :
https://github.com/allegroai/clearml-serving/blob/main/docker/docker-comp...
Notice that the StorageManager has default configuration here:
https://github.com/allegroai/trains/blob/f27aed767cb3aa3ea83d8f273e48460dd79a90df/docs/trains.conf#L76
Then a per bucket credentials list, with detials:
https://github.com/allegroai/trains/blob/f27aed767cb3aa3ea83d8f273e48460dd79a90df/docs/trains.conf#L81
SoggyBeetle95 is this secret a per Task secret, or is it for the agent itself (I.e. for all Tasks the agent will spin)?
Ohh, if this is the case then it kind of makes sense to store on the Task itself. Which means the Task object will have to store it, and then the UI will display it :(
I think the actual solution is a vault , per user, which would allow users to keep their credentials on the sever, the agent to pass those to the Task when it spins it, based on the user. Unfortunately the vault feature is only available on the paid/enterprise version ( with RBAC etc.).
Does that make sense?
SoggyBeetle95 the question is, where does clearml stores these arguments, and the answer is on the Task object (from there the agent will take them and apply to the docker execution). Now since all users see all the tasks, they also see these arguments. Wdyt?
SoggyBeetle95 maybe it makes sense to configure the agent with an access-all credentials? Wdyt
SoggyBeetle95 you can configure the credentials in the clearml.conf
running on the agent machines:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L320
(I'm assuming these are storage credentials)
If you need general purpose env variables, you can ad them here:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149
with ["-e", "MY_VAR=MY_VALUE"]
Task.init(..., output_uri='s3://...')
Thanks Martin, so does it mean I wonβt be able to see the data hosted on S3 bucket in ClearMl dashboard under datasets tab after registering it?
Sure you can, let's assume you have everything in your local /mnt/my/data
you can just add this folder with add_files
then upload to your S3 bucket with upload(output_uri="
None ",...)
make sense ?
SuperiorDucks36 from code ? or UI?
(You can always clone an experiment and change the entire thing, the question is how will you get the data to fill in the experiment, i.e. repo / arguments / configuration etc)
There is a discussion here, I would love to hear another angle.
https://github.com/allegroai/trains/issues/230
CleanPigeon16 Coming very soon, we adding a few features for the pipeline, this one will also be included :)
Hi @<1561885941545570304:profile|PunyKangaroo87>
What do mean by store data locally?
Like clearml-data? I.e Dataset?
You can always use file:///root/path/folder as destination, this will store everything into the local folder, is that it?
Not sure why, but for some reason it seems it is failing to analyze the code, hence the warning and no packages...
Any other hints on your setup that might help to better understand the root cause ? maybe home folder with unicode characters ? python installed in a specific way?
I see TightElk12
You can always setup the OS environments : CLEARML_API_HOST CLEARML_WEB_HOST CLEARML_FILES_HOST with the correct configuration Or you can simply set CLEARML_NO_DEFAULT_SERVER=1 which will prevent any usage of the default demo serverwdyt?
Our remote machine is Windows 10
JumpyDragonfly13 seems like the Windows 10 + docker is the issue (that would explain the OCI error)
Is this relevant ?
https://github.com/microsoft/WSL/issues/5100
However, this one should be a feature to work on, and should be fairly easy to implement.
Feel free to add as GitHub issue π
Main challenge is understanding what needs to be added as "uncommitted changes"
Generally speaking, for the exact reason if you are passing a list of files, or a folder, it will actually zip them and upload the zip file. Specifically to pipeline it should be similar. BTW I think you can change the number of parallel upload threads in StorageManager, but as you mentioned it is faster to zip into one file. Make sense?
Ohh! I see now
@<1526371965655322624:profile|NuttyCamel41> the "backend: "pytorch" is not really supported because it does not use the optimized Triron engine (which is the reason to run Triron server)
In order to use pytorch you need to convert it to torchscript and then deploy, see example here:
None
[None](https://github.com/allegroai/clearml-serving/blob/7ba356efc97a6ae2159283d198d981b3c1ab85e6/examples/pytor...
Hi IrritableGiraffe81
Yes it deploys all ClearML (including web).
ClearML-serving unfortunately is a bit more complicated to spin, as it needs actual compute nodes.
That said we are working on making it a lot easier π
I think that what happened was you are running it on the host machine (not inside the docker)
I probably missed a "
somewhere
You can try just pulling the "metric" section of the Task, but I cannot imaging the network bandwidth is the issue?
Could it be load on the clearml-server (i.e. it needs to handle lots of requests ?)
You can try direct API call for all the Tasks together:Task._query_tasks(task_ids=[IDS here], only_fields=['last_metrics'])
you can also get it flattened with:task.get_parameters()
Type in both cases is string
this?ids = [t.id for t in top_task]
Hi LazyTurkey38
Configuring these folders will be pushed later today π
Basically you'll have in your clearml.conf
` agent {
docker_internal_mounts {
sdk_cache: "/clearml_agent_cache"
apt_cache: "/var/cache/apt/archives"
ssh_folder: "/root/.ssh"
pip_cache: "/root/.cache/pip"
poetry_cache: "/root/.cache/pypoetry"
vcs_cache: "/root/.clearml/vcs-cache"
venv_build: "/root/.clearml/venvs-builds"
pip_download: "/root/.clearml/p...
Hi PerplexedWalrus3
you should get something like the following on the console :ClearML Task: created new task id=1ca59ef1f86d44bd81cb517d529d9e5a 2021-07-25 13:59:09 ClearML results page:
2021-07-25 13:59:16
What are you seeing?