Reputation
Badges 1
25 × Eureka!Can you post the toml file? Maybe the answer is there
TenseOstrich47 you can actually enter this script as part of the extra_docker_shell_script
This will be executed at the beginning of each Task inside the container, and as long as the execution time is under 12h, you should be fine. wdyt?
Correct:extra_docker_shell_script: ["apt-get install -y awscli", "aws codeartifact login --tool pip --repository my-repo --domain my-domain --domain-owner 111122223333"]
you can also set theΒ
agent.package_manager.extra_index_url
Β , but since this is dynamic,...
You are correct, sine this is dynamic there is no need to set the " extra_index_url
" configuration in clearml.conf, the additional bash script will configure pip directly. Make sense ?
Hi AgitatedTurtle16 could you verify you can access the API server with curl?
That seems like the k8s routing, can you try the web server curl?
Also, I just wanted to say thanks for the tool! I'm managing a small data science practice and it's going to be really nice to have a view of all of the experiments we've got and know our GPU utilization, all without having to give every data scientist access to each box where the workflows are run. Incredibly stoked.
β₯ β€ β₯
Hi ShallowArcticwolf27
However, the AMI for version 0.16.1 has the following docker-compose file
I think we moved the docker-compose yaml when we upgraded from trains to clearml. Any reason your are installing the old docker-compose ?
I don't have the compose file, or at least can't seem to find it inΒ
/opt
you can manually take down all dockers with:docker ps
then docker stop <container id>
for each container id
JitteryCoyote63 you mean? (notice no brackets)task.update_requirements(".")Β
Either pass a text or a list of lines:
The safest would be '\n'.join(all_req_lines)
Could you post what you see under "installed packages" in the UI ?
Try:task.update_requirements('\n'.join([".", ]))Β
Exactly, thatβs my problem: I want to remove it to make sure it is reinstalled (because the version can change)
JitteryCoyote63 yes, this is definitely a pip bug... can you test with the latest pip version, maybe it was fixed? (i.e. git+https:// link)
With env caching enabled, it wonβt reinstall this private dependency, right?
It will, local packages (".") and git packages are alwyas reinstalled even if using venv caching, exactly for that reason π
Ohh so the setup.py is the one containing these requirements, oops I totally missed that :( let me check what pep has to say about that ... (Basically this is not a clearml issue but a pip one...)
error in my-package setup command:
Okay this seems like an error in the setup.py you have in the "mypackage" folder
and when you remove the "." line does it work?
oh dear π if that's the case I think you should open an Issue on pypa/pip , I'm not sure what we can do other than that ...
If we have the time maybe we could PR a fix?!
GrittyKangaroo27 any chance you can open a GitHub issue so this is not forgotten ?
(btw: we I think 1.1.6 is going to be released later today, then we will have a few RC with improvements on the pipeline, I will make sure we add that as well)
ContemplativeCockroach39 unfortunately No directly as part of clearml π
I can recommend the Nvidia triton serving (I'm hoping we will have the out-of-the-box integration soon)
mean while you can manually run it , see docs:
https://developer.nvidia.com/nvidia-triton-inference-server
docker here
https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver
I'm still unclear on why cloning the repo in use happens automatically for the pipeline task and not for component tasks.
I think in the pipeline it was the original default, but it turns out for a lot of users this was not their defualt use case ...
Anyhow you can also pass repo="."
which will load + detect the repo in the execution environemtn and automatically fill it in
Hi QuaintPelican38
Can you ssh to {instance_public_ip_address}:10022 (something like ssh -p 10022 user@IP_HERE
)?
Basically just getting the password prompt means you are okay.
I suspect that you have some AWS security definition (firewall) that prevents a direct access to the instance, could that be?
Is there any better way to avoid the upload of some artifacts of pipeline steps?
How would you pass "huge datasets (some GBs)" between different machines without storing it somewhere?
(btw, I would also turn on component caching so if this is the same code with the same arguments the pipeline step is reused instead of reexecuted all over again)
Makes sense to add it to docker run by default if GPUs are mentioned in agent.
I think this is an arch thing, --privileged is not needed on ubuntu flavor, that said you can always have it if you add it here:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L149
clearml-agent daemon --gpus 0 --queue default --docker
But docker still sees all GPUs.
Yes --gpus should be enough, are you sure regrading the --privileged flag ?
One thing though - I am running agent on behalf of a regular user.
Oh that might be credentials / docker service issue (i.e. the user might not have the ability to rn a docker with --gpus, but as you mentioned,, that seems like an arch thing π )
The latest image seems to require drivers on the host 460+
try this one:
https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel_20-12.html#rel_20-12
Why can we even change the pip version in the clearml.conf?
LOL mistakes learned the hard way π
Basically too many times in the past pip versions were a bit broken, which is fine if they are used manually and users can reinstall a diff version, but horrible when you have an automated process like the agent, so we added a "freeze version" option, only with greater control. Make sense ?
Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633
Optional[Sequence[Union[str, Dataset]]]
None, list of string or list of Datasets objects
(each one is a parent (supporting multiple parents)