Reputation
Badges 1
25 × Eureka!Could you post what you see under "installed packages" in the UI ?
Try:task.update_requirements('\n'.join([".", ]))Β
Exactly, thatβs my problem: I want to remove it to make sure it is reinstalled (because the version can change)
JitteryCoyote63 yes, this is definitely a pip bug... can you test with the latest pip version, maybe it was fixed? (i.e. git+https:// link)
With env caching enabled, it wonβt reinstall this private dependency, right?
It will, local packages (".") and git packages are alwyas reinstalled even if using venv caching, exactly for that reason π
Ohh so the setup.py is the one containing these requirements, oops I totally missed that :( let me check what pep has to say about that ... (Basically this is not a clearml issue but a pip one...)
error in my-package setup command:
Okay this seems like an error in the setup.py you have in the "mypackage" folder
and when you remove the "." line does it work?
oh dear π if that's the case I think you should open an Issue on pypa/pip , I'm not sure what we can do other than that ...
If we have the time maybe we could PR a fix?!
GrittyKangaroo27 any chance you can open a GitHub issue so this is not forgotten ?
(btw: we I think 1.1.6 is going to be released later today, then we will have a few RC with improvements on the pipeline, I will make sure we add that as well)
ContemplativeCockroach39 unfortunately No directly as part of clearml π
I can recommend the Nvidia triton serving (I'm hoping we will have the out-of-the-box integration soon)
mean while you can manually run it , see docs:
https://developer.nvidia.com/nvidia-triton-inference-server
docker here
https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver
I'm still unclear on why cloning the repo in use happens automatically for the pipeline task and not for component tasks.
I think in the pipeline it was the original default, but it turns out for a lot of users this was not their defualt use case ...
Anyhow you can also pass repo="."
which will load + detect the repo in the execution environemtn and automatically fill it in
Hi QuaintPelican38
Can you ssh to {instance_public_ip_address}:10022 (something like ssh -p 10022 user@IP_HERE
)?
Basically just getting the password prompt means you are okay.
I suspect that you have some AWS security definition (firewall) that prevents a direct access to the instance, could that be?
Is there any better way to avoid the upload of some artifacts of pipeline steps?
How would you pass "huge datasets (some GBs)" between different machines without storing it somewhere?
(btw, I would also turn on component caching so if this is the same code with the same arguments the pipeline step is reused instead of reexecuted all over again)
Makes sense to add it to docker run by default if GPUs are mentioned in agent.
I think this is an arch thing, --privileged is not needed on ubuntu flavor, that said you can always have it if you add it here:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L149
clearml-agent daemon --gpus 0 --queue default --docker
But docker still sees all GPUs.
Yes --gpus should be enough, are you sure regrading the --privileged flag ?
One thing though - I am running agent on behalf of a regular user.
Oh that might be credentials / docker service issue (i.e. the user might not have the ability to rn a docker with --gpus, but as you mentioned,, that seems like an arch thing π )
The latest image seems to require drivers on the host 460+
try this one:
https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel_20-12.html#rel_20-12
Why can we even change the pip version in the clearml.conf?
LOL mistakes learned the hard way π
Basically too many times in the past pip versions were a bit broken, which is fine if they are used manually and users can reinstall a diff version, but horrible when you have an automated process like the agent, so we added a "freeze version" option, only with greater control. Make sense ?
Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633
Optional[Sequence[Union[str, Dataset]]]
None, list of string or list of Datasets objects
(each one is a parent (supporting multiple parents)
Bottom line the driver version in the host machine does not support the CUDA version you have in the docker container
I think it would make sense to have one task per run to make the comparison on hyper-parameters easier
I agree. Could you maybe open a GitHub issue on it, I want to make sure we solve this issue π
It's a running number because PL is creating the same TB file for every run
Hi LovelyHamster1
That is a good point, sine the Pipeline kind of assumes the task are already in the system, it clone them (leaving you with the original Draft Task).
I think we should add a flag to that pipeline that if the Task is in draft it will use it (instead of cloning it) Since it seems your pipeline is quite straight forward, I'm not sure you actually need the pipeline controller class, you can perform the entire thing manually, see example here: https://github.com/allegroai/clea...
GiddyPeacock64 Are you sending the jobs from JupyterLab Kale extension ?
EDIT:
Is the pipeline step itself calling Task.init?
We already have the feature-store to save all data, thatβs why I donβt need to save it (just a reference of version of dataset).
that makes sense, so why don't you point to the feature store ?
I can have different steps of the pipeline running on different machines. But this is not my use case.
if they are running on the same machine you can basically return a path to the local storage or change the output_uri to the local storage, this will cause them to get serialized to the l...
I could merge some steps, but as I may want to cache them in the future, I prefer to keep them separate
Makes total sense, my only question (and sorry if I'm dwelling too much in it) is how would you pass the data between step 2 to step 3, if this is a different process on the same machine ?
My bad you have to pass it to the container itself:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149extra_docker_arguments: ["-e", "CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"]