
Reputation
Badges 1
25 × Eureka!Hi @<1661542579272945664:profile|SaltySpider22> I'm not sure I understand the answer to my parallel quesion
I have no idea what string reference could be used when steps come from Task?
Oh I see, you are correct, when it comes to Tasks the assumption is your are passing strings (with selectors on the strings, i.e. the curly brackets) but there is no fancy serialization/deserialization as you have with pipelines from decorators / functions. The reason for that is that the Task itslef is a standalone, there is no way for the pipeline logic to actually "pull data" from it and "pass" it to the o...
When looking at the worker details, it says "No queues currently assigned to this worker"
Yes, I think we should have better information there, the "AWS service" is not directly pulling jobs from any specific queue, hence nothing there. It is "listening" to queues and launching machines, those machines will be listening to the queue. I wonder if it is just easier to also make sure it is listed as "assigned" to those queues . wdyt?
Hi @<1576381444509405184:profile|ManiacalLizard2>
If you make sure all server access is via a host name (i.e. instead of IP:port, use host_address:port), you should be able to replace it with cloud host on the same port
FreshParrot56 we could add this capability, but the main caveat is that f your version depends on multiple parent versions you still need to download and extract all the parent versions, which means that when you clear them you might hurt later performance. Does that make sense? What is the use-case / scenario for you?
That would be great! Might have to useΒ
2>/dev/null
Β in some of my bash scripts
Feel free to test and PR :)
One other question regarding connecting. We have setup sshd inside the docker image we are using.
Actually the remote session opens port 10022 on the host machine (so it does not collide with the default ssh port)
It actually runs an additional sshd
inside the docker, setting its port.
And the clearml-session will ssh directly into the container sshd...
Gitlab has support for S3 based cache btw.
This might still be considered "slow" compared to local-dist/cluster mount
Would adding support for some sort of post task script help? Is something already there?
Interesting, can you expand on the use case? (currently there is only pre-task script, for setup)
Basically if I pass an arg with a default value of False, which is a bool, it'll run fine originally, since it just accepted the default value.
I think this is the nargs="?"
, is that right ?
So the original looks good, could it be you tried to clone a Task that was executed with an agent with pip, and then pushed into an agent running conda?
Sure thing!
BTW: not sure if it helps but the SaaS version integrates with Genesis Cloud I know they provide cheap GPUs might be worth checking
See here:
https://download.pytorch.org/whl/torch_stable.html
cu110/* has no torch 1.3.1 only 1.7.0
Hi @<1600299043865497600:profile|MagnificentSeaurchin90>
Any chance you can provide more info on the error?
if I want to compare two experiments the scalar plots do not load ( loading forever ).
I'm assuming the issue is the Plots tab? or is it the Scalars? what do you have in the Plots? can you send an image of the single experiment ?
Hi TrickyRaccoon92
... would any running experiment keep a cache of to-be-sent-data, fail the experiment, or continue the run, skipping the recordings until the server is back up?
Basically they will keep trying to send data to server until it is up again (you should not loose any of the logs)
Are there any clever functionality for dumping experiment data to external storage to avoid filling up the server?
You mean artifacts or the database ?
you could also use:
https://github.com/allegroai/clearml/blob/ce7e77a00e869a2690f31cbc578636ce88bc4613/docs/clearml.conf#L188
and setup the clearml.conf
on the users machine to automatically log the environment variables at run time (stored under the Configuration tab).
Then the agent will pull these same variables at execution time and set them
Hi @<1523702000586330112:profile|FierceHamster54>
Nope π nothing to worry about.
That said do notice the open-source file-server is not secure, this does not mean it will spill data on the server, but it does mean that you should probably put it behind a VPN or use S3/GCP/Azure if this is open to the public internet
yeah I tend to agree... keep me posted hen you find the root cause π€
However, the pipeline experiment is not visible in the project experiment list.
I mean press on the "full details" in the pipeline page
Hi TrickySheep9
You should probably check the new https://github.com/allegroai/clearml-server-helm-cloud-ready helm chart π
https://github.com/allegroai/clearml-server-helm-cloud-ready
So on the ec2 instance (with the agent running), just install prior to running the agent:apt-get install poppler-utils
Very Cool!
BTW guys, are you using the task.models[]
to continue from the last checkpoint? or is it task.artifacts[]
?
So net-net does this mean itβs behaving as expected,
It is as expected.
If no "Installed Packages" are listed, then it cannot pull a cached venv (because requirements.txt is not a full env, and it never analyzed it)).
It does however create a venv cache based on it (after installing it)
The Clone of this Task (i.e. right click on the UI clone experiment, enqueue it, Will use the cached copy becuase the full packages are listed in the "Installed Packages" section of the Task.
Make sens...
Actually you cannot breakpoint at "atexit" calls (or at least doesn't work with my gdb)
But I would add a few prints here:
https://github.com/allegroai/clearml/blob/aa4e5ea7454e8f15b99bb2c77c4599fac2373c9d/clearml/task.py#L3166
Hi PanickyAnt52
hi, is there a way to get back the pipeline object when given a pipeline id?
Yes basically this is a specific type of Task, anything you stored on it can be accessed via the Task object, i.e. pipeline_task=Task.get_task(pipeline_id)
I'm curious, how would you use it?
BTW: since pipeline is also a Task you can have a pipeline launch a step that is a pipeline by its own
Hi EnviousStarfish54
docker on windows , with nvidia runtime support is only with WSL (I think)
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-wip
https://medium.com/@dalgibbard/docker-with-gpu-support-in-wsl2-ebbc94251cf5
Hi SparklingHedgehong28
What would be the use for "end of docker hook" ? is this like an abort callback? completion ?
instance protection
Do you mean like when instance just died (line spot in AWS) ?
GiganticTurtle0 can you please add a github issue with feature request to clearml-agent? I think this is a great use case!