
Reputation
Badges 1
25 × Eureka!Notice the pipeline step/Task at execution is not aware of the pipeline context
Do we support GPUs in a) docker mode b) k8s glue?
yes on both
Is there a good reference to get started with k8s glue?
A few folks here already set it up, do you have a k8s cluster with GPU support ?
Hmm so I guess the actual code adds it into the reporting itself ...
How about we call:task.set_initial_iteration(0)
Ok, so it doesn't follow the exact same rules asΒ
Task.init
?
Correct
I was afraid all the logs and outputs of a hyperparameter optimization task would be deleted just because no artifacts were created.Β (edited)
Should not happen π
Is there any contingency plan for an agent to continue running a task without reading the repository on the GitLab server?
Not sure what can be done ... any suggestions ?
At runtime, can I ask the agent to use some cached repository?
sometimes you will have it (as the agent stores a cached copy, but I would hardly count on it (and it might be at different states on different machines...)
... (due to regular maintenance service, something I cannot control).
Maybe let "th...
GiganticTurtle0 is there any git redundancy on your network ? maybe you could configure a fallback server ?
Hi SubstantialElk6
I can't see that is was removed, could you send the full log ?
SubstantialElk6 could you add a github issue to set the direct url for the vscode as a parameter to the cleaml-session?
We already have --vscode-version
we could either extend it to include a direct url, or add a new argument.
wdyt ?
JitteryCoyote63 yes this is very odd, seems like a pypi flop ?!
On the website they do say there is 0.5.0 ... I do not get it
https://pypi.org/project/pytorch3d/#history
Could that be the proper way to install ?
https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md#3-install-wheels-for-linux
Thanks @<1523702652678967296:profile|DeliciousKoala34> I think I know what the issue is!
The container has 1.3.0a and you need 1.3.0 this is why it is re-downloading (I'll make sure the agent can sort it out, becuase this is Nvidia's version in reality it should be a perfect match)
I prepared my own image and want use this venv
No worries, it creates a "transparent" venv, it uses everything from the docker (the penalty of create a new venv is negligible π , you end up with the exact same set of packages)
Hello guys, i have 4 workers (2 in default and 2 in service queue on same machine)
Hi @<1526734437587357696:profile|ShaggySquirrel23>
I think what happens is one agent is deleting it's cfg file when it is done, but at least in theory each one should have it's own cfg
One last request can you try with the agent's latest RC version 1.5.3rc2 ?
Thank you!
one thing i noticed is that it's not able to find the branch name on >=1.0.6x , while on 1.0.5 it can
That might be it! let me check the code again...
Are there any services OOB like this?
On the open-source, I can't recall any but will probably be easy to write. Paid tier might have an offering though, not sure π
Hi QuaintJellyfish58
This is odd, this "undefined" project is also marked as "Example" which would explain why you cannot delete it, but not how you ended up with one
Any idea on what changed on your server ?
overrides -> "kubectl run --overrides "
template -> "kubectl apply template.yaml"
But how do you specify the data hyperparameter input and output models to use when the agent runs the experiment
They are autodetected if you are using Argparse / Hydra / python-fire / etc.
The first time you are running the code (either locally or with an agent), it will add the hyper parameter section for you.
That said you can also provide it as part of the clearml-task
command with --args
(btw: clearml-task --help
will list all the options, https://clear.ml/docs/...
So you are uploading a local file (stored in a Dataset) into GS bucket? may I ask why ?
Regrading usage (I might have a typo but this is the gist):torageManager.upload_file( local_file=separated_file_posix_path, remote_url=remote_file_path + separated_file_posix_path.relative_to(files_rgb) )
Notice that you need to provide the full upload URL (including path and file name to be used on your GS storage)
This is done in the background while accessing the cache, so it should not have any slowdown effect
Hi @<1720249421582569472:profile|NonchalantSeaanemone34>
Sorry I missed this message. Yeah the reason it's not working is because the way the returned value is stored and passed is by using 'pickle' , unfortunately python pickle does not support storing lambda functions...
https://docs.python.org/3/library/pickle.html#id8
Correct, which makes sense if you have a stochastic process and you are looking for the best model snapshot. That said I guess the default use case would be min/max (and not the global variant)
OutrageousGrasshopper93 could you send an example of the two links from the artifacts (one local one remote) ?
@<1547390422483996672:profile|StaleElk72> when you go to the dataset in the UI, and press on "Full Details" then go to the Artifacts tab, what is the link you see there?
Hi @<1547028116780617728:profile|TimelyRabbit96>
It should process the new request A (this is a multi threading / async implementation)
Is this consistent with what you are seeing ?
Is it possibe to launch a task from Machine C to the queue that Machine B's agent is listening to?
Yes, that's the idea
Do I have to have anything installed (aside from theΒ
trains
Β PIP package) on Machine C to do so?
Nothing, pure magic π
Hi FunnyTurkey96
Which pip are you using, basically pip changed the dependency resolver after 20.1
Change: https://github.com/allegroai/clearml-agent/blob/aede6f4bac71c8fc56e7cf982318a48527953a3c/docs/clearml.conf#L57pip_version: "<20.2"
See if that helps
it should be fairly easy to write such a daemon
from clearml.backend_api.session.client import APIClient
client = APIClient()
timestamp = time() - 60 * 60 * 2 # last 2 hours
tasks = client.tasks.get_all(
status=["in_progress"],
only_fields=["id"],
order_by=["-last_update"],
page_size=100,
page=0,
created =[">{}".format(datetime.utcfromtimestamp(timestamp))],
)
...
references:
[None](https://clear.ml/...