Reputation
Badges 1
25 × Eureka!Hi SmugDog62
My guess is that there's an issue with the git repo detector.
Seems like you are correct
Can are you getting on the execution tab?
Is the repo correct?
Do you see the notebook in the uncommited changes ?
In that case you should probably mount the .ssh
from the host file-system into the docker. for example:docker run -v /home/user/.ssh:/root/.ssh ...
WickedGoat98 the above assumes your are running the docker manually, if you are using docker-compose.yml file the same mount should be added to the docker-compose.yml
Hi GreasyPenguin14
It looks like you are trying to delete a Task that does not exist
Any chance the cleanup service is misconfigured (i.e. accessing the incorrect server) ?
https://github.com/allegroai/clearml/issues/199
Seems already supported for a while now ...
You mean parameters of the pipeline? Is this a pipeline from Tasks or from function decorator?
In theory it should not, in practice you could run out of space while running the experiment itself...
You can always cleanup everything from time to time (maybe worth a flag?)
Ohh! I see now
@<1526371965655322624:profile|NuttyCamel41> the "backend: "pytorch" is not really supported because it does not use the optimized Triron engine (which is the reason to run Triron server)
In order to use pytorch you need to convert it to torchscript and then deploy, see example here:
None
[None](https://github.com/allegroai/clearml-serving/blob/7ba356efc97a6ae2159283d198d981b3c1ab85e6/examples/pytor...
RoundMosquito25 do notice the agent is pulling the code from the remote repo, so you do need to push the local commits, but the uncommitted changes clearml will do for you. Make sense?
Hi HappyDove3task.set_script
is a great way to add the info (assuming the .git is missing)
Are you running it using PyCharm? (If so use the clearml pycharm plugin, it basically passes the info from your local git to the remote machine via OS environment)
It runs directly but leads to the above error with clearml
Both manually (i.e. calling Task.init and running it without agent, and with agent ? same exact behavior ?
You should manually remove the cudatoolkit from the installed packages section in the UI, then try to send it to the agent and see if it works. The question is how it ended there in the first place
Hi MelancholyBeetle72
You mean the venv creation takes the bulk of the time, or it something else ?
Yep the automagic only kick in with Task.init... The main difference and the advantage of using a Dataset object is the underlying Task resides in a specific structure that is used when searching based on project/name/version, but other than that, it should just work
I also found that you should have a deterministic ordering
before
you apply a fixed seed
Not sure I follow ?
yup, i updated this in my local clearml.conf... Or should be updating this elsewhere as well
On the agent's machine, you should update the default_output_uri. Make sense ?
Hi BoredHedgehog47 I'm assuming the nginx on the k8s ingest is refusing the upload to the files server
JuicyFox94 wdyt?
Hi FlutteringWorm14
Is there some way to limit that?
What do you mean by that? are you referring to the Free tier ?
VictoriousPenguin97 basically spin down sereverA (this should flush all DBs) then copy /opt/clearml to the new server and spin it with docker-compose. As long as the new server is on the same address as the previous one, everything should work out of the box
Hi FierceHamster54
Dataset is downloading multi threaded already
But yes get_local_copy() is thread / process safe
Hi ShortElephant92
You could get a local copy from the local server, then switch credentials to the hosted server and upload again, would that work?
VexedCat68 yes 🙂 you can also pass the parent folder and it will zip the entire subfolders into a single artifact
This task is picked up by first agent; it runs DDP launch script for itself and then creates clones of itself with task.create_function_task() and passes its address as argument to the function
Hi UnevenHorse85
Interesting use case, just for my understanding, the idea is to use ClearML for the node allocation/scheduling and PyTorch DDP for the actual communication, is that correct ?
passes its address as argument to the function
This seems like a great solution.
the queu...
The file is never touched, nowhere in the process that file is deleted.
it should never have gotten there, this is not the git repo folder, it one level above...
but I still need the laod ballancer ...
No you are good to go, as long as someone will register the pods IP automatically on a dns service (local/public) you can use the regsitered address instead of the IP itself (obviously with the port suffix)
Thanks for your support
With pleasure!