So if everything works you should see "my_package" package in the "installed packages"
the assumption is that if you do:pip install "my_package"
It will set "pandas" as one of its dependencies, and pip will automatically pull pandas as well.
That way we do not list the entire venv you are running on, just the packages/versions you are using, and we let pip sort the dependencies when installing with the agent
Make sense ?
Hmm should not make a diff.
Could you verify it still doesn't work with TF 2.4 ?
Then running by using the
, am I right?
yep
I have put the
--save-period
while running Yolov5 and ClearML does not save the weight per epoch that I have trained. Why is this happened?
But do you still see it in the clearml UI ? do you see the models logged in the clearml UI ?
Hi @<1542316991337992192:profile|AverageMoth57>
Not sure I follow how the integration what you have in mind regarding Gerrit integration None
Sounds interesting ...
wdyt?
ssh: Could not resolve hostname
: Name or service not known
@<1542316991337992192:profile|AverageMoth57> so is this the main issue? this seems unrelated to the Gerrit thing, just missing configuration of the .ssh on the agent machine, is that correct?
Hi @<1552101458927685632:profile|FreshGoldfish34>
self-hosted, you mean the open source ? if so, then yes totally free ๐
That said I would recommend to have the server inside your VPN, just in case from a security perspective
BTW: there is a full Pipeline class that does everything for you, example here:
https://github.com/allegroai/clearml/tree/master/examples/pipeline
Just run once (from your python console / pycharm etc.):
https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py
Hi AgitatedTurtle16
You can find documentation here:
https://github.com/allegroai/clearml-session
Basically it uses the cleaml-agents to launch a session on one of the machines in the cluster.
In the remote session itself it install jupyterlab + vscode-server, then it connects to the remote session (running on the agent's machine) automatically over ssh and creates tunnel to these services.
Hi JumpyDragonfly13
I don't know why I'm gettingย
172.17.0.2
I think it (the remote jupyter Task) fails to get the correct IP address of the server.
You can manually correct it by going to the DevOps project, look for the runnig Task there, then under Configuration/Properties change external_address
to the actual IP 10.19.20.15
Once that is done, re-run the clearml-session
, it will suggest to connect to the running session, it should work....
BTW:
I'd like...
Basically run the 'agentin virtual environment mode JumpyDragonfly13 try this one (notice no --docker flag)
clearml-agent daemon --queue interactive --create-queue Then from the "laptop" try to get a remote session with:
clearml-session `
UnevenDolphin73 FYI: clearml-data is documented , unfortunately only in GitHub:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Hi IntriguedRat44
Sorry, I missed this message...
I'm assuming you are running in manual mode (i.e. not through the agent), in that case we do not change the CUDA_VISIBLE_DEVICES.
What do you see in the resource monitoring? Is it a single GPU or multiple GPUs?
(Check the :monitor:gpu in the Scalar tab under results,)
Also what's the Trains/ClearML version you are suing and the OS ?
IntriguedRat44 If the monitoring only shows a single GPU (the selected one) it means it reads the correct CUDA_VISIBLE_DEVICES (this is how it knows that you are only using a selected GPU not all of them).
There is nothing else in the code that will change the OS environment.
Could you print os.environ['CUDA_VISIBLE_DEVICES'] while running the code to verify ?
IntriguedRat44 could I ask you to open a GitHub issue on it?
I really do not want it to slip through our fingers...
(BTW: meanwhile I was not able to reproduce it, what's the OS / nvidia drivers you are using )?
IntriguedRat44 how do I reproduce it ?
Can you confirm that marking out the Task.init(..) call will fix it ?
Interesting... TrickyRaccoon92 could it be the validation phase was creating a new Tensorboard file ?
Hi @<1523707131994312704:profile|CrabbyKoala94>
I wanted to use method Task.reset() or Task.delete() however none of that seems to be able to delete
only
the logs in the "console" section in the UI.
So Task.reset
will reset the entire outputs of the Task (and the status), as you noticed. Why would you want to just remove the logs?
You can disable the auto logs altogether if you really want to, see Task.init [auto_connect_streams](https://github.com/allegroai/cl...
I want to be able to delete only the logs since they are taking a lot of space in my case.
I see... I do not think this is possible ๐
You can disable the auto logging though ... pass auto_connect_streams=False
to Task.init
SmugDog62 so on plain vanilla Jupyter/lab everything seems to work.
What do you think is different in your setup ?
@<1545216070686609408:profile|EnthusiasticCow4>git+ssh://
will be converted automatically to git+https
if you have user/pass ocnfigured in your clearml.conf on the agent machine.
More over, git packages are always installed After all other packages are installed (because pip cannot resolve the requirements inside the git repo in time)
Are these experiments logged too (with the train-valid curves, etc)?
Yes every run is log as a new experiment (with it's own set of HP). Do notice that the execution itself is done by the "trains-agent". Meaning the HP process creates experiments with new set of HP an dputs them into the execution queue, then trains-agent
pulls them from the queue and starts executing them. You can have multiple trains-agent
on as many machines as you like with specific GPUs etc. each one ...
Well that depends on how you think about the automation. If you are running your experiments manually (i.e. you specifically call/execute them), then at the beginning of each experiment (or function) call Task.init
and when you are done call Task.close
. This can be done in parallel if you are running them from separate processes.
If you want to automate the process, you can start using the trains-agent
which could help you spin those experiments on as many machines as you l...
Hi SourSwallow36
What do you man by Log each experiment separately ? How would you differentiate between them?