is it possible to perform debugging operations with pycharm integration using remote session?
Sure, use clearml-session it will open an ssh connection to the remote machine, then you can use pycharm
GiddyTurkey39 can you ping the server-address
(just making sure, this should be the IP of the server not 'localhost')
his means that you guys internally catch the argparser object somehow right?
Correct π this is how you get the type checking casting abilities, and a few other perks
I imagine that these phantom dependencies will prevent parallelization. Is there a workaround?
yes, they might... workaround might be a bit ugly but copy pasting the functions and changing the name
BTW: I'll check when is the next RC scheduled for, maybe it will already contain a fix π€
SmarmySeaurchin8 regarding the original question:task.set_project(project_id)
Task.get_projects() to get all the project names/ids
Let's try:
` echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && b...
WackyRabbit7
Cool - so that means the fileserver which comes with the host will stay emtpy? Or is there anything else being stored there?
Debug Images and artifacts will be automatically stored to the file server.
If you want your models to be automagically uploaded add the following :task=Task.init('example', 'experiment', output_uri='
')
(You can obviously point it to any other http/S3/GS/Azure storage)
Thanks VexedCat68 !
This is a great example, maybe PR it to the cleamrl-servvng repo ? wdyt?
CooperativeSealion8
when it first asks me to enter my full name
Where? in the Web?
Hi LovelyHamster1
That is a good point, I think the safest / robust way is to configure both to use the same dns name/s so both (internal/external) are accessible.
Some background, the URL itself on the artifact is basically a standalone, once registered on the Task, the UI will not replace it but use it as is (The UI has no "understanding" on which server it is, it will just fetch the file).
Are you also using a diff port on the load balancer ?
(because the easiest fix is on your external ...
that machine will be able to pull and report multiple trials without restarting
What do you mean by "pull and report multiple trials" ? Spawn multiple processes with different parameters ?
If this is the case: the internals of the optimizer could be synced to the Task so you can access them, but this is basically the internal representation, which is optimizer dependent, which one did you have in mind?
Another option is to pull Tasks from a dedicated queue and use the LocalClearMLJob ...
Hi YummyFish22
Looks like the task does not have "Task.init" call on the main script (or an import of clearml)? could that be the case?
Hi SpotlessFish46 ,
Is the artifact already in S3 ?
Is the S3 configured as the default files_server in the trains.conf
?
You can always use the StorageManager upload to wherever and register the url on the artifacts.
You can also programmatically change the artifact destination server to S3, then upload the artifact as usual.
What would be the best natch for you?
Gitlab has support for S3 based cache btw.
This might still be considered "slow" compared to local-dist/cluster mount
Would adding support for some sort of post task script help? Is something already there?
Interesting, can you expand on the use case? (currently there is only pre-task script, for setup)
By default the agent will add the root of the git repository into the pythonpath , so that you can import...
I'm getting:hydra_core == 1.1.1
What's the setup you have? python version, OS, Conda yes/no?
The latest RC (0.17.5rc6) moved all logs into separate subprocess to improve speed with pytorch dataloaders
The reasoning is that most likely simultaneous processes will fail on GPU due to memory limit
you mean The Task already exists or you want to create a Task from the code ?
MagnificentSeaurchin79 you can delay it with:task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
RipeGoose2 you can put ut before/after the Task.init, the idea is for you to set it before any of the real training starts.
As for not effecting anything,
Try to add the callback and just have it returning None (which means skip over the model log process) let me know if this one works
it knows itβs a notebook and automatically adds the notebook as an artifact right?
correct
and the uncommited changes becomes the nottebook converted to a script?
correct
In one case I am seeing actual git diff coming in instead of the notebook.
it might be there is both a git repository and a notebook and the git diff will show before the notebook is detected and shown instead ? (there is a watchdog refreshing the notebook every 30sec or so)
I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?
I'm pretty sure there is a nice way, let me check soemthing
Hi UnsightlySeagull42
How can I reproduce this behavior ?
Are you getting all the console logs ?
Is it only the Tensorboard that is missing ?
p.s. you should remove this line πextra_index_url: ["git@github.com:salimmj/xxxx"]
Aws autoscaler will work with iam rules along as you have it configured on the machine itself. Sagemaker job scheduling (I'm assuming this is what you are referring to, and not the notebook) you need to select the instance as well (basically the same as ec2). What do you mean by using the k8s glue, like inherit and implement the same mechanism but for sagemaker I stead of kubectl ?
Can you see it on the console ?