Reputation
Badges 1
147 × Eureka!Then I ssh into the remote machine using ngrok hostname and tunnel the port for Jupyter
` Remote machine is ready
Setting up connection to remote session
Starting SSH tunnel
Warning: Permanently added '<CENSORED>' (ECDSA) to the list of known hosts.
Enter passphrase for key '/Users/jevgenimartjushev/.ssh/id_rsa': <CENSORED>
SSH tunneling failed, retrying in 3 seconds `
For others, who havenāt heard about ngrok:Ngrok exposes local servers behind NATs and firewalls to the public internet over secure tunnels.
it is missing in CLI, but I was able to set external_ssh_port
and external_address
in GUI. It was certainly a step forward, but still failed
I guess thatās because ngrok is not like a Dynamic DNS
in the far future - automatically. In the nearest future - more like semi-manually
like replace a model in staging seldon with this model from clearml; push this model to prod seldon, but in shadow mode
we are just entering the research phase for a centralized serving solution. Main reasons against clearml-serving triton are: 1) no support for kafka 2)no support for shadow deployments (both of these are supported by Seldon, which is currently the best=looking option for us)
I already added to the task:Workaround: Remove
limit_execution_time from
scheduler.add_task
I tried this, but didnāt help:input_models = current_task.models["input"] if len(input_models) == 1: input_model_as_input = {"name": input_models[0].name, "type": ModelTypeEnum.input} response = current_task.send(DeleteModelsRequest( task=current_task.task_id, models=[input_model_as_input] ))
I am not registering a model explicitly in apply_model
. I guess it is done automatically when I do this:output_models = train_task_with_model.models["output"] model_descriptor = output_models[0] model_filename = model_descriptor.get_local_copy()
clearml==1.5.0
WebApp: 1.5.0-192 Server: 1.5.0-192 API: 2.18
log:[2021-09-09 11:22:09,339] [8] [WARNING] [clearml.service_repo] Returned 400 for tasks.dequeue in 2ms, msg=Invalid task id: id=28d2cf5233fe41399c255950aa8b 8c9d,company=d1bd92a3b039400cbafc60a7a5b1e52b
I think they appeared when I had a lot of HPO tasks enqueued and not started yet, and then I decided to either Abort or Archive them - I donāt remember already
for the tasks that are not deleted, log is different:[2021-09-09 12:19:07,718] [8] [WARNING] [clearml.service_repo] Returned 400 for tasks.dequeue in 4ms, msg=Invalid task id: status=stopped, expected=queued
this does not prevent from enqueuing and running new tasks, rather an eyesore
no new unremovable entries have appeared (although I havenāt tried)
or somehow, we can centralize the storage of S3 credentials (i.e. on clearml-server) so that clients can access s3 through the server
AgitatedDove14 I did exactly that.
the task is running, but no log output from fil-profiler (when ran totally locally, then it does some logging at the very beginning)
yes, but note that Iām not talking about VS Code instance set up be clearml-session, but about a local one. Iāll do another test to determine whether VS Code from clearml-session suffers from the same problem
yeah, I missed the fact that Iām running it not by opening remote jupyter in browser, but by connecting to remote jupyter with local VS Code
also, I tried running the notebook directly in remote jupyter - I see correct uncommitted changes
so I assume itās somehow related to remote connection form VS Code
Adding venv into cache: /root/.clearml/venvs-builds/3.8 Running task id [aa2aca203f6b46b0843699d1da373b25]: [.]$ /root/.clearml/venvs-builds/3.8/bin/python -u '/root/.clearml/venvs-builds/3.8/code/-m filprofiler run catboost_train.py'
Basically, my problem is that it returns empty result. In the same code I can get dataset by its ID and I can get the task (which created the dataset) usingTask.get_tasks()
(without mentioning th ID explicitly)