Reputation
Badges 1
25 × Eureka!Edit the cloned version and enqueue it?
are you using matplotlib ? could it be the binding check if matplotlib exists ? could it be you are running it with DEBUG on (i.e. global log level debug) ?
can you bump me to that thread?
https://clearml.slack.com/archives/CTK20V944/p1630610430171200
I realise I'll need to catalogue all the dataset ids created by ppl separately on a spreadsheet.
Okay this part I missed, why would you need to add additional "catalog" when you have the UI?
Hi @<1570583227918192640:profile|FloppySwallow46>
Not sure I follow, could you explain ?
Hi @<1524560082761682944:profile|MammothParrot39>
By default you have the last 100 iterations there (not sure why you are only seeing the last 3), but this is configurable:
None
Hi @<1541954607595393024:profile|BattyCrocodile47>
Did you check None ?
You are not supposed to do 2,3,4
After (1) you should just do
ssh root@localhost -p 8022
and provide the password that is written in the CLI
(Notice to pass --public-ip
if your remote machine is using a public IP you can access)
do you have a video showing the use case for clearml-session
I totally think we should, I'll pass it along π
what is the difference between vscode via clearml-session and vscode via remote ssh extension ?
Nice! remote vscode is usually thought of as SSH, basically you have your vscode running on your machine, and using SSH vscode automatically connects to the remote machine.
Clearml-Session also ads a new capability VSCode inside your browser, where the VSCode itself as well...
I'm guessing this is done through code-server?
correct
I'm currently rolling a JupyterHub instance (multiuser, with codeserver inside) on the same machine as clearml-server. Thatβs where tasks are executed etc. so, all browser dev env.
Yeah, the idea with clearml-session each user can self serve themselves the container that works best for them. With a jupyterhub they start to step on each other's toes very quickly ...
Can you please elaborate on the latter point? My jupyterhubβs fully containerized and allows users to select their own containers (from a list i built) at launch, and launch multiple containers at the same time, not sure I follow how toes are stepped on. (edited)
Definitely a great start, usually it breaks on memory / GPU-mem where too many containers on the same machine are eating each others GPU ram (that cannot be virtualized)
I guess only if autoscaling is used (one worker one machine)?
yes, basically depending on how you set autoscaling / k8s integration π
exactly! it is very cool to see it in action, and it really works very well, kudos for these guys
Hi @<1539055479878062080:profile|FranticLobster21>
Like this?
https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[β¦]ation/hyper-parameter-optimization/hyper_parameter_optimizer.py
[https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[β¦]ation/hyper-parameter-opt...
In that case, I think it is stuck on a previous Node, I can;t think of any other reason.
Do you have something else on the same PV that was lost ? like api server configuration?
This will mount the trains-agent machine's hosts file into the docker
Actually, dumb question: how do I set the setup script for a task?
When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait π
Hi @<1641611252780240896:profile|SilkyFlamingo57>
. It is not taking a new pull from Git repository.
When you are saying it's not trying to get the latest, are you referring to a new run of the pipeline, and then the component being pulled is Not pulling the latest from the branch, is that the issue?
When you click on the component Task details (i.e. right hand side panel "Full details"), what's the commit ID you have?
Lastly, is the component running on the same machine as the prev...
Could it be the credentials are actually incorrect? because it seems like you can access the server? (I assume you were able to browse to it and generate credentials. right?)
Seems lime someone sitting in the middle and reroutes the request (maybe both https and port) ?!
According to you the VPN shouldn't be a problem right?
Correct as long as all parties are on the same VPN it should work, all the connections are always http so basically trivial communication
Basically try with the latest RC π
pip install trains 0.15.2rc0
PompousParrot44 Enterprise licensing pricing usually custom tailored to the size of the company and based on usage. If you are interested feel free to leave details in the "contact us" form on the website, and someone from sales will contact you shortly after.
Yea the "-e ." seems to fit this problem the best.
π
It seems like whatever I add to
docker_bash_setup_script
is having no effect.
If this is running with the k8s glue, there console out of the docker_bash_setup_script ` is currently Not logged into the Task (this bug will be solved in the next version), But the code is being executed. You can see the full logs with kubectl, or test with a simple export test
docker_bash_setup_script
` export MY...
assuming you have http://hparams.my _param
my suggestion is:
` @hydra.main(config_path="solver/config", config_name="config")
def train(hparams: DictConfig):
task = Task.init(hparams.task_name, hparams.tag)
overrides = {'my_param': hparams.value}
task.connect(overrides, name='overrides')
in remote this will print the value we put in "overrides/my_param"
print(overrides['my_param'])
now we actually use overrides['my_param'] `Make sense ?
I think I found something, let me see if I can reproduce it
SuperiorDucks36 , is the domain name "rz-s-git" this does not seem like a valid domain?
EDIT:
Is it a local domain on your network?