Reputation
Badges 1
25 × Eureka!Hmm, you are correct
Which means this is some conda issue, basically when installing from env file, conda is not resolving the correct pytorch version π
Not sure why... Could you try to upgrade conda ?
Hi ElegantCoyote26
is there a way to get a Task's docker container id/name?
you mean like Task.get_task("task_id_here").get_base_docker() ?
ow a Task's results page also has a plot for this, but I guess it's at the machine level and not the task level?
This is actually on the container level, meaning checked from inside the container. It should be what you are looking for
It might be the file upload was broken?
Still not supported π
That would be great! Might have to useΒ
2>/dev/null
Β in some of my bash scripts
Feel free to test and PR :)
One other question regarding connecting. We have setup sshd inside the docker image we are using.
Actually the remote session opens port 10022 on the host machine (so it does not collide with the default ssh port)
It actually runs an additional sshd inside the docker, setting its port.
And the clearml-session will ssh directly into the container sshd...
This is what I just used:
` import os
from argparse import ArgumentParser
from tensorflow.keras import utils as np_utils
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Activation, Dense, Softmax
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
from clearml import Task
parser = ArgumentParser()
parser.add_argument('--output-uri', type=str, required=False)
args =...
How so? Installing a local package should work, what am I missing?
It completed after the max_job limit (10)
Yep this is optuna "testing the water"
I could improve the cost-efficiency of my provisionned GCP A100 instances
But their pricing is linear, if you do not need a100 get a cheaper instance ?! no?
WickedGoat98 Actually the fileserver replied, so it all looks fine to me.
Try to run the text example again, see if you are still getting the fileserver error .
BeefyHippopotamus73 this error seems like it is coming from boto3, are you sure the credentials are properly configured and that you have read permission ?
Okay. AndΒ
110
Β means 11.1 and not 11.0?Β (edited)
110 means 11.0, the odd thing is, it actually installed 11.1, and from the pytorch website this is exactly how they suggest to install with conda...
Let me know if forcing the CUDA version changes anything
The problem is that clearml installsΒ
cudatoolkit=11.0
Β butΒ
cudatoolkit=11.1
Β is needed.
You suggested this fix earlier, but I am not sure why it didnt work then.
Hmm , could you test with the clearml-agent 0.17.2 ? making surethis actually solves the problem
ComfortableShark77 it seems the clearml-serving is trying to Upload data to a different server (not download the model)
I'm assuming this has to do with the CLEARML_FILES_HOST, and missing credentials. It has nothing to do with downloading the model (that as you posted, will be from the s3 bucket).
Does that make sense ?
RoughTiger69
move the files locally (i.e. based on the example move folder b into folder a ) Create a new version with two parents ('a' and 'b') then sync the local root folder ('a' in your case). Only the meta-data should change (because the referenced files are already in one of the datasets)wdyt?
WittyOwl57 what about? vm.max_map_count echo "vm.max_map_count=262144" > /tmp/99-clearml.conf
sudo mv /tmp/99-clearml.conf /etc/sysctl.d/99-clearml.conf
sudo sysctl -w vm.max_map_count=262144
sudo service docker restart `https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac (5)
AntsySeagull45 kudos on sorting it out π
quick note, trains-agent will try to run the python version specified by the original Task. i.e. if you were running python3.7 it will first try to look for python 3.7 then if it is not there it will run the default python3. This allows a system with multiple python versions to run exactly the python version you had on your original machine. The fact that it was trying to run python2 is quite odd, one explanation I can think of is if the original e...
@<1523710674990010368:profile|GreasyPenguin14> what do you mean "but I do I get the... " ?
Configuring git user/pass will allow you to launch Tasks from private repositories on the services queue (the agent is part of the docker-compose).
That said, this is not a must, worst case you'll get an error when git fails to clone your repo :)
Hi StaleHippopotamus38
I imagine I could make the changes specified in the warning toΒ
/etc/security/limits.conf
Yep seems like elastic memory issue, but I think the helm chart takes care of it,
You can see a reference in the docker compose:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L41
But in credentials creation it still shows 8008. Are there any other places in docker-compose.yml where port from 8008 to 8011 should be replaced?
I think there is a way to "tell" it what to out there, not sure:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#configuration-files
i hope can run in same day too.
Fix should be in the next RC π
What happened in the server configuration that all of a sudden you have zero ports open?
Hi ReassuredTiger98
Are you referring to the UI (as much as I understand there was an improvement, but generally speaking, it still needs the users to have the S3 credentials in the UI client, not backend)
Or are you asking on the cleanup service ?