Reputation
Badges 1
25 × Eureka!CheekyFox58 what do you have in the plots Tab?
Hi CrookedAlligator14
or is underlying data also accessible?
What do you mean by "underlying data" ?
Hi @<1618056041293942784:profile|GaudySnake67>Task.create
is designed to create an External task not from the current running process.Task.init
is for creating a Task from your current code, and this is why you have all the auto_connect parameters. Does that make sense ?
I have a question regarding running the code on the remote machine, each time I run the code I see the console in the ClearML server start downloading all the libraries I used in the code and when I run another code the same thing happens so why it has to download all the libraries again and many times?
I'm assuming you are referring to the installation, the downloaded python packages are cached.
You can turn on full caching by uncommenting the following line:
https://github.com/alleg...
itβs not implemented right,
I think we forgot to add it as an argument (the query models supports it, but it is not passed to the call)
What do you have under the "installed packages" section? Also you can configure the agent to use poetry to restore the environment (instead of pip)
Hi @<1570220858075516928:profile|SlipperySheep79>
I think this is more complicated than one would expect. But as a rule of thumb, console logs and metrics are the main ones. I hope it helps? Maybe sort by number of iterations in the experiment table ?
BTW: probable better to ask in channel
Hi DilapidatedDucks58
trains-agent tries to resolvethe torch package based on the specific cuda version inside the docker (or on the host machine is if used in virtual-env mode). It seems to fail finding the specific version "torch==1.6.0.dev20200421+cu101"
I assume this version was automatically detected by trains when running manually. If this version came from a private artifactory you can add it to the trains.conf https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L...
MysteriousBee56 that is so weird ... last one, I promise πdocker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo \$(which python3) && echo \$(which trains-agent)"
Yep π
Also maybe worth changing the entry point of the agent docker to always create a queue if it is missing?
GreasyPenguin14 yes there is π
https://github.com/allegroai/clearml/issues/209
Set environment variable CLEARML_NO_DEFAULT_SERVER=1
Could you verify the Task.init call is inside the main function and Not the global scope? We have noticed some issues with global scope calls in some cases
try these values:
os.environ.update({
'CLEARML_VCS_COMMIT_ID': '<commit_id>',
'CLEARML_VCS_BRANCH': 'origin/master',
'CLEARML_VCS_DIFF': '',
'CLEARML_VCS_STATUS': '',
'CLEARML_VCS_ROOT': '.',
'CLEARML_VCS_REPO_URL': '
',
})
task = Task.init(...)
Maybe it's the Azure upload that has a weird size bug?!
That is correct.
Obviously once it is in the system, you can just clone/edit/enqueue it.
Running it once is a mean to populate the trains-server.
Make sense ?
GrittyHawk31 by default any user can login (i.e. no need for password), if you want user/password access:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config/#web-login-authentication
Notice no need to have anything else in the apiserver.conf
, just the user/pass section, everything else will just be the default values.
So the agent installed okay. It's the specific Task that the agent is failing to create the environment for, correct?
if this is the case, what do you have in the "Installed Packages" section of the Task (see under the Execution tab)
What does spin mean in this context?
This line:docker-compose --env-file example.env -f docker-compose-triton-gpu.yml up
But these have: different task ids, same endpoints (from looking through the tabs)
So I am not sure why they are here and why not somewhere else
You can safely ignore them for the time being π
but is it true that I can have multiple models on the same docker instance with different endpoints?
Yes! this is exactly the idea (and again I'm not sure ...
still it is a chatgpt interface correct ?
Actually, no. And we will change the wording on the website so it is more intuitive to understand.
The idea is you actually train your own model (not chatgpt/openai) and use that model internally, which means everything is done inside your organisation, from data through training and ending with deployment. Does that make sense ?
Hi WickedGoat98
Regardless on the ingress configuration (which seems like you have the hang of), the API instance itself needs to be configured with persistent volume (the web / file server do not need direct access to the API server).
Can you get the API to run properly ?
Regrading the trains-agent
once you have the API/Web/File server configured, you can configure it like the trains-agent-services is configured inside the docker-compose (e.g. set the environment variable with the c...
Also, can the image not be pulled from dockerhub but used from the local build instead?
If you have your docker configured to pull from local artifactory, then the agent will do the same π (it is calling the docker command just like you do)
agent.default_docker.arguments: "--mount type=bind,source=$DATA_DIR,target=/data"
Notice that you are use default docker arguments in the example
If you want the mount to always be there use extra_docker_arguments :
https://github.com/...
Hmm can you try with additional configuration, next to "secure: true" in your clearml.conf, can you add "verify: false"
can we also put the path to the CA?
Yes :)
Hi @<1727497172041076736:profile|TightSheep99>
I think you are correct! it will use the internal individual file upload retry but does not let you control it.
Could you please open a github issue so that we do not forget to add it?
data["encoded_lengths"]
This makes no sense to me, data is a numpy array, not a pandas frame...
EnviousStarfish54
plt.show will capture the figure, that if you call it multiple times, it will add a running number to the figure itself (because the figure might change, and you might want the history)
if you call plt.imshow, it's the equivalent of debug image, hence it will be shown in the debug-samples tab, as an image.
Make sense ?
I'm looking into the savefig issue, meanwhile you can disable the popup by adding at the top of your code the following:import matplotlib matplotlib.rcParams['backend'] = 'agg' import matplotlib.pyplot matplotlib.pyplot.switch_backend('agg')