But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me
From the log it installed:cudatoolkit==11.1.1
based on the CUDA it found on the host machine: agent.cuda_version = 110
But for some reason it installed the pytorch from the conda "pytorch" repo without the cuda support.
Hi Team, I'm currently trying to install ClearML-Server on a Powerpc server with RedHat7.
You are a brave man LividCrab90 !
s there dockerfiles for the ClearML-Server stack somewhere ?
The main issue is replacing the DB containers, do you have elastic/mongo/redis for powerpc ?
By default the agent will add the root of the git repository into the pythonpath , so that you can import...
SteepDeer88
Try the following:
` Task.add_requirements("pycocotools-windows", "; platform_system == "Windows"")
Task.add_requirements("pycocotools", "; platform_system != "Windows"")
Task.init(...) You should see in your "installed packages" something like:
pycocotools-windows ; platform_system == "Windows"
pycocotools ; platform_system != "Windows" `
ScantMoth28 where are you seeing this warning ?
I'm wondering what happens if i were to host the instance and one of these were to go down from time to time in production, as the deployments provided by the helm chart are not redundant.
Long story short, it will break the clearml-server, please do not take them down, if you do need to do that, also take down the clearml-server. The python clients will wait until it is up again, so no session would be destroyed
Hi JealousParrot68
This is the same as:
https://clearml.slack.com/archives/CTK20V944/p1627819701055200
and,
https://github.com/allegroai/clearml/issues/411
There is something odd happening in the files-server as it replaces the header (i.e. guessing the content o fthe stream) and this breaks the download (what happens is the clients automatically ungzip the csv).
We are working on a hit fix to he issue (BTW: if you are using object-storage / shared folders, this will not happen)
Nicely found @<1595587997728772096:profile|MuddyRobin9> !
from task pick-up to "git clone" is now ~30s, much better.
This is "spent" calling apt update && update install && pip install clearml-agent
if you have those preinstalled it should be quick
though as far as I understand, the recommendation is still to not run workers-in-docker like this:
if you do not want it to install anything and just use existing venv (leaving the venv as is) and if something is missing then so be it, then yes sure that the way to go
BTW: you can also just add -e "
CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"
to the docker args (under the Execution tab) to override the setting of the docker.
you can also add " export;
" to the docker startup bash script section (do not add "#/bin/bash" , just the actual script) to get a list of all the environment variables inside the docker, just in case
Hi OddShrimp85
right place to ask about clearml serving.
It is π
I did not manage to get clearml serving work with my own clearml server and triton setup.
Yes it should have been updated already, apologies.
Until we manage to sync the docs, what seems to be your issue, maybe we can help here?
Any plans to add unpublished stateΒ for clearml-serving?
Hmm OddShrimp85 do you mean like flag, not being served ?
Should we use archive
?
The publish state, basically locks the Task/Model so they are not to be changed, should we enable unlocking (i.e. un-publish), wdyt?
By the way, will downloading still happen if the datasets is available in the cache folder?
If it is cached, then there is no need to re-download π
Hi @<1523701304709353472:profile|OddShrimp85>
Do you mean Dataset.get_local_copy()
?
When you set the pod make sure you mount the clearml local cache folder to the PV
basically /root/.clearml/cache/
Scheduled training is what Iβm looking forward to
I'll try to remember to update here once we pushed into the GitHub repo, feedback is always appropriated π
If in the next two weeks you hear nothing, please ping here to make sure I did not forget π
does the clearml server is a worker i can serve on models?
The serving is done by one of the clearml-agents.
Basically you spin an agent, then this agent is spinning the model serving engine container (fully managed).
(1) install run run clearml-agent (2) run clearml-session CLI to configure and spin the serving engine
Could you disable the windows anti-virus firewall and test?
Hi ComfortableHorse5
Yes this is more of a suggestion that you should write them using the platform capabilities, the UI implementation is being worked on, as well as a few helpers classes, I thin you'll be able to see a few in the next release π
Seems like a okay clearml.conf
file
Notice this is the error:404
can you curl to this address ? are you sure you have httpS and not http ? was the dns configured ?
No worries π glad to hear it worked out π
Assuming this is a followup on:
https://clearml.slack.com/archives/CTK20V944/p1626184974199700?thread_ts=1625407069.458400&cid=CTK20V944
This depends on how you set it with the clearml-serving --endpoint my_model_entrycurl <serving-engine-ip>:8000/v2/models/my_model_entry/versions/1
Hi BattyLizard6
Not that I'm aware of, which TF version are you using, and which clearml version?
Nice debugging experience
Kudos on the work !
BTW, I feel weird to add an issue on their github, but someone should, this generic setup will break all sorts of things ...
EnviousStarfish54 regrading file server, you have one built into the trains-server, and this will be the default location to store all artifacts. You can also use external solutions like S3 GS Azure etc.
Regarding the models, any model store / load is automatically logged as long as you are using one of the supported frameworks (TF Keras PyTorch scikit learn)
If you want your model to be automatically uploaded, just add outpu_uri:
task=Task.init('examples', 'model', output_uri=' http://trai...
Hi DisturbedWalrus17
This is a bit of a hack, but will work:from clearml.backend_interface.metrics.events import UploadEvent UploadEvent._file_history_size = 10
Maybe we should expose it somewhere, what do you think?