😞 CooperativeFox72 please see if you can send a code snippet to reproduce the issue. I'd be happy to solve the it ...
okay, just so I understand, this is what you have on your client that can connect with the server:api { api_server: web_server: files_server: credentials {"access_key": "KEY", "secret_key": "SECRET"} }
Hi TrickyRaccoon92 , yes the examples folder is a special case, I'm not sure you can directly delete it.
Can you archive individual experiments in it ?
Hi LovelyHamster1
That is a good point, I think the safest / robust way is to configure both to use the same dns name/s so both (internal/external) are accessible.
Some background, the URL itself on the artifact is basically a standalone, once registered on the Task, the UI will not replace it but use it as is (The UI has no "understanding" on which server it is, it will just fetch the file).
Are you also using a diff port on the load balancer ?
(because the easiest fix is on your external ...
SharpDove45 FYI:
if you set the environment variable CLEARML_NO_DEFAULT_SERVER=1 , it will make sure never to default to the demo server
(also im a bit newer to this world, whats wrong with openshift?)
It's the most difficulty Kubernetes flavor to work with 🙂
weve already tried that but it didnt really change ...
Can you provide full log? as well as how you created the pods ?
Could you test if this is working:
https://github.com/allegroai/clearml/blob/master/examples/reporting/matplotlib_manual_reporting.py
instead of terminating them once they are inactive, so that they could be available immediately when they are needed.
JitteryCoyote63 I think you can increase the IDLE timeout on the autoscaler, and achive the same behavior, no ?
DeliciousSeal67 the agent will use the "install packages" section in order to install packages for the code. If you clear the entire section (you can do that in the UI or programmatically) then it will revert to requirementsd.txt
Make sense ?
OutrageousGrasshopper93 is "--gpus all" working ?
ReassuredTiger98
will it then be used by the clearml-agent
Yes, I think that in order to make it work, you have to make sure that the agent is also running with TRAINS_LOG_ENVIRONMENT=MYVAR*
Notice that you can use wildcard or have a list of VARIABLE you allow wither the clearml or the agent to monitor / change.
Okay, progress.
What are you getting when running the following from the git repo folder:git ls-remote --get-url origin
The problem is that clearml installsÂ
cudatoolkit=11.0
 butÂ
cudatoolkit=11.1
 is needed.
You suggested this fix earlier, but I am not sure why it didnt work then.
Hmm , could you test with the clearml-agent 0.17.2 ? making surethis actually solves the problem
Hmm and you are getting empty list for thi one:
server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/"
Hi @<1716987924207112192:profile|CostlyOctopus40>
is opensearch supported in ClearML instead of Elasticsearch ? please shed some light on that
Long story short, maybe?! but this is not officially supported.
We only support elasticsearch, the opensearch fork is not officially supported and since we continue to use more advanced features of Elastic, it might be that the API will not be compatible in the future.
Out of curiosity, why are you using opensearch?
Hi @<1523701868901961728:profile|ReassuredTiger98> when you get to it...
please download the wheel, then install it with
pip3 install -U clearml_agent-0.17.3rc0-py3-none-any.whl
Then run the daemon with the additional --debug argument, basically:
clearml-agent --debug daemon --foreground ...
Once the agent is running please send the Task's log from your console 🙂
The problems comes from ClearML that thinks it starts from iteration 420, and then adds again the iteration number (421), so it starts logging from 420+421=841
JitteryCoyote63 Is this the issue ?
JitteryCoyote63 that makes total sense!!
The reporting subprocess is not being updated with the new value! Let me check how we can pass it along...
I might gave an idea, could you test with:
` from clearml import Task
Task._report_subprocess_enabled = False
...
real code here `
ohh AbruptHedgehog21 if this is the case, why don't you store the model with torch.jit.save and use Triton to run the model ?
See example:
https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch
(BTW: if you want a full custom model serve, in this case you would need to add torch to the list of python packages)
we will try to use Triton, but it’s a bit hard with transformer model.
Yes ...
All extra packages we add in serving)
So it should work, you can also run your preprocess class manually from your own machine (for debugging), if you pass to it a local file (basically the downloaded model file from the UI, it should work
it. But it’s maybe not the best solution
Yes... it is not, separating the pre/post to CPU instance and letting triton do the GPU serving is a lot more effici...
WackyRabbit7 I guess we are discussing this one on a diff thread 🙂 but yes, should totally work, that's the idea
(without having to execute it first on Machine C)
Someone some where has to create the definition of the environment...
The easiest to go about it is to execute it one.
You can add to your code the following linetask.execute_remotely(queue_name='default')This will cause you code to stop running and enqueue itself on a specific queue.
Quite useful if you want to make sure everything works, (like run a single step) then continue on another machine.
Notice that switching between cpu...
How can i get loaded model in Preporcess class in ClearML Serving?
ComfortableShark77
You mean your preprocess class needs a python package or is it your own module ?
So currently there is a limit (from the elasticsearch) of about 10k (anything above the is subsampled)
In the new version we are adding a "maximize" button, then in the full screen you will have the raw data including all ???k samples. sounds good?
https://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console
Hmm try to set this one before spinning the agent
Windowsset PYTHONIOENCODING=:replaceInside Colabos.environ["PYTHONIOENCODING"] = ":replace"