Reputation
Badges 1
25 × Eureka!The latest image seems to require drivers on the host 460+
try this one:
https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel_20-12.html#rel_20-12
Hi JitteryCoyote63 , I cannot reproduce it... when I call set initial iteration 0, it does what I'm expecting, and resend the scalar. I tested with the clearml ignite example, any thoughts on how I can reproduce?
Hi StaleHippopotamus38
I imagine I could make the changes specified in the warning toΒ
/etc/security/limits.conf
Yep seems like elastic memory issue, but I think the helm chart takes care of it,
You can see a reference in the docker compose:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L41
Hi CleanPigeon16
Yes there is, when you are cloning the pipeline in the UI, go to the Configuration/Pipeline/continue_pipeline and change it to True
Hi @<1546303293918023680:profile|MiniatureRobin9> could it be the pipeline logic is created via the clrarml-task CLI? If this is the case, I think this is an edge case we should fix. Basically it creates a Task instead of pipeline, which in.essence only effects the UI. To solve it, just run the pipeline locally, notice that by default when you start it, it will actually stop the local run and relaunch itself on an agent.
Also, could you open a GitHub issue so we add a flag for it?
python k8s_glue_example.py --helpTo get all the commands for configurations
You should probably pass a few :)
We actually added a specific call to stop the local execution and continue remotely , see it here: https://github.com/allegroai/trains/blob/master/trains/task.py#L2409
Hi @<1664079296102141952:profile|DangerousStarfish38>
You mean spin the agent on multiple Windows machines? Yes that is supported, I think that it is limited to venv (i.e. not docker) mode, but other than that should work out of the box
I can add files to the data set, even after I finish the experiment?
Correct
https://clear.ml/docs/latest/docs/clearml_data#creating-a-dataset
https://clear.ml/docs/latest/docs/guides/data%20management/data_man_cifar_classification
https://github.com/allegroai/clearml/blob/master/docs/datasets.md#create-dataset-from-code
(If you are running the trains-agent with the exact same command, I (think) you will get the same worker_id in which you will end up with something similar to what you describe)
To solve it add TRAINS_WORKER_NAME="new_unique_name" trains-agent ...
I think we resolve it automatically, but based on your description it looks like we use the same worker name/id multiple times ...
Hi EnviousStarfish54
Color coding on the entire UI is stored per user (I think that on your local cookies, but I might be wrong). Anyhow any title/series combination will have the select color regardless of the project.
This way you can configure once that loss is red and accuracy is green, etc.
EnviousStarfish54 Yes i'm not sure what happens there we will have to dive deeper, but now that you got us a code snippet to reproduce the issue it should not be very complicated to fix (I hope π€ )
ShallowCat10 so you mean like meta-data on top of the image? or another level of title series ?
because, the iteration field itself is an integer...
Sorry found the code on the Task, duh π
` # get_ipython().magic('pip install clearml')
import clearml
from clearml import Task
task = Task.init(project_name='examples', task_name='test param', reuse_last_task_id=False)
param = {
'tuple_double_quotes_r': (r"value\blah", 1),
'tuple_double_quotes': ("value\blah", 1),
'tuple_single_quotes': ('value\blah', 1),
"double_quotes_r": r"value\blah",
'double_quotes': "value\blah",
'single_quotes': 'value\blah'
...
BoredGoat1
Hmm, that means it should have worked with Trains as well.
Could you run the attached script, see if it works?
DeliciousBluewhale87 Yes I think so, do notice that you might end up with maximum of 12 pods.
You can also do the following with max 10 nodes: (notice --queue can always get a list of nodes it will pull based on the order of the queues)python k8s_glue_example.py --queue high_priority_q low_priority_q --ports-mode --num-of-services 10
My bad I wrote refresh and then edited it to the correct "reload" π
BattyLion34 the closest I can think of the is monitoring class that can easily be extended.
Datasets are a type of Task, so we can monitor a project and trigger an action when we see a change in number of Tasks/Datasets that are completed.
Monitoring class:
https://github.com/allegroai/clearml/blob/master/clearml/automation/monitor.py
Monitoring example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py
I think a dataset monitoring example wil...
Thanks @<1523702652678967296:profile|DeliciousKoala34> I think I know what the issue is!
The container has 1.3.0a and you need 1.3.0 this is why it is re-downloading (I'll make sure the agent can sort it out, becuase this is Nvidia's version in reality it should be a perfect match)
Hi JitteryCoyote63 a few implementation details on the services-mode, because I'm not certain I understand the issue.
The docker-agent (running in services mode) will pick a Task from the services queue, then it will setup the docker for it spin it and make sure the Task starts running inside the docker (once it is running inside the docker you will see the service Task registered as additional node in the system, until the Task ends) once that happens the trains-agent will try to fetch the...
yes ...
What's your use case for passing an empty dict ? (meaning how would one use it later)
Are there any services OOB like this?
On the open-source, I can't recall any but will probably be easy to write. Paid tier might have an offering though, not sure π
PYTHONPATH is still not working as expected
inside your code if you do :import os print("PYTHONPATH", os.environ["PYTHONPATH"])what are you getting?
I reached over 1M API calls in about one week using clearml-serving
Oh that makes sense now π
If I remember correctly, adding an additional model to a signal clearml-serving instance should not actually change the number of API calls, they are mostly affected by the number of clearml-serving / containers and not in the number of models.
remote repository
's lock file.
Which file is that? the poetry lock of the internal VCS lock (the agent itself)
How do you run theΒ
clearml-agent
Β in docker mode
clearml-agent --docker
See here:
https://clear.ml/docs/latest/docs/clearml_agent#docker-mode
Maybe the configuration file changed?
None
The logic is if the name and project are the same, and there are no artifacts/models, and the last time it was created was under 72 hours, reuse the Task
Check the links that are generated in the ui when you upload an artifact or model