Reputation
Badges 1
25 × Eureka!RipeGoose2 you can put ut before/after the Task.init, the idea is for you to set it before any of the real training starts.
As for not effecting anything,
Try to add the callback and just have it returning None (which means skip over the model log process) let me know if this one works
Hi NastyFox63
What do you mean not all of them are shown?
Do they have diff series/titles, are they plots or scalars ? How are you reporting them ?
I see now.
Let's assume you know which snapshot that was:
` prev_task = Task.get_task(task_id='the_first_training_task_id')
get the second from last checkpoint
task.models['output'][-2].url
prev_scalars = prev_task.get_reported_scalars()
new_task = Task.init('example', 'new task')
logger = new_task.get_logger()
do some fpr loop and report the prev_scalars with logger.report_scalars
new_task.flush(wait_for_uploads=True)
new_task.set_initial_iteration(22000)
start the train `
Basically just change the helm yamlqueue: my_second_queue_name_here
Hi @<1561885921379356672:profile|GorgeousPuppy74>
Please use threads to ask questions, so we keep everything tidy
(and if you can please remove your first message, and merge it with the above one, this one and edit this one, for better readability)
regrading the issue, you need to either have clearm.conf in your Home folder, I'm assuming thisis /root/
not /home/ubuntu/.
Also not sure why you need to expose ports...
you can also just create a venv and run the tests there (with the latest python package) ?
dataset catalogue as advertised.
Creating the Dataset on ClearML, is the catalog, you can move datasets around, put in sub-folders add tags add meta-data, search etc. I think this qualifies as a dataset catalog , no?
EnviousStarfish54 are those scalars reported ?
If they are, you can just do:task_reporting = Task.init(project_name='project', task_name='report') tasks = Task.get_tasks(project_name='project', task_name='partial_task_name_here') for t in tasks: t.get_last_scalar_metrics() task_reporting.get_logger().report_something
So if everything works you should see "my_package" package in the "installed packages"
the assumption is that if you do:pip install "my_package"
It will set "pandas" as one of its dependencies, and pip will automatically pull pandas as well.
That way we do not list the entire venv you are running on, just the packages/versions you are using, and we let pip sort the dependencies when installing with the agent
Make sense ?
Hi WickedGoat98 ,
I think you are correct π
I would guess it is something with the ingress configuration (i.e. ConfigMap)
SmarmyDolphin68
Debug Samples tab and not the Plots,
Are you doing plt.imshow ?
Also make sure you have report_image=False when calling the report_matplotlib_figure
(if it is true it will upload it as an image to "debug samples")
I think you are correct π Let me make sure we add that (docstring and documentation)
Okay found the issue, to disable SSL verification global add the following env variable:CLEARML_API_HOST_VERIFY_CERT=0(I will make sure we fix the actual issue with the config file)
Now, when I add delta to calculate the variation of this: error: bad_data: 1:110: parse error: ranges only allowed for vector selectors
This means your avg is already a scalar (i.e. not a vector) which means you can (as you said) have the alert based on that
Should be under Profile -> Workspace (Configuration Vault)
GiganticTurtle0 I know that the UI is optimizing the display so it does not push all the parameters, but does so based on the scroll. Are you saying there is a bug on the logic? If so, how do I reproduce?
But from your other answer, I think I'm understanding that you
can
have multiple agents on a single instance listening to the same queue.
Correct
So we could maybe initialize 4 instances of the agent on a single EC2 instance which would allow us to handle a higher volume of small batches concurrently without tying up the entire instance.
Correct (that said I do not understand how come a single Task does not utilize the CPU, I was under the impression it is run...
Hi GrittyHawk31
but it could not connect to the grafana dashboard through port 3000, is there any particular reason for that? I may have missed something.
Did you run the full docker-compose.yml ?
Are you able to curl to the endpoints ?
Task.enqueue will execute immediately, i need execute task to spesific time
Oh I see what you mean, trigger -> scheduled (cron alike) -> Task executed.
Is that correct?
Hi VexedCat68
Are we talking youtubes ? docs? courses ?
yes i can communicate with the server, i managed to put tasks in the queue and retrieve them as well as running tasks with metrics reporting
Through the UI or python code ?
UnevenDolphin73 you mean the clearml-server helm chart ?
Hi @<1547028116780617728:profile|TimelyRabbit96>
Notice that if running with docker compose you can pass an argument to the clearml triton container an use shared mem. You can do the same with the helm chart
(ignoring still having to fix the problem withΒ
LazyEvalWrapper
Β return values).
fix will be pushed post weekend π
such as displaying the step execution DaG in the PLOTS tab .Β (edited)
Wait, what are you getting on the DAG plot ? I think we "should" be able to see all the steps
Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient
client = APIClient()
queue_ids = client.queues.get_all(name="queue_name_here")
while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...
Okay we have something π
To your clearml.conf add:agent.docker_preprocess_bash_script = [ "su root", "cp -f /root/*.conf ~/", ]Let's see if that works