GiganticTurtle0
If there are several tasks running concurrently, which task should
Task.current_task()
return? (
How could you have that ?
Per process, there is one Main current Task (until you close it).
Are you referring to a pipeline with multiple steps ?
If this is the case, task.current_task will return the Task of the component (if executed form the component) and the pipeline (if called from the pipeline logic function).
Notice we added the ability to s...
oh, then this is user/pass (pass is the same as app key / secret)
None
Hmm that is odd, can you send an email to support@clear.ml ?
Hi @<1544853721739956224:profile|QuizzicalFox36>
http:/34.67.35.46:8081/...
notice there is a / missing in the link, how is that possible? it should be http://
Hi ColossalDeer61 ,
My question is about existing monitors in the trains-server (preferably the web UI)
So the idea is you run the code once, it creates a Task in the system and verifies the Slack credentials are working. then you can enqueue it in the "services", and voila, you have a monitoring service running, that you can control from the UI and creates alerts to Slack. unfortunately there is no built-in way to achieve that in the UI. but it should not take more than a few minute...
The function
a delete request with a
raise_on_errors=False
flag.
Are you saying we should expose raise_on_errors it to _delete_artifacts() function itself?
If so, sure seems logic to me, any chance you want to PR it? (please just make sure the default value is still False so we keep backwards compatibility)
wdyt?
But you can get that directly, Task.get_task(...).artifacts[name].url , no? Am I missing something?
Hi IntriguedRat44
Sorry, I missed this message...
I'm assuming you are running in manual mode (i.e. not through the agent), in that case we do not change the CUDA_VISIBLE_DEVICES.
What do you see in the resource monitoring? Is it a single GPU or multiple GPUs?
(Check the :monitor:gpu in the Scalar tab under results,)
Also what's the Trains/ClearML version you are suing and the OS ?
Failing when passing the diff to the git command...
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
I think I understand what's going on, in order for the pipeline logic to be "aware" of the pipeline component, it needs to be declared in the pipeline logic script file (or scope if you will).
Try to import from src.testagentcomponent import step_one also in the global pipeline script (not just inside the function)
(as i see the services worker is only in the services-queue, and not my default queue (where my other servers/workers are)
So basically the service-mode is just a flag passed to the agent, and the services queue is the name of the queue it will pull from.
If i want a normal worker also
You can just add another section to the docker-compose, or run it manually after you spin the docker-compose.
LazyFox65 wdyt ?
BitingKangaroo95 can you post here the entire console output of clearml-session (including full command line) ?
What's the python, torch, clearml version?
Any chance this can be reproducible ?
What's the full error trace/stack you are getting?
Can you try to debug it to where exactly it fails here?
https://github.com/allegroai/clearml/blob/86586fbf35d6bdfbf96b6ee3e0068eac3e6c0979/clearml/binding/import_bind.py#L48
RoughTiger69 wdyt?
clearml python version: 1.91
could you upgrade to 1.9.3 and try?
Minio is on the same server and the 9000 and 9001 ports are open for tcp
just to be clear, the machine that runs your clearml code can in fact access the minio on port 9000 ?
I tested with the latest and everything seems to work as expected.
BTW: regrading "bucket-name" , make sure it complies with the S3 standard, as a test try to change it to just "bucket" bi hyphens
. Can I get gpu usage over time frame via API also?
task.get_reported_scalarsBut this will get you All the scalars, I think the next version of the server supports asking a specific one as well.
How are you implementing the alert monitoring?
Is is a stateless process starting every X min, or is it a state-full process running and monitoring ?
EnviousStarfish54 thanks again for the reproducible code, it seems this is a Web UI bug, I'll keep you updated.
SmallDeer34 No worries, I'm happy to hear the issue disappeared 🙂
Hi TrickySheep9
So basically the idea is you can quickly code a scheduler with your own logic, then launch is on the "services queue" to run basically forever 🙂
This could be a good example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py
https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py
Hi @<1668065560107159552:profile|VivaciousPenguin20>
I think you are looking at the wrong experiment, this is a 3 year old experiment ? this does not seem to be your currently executed experiment, right?
Hi @<1547028031053238272:profile|MassiveGoldfish6>
What is the use case? the gist is you want each component to be running on a different machine. and you want to have clearml do the routing of data and logic between.
How would that work in your use case?
Hi DrabOwl94
I think that if I understand you correctly you have a Lot of chunks (which translates to a lot of links to small 1MB files, because this is how you setup the chunk size). Now apparently you have reached the maximum number of chunks per specific Dataset version (at the end this meta-data is stored in a document with limited size, specifically 16MB).
How many chunks do you have there?
(In other words what's the size of the entire dataset in MBs)
that clearml-agent needs to be installed from system python mentioned anywhere in the docs, if not I suggest it gets added.
You are right, I will check and fix if not 🙂
Thank you so much for helping.
My pleasure
GiganticTurtle0 what's the Dataset Task status?
Can you do the following
Clone the Task you previously sent me the installed packages of, then enqueue the cloned task to the queue the agent with the conda.
Then send me the full log of the task that the agent run
GreasyPenguin14 I think this is what you are looking forTask.get_project_id('project_name')
No worries, just wanted to make sure it doesn't slip away 😉
I can install clearml and clearml-agemt and run the worker inside a docker
oh I see, you should install it inside a docker, then mount the docker socket so it can spin sibling containers , ans lastly make sure the mounts are correct with this env variable:
None
Hi GreasyRaven35
You should set the output_uri, in Task init, it will auto upload the model, and register the remote location URLtask = Task.init(..., output_uri=True)You can also specify a target bucket, if you configured credentials (e.g. output_uri=" s3://bucket ")
It's the safest way to run multiple processes and make sure they are cleaned afterwards ...