You can always get the running task (the pipeline in your case) with Task.current_task().task_id
BTW, what about running trains-agent in docker mode? That can solve all your cuda issues
Hi SmugSnake85 , Did you upload the data with clearml-data?
Hi RattySeagull0 ,
Can you try quote the gpus numbers? like --gpus "0,1"
?
looks the same issue as https://github.com/allegroai/trains-agent/issues/35
Hi GiganticTurtle0 ,
My favorite is ps -ef | grep clearml-agent
and after kill -9 <agent pid>
in your configuration, you have agent.package_manager.type = conda:wq
. can you try running it with conda
?
after changing your ~/clearml.conf
file, you need to restart the agent .
Hi UnevenDolphin73 ,
If the ec2 instance is up and running but no clearml-agent is running, something in the user data script failed.
Can you share the logs from the instance (you can send in DM if you like)?
The
report_scalar
feature creates a plot of a single data point (or single iteration).
UnevenDolphin73 thats how I would use it. with it you can compare between tasks and compare the results. You can also add it to the project view and filter with it too:
You can configure env vars in your docker compose, but what is your scenario? Maybe there are some other solutions
SmugTurtle78
by the end of this week
Hi LazyFish41 ,
You can use agent.docker_init_bash_script
to execute any command at the startup of any docker, so you can use it to install the python version you want to use.
You can specify Set the python version to use when creating the virtual environment and launching the experiment with agent.python_binary
ArrogantBlackbird16 the file.py
is the file contains the Task.init
call?
not sure I’m getting the flow, if you just want to create a template task in the system, clone and enqueue it, you can use task.execute_remotely(queue_name="my_queue", clone=True)
,can this solve the issue?
Hi TeenyFly97 ,
With task.close()
the task will do a full shutdown process. This includes repo detection, logs, metrics and artifacts flush, and more. The task will not be the running task anymore and you can start a new task.
With task.mark_stopped()
, the task logs will be flushed and the task will mark itself as stopped
, but will not perform the full shutdown process, so the current_task
will still be this task.
For example:
` from trains import Task
task = Task.in...
ArrogantBlackbird16 can you send a toy example so I can reproduce it my side?
maybe I missed something here, each process also create an agent to run the task with?
Hi @<1523704198338711552:profile|RoughTiger69> ,
there is a pipeline example that gets the input from previous tasks in the pipeline, can this help?
Hi TrickySheep9 . not sure I get that, you want to remove the warning?
you can this description as the preview, can this help?
task.upload_artifact(name='artifact name', artifact_object=artifact_object, preview="object description")
Hi LazyFish41 , You can specify the pip version in the agent’s configuration file: https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L57
the ClearML agent will install pip
Hi SarcasticSnake58 ,
The connection to the server is in the api part in your ~/clearml.conf
file. I think you don’t have such file in your running container, so you are directed to the demoapp server.
If you are running the container from your local machine, you can start the container with the local ~/clearml.conf
, add -v "~/clearml.conf":"/root/clearml.conf"
to the docker command. You can also connect the docker and create a new configuration file with ` clearml-init...
So if you have ~/trains.conf
work with it, if you don't, work offline?
For data versioning you can use the ClaerML data managemant.
Its being done with the CLI, an easy installation and you are ready to go, you can view a full example in this link - https://github.com/allegroai/clearml/blob/master/docs/datasets.md , including the installation.
Every task in ClearML includes the git repo, the changes and the full running environment.
You have some more cool things you can use (like pipelines, HPO, ClearML task CLI and more), you can find all of them here - ht...
yes, you could also use the container’s SETUP SHELL SCRIPT
and run command to install your python version (e.g.sudo apt install python3.8
for example)
TrickyFish46 I tried https://clear.ml/ and http://app.clear.ml and both works for me, which one is causing you the issues?
Not sure I’m getting the all system but for:
I want to have a CI/CD pipeline that, upon Engineer A commit, ensures that the pipeline is re-deployed such that with Engineer B uses it as template, it’s definitely the latest version of the code and process
You can configure your task to take the latest from a branch, so on each commit you are updated.
I am using MinIO current with trains=0.16.4, if this will still be an valid option for ClearML (>0.17.0)?
Yes, you can still use it (and many other options, include local and cloud storage),
Hi, what’s storage ClearML is using? I am considering self-host clearML on a Cloud machine, while separated the data storage with our own machine (reduce network cost)
You can choose the one that works for you (Shared folder, S3, GS, Azure, Http).
You can add something like:
import os from trains.backend_config.defs import LOCAL_CONFIG_FILES if not os.path.exists(LOCAL_CONFIG_FILES[0]): Task.set_offline(offline_mode=True)
notice that LOCAL_CONFIG_FILES is a list
if the conf file doesnt exists, turn on offline mode (will save you task locally).