Ohh if this is the case, you might also consider using offline mode, so there is no need for backend
https://clear.ml/docs/latest/docs/guides/set_offline#setting-task-to-offline-mode
Oh that makes sense:
` # Create a child process
using os.fork() method
pid = os.fork()
if pid > 0 :
# pid greater than 0 represents
# the parent process
print("I am parent process:")
print("Process ID:", os.getpid())
print("Child's process ID:", pid)
else :
# pid equal to 0 represents
# the created child process
print("\nI am child process - this is still fully auto logged")
print("Process ID:", os.getpid())
print("Parent's process ID:", o...
task.mark_completed()
You have that at the bottom of the script, never call it on yourself, it will kill the actual process.
So what is going on you are marking your own process for termination, then it terminates itself leaving the interpreter and this is the reason for the errors you are seeing
The idea of mark_* is to mark an external Task, forcefully.
By just completing your process with exit code (0) (i.e. no error) the Task will be marked as completed anyhow, no need to call...
JealousParrot68 yes this seems like a correct description.
The main diff between 1 & 2 is what is the actual data, if this is training/testing data, then Dataset would make sense, if this is a part of a preprocessing pipeline, then artifacts make more sense (notice we added pipeline step caching in the artifacts, so that you can reuse steps if they have the same parameters/code, which means you are able to clone a pipeline and rerun without repeating unnecessary data processing.
Hi @<1635088270469632000:profile|LividReindeer58>
You mean the clearml.conf?
You can do:
from clearml.config import config_obj
you should have the entire configuration file as an object (dict interface)
fyi: under the hood it uses pyHOCON
repeat it until they are all dead 🙂
Hi NonsensicalSeaanemone47
I'm assuming you mean k8s as compute cluster?
If so, then yes clearml adds priority scheduling on top of your existing kl8s cluster. It also allows you to reuse images as the k8s spins the base container image and then inside the container image the agent sets the environment of the experiment (clones code, apply diff, install missing python packages etc.)
It also gives visibility into the executed pods.
Make sense ?
btw: I'm assuming that args is not the ArgParser object, as the ArgParser is automatically "connected" ?
EnviousStarfish54 regrading file server, you have one built into the trains-server, and this will be the default location to store all artifacts. You can also use external solutions like S3 GS Azure etc.
Regarding the models, any model store / load is automatically logged as long as you are using one of the supported frameworks (TF Keras PyTorch scikit learn)
If you want your model to be automatically uploaded, just add outpu_uri:
task=Task.init('examples', 'model', output_uri=' http://trai...
So that agent on different nodes will probably require different cuda-version images.
That makes sense SarcasticSquirrel56
I would edit the helm chart (or deploy manually) based on a selector that will select the different nodes/gpus and assign the correct containers (i.e. matching CUDA versions to the diff GPUs / drivers)
BTW: you can also playaround with k8s glue, which would dynamically spin pods based on clearml Tasks.
wdyt?
So the naming is a by product of the many TB created (one per experiment), if you add different naming ot the TB files, then this is what you'll be seeing in the UI. Make sense ?
I would expect that after calling Task.enqueue(exit=True), the local task is closed and no processes related to it is running
Ohh my apologies, I did not understand that.
Are you saying that locally you call task.remote_execute(exit_process=True) and it does not leave the local process ?
Specifically for this one, this is the auto generated docstring from the actual code, so PR to the
https://github.com/allegroai/clearml/blob/e53a76b713910adaf87578c69e86f8154d4ab4c1/clearml/logger.py#L152
Hi RoundMosquito25
Hmm I remember this is tricky ... What's the clearml version? also where is the line you had to hack ?
To be honest, I'm not sure I have a good explanation on why ... (unless on some scenarios an exception was thrown and caught silently and caused it)
unless the domain is different
?
Imagine that you are working with both github and bitbucket for example, if you are using git-ssh than git will know which of the domains to send the key to. Currently there is a single user/pass entry so all domains will get the same credentials. But I think this is a rare use case.
Will they get ordered ascending or descending?
Good point, I'll check the docs... but I think they do not specify
https://clear.ml/docs/latst/docs/references/sdk/task#taskget_tasks
From the code it seems the ordered is not guaranteed.
You can however pass '-last_update' : order_by which will give you the latest updated first
` task_filter = {
'page_size': 2,
'page': 0,
'order_by': ['last_metrics.{}.{}'.format(title, series), '-last_update']
}
Task.get_tasks(...
Hi GiganticTurtle0
Is there a simple way to make
Task.init
compatible with
Dask.distributed
client?
Please tell me more 🙂
I think Dask is trying to pickle you Task object (which is not pickable).
You can however create the Task once with Task.init
and pass the Task ID to the child processes and then use Task.init(..., continue_last_task=task_id_here)
wdyt?
The log is missing, but the Kedro logger is print to sys.stdout in my local terminal.
I think the issue night be it starts a new subprocess, and that subprocess is not "patched" to capture the console output.
That said if an agent is running the entire pipeline, then everything is logged from the outside, so whatever is written to stdout/stderr is captured.
FiercePenguin76 in the Tasks execution tab, under "script path", change to "-m filprofiler run catboost_train.py".
It should work (assuming the "catboost_train.py" is in the working directory).
Hi DefeatedCrab47
You mean by trains-agent, or accumulated over all experiences ?
JitteryCoyote63 could you test with rc3 ?
@<1523701083040387072:profile|UnevenDolphin73> it's looking for any of the files:
None
Yes, in tandem with the experiments (because they constantly log to the server).
That said, with 0.16 we added offline mode, so you can run in offline mode, then import the experiment into the system.
SubstantialElk6
Hmm do you have torch in the "installed packages" section of the Task ?
(This what the agent is using to setup the environment inside the docker, running as a pod)
Sure set os environment 'CLEARML_NO_DEFAULT_SERVER=1`