Reputation
Badges 1
25 × Eureka!Can you send the console output of this entire session please ?
I'm not sure on the frequency it updates though
Essentially the example provide just prints out ids to the log file,
What do mean?
Hi PompousParrot44
Well this kind of control is tricky. If you don't mind processes "fighting over cpu" you can just spin two trains-agents in cpu-mode. It will work as long as they have a different TRAINS_WORKER_NAME
The other option (might be a bit of an overkill) is to use K8s, which will set the CPU % for the entire agent.
What do you think?
you need to set
CLEARML_DEFAULT_BASE_SERVE_URL:
So it knows how to access itself
he problem is due to tight security on this k8 cluster, the k8 pod cannot reach the public file server url which is associated with the dataset.
Understood, that makes sense, if this is the case then the path_substitution feature is exactly what you are looking for
@<1651395720067944448:profile|GiddyHedgehong81> just to be clear, Dataset.get_local_copy returns a path to your files,
You have to Manually add the additional path to the specific files you need to use. It does Not know that in advance.
That was the initial issue you had, and I assume it is the same one here. does that make sense ?
And command is a list instead of a single str
"command list", you mean the command argument ?
When using the UI with regex to search for experiments, due to the greedy nature of the search, it consistently pops up the "ERROR Fetch Experiments failed" window when starting to use groups in regex (that is, parentheses of any kind).
hmm that is a good point (i.e. only on enter it would actually search)
Could it be updated so that if an invalid regex pattern is given, it simply highlights the search bar in red (or similar) rather than stop us while writing the search pattern?
...
Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient
client = APIClient()
queue_ids = client.queues.get_all(name="queue_name_here")
while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...
Thanks EnviousStarfish54
Let me check if I can reproduce it
And is Task.init called on all processes ?
Thanks EnviousStarfish54 we are working on moving them there!
BTW, in the mean time, please feel free to open GitHub issue under train, at least until they are moved (hopefully end of Sept).
Hi @<1533619725983027200:profile|BattyHedgehong22>
Can you elaborate ? what do you mean params file ?
Is this something like:
Task.current_task().connect_configuration('my_conf.json', name="my conf file")
I was trying to do exactly as you mentioned setting the environment variableΒ
before
Β any trains import but it didn't work
In your entry point script, (even if you do not call trains/ Task.init ) add:import os os.environ['TRAINS_CONFIG_FILE']='~/my_new_trains.conf' import trainsThen when you actually import trains, everything is already set and it will not read the configuration again.
Make sense ?
Hi @<1523701295830011904:profile|CluelessFlamingo93>
What do you mean? what's the difference between ClearML server and self hosted? both are self hosted no?
Hi @<1523701304709353472:profile|OddShrimp85>
there anywhere I could get a charr that can work with lower version of k8s? Or any other methods?
I think the solution is to install it manually from the helm chart (basically take it out and build a Job YAML, wdyt?
Hmm interesting, will pass it along to FE π 3. That is nice! I wonder if this is built into the graph library
Hmm so the Task.init should be called on the main process, this way the subprocess knows the Task is already created (you can call Task.init twice to get the task object). I wonder if we somehow can communicate between the sub processes without initializing in the main one...
Hi VexedElephant56
Yes it is:
Define CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
(if running in doecker mode add -e CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1 as container args)
https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_env_var
Hi GracefulDog98
Any guess why the password is "incorrect" for me?
Basically the clearml-session CLI needs to be able to access (SSH) into the host (cleaml-agent) machine,
is that possible?
Yep, basically this will query the Task and get the last one:
https://github.com/allegroai/clearml/blob/ca70f0a6f6d52054a095672dc087390fabf2870d/clearml/task.py#L729
Notice task_filter allows you do do all sorts of filtering
https://github.com/allegroai/clearml/blob/ca70f0a6f6d52054a095672dc087390fabf2870d/clearml/task.py#L781
Thanks @<1630377234361487360:profile|RoughSeaturtle43>
server certificate verification failed. CAfile: none CRLfile: none
Oh I see this is an https issue inside the container, you need to mount your self signed certificate
add something like that to your agent.conf:
extra_docker_arguments: ["-v", "/path/to/cert.pem:/etc/ssl/certs/myca.pem"]
GiddyTurkey39
BTW: you can always add the missing package via code:Task.add_requirements('torch', optional_version)
WARNING:root:Could not lock cache folder /home/ronslos/.clearml/venvs-cache: [Errno 11] Resource temporarily unavailable
Hi @<1549927125220331520:profile|ZealousHare78>
could it be you are also working on the same machine ? are you running the agent in docker mode or venv mode ?
no, at least not yet, someone definitely needs to do that though haha
Currently all the unit tests are internal (the hardest part is providing server they can run against and verify the results, hence the challange)
For example, if ClearML would offer a
TestSession
that is local and does not communicate to any backend
Offline mode? it stores everything into a folder, then zips it, you can access the target folder or the zip file and verify all the data/states