Reputation
Badges 1
25 × Eureka!(BTW: draft means they are in edit mode, i.e. before execution, then they should be queued (i.e. pending) then running then completed)
You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79
BTW: the same hold for tagging multiple experiments at once
Hi @<1559711593736966144:profile|SoggyCow20>
How did you configure the clerml.conf ? see here an example:
None
is this code running inside the Task that is you data processing? Assuming it does check this code, it will fetch the pipeline and then the task you need
previous_task = Task.get_task(
project=Task.current_task().project,
task_name="process_dataset", #use "process_dataset" name from pipe
task_filter={'status': ['completed']})
Notice using the current Tasks project and to make sure you are looking for a component running under the same pipeline
Any chance you actually run the second script with Popen (i.e. calling the python as a subprocess) ?
Hi @<1784754456546512896:profile|ConfusedSealion46>
clear ml server took so much memory usage, especially for elastic search
Yeah that depends on how many metrics/logs you have there, but you really have to have at least 8GB RAM
delete old experiments ?
If I try to connect a dictionary of typeΒ
dict[str, list]
Β withΒ
task.connect
, when retrieving this dictionary with
Wait, this should work out of the box, do you have any specific example?
Oh I see your point, that makes sense, it should check the state of the Task and force it to aborted so it can be renequed, the issue with reset it will clear the previous run execution, which I think we do not want, Wdyt?
Hi @<1523708920831414272:profile|SuperficialDolphin93>
The error seems like nvml fails to initialize inside the container, you can test it with nvidia-smi and check if that wirks
Regrading Cuda version the ClearML serving inherits from the Triton container, could you try to build a new one with the latest Triton container (I think 25). The docker compose is in the cleaml serving git repo. wdyt?
Yep, found it, the --name is marked as required and the argparser throws an error ...
I'll make sure this is fixed as well π
NastySeahorse61 I would try to open in incognito mode (i.e. no cookies etc.), did you also change the address of the server?
Hi MotionlessSeagull22
Hmm I'm not this is possible in the UI.
You can compare multiple experiments and view the images in form of thumbnails one next to the other, But full view will be a single image...
You can however right click on the image and get a direct link, then open a new tab ... :(
Hi @<1545216070686609408:profile|EnthusiasticCow4>
The auto detection of clearml is based on the actual imported packages, not the requirements.txt of your entire python environment. This is why some of them are missing.
That said you can always manually add them
Task.add_requirements("hydra-colorlog") # optional add version="1.2.0"
task = Task.init(...)
(notice to call before Task.init)
is there a way to increase the size of the text input for fields or a better way to handle lists?
No π
Maybe an easier way to use connect_configuration instead ? it will take an entire dict and store it as text (format is hocon, which is YAML/Json compatible, which means it is hard to break when editing)
callbacks.append( tensorflow.keras.callbacks.TensorBoard( log_dir=str(log_dir), update_freq=tensorboard_config.get("update_freq", "epoch"), ) )Might be! what's the actual value you are passing there?
Also can you right click on the image and save it on your machine, see if it is cropped, or it is just a UI issue
CurvedHedgehog15 there is not need for :task.connect_configuration( configuration=normalize_and_flat_config(hparams), name="Hyperparameters", )Hydra is automatically logged for you, no?!
It has to be alive so all the "child nodes" could report to it....
os.environ['TRAINS_PROC_MASTER_ID'] = '1:da0606f2e6fb40f692f5c885f807902a' os.environ['OMPI_COMM_WORLD_NODE_RANK'] = '1' task = Task.init(project_name="examples", task_name="Manual reporting") print(type(task))Should be: <class 'trains.task.Task'>
Correct, and that also means the code the runs is not auto-magically logged.
Notice that you have to Have the task already started by the Master process
BoredHedgehog47 can you test this one? Is it close to your code ?
ResponsiveHedgehong88 so I would suggest using execute_remotely in your code, basically you start locally you make sure everything is passed as intended, then from within the code you call task.execute_remotely(...) which will stop the current process and enqueue the Task on the selected queue for the agent to execute.
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/advanced/execute_remotely_example.py#L127
This way you can both easily test...