Reputation
Badges 1
25 × Eureka!CooperativeFox72 btw, are you guys running those 20 experiments manually or through trains-agent ?
Can I delete logs from existing experiments on the ClearML server?
Only by resetting the Task (which would delete everything), or deleting the Task iteself.
You can also disable the auto console log, and report manually ?
Task.init(..., auto_connect_streams=False)
Great to hear it got solved. BTW network drives are supported but you have to make sure the mount file system supports locks (NFS does)
Weโd be using https in production
Nice ๐
@<1687653458951278592:profile|StrangeStork48> , I was reading this thread trying to understand what exactly is the security concern/fear here, and I'm not sure I fully understand. Any chance you can elaborate ?
Are you doing from keras import ...
or from tensorflow.keras import
?
And this is with the latest pycharm plugin 1.1.0 ?
`
Example use case:
an_optimizer = HyperParameterOptimizer(
# This is the experiment we want to optimize
base_task_id=args['template_task_id'],
# here we define the hyper-parameters to optimize
hyper_parameters=[
UniformIntegerParameterRange('General/layer_1', min_value=128, max_value=512, step_size=128),
UniformIntegerParameterRange('General/layer_2', min_value=128, max_value=512, step_size=128),
DiscreteParameterRange('General/batch_size', values=[...
Hi LovelyHamster1 ,
you mean totally ignore the "installed packages" section, and only use the requirements.txt ?
Hi @<1523701523954012160:profile|ShallowCormorant89>
This means the system did not detect any "iteration" reporting (think scalars) and it needs a time-series axis for the monitoring, so it just uses seconds from start
So I checked the code, and the Pipeline constructor internally calls Task.init, that means that after you constructs the pipeline object, Task.current_task() should return a valid object....
let me know what you find out
and this link on it's own works?
if it does, open your browser dev tools (ctrl+shift+I on chrome, I think), I'm assuming you will see a few errors on CORS or the alike, paste them here
What do you have in the artifacts of this task id: 4a80b274007d4e969b71dd03c69d504c
So when the agent fire up it get's the hostname, which you can then get from the API,
I think it does something like "getlocalhost", a python function that is OS agnostic
I didn't realise that pickling is what triggers clearml to pick it up.
No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)
Hi WorriedParrot51
Let me shed some light on this complicated mechanism, because this is not very straight forward.
Basically the agent signals the trains package it should ignore the code calls, and use a specific Task in the backend (i.e. if in manual mode, the trains package logs the data into the trains-server, in agent mode (remote mode), it does the opposite and takes the data from the trains-server "into" the code)
Specifically, just like in manual mode, calling argparse.parse is be...
Seems like something is not working with the server, i.e. it cannot connect with one of the dockers.
May I suggest to carefully go through all the steps here, make sure nothing was missed
https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md
Especially number (4)
ElegantCoyote26 It means we need to have a keras logger that logs everything to trains, then we need to hook it automatically.
Do you feel like PR-ing the logger (the hooking I can take care of ๐ )?
Hi @<1691620877822595072:profile|FlutteringMouse14>
In the latest project I created, Hydra conf is not logged automatically.
Any chance the Task.init call is not on the main script (where the Hydra is) ?
[Assuming the above is what you are seeing]
What I "think" is happening is that the Pipeline creates it's own Task. When the pipeline completes, it closes it's own Task, basically making any later calls to Tasl.current_task() return None, because there is no active Task. I think this is the reason that when you are calling process_results(...) you end up with None.
For a quick fix, you can dopipeline = Pipeline(...) MedianPredictionCollector.process_results(pipeline._task)
Maybe we should...
Hi @<1724960468822396928:profile|CumbersomeSealion22>
As soon as I refactor my project into multiple folders, where on top-level I put my pipeline file, and keep my tasks in a subfolder, the clearml agent seems to have problems:
Notice that you need to specify the git repo for each component. If you have a process (step) with more than a single file, you have to have those files inside a git repository, otherwise the agent will not be able to bring them to the remote machine
Then this is by default the free space on the home folder (`~/.clearml') that is missing free space
if in the "installed packages" I have all the packages installed from the requirements.txt than I guess I can clone it and use "installed packages"
After the agent finished installing the "requirements.txt" it will put back the entire "pip freeze" into the "installed packages", this means that later we will be able to fully reproduce the working environment, even if packages change (which will eventually happen as we cannot expect everyone to constantly freeze versions)
My problem...
but not as a component (using the decorator)
Hmm yes, I think that component calling component as an external component is not supported yet
(basically the difference is , is it actually running as a function, or running on a different machine as another pipeline component)
I noticed that when a pipeline step returns an instance of a class, it tries to pickle.
Yes this is how the serialization works, when we pass data from one node to another (by design it supports multiple mach...
No TB (Tesnorboard) is not enabled.
That explains it ๐ did you manage to get it working ?
Hi @<1637624975324090368:profile|ElatedBat21>
I think that what you want is:
Task.add_requirements("unsloth", "@ git+
")
task = Task.init(...)
after you do that, what are you seeing in the Task "Installed Packages" ?
so firs yes, I totally agree. This is why the clearml-serving
has a dedicated statistics module that creates histograms over time, then we push it into Prometheus and connect grafana to it for dashboards and alerts.
To be honest, I would just use it instead of reporting manually, wdyt?
One example is a node that resizes the images, this node receives as input a Dataset and iterates over each image, resizes it an outputs a new Dataset, which is used in the next node downstream in the Pipeline.
I agree, this sounds like a "function" rather than a job, so better suited for Kedro.
organization structureย
ย and see for yourself (this pipeline has two nodesย
train_model
ย andย
predict
ย )
Interesting! let me dive into that and ...