Nice ! ๐
btw: clone=True
means creating a copy of the running Task, but basically there is no need for that , with clone=False, it will stop the running process, and launch it on the remote host, logging everything on the original Task.
None
See: Add an experiment hyperparameter:
and add gpu
: True
Hiย
, if you don't mind having a look too,
With pleasure :)
according to the above I was expecting the config to be auto-magically updated with the new yaml config I edited in the UI, however it seems like an additional step is required.. probably connect_dict? or am I missing something
Notice the OmegaConf section description :Full OmegaConf YAML configuration. This is a read-only section, unless 'Hydra/_allow_omegaconf_edit_' is set to True
By default it will alw...
RipeGoose2 That sounds familiar. Could you test with the latest RC?pip install trains==0.16.4rc0
So if I do this in my local repo, will it mess up my git state, or should I do it in a fresh directory?
It will install everything fresh into the target folder (including venv and code + uncommitted changes)
yup, it's there in draft mode so I can get the latest git commit when it's used as a base task
Yes that seems to be the problem, if it is in draft mode, you have no outputs...
Did you meantย
--detached
ย ?
Oops yes sorry you are correct should be --detached ๐
Hmm CourageousLizard33 seems you stumbled on a weird bug,
This piece of code only tries to get the username of the current UID, but since you are running inside a docker and probably set the environment UID but there is no "actual" UID by that number on /etc/passwd , and so it cannot resolve it.
I'm attaching a quick fix, please let me know if it solved the problem.
I'd like to make sure we have it in the next RC as soon as possible.
It manages the scheduling process, so no need to package your code, or worry about building dockers etc. It also has an AWS autoscaler, that spins ec2 instances based on the amount of jobs you have in the execution queue, and the limit of your budget (obviously spinning down machines that are idle)
Regrading the project name:
set_project will support project_name in the next version ๐ project_id=[p.id for p in Task.get_projects() if p.name==project_name][0]
in clearml.conf we could have:azure.storage { max_connections = 10 # containers: [ # { # account_name: "clearml" # account_key: "secret" # # container_name: # } # ] }
Then in AzureContainerConfigurations
:
` @classmethod
def from_config(cls, configuration):
...
class AzureContainerConfigurations(object):
def init(self, container_configs=None, max_connections=None):
...
Basically you create the Task and make sure the "Dataset" is attached to it:task = Task.init(...) dataset = Dataset.create(task=task) dataset.add_files(...)
This will make sure the code is attached to the Dataset
I'm thinking of a few plots in my current in-house tooling which are slightly different than the standard charts we look at. For example a custom parallel coordinate chart that can use aggregations, categorical variables, etc.
This can be done by comparing experiments, then check the Hyper-Parameters tab, and select graph from the drop down at the top
So my question in general is pertaining to if I would need to get better at Javascript if I were to make those changes. My guess is ...
CloudySwallow27 okay essentially this defs file is kind of a user "secret vault" for access credentials, is that correct?
By the way, will downloading still happen if the datasets is available in the cache folder?
If it is cached, then there is no need to re-download ๐
OddAlligator72 I like this idea.
The single thing I'm not sure about is the "function entry point"
Why would one do that? Meaning why wouldn't you have a proper python entry-point.
The reason I'm reluctant is that you might have calls/functions/variables in global scope of the file storing the function, and then users will not know why something broke, ans it will be very cumbersome to debug.
A simple script entry point seems trivial to launch and debug locally.
What do you think ? What woul...
Thanks SolidSealion72 !
Also, I found out that adding "pool.join()" after pool.close() seem to solve the issue in the minimal example.
This is interesting, I'm pretty sure it has something to do with the subprocess not "closing" properly (or too fast or something)
Let me see if I can reproduce
GreasyPenguin14 GrittyKangaroo27 the new release contains a fix, could you verify it solves the issue in your scenario as well (there is now a smart timeout to detect the inconsistent state, that means the close/exit procedure might be delayed (10sec) instead of hanging in these specific rare scenarios)
Hi ShallowCat10
What's the TB your are using?
Is this example working correctly for you?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/tensorboard_pr_curve.py
Hi CluelessFlamingo93
I think the latest clearml-agent 1.5.1 fixed that issue (this is basically pip trying to "protect" you from mismatch packages)
can you upgrade your clearml-agent and test?pip3 install clearml-agent==1.5.1
Hi NastyOtter17
"Project" is so ambiguous
LOL yes, this is something GCP/GS is using:
https://googleapis.dev/python/storage/latest/client.html#module-google.cloud.storage.client
think perhaps it came across as way more passive aggressive than I was intending.
Dude, you are awesome for saying that! no worries ๐ we try to assume people have the best intention at heart (the other option is quite depressing ๐ )
I've been working on a Azure load balancer example, ...
This sounds exciting, let me know if we can help in any way
I find it quite difficult to explain these ideas succinctly, did I make any sense to you?
Yep, I think we are totally on the same wavelength ๐
However, it also seems to be not too prescriptive,
One last question, what do you mean by that?
Hi MelancholyElk85
However, when I clone the pipeline from web UI and launch it once again, it works. Is there a way to bypass this?
In both cases, are you seeing a different behavior on the same machine running the agent (i.e. clonening from the UI vs code) ?