
Reputation
Badges 1
25 × Eureka!ShallowCat10 Thank you for the kind words 🙂
so I'll be able to compare the two experiments over time. Is this possible?
You mean like match the loss based on "images seen" ?
this results at the end of an experiment in an object to be saved under a given name regardless if it was dynamic or not?
Yes, at the end the name of the artifact is what it will be stored under (obviously if you reuse the name you basically overwrites the artifact)
Hi ScaryKoala63
Which versions are you using (clearml / lightning) ?
How can i make it such that any update to the upstream database
What do you mean "upstream database"?
Hi FrothyShark37
is the task scheduler only acessible through the SDK?
yes, in the open source version this is strictly code based. I know the enterprise tier has a UI for it, but in terms of features I believe this is equivalent
HealthyStarfish45
No, it should work 🙂
LittleShrimp86 what do you have in the Configuration Tab of the Cloned Pipeline?
(I think that it has empty configuration -> which means empty DAG, so it does nothing and leaves)
GrievingTurkey78 short answer no 😞
Long answer, the files are stored as differentiable sets (think changes set from the previous version(s)) The collection of files is then compressed and stored as a single zip. The zip itself can be stored on Google but on their object storage (not the GDrive). Notice that the default storage for the clearml-data is the clearml-server, that said you can always mix and match (even between versions).
Hi PungentLouse55
it depends on the trains-server version you are running.
If the trains-server >= 0.16 then you have to add "Args/" prefix. If you are running an older version, then you should not add any prefix.
Thanks SmallDeer34 , I think you are correct, the 'output' model is returned properly, but "input" are returned as model name not model object.
Let me check something
if the first task failed - then the remaining task are not schedule for execution which is what I expect.
agreed
I'm just surprised that if the first task is
aborted
instead by the user,
How is that different from failed? The assumption is if a component depends on another one it needs its output, if it does not then they can run in parallel. What am i missing?
PompousBeetle71 cool, next RC will have the argparse exclusion feature :)
IrateBee40 I think I have an idea what's wrong, https
could it be there is some firewall in the middle intercepting the entwork, and without installing SSL certificate the ssl connection is failing ?
EcstaticGoat95 I can see the experiment but I cannot access the notebook (I get Binder inaccessible
)
Is this the exact script as here? https://clearml.slack.com/archives/CTK20V944/p1636536308385700?thread_ts=1634910855.059900&cid=CTK20V944
If this is GitHub/GitLab/Bitbucket what I'm thinking is just a link opening an iframe / tab with the exact entry point script / commit.
What do you think?
ReassuredTiger98 All that said, how about opening an Issue on GitHub (feature request)? if we get a bit of support from users, we could definitely add it
task._wait_for_repo_detection()
You can use the above, to wait until repository & packages are detected
(If this is something users need, we should probably make it a "public function" )
Hi @<1533982060639686656:profile|AdorableSeaurchin58>
Notice the scalars and console are stored on the elasticsearch DB, this is usually under/opt/clearml/data/elastic_7
however if I want multiple machines syncing with the optimizer, for pulling the sampled hyper parameters and reporting results, I can't see how it would work
I have to admit, this is where I'm loosing you.
I thought you wanted to avoid the agent, since you wanted to run everything locally, wasn't that the issue ?
Maybe there is some background missing here, let me see if I can explain how the optimizer works.
In your actual training code you have something like:` params = {'lr': 0.3, ...
So just to be clear - the file server has nothing to do with the storage?
Think of it as a quick and dirty "minio", storing files and serving them over http. If you have minio (or any object storage) you can replace it all together 🙂
, is the team open to PRs from external people?
Yes please do! PRs are welcomed! I thought we fixed the GitHub readme to reflect it, anyhow I'll make sure we do 🙂
PanickyMoth78 thank you for the mock code, I can verify it reproduces the issue. It seem that for some reason (bug) when the same function is called multiple times it "collects" parents, hence the odd graph,
BTW: if you want to see exactly what is passed to the step you can press on the step's full_details, and see the hyperparameter section.
I'll make sure we fix this bug in the next RC.
I started running it again and it seems to have passed the phase where it failed last time
Yey!
Yes it is a common case....
I have the feeling ShinyLobster84 WackyRabbit7 you are not alone in this one 🙂 let me make sure we change the default value of Yes it is a common case
to False, so the code looks cleaner
set the following:CLEARML_AGENT_DISABLE_SSH_MOUNT=1 clearml-agent daemon ...
The issue is, it will automatically mount the .ssh of the host into the container, so that if you are using SSH to clone git you have credentials, in your case, it also mounts the configuration, hence failing to login.
I will make sure we add it to the configuration file, so it is more visible
Glad to hear that! 🙂
Oh that is odd... let me check something
One more question, in the second log, trains agent is configured with Conda, on the first it is configured with pip, or at least this is what it looks like, can you confirm?
Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?