Hi CheerfulGorilla72
see
Notice all posts on that channel are @ channel 🙂
i've tried setting up a clearml application on openshift
First, my condolences 🙂 openshift ...
Second, what you need to make sure is that each container (i.e. ELK/Monogo etc) has their own PV for persistent storage , I'm assuming this is the root cause for the error.
Make sense to you ?
Okay, I'll pass to front-end, see what they can do about it.
The issue itself is changing the default user.
USER appuser
WORKDIR /home/appuser
Any reason for it ?
I am creating this user
Please explain, I think this is the culprit ...
Are you inheriting from their docker file ?
Do you mean it recently become part of enterprise version?
I do not think so, but it seems this the support for the open-source is more like a PoC
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
Hi RipeGoose2
Just to clarify, the issue with the html stuck in cache is a UI, thing, basically the webapp needs to tell the browser not to cache the artifacts, it has nothing to do with how the artifacts are created.
Regardless we love improvements so feel free to mass around with the code and PR once you get something useful 😉
Specifically this is where the html conversion happens
https://github.com/allegroai/clearml/blob/9d108d855f784e1fe7f5691d3b7bf3be64576218/clearml/backend_in...
Could be nice to write some automation
I call
Task.init
after I import tensorflow (and thus tensorboard?)
That should have worked...
Can you manually add a TB report before calling opennmt
function ?
(I want to verify the Task.init is indeed catching the TB calls, my theory is that somewhere inside the opennmt
we loose the TB)
For local testing, we have added a
ScantChimpanzee51 there is already an environment variable for that, you can just set CLEARML_OFFLINE_MODE
🙂
By the way, if we don’t wrap other calls in
is_offline()
we get errors like “DateTime object is not serializable”, but that’s a secondary issue.
I think this was fixed, can you verify with the latest RC 1.7.3rc0
? If this still happens can you share the code
However, this results in the process getti...
It might be broken for me, as I said the program works without the offline mode but gets interrupted and shows the results from above with offline mode.
How could I reproduce this issue ?
But there might be another issue in between of course - any idea how to debug?
I think I missed this one, what exactly is the issue ?
If the only issue is this linetask.execute_remotely(..., exit_process=True)
It has to finish the static analysis of the entire repository (which usually happens in the background but now we have to wait for it). If the repo is large this could actually take 20sec (depending on CPU/drive of the machine itself)
We are using k8s glue to spawn the job. ...
I think this is actual network latency, nothing to do with the jobs, could it be the server is very far away?
What happens when you manually start a Task from your machine ?
Is the latency fixed? Is it just when starting a new Task?
Wait I might be completely off.
Is this line "hangs" ?
task.execute_remotely(..., exit_process=True)
Hi Guys,
I hear you guys, and I know this is planned but probably bump down priority.
I know the main issue is the "Execution Tab" comparison, the rest is not an issue.
Maybe a quick Hack to only compare the first 10 in the Execution, and remove the limit on the others ? (The main isue with the execution is the git-diff / installed packages comparison that is quite taxing on the FE)
Thoughts ?
PompousParrot44 unfortunately not yet 😞
But the gist is :
MongoDB stores experiment data (i.e. execution parameters, git ref etc.)
ElasticSearch stores results (i.e. metrics console logs, debug image links etc.)
Does that help?
We are always looking for additional talented people 😉 DM me...
So I wonder - why should an agent be related to a specific user's credentials? Is the right way to go about this is to create a "fake user" for the sake of the agent?
Very true you have to have credentials for the trains-agent, so it can "report" to the trains-server, that said, the creator of the Task (i.e. the person who cloned it) will be registered as the "user" in the UI.
I would recommend to create an "agent" user and put it's credentials on the trains-agent machine (the same way...
Hi @<1539055479878062080:profile|FranticLobster21>
hey, how do I use local files as dependencies?
You mean like a repository ?
Can I specify in task what local files do I use that should be packaged?
In a git repo?
Basically the agent can do two things, either replicate a single script or clone a git repo + uncommitted changes
It should be the last line (or almost) of the Log. is it there ? Also it seems that from the log, that trains you are using trains 0.14.3 , try with trains 0.15 , let me know if you are still missing packages
CooperativeFox72 I would think the easiest would be to configure it globally in the clearml.conf (rather than add more arguments to the already packed Task.init) 🙂
I'm with on 60 messages being way too much..
Could you open a Github Issue on it, so we do not forget ?
Hi RipeGoose2
I just test the hydra example, seems to work when you add the offline right after the import:
` from clearml import Task
Task.set_offline(True) `
In regards to the YAML how would you pass data? Like the pipeline from tasks example?
The imports inside the functions are because the function itself becomes a stand-alone job running on a remote machine, not the entire pipeline code. This also automatically picks packages to be installed on the remote machine. Make sense?
You mean to design the entire pipeline from YAML?
(this assumes your Tasks know how to process links to artifacts)
Is this what you are after?
(BTW: any reason for working with YAML files instead of coding it?)