Reputation
Badges 1
25 × Eureka!BTW: in your code, you should probably replacedataset_task = Task.get_task(task_id=dataset.id)
with:dataset_task = dataset._task
Any chance you actually run the second script with Popen (i.e. calling the python as a subprocess) ?
we have a separate cache
Why? they can share
is there a built in programmatic way to adjustΒ
development.default_output_uri
?
How about: In your Task.init(output_uri='...')
I'll try to go with this option, I think its actually perfect for my needs
Great!
Can you test with the credentials also in the global section
None
key: "************"
secret: "********************"
Also what's the clearml python package version
In the documentation it warns about
.close()
"Only call Task.close if you are certain the Task is not needed."
Maybe this is not clear enough, this means you do not need to automatically Add/Log/Track things into the Task in the current process.
This does Not mean you cannot access the Task or its artifacts
Mark closed means to externally (i..e not from the process that crated the Task, maybe even from a different machine) close and mark the task as completed (this...
Hi @<1541229812243238912:profile|PoisedMoth54>
We should probably add a better interface (please feel free to open a github issue on the interface) until then
dataset._task.connect_configuration(configuration="path/to/file", name="my config")
So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.
Yeah I think that for some reason it fails detecting this is actually jupyter noteboko (not really sure why), Thank you for double checking on the container !!
Hi @<1643423185791619072:profile|DashingCentipede5>
Notice that you called "start_locally", it tries to run the code locally inside your jupter notebook, it assumes everything including code already exists, is that your case ?
- In a notebook, create a method and decorate it by fastai.scriptβs
@call_parse
.Any chance you have a very simple code/notebook to reference (this will really help in fixing the issue)?
Any chance your code needs more than the main script, but it is Not in a git repo? Because the agent supports either single script file, or a git repo with multiple files
Hi @<1575656665519230976:profile|SkinnyBat30>
Streamlit apps are backend run (i.e. the python code drives the actual web app)
This means running your Tasks code and exposing the web app (i.e. http) streamlit.
This is fully supported with ClearML, but unfortunately only in the paid tiers π
You can however run your Task with an agent, make sure the agent's machine is accessible and report the full IP+URL as a hyper-parameter or property, and then use that to access your streaml...
and those env variables are credentials for ClearML. Since they are taken from k8s secrets, they are the same for every user.
Oh ...
I can create secrets for every new user and set env variables accordingly, but perhaps you see a better way out?
So the thing is, if a User spins the k8s job, the user needs to pass their credentials (so the system knows who it is)... You could just pass the user's key/secret (not nice, but probably not a big issue, as everyone is an Admin anyhow,...
` task = Task.init(...)
assume model checkpoint
if task.models['output']:
get the latest checlpoint
model_file_or_path = task.models['output'][-1].get_local_copy()
load the model checkpoint
run training code `RoughTiger69 Would the above work for you?
Ohh sorry I missed that and answered on the original message, nvm π all is well now
I think the ClearmlLogger is kind of deprecated ...
Basically all you need is Task.init at the beginning , the default tensorboard logger will be caught by clearml
(Also can you share the clearml.conf, without actual creds π )
The problems comes from ClearML that thinks it starts from iteration 420, and then adds again the iteration number (421), so it starts logging from 420+421=841
JitteryCoyote63 Is this the issue ?
After you call task.set_initial_iteration(0)
what do you get with task.get_initial_iteration()
, is it 0 ?
CourageousLizard33 Are you using the docker-compose to setup the trains-server?
Hi ZippyAlligator65
You mean like env vars?
Ohh yes, if you deleted the token then you have to recreate the cleaml.conf
BTW: no need to generate a token, it will last π
Hi AbruptWorm50
the second "epoch loss" is the scalar for the "validation" process (see "validation: epoch loss" series is actually the TF file/folder prefix automatically added)
Make sense ?
How can i make it such that any update to the upstream database
What do you mean "upstream database"?
Hi DeliciousBluewhale87
I think you are correct, there is no way to pass it.
As TimelyPenguin76 mentioned you can either set a default output_uri on the agent's config file, or edit the created Task in the UI.
What is the specific use case ? Maybe we should add this ability, wdyt?
AttractiveCockroach17
Can you print the configuration to console when you start he run (you will get a local print and then later the remote print), are they the same? Are the 3 runs the same (local / remote print)