Reputation
Badges 1
25 × Eureka!SmugDog62 so on plain vanilla Jupyter/lab everything seems to work.
What do you think is different in your setup ?
YEY! π π
Hmm BitterStarfish58 what's the error you are getting ?
Any chance you are over the free tier quota ?
If it helps, you can override it on the clients with an OS environment CLEARML_FILES_HOST
Hi RoughTiger69
unfortunately, the model was serialized with a different module structure - it was originally placed in a (root) module called
model
....
Is this like a pickle issue?
Unfortunately, this doesnβt work inside clear.ml since there is some mechanism that overrides the import mechanism using
import_bind
.
__patched_import3
What error are you getting? (meaning why isn't it working)
Hi AverageBee39
It seems the json is corrupted, could that be ?
Yes, sorry, that wasn't clear π
Hi JuicyFox94
you pointed to exactly the issue π
In your trains.conf
https://github.com/allegroai/trains/blob/f27aed767cb3aa3ea83d8f273e48460dd79a90df/docs/trains.conf#L94
. I can't find any actual model files on the server though.
What do you mean? Do you see the specific models in the web UI? is the link valid ?
Are they ephemeral or later used by other Tasks, execution etc ?
For example: configuration files, they are specific for an execution, and someone will edit them.
Initial weights files, are something that multiple execution might needs them, and they will be used to restore an execution. Data, even if changing, is usually used by multiple executions tasks etc.
It seems like you treat these files as "configurations", is that right ?
What's the clearml version? Is this with the latest from GitHub?
We use an empty queue to enqueue our tasks in, just to trigger the scheduler
it's only importance is that the experiment is not enqueued anywhere else, but the trigger then enqueues it
π
It's just that the trigger is never triggered
(Except when a new task is created - this was not the case)
Is the trigger controller running on the services queue ?
BeefyCow3 see this https://allegroai-trains.slack.com/archives/CTK20V944/p1593077204051100 :)
Hmm that should have worked ...
I'm assuming the Task itself is running on a remote agent, correct ?
Can you see the changes in the OmegaConf section ?
what happens when you pass--args overrides="['dataset.path=abcd']"
Hi MistakenDragonfly51
Notice that Models are their own entity, you can query them based on tags/projects/names etc.
Querying and getting Models is done by Model class:
https://clear.ml/docs/latest/docs/references/sdk/model_model#modelquery_models
task.get_models()
is always empty. (edited)
How come there are no Models on the Task? (in other words how come this is empty?)
Okay ConfusedPig65 I found the problem. For some reason the latest TF.keras.load_model . save_model is not tracked.
I'll make sure we push a fix later today
GiddyTurkey39
A flag would be really cool, just in case if theres any problem with the package analysis.
Trying to think if this is a system wide flag (i.e. trains.conf) or a flag in task.init.
What do you think?
It reverts back, but it cannot "delete" the last reported iteration value.
Make sense ?
"
This is Not a an S3 endpoint... what is the files server you configured for it?
LudicrousParrot69 there is already
Task.add_tags
https://github.com/allegroai/clearml/blob/2d561bf4b3598b61525511a1a5f72a9dba74953e/clearml/task.py#L964
Thanks @<1523701713440083968:profile|PanickyMoth78> for pining, let me check if I can find something in the commit log, I think there was a fix there...
BTW: if you only need the git diff you can just copy them from the UI into a txt file and do:git apply <copied-diff.txt>
DeliciousBluewhale87 you can try:
` import sqlite3
import pandas as pd
conn = sqlite3.connect('test_database')
sql_query = pd.read_sql_query ('''
SELECT
*
FROM products
''', conn)
sql_query.to_csv(...) `
Hi IrateBee40
What do you have in your ~/clearml.conf
?
Is it pointing to your clearml-server ?
Basically it hooks into any torch.save function (monkey patching in realtime)
Hi WickedElephant66
Setting the pipeline controller with pipeline_execution_queue as None
is actually launching the pipeline controller on your "dev" machine, not sure why you have two of them?
Of course, I used "localhost"
Do not use "localhost" use your IP then it would be registered with a URL that points to the IP and then it will work
Hi, what is host?
The IP of the machine running the ClearML server