Reputation
Badges 1
25 × Eureka!I changed them to the one exposed to the users (the same I have in my local clearml.conf) and things work.
Nice!
But I can't really figure out why that would be the case...
So the thing is, the link to the files are generated by the clients, which means the actual code generated a link an internal link to the file server (i.e. a link that only works inside the k8s cluster). When you wanted to see the image/plot you were accessing it from outside the cluster, and the link simply ...
Hi @<1533257411639382016:profile|RobustRat47>
sorry for the delay,
Hi when we try and sign up a user with github.
wait, where are you getting this link?
task.mark_completed()
You have that at the bottom of the script, never call it on yourself, it will kill the actual process.
So what is going on you are marking your own process for termination, then it terminates itself leaving the interpreter and this is the reason for the errors you are seeing
The idea of mark_* is to mark an external Task, forcefully.
By just completing your process with exit code (0) (i.e. no error) the Task will be marked as completed anyhow, no need to call...
Yes, could you send the full log? screen grab ?
AttributeError: 'PosixPath' object has no attribute 'loc'
SarcasticSquirrel56 I'm assuming the artifacts is pandas and you forgot to either import before or add as requirement for the Task π
This is causing the artifact .get()
method to revert to returning the local path to the artifact, instead of actually de-serializing
(We should print a warning though, I'll make sure we do π )
EDIT: basically clearml failed to realize you also need pandas because it was never imported ...
wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?
I think you are correct, and if we detect that we are using pandas to upload an artifact, we should try and make sure it is listed in the requirements
(obviously this is easier said than done)
And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?
Yes, c...
Error 101 : Inconsistent data encountered in document: document=Output, field=model
Okay this point to a migration issue from 0.17 to 1.0
First try to upgrade to 1.0 then to 1.0.2
(I would also upgrade a single apiserver instance, once it is done, then you can spin the rest)
Make sense ?
Can you share the modified help/yaml ?
Did you run any specific migration script after the upgrade ?
How many apiserver instances do you have ?
How did you configure the elastic container? is it booting?
In order to facilitate the multiple credentials one must use the Clearml SDK obviously.
Yes π
WickedGoat98 if this is the case, you can check this example. Same idea only "manual":
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
and if you add --skip-task-init
?
I think what happens is that the clearml-Task, adds a Task.init
call without the output_uri
that is called before "your" Task.init, and this is what causes it to be ignored. Could that be the case?
because it should have detected it...
Did you see "Repository and package analysis timed out ..."
SmilingFrog76
there is no internal scheduler in Trains
So obviously there is a scheduler built into Trains, this is the queues (order / priority)
What is missing from it is multi node connection, e.g. I need two agents running the exact same job working together.
(as opposed to, I have two jobs, execute them separately when a resource is available)
Actually my suggestion was to add a SLURM integration, like we did with k8s (I'm not suggesting Kubernetes as a solution for you, the op...
is how you would create different queues,
SarcasticSquirrel56 you can create them from the UI, when the server is already running
(if you are saying, how do I create them in the first installaiton, then yes you are correct, this is possible in the helm chart, I think π )
Hi UnsightlyLion90
from my understanding agent do the job of SLURM,
That is kind of correct (they overlap in some ways π )
Any guide of how to integrate both of them?
The easiest way is to just add the "Task.init()" call to your code, and use SLURM to schedule the job. this will make sure all jobs are fully logged (this can also includes automatically uploading the models, and artifact support etc)
Full SLURM support (i.e. similar to the k8s glue support), is currently ou...
Hi MysteriousBee56 , do you have Trains installed from the git?
Another question, you mentioned "it breaks my execution", I'm assuming you mean trains-agent?!
If that is the case, there is a fix for trains-agent install 0.15.2rc0
Thanks EnviousStarfish54
Let me check if I can reproduce it
MysteriousBee56
Well we don't want to ask sudo permission automatically, and usually setups do no change, but you can diffidently call this one before running the agent πsudo chmod 777 -R ~/.trains/
PanickyMoth78
Is it limited to
accounts? (
unfortunately, yes π , but I'm sure sales will be able to hook you up ...
Hi SubstantialElk6
clearml-agent was just updated, it should solve the issue.2. Notice that "torch" / "torchvision" packages are resolved by the agent based on the pytorch compatibility table. Is there a way to reproduce the issue where it fails resolving the torch version? could you send a full log?
3. If you want a specific torch version , you can put a direct link to the torch wheel, for example: https://download.pytorch.org/whl/cu102/torch-1.6.0-cp37-cp37m-linux_x86_64.whl
docstring ?
Usually the preferred way is StorageManager
https://clear.ml/docs/latest/docs/references/sdk/storage
https://clear.ml/docs/latest/docs/integrations/storage
Hi UnsightlySeagull42
How can I reproduce this behavior ?
Are you getting all the console logs ?
Is it only the Tensorboard that is missing ?
Yeah the doctring is always the most updated π
AntsyElk37
and when i try to use --output-uri i can't pass true because obviously i can't pass a boolean only strings
hmm, that sounds right, I think we should fix that so when using --output-uri true
the value that is passed is actually True, not the string "true".
Regrading the issue itself:
are you saying --skip-task-init
is being ignored ? and it always adds the Task.init call? you can also pass --output-uri
https://files.clear.ml (which is the same as True) ,...
ok, yes, but this will install the package of the branch specified there.
Correct
So If im working on my own branch and want to run an experiment, I would have to manually put in the git path my current branch name.
When you say your own branch you mean local (i.e. not pushed to remote git repo) ?
Hi AttractiveCockroach17
In your "Installed Packages" (when the task is in draft mode, you can edit it like any requirements.txt), you need to add:package @ git+
You can also make sure you have in in the first place bu addingTask.add_requirements("package", "@ git+
") task = Task.init(...)
Let say I donβt have the data on my local machine but only S3 bucket.
You can still register it, but make sure you do not delete it from the S3 bucket because it will keep a link to it
Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known')': /
what did you put in output_uri
?
StorageHelper is used internally.
I'll make sure we remove it from the examples/docs