and those env variables are credentials for ClearML. Since they are taken from k8s secrets, they are the same for every user.
Oh ...
I can create secrets for every new user and set env variables accordingly, but perhaps you see a better way out?
So the thing is, if a User spins the k8s job, the user needs to pass their credentials (so the system knows who it is)... You could just pass the user's key/secret (not nice, but probably not a big issue, as everyone is an Admin anyhow,...
hmm interesting use case, why do you need to add the "--no-binary"
Hi RoundMosquito25
This is a bit old but probably a good start:
https://clear.ml/blog/stacking-up-against-the-competition/
tl;dr
ClearML advantages (at least a few I can think of)
Scales way better Enables out of the box experiment orchestration (i.e. remote execution etc) Data management Nicer UI Full RestAPI Full MLops platform Model serving Query-able model repositoryProbably more ๐
I think my question is more about design, is a ModelPipeline class a self contained pipeline? (i.e. containing all the different steps or is it a single step in a pipeline)
You can always log it manually:from clearml import InputModel input_model = InputModel.import_model(weights_url='/tmp/keras_example/weight.6.hdf5')
Really what I need is for A and B to be separate tasks, but guarantee they will be assigned to the same machine so that the clearml dataset cache on that machine will be warm.
I think that what you are looking for is multi-machine cache (which is fully supported). Basically mount an NFS/SMB folder from a NAS to any of those machines, configure the cache folder to point to it, and not you do not need to worry about affinity ?
no?
Is there a way to group A and B into a sub-pipeline, h...
WickedGoat98 I suspect the main difference is with GitHub your are cloning with https (i.e. not credentials needed) , but with gitlab you are using SSH authentication to clone the repository .If on the machine running the trains-agent you can "git clone" your repository (i.e. from command line), the trains-agent should be able to do the same (basically make sure you have the SSH keys in your ~/.ssh folder.
Are you testing the trains-agent service from (i.e. from the docker compose) o...
can we also put the path to the CA?
Yes :)
I think it would be nicer if the CLI had a subcommand to show the content ofย
~/.clearml_data.json
ย .
Actually, it only stores the last dataset id at the moment, no not much ๐
But maybe we should have a cmd line that just outputs the current datasetid, this means it will be easier to grab and pipe
WDYT?
requirements specified with git repo
you mean the reuqirements.txt is inside the gir repo? or do you mean a link to the git-repo as part of the requirements?
Can you also provide an example of the content, I think I have an idea
Thanks! I think I was able to locate the issue, but I wanted to verify ๐
yes I'm with you, we need to fix it asap
model_path/run_2022_07_20T22_11_15.209_0.zip , err: [Errno 28] No space left on deviceWhere was it running?
I take it that these files are also brought into pipeline tasks's local disk?
Unless you changed the object, then no, they should not be downloaded (the "link" is passed)
SteadySeagull18 btw: in post-callback the node.job will be completed
because it is a called after the Task is completed
Hi @<1597762318140182528:profile|EnchantingPenguin77>
, but it seems like clearml always create a virtual environmen
Yes that's correct, but the new venv inside the container inherits from the system packages (so if nothing changes it does nothing)
Is there a way that I can have the clearml-task to automatically activated a virtual environment use the activated custom virtual environment in my docker and run the scripts
Yoo can but the "correct" way to work with python and co...
I'm getting lot of bizarre errors running without a docker image attached
I think there is a mix in terminology
ClearML Agent can run in two different modes:
- virtual env - where it create a new venv for every Task executed
- docker mode- where it spins a docker as Base environment, then inside the docker (in real time) it will fetch the code, install missing python packages etc.There is no need to build a specific docker container, for example you can use the "python:3.10-bullseye" d...
The only workaround I can think of is :series = series + 'IoU>X'
It doesn't look that bad ๐
I have to specify the full uri path ?
No it should be something like " s3://bucket "
the model files management is not fully managed like for the datasets ?
They are ๐
The latest RC (0.17.5rc6) moved all logs into separate subprocess to improve speed with pytorch dataloaders
What about the epochs though? Is there a recommended number of epochs when you train on that new batch?
I'm assuming you are also using the "old" images ?
The main factor here is the ratio between the previously used data and the newly added data, you might also want to resample (i.e. train on more) new data vs old data. make sense ?
GiganticTurtle0 is it just --stop that throws this error ?
btw: if you add --queue default to the command line I assume it will work, the thing is , without --queue it will look for any queue with the "default" tag on it, since there are none, we get the error.
regardless that should not happen with --stop I will make sure we fix it
Just so we do not forget, can you please open an issue on clearml-agent github ?
I always have my notebooks in git repo but suddenly it's not running them correctly.
What do you mean?
Can I switch off git diff (change detection?)
Yes, Task.init(..., auto_connect_frameworks={"detect_repository": False})
As we use a custom CUDA image, we do not want this running on user login, and get ugly error messages about missing symlinks.
You can customize the startup bash script (running inside Any container) here:
https://github.com/allegroai/clearml-agent/blob/bf07b7f76d3236c1118b81730c6d9718705a795a/docs/clearml.conf#L145
LackadaisicalOtter14 Would that help?
Hi @<1598487094601191424:profile|MysteriousCow84>
You should put it in the dedicated section:
None
I can but that is not a configuration we would want to run with in production
Agreed, I just want to isolate the issue. I think this is the bottom python interface missing some configuration or environment variables
GrotesqueOctopus42
The problem is that when I import some function from a file in another folder, that task doesn't catch the files depencies.
Just to be clear, if this is another file, you have to have all the files in the same git repo for the agent to actually be able to fetch them on the remote machine.
If you have a mix of notebooks and code, you have to have the local code in a git repo,
Make sense ?
Oh task_id is the Task ID of step 2.
Basically the idea is, you run your code once (lets call it debugging / programming), that run creates a task in the system, the task stores the environment definition and the arguments used. Then you can clone that Task and launch it on another machine using the Agent (that basically will setup the environment based on the Task definition and will run your code with the new arguments). The Pipeline is basically doing that for you (i.e. cloning a task chan...
So sharing with the agent is also not possible.
But they can see each others experiments, so why wouldn't the agent be able to have a read-only access ?
BTW:
ReassuredTiger98 you can put your user/pass into the git URL link, but I'm not sure this will solve the privacy issue ๐