
Reputation
Badges 1
25 × Eureka!Error 101 : Inconsistent data encountered in document: document=Output, field=model
Okay this point to a migration issue from 0.17 to 1.0
First try to upgrade to 1.0 then to 1.0.2
(I would also upgrade a single apiserver instance, once it is done, then you can spin the rest)
Make sense ?
The --template-yaml allows you to use foll k8s YAML template (the overrides is just overrides, which do not include most of the configuration options. we should probably deprecate it
Hi ConvolutedChicken69
but when running the script it only clones the repo the clearml task is on, how can it get the other repo also?
Do you have a wheel or a git you can install it from ?
Hi SmallDeer34
ClearML automagical logging will work on the current python process. But in your example yyour Bash is running another python script (that has nothing to do with the original notebook), hence clearml automagic is not aware of it (i.e. it cannot "patch" the tensorboard calls).
In order to make it work.
you should do something like:from joeynmt import train train.main(...)
Or something similar 🙂
Make sense ?
if you have an automation process, then you should have the Task object, no?
then you have task.id
What am I missing here?
Can you send the full log as attachment?
task = Task.get_task(project_name='project', task_name='best_model_ever')
GreasyPenguin14 I think the default is reporting on failed tasks only? could that be?
JitteryCoyote63
IAM role to the web app could access
you mean the web client key/secret to access S3 data ?
Can I change the parameters before executing the draft task
Yes you can, after you clone the experiment everything becomes editable, so you can edit the config in the UI.
For example, let's assume I have config.yml, and in my code I do:my_file = task.connect_configuration('config.yml') with open(my_file, 'rt') as f: ...
Then after I clone it in the UI and edit the configuration, when it will be executed remotely,my_file
will contain the content of the configuration as s...
Hi @<1601386194774528000:profile|AmusedPanda8>
I think the project name is ./model_training/trained_models/yolov8n-TEST_OKTODELETE/
and for some reason you have "." as a project project?
(notice jested projects are automatically created based on the project name with '/' as separator)
That is awesome!
If you feel like writing a bit about the use-case and how you solved it, I think AnxiousSeal95 will be more than happy to publish something like that 🙂
I can't find out how to pass my custom clearml.conf
Hi @<1544491301435609088:profile|TeenyElk27>
The easiest is to map it into the container in your docker-compose
(map a host clearml.conf into /root/clearml.conf inside the container)
BroadMole98 as one can expect long answer as well 🙂
I have a workflow with 19000 job nodes in it.
wow, 19k job nodes? as in a single pipeline 19k steps?
The main idea of the trains-agent is to allow multi-node workloads, and creating pipelines on top of a scheduler without worrying about docker packaging (done automatically for you), and to have a proper scheduler with priority (that is missing from k8s)
If the first step is just "logging" all the steps, you can easily add "Task...
Hi BroadMole98 ,
what's the current setup you have? And how do you launch jobs to Snakemake?
BroadMole98
I'm still exploring what trains is for.
I guess you can think of Trains as Experiment manager + MLOps tied together.
The idea is to give a quick and easy way to move from coding/running on one machine to scaling it to multiple remote machines, with everything that comes with it.
In some ways it is like snakemake, it setups your environment and execute the code. Snakemake also allows you to setup data, which in Trains is done via code (StorageManager), pipelines are also...
Hi BroadMole98
What I think I am understanding about trains so far is that it's great at tracking one-off script runs and storing artifacts and metadata about training jobs, but doesn't replace kubeflow or snakemake's DAG as a first-class citizen. How does Allegro handle DAGgy workflows?
Long story short, yes you are correct. kubeflow and snakemake for that matter, are all about DAGs where each node is running a docker (bash) for you. The missing portions (for both) are:
How do I cr...
BroadMole98 Awesome, can't wait for your findings 🙂
Is there a way to do this using ssh keys?
the .ssh of the host machine should be automatically mounted, you can force it by setting force_git_ssh_protocol: true
None
It is still not working for me. Are you using Linux, windows or macos?
should work for linux mac and windows, what are you using ?
Is this still an issue (if you provide queue name, the default tag is not used so no error should be printed)
Actually this should be a flag
I have to specify the full uri path ?
No it should be something like " s3://bucket "
the model files management is not fully managed like for the datasets ?
They are 🙂
This would be my only improvement, otherwise awesome!!!output_model.update_weights(weights_filename=os.path.join(training_data_path, 'runs', 'train', 'yolov5s6_results', 'weights', 'best.onnx'))
do I still need to specify a OutputModel
No need, only if you want to upload a local model file (but I assume in this case, no new model is created)
BTW: what happens if you pass the same s3://bucket to Task.init output_uri
? I assume you are getting the same access issue ?
Could it be it checks the root target folder and you do not have permissions there only on subfolders?
well from 2 to 30sec is a factor of 15, I think this is a good start 🙂
OutrageousGrasshopper93tensorflow-gpu
is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)
How can i set the base python version for the newly created conda env?
You mean inside the docker ?