Reputation
Badges 1
25 × Eureka!Hi ClumsyElephant70
s there a way to run all pipeline steps, not in isolation but consecutive in the same environment?
You mean as part of a real-time inference process ?
Hi @<1707565838988480512:profile|MeltedLizard16>
Maybe I'm missing something but gust add to your YOLO code :
from clearml import Dataset
my_files_folder = Dataset.get("dataset_id_here").get_local_copy()
what am I missing?
C will be submitted to a different queue and I donโt care as much
Is there a way to define โtask affinityโ in this way?
Hi RoughTiger69 ,
when you say Task affinity, you mean, I want C to be executed next to A/B ? Affinity as a concept doesn't really exist, it can be abstracted to a queue, where you have agents pulling from multiple queues. Then C can be pushed to one the the queues (in theory you might be able to programmtically control the Queue of C), wdyt?
Hi GrotesqueMonkey62 any chance you can be a bit more specific? Maybe a screen grab?
Here is how it works, if you look at an individual experiment scalars are grouped by title (i.e. multiple series on the same graph if they have the same title)
When comparing experiments, any unique combination of title/series will get its own graph, then the different series on the graph are the experiments themselves.
Where do you think the problem lays ?
ShaggyHare67 could you send the console log trains-agent
outputs when you run it?
Now theย
trains-agent
ย is running my code but it is unable to importย
trains
Do you have the package "trains" listed under "installed packages" in your experiment?
Error 101 : Inconsistent data encountered in document: document=Output, field=model
Okay this point to a migration issue from 0.17 to 1.0
First try to upgrade to 1.0 then to 1.0.2
(I would also upgrade a single apiserver instance, once it is done, then you can spin the rest)
Make sense ?
Okay I think I know what's going on (there is a race that for some reason on CoLab acts differently).
As a quick hack you can do the following:Task._report_subprocess_enabled = False task = Task.init(...) task.set_initial_iteration(0)
OK, I got it by modifying the .conf file and putting the credentials on node
Nice! ๐
For visibility, after close inspection of API calls it turns out there was no work against the saas server, hence no data
So you are uploading a local file (stored in a Dataset) into GS bucket? may I ask why ?
Regrading usage (I might have a typo but this is the gist):torageManager.upload_file( local_file=separated_file_posix_path, remote_url=remote_file_path + separated_file_posix_path.relative_to(files_rgb) )
Notice that you need to provide the full upload URL (including path and file name to be used on your GS storage)
we also provide a custom
aux-config
file. We also had to make sure to update the name inside
config.pbtxt
so that Triton is happy:
Good point, what would be the logic of the auto "config.pbtxt" patching we should employ ?
I would clone the first experiment, then in the cloned experiment, I would change the initial weights (assuming there is a parameter storing that) to point to the latest checkpoint, i.e. provide the full path/link. Then enqueue it for execution. The downside is that the iteration counter will start from 0 and not the previous run.
Could you disable the windows anti-virus firewall and test?
EnviousStarfish54
and the 8 charts are actually identical
Are you plotting the same plot 8 times?
Hi @<1523701079223570432:profile|ReassuredOwl55> let me try ti add some color here:
Basically we have to parts (1) pipeline logic, i.e. the code that drives the DAG, (2) pipeline components, e.g. model verification
The pipeline logic (1) i.e. the code that creates the dag, the tasks and enqueues them, will be running in the git actions context. i.e. this is the automation code. The pipeline components themselves (2) e.g. model verification training etc. are running using the clearml agents...
But I'm sure there is a cleaner way to proceed.
Maybe ?!path = task.get_output_destination().replace('file://', '', 1)
YummyMoth34
It tried to upload all events and then killed the experiment
Could you send a log?
Also, what's the train package version ?
Hi DefeatedCrab47
You mean by trains-agent, or accumulated over all experiences ?
"warm" as you do not need to sync it with the dataset, every time you access the dataset, clearml
will make sure it is there in the cache, when you switch to a new dataset the new dataset will be cached. make sense?
DeterminedToad86
Yes I think this is the issue, on SageMaker a specific compiled version of torchvision was installed (probably part of the image)
Edit the Task (before enqueuing) and change the torchvision URL to:torchvision==0.7.0
Let me know if it worked
Apparently the error comes when I try to access from
get_model_and_features
the pipeline component
load_model
. If it is not set as pipeline component and only as helper function (provided it is declared before the components that calls it (I already understood that and fixed, different from the code I sent above).
ShallowGoldfish8 so now I'm a bit confused, are you saying that now it works as expected ?
it would be nice to group experiments within projects
DilapidatedDucks58 you mean is collapse/expand ? or in something like "sub-project" ?
Hi @<1545216070686609408:profile|EnthusiasticCow4>
The auto detection of clearml is based on the actual imported packages, not the requirements.txt of your entire python environment. This is why some of them are missing.
That said you can always manually add them
Task.add_requirements("hydra-colorlog") # optional add version="1.2.0"
task = Task.init(...)
(notice to call before Task.init)
Okay how do I reproduce it ?
Hi ItchyJellyfish73
The behavior should not have changed.
"force_repo_requirements_txt" was always a "catch all option" to set a behavior for an agent, but should generally be avoided
That said, I think there was an issue with v1.0 (cleaml-server) where when you cleared the "Installed Packages" it did not actually cleared it, but set it to empty.
It sounds like the issue you are describing.
Could you upgrade the clearml-server
and test?
None
This seems like the same discussion , no ?
Hi @<1655744373268156416:profile|StickyShrimp60>
My hydra OmegaConf configuration object is not always being picked up, and I am unable to consistently reproduce it.
... I am using clearml v1.14.4,
Hmm how can we reproduce it? what are you seeing what it does "miss" the hydra, i.e. are you seeing any Hydra section? how are you running the code (manually , agent ?)