Could you please add it, I really do not want to miss it 🙂
This is already part of the docker-compose file,
https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml
I'll try to create a more classic image.
That is always better, though I remember we have some flag to allow that, you can try with:CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1 clearml-agent ...
Hi StickyBlackbird93
Yes, this agent version is rather old ( clearml_agent v1.0.0 )
it had a bug where pytorch wheel aaarch broke the agent (by default the agent in docker mode, will use the latest stable version, but not in venv mode)
Basically upgrade to the latest clearml-agent version it should solve the issue:pip3 install -U clearml-agemnt==1.2.3BTW for future debugging, this is the interesting part of the log (Notice it is looking for the correct pytorch based on the auto de...
I think this is the main issue, is this reproducible ? How can we test that?
Hmm I just noticed:
'--rm', '', 'bash'
This is odd this is an extra argument passed as "empty text" how did that end up there? could it be you did not provide any docker image or default docker container?
Yea I know, I reported this
LOL, apologies these days it a miracle I still remember my login passwords 😉
JitteryCoyote63 Not sure how/why the X-Pack feature was on (it is not used by the system), but you can disable it with an environment variable in the docker-composexpack.security.enabled=falseShould solve the problem ...
@<1720249421582569472:profile|NonchalantSeaanemone34>
dso = Dataset.create(
dataset_project= project_name,
dataset_name= dataset_name,
parent_datasets=[parent_datasets_id],
)
dso = Dataset.get(
dataset_project= project_name,
dataset_name= dataset_name,
only_completed=True,
only_published=False,
alias='latest',
)
why are you creating a dataset then getting a dataset on the same object?
it seems you are trying to upload...
Hi ShinyWhale52
This is just a suggestion, but this is what I would do:
- use
clearml-dataand create a dataset from the local CSV fileclearml-data create ... clearml-data sync --folder (where the csv file is)2. Write a python code that takes the csv file from the dataset and creates a new dataset of the preprocessed data
` from clearml import Dataset
original_csv_folder = Dataset.get(dataset_id=args.dataset).get_local_copy()
process csv file -> generate a new csv
preproces...
SubstantialElk6
The ~<package name with first name dropped> == a.b.c is a known conda/pip temporary install issue. (Some left over from previous package install)
The easiest way is to find the site-packages folder and delete the package, or create a new virtual environment
BTW:
pip freeze will also list these broken packages
We should probably add (set_task_type :))
I have a process that cleans the
/tmp
each day,
WackyRabbit7 the files (configuration etc.) that are mapped into the containers are stored there.
They should clean themselves, that said, we have noticed that the services-mode skips this cleanup, and it will be solved on the next RC of clearml-agent.
Make sense ?
I was thinking such limitations will exist only for published
Published Task could not me "marked started" even when with force flag
Hi @<1578193378640662528:profile|MoodySeaurchin4>
but is it possible to log some metrics too, like rmse or the likes? If so, how would you do it?
Sure, I'm assuming this is part of the output ? if not, this means this is part of your code, and if this is the case then yes you should use collect_custom_statistics_fn
None
`collect_custom_statistics_fn({'rmse'...
But what I get with
get_local_copy()
is the following path: ...
Get local path will return an immutable copy of the dataset, by definition this will not be the "source" storing the data.
(Also notice that the dataset itself is stored in zip files, and when you get the "local-copy" you get the extracted files)
Make sense ?
Hi SubstantialElk6
but in terms of data provenance, its not clear how i can associate the data versions with the processes that created it.
I think DeliciousBluewhale87 ’s approach is what we are aiming for, but with code.
So using clearml-data from CLI is basically storing/versioning of files (with differentiable based storage etc, but still).
What ou are after (I think) is in your preprocessing code using the programtic Dataset class, to create the Dataset from code, this a...
Yes, which looks like a lot, but you only need to d that once.
Auto scheduler would make (1) redundant (as it would spin the instance up/down based on the jobs in the queue)
I figured out the problem...
Nice!
Unfortunately, the hyperparameters in configuration object seems to be superior to the hyperparameters in Hyperparameter section
Hmm what do you mean by that ? how did you construct the code itself? (you should be able to "prioritize" one over the over)
Hi PanickyMoth78
it was uploading fine for most of the day but now it is not uploading metrics and at the end
Where are you uploading metrics to (i.e. where is the clearml-server) ?
Are you seeing any retry logging on your console ?packages/clearml/backend_interface/metrics/reporter.py", line 124, in wait_for_eventsThis seems to be consistent with waiting for metrics to be flushed to the backend, but usually you will see retry messages on your console when that happens
Hi GrittyCormorant73
When I archive the pipeline and go into the archive and delete the pipeline, the artifacts are not deleted.
Which clearml-server version are you using? The artifact delete was only recently added
If this is the case, then you have to set a shared PV for the pods, this way they can actually have a persistent cache, which would also be shared.
BTW: a single function call might not be a perfect match for a pipeline component , the overhead of starting a node might not be negligible as it needs to install required python packages bring the code etc.
Any idea where that could come from? Could we turn off the local logging as well - in these kinds of runs we don’t need it?
It is supposed to create it automatically... I tested with other examples (clearml version 1.7.3rc1) everything seems to work
What am I missing? how do we recreate the issue ? can you verify it is still not working with the latest RC?
what format should I specify it
requirements.txt format e.g. ["package >= 1.2.3"]
Would this enforce that package on various components
This is a per component control, so you can have different packages / containers based on the componnent
Would it then no longer capture import statements?
This is replacing the auto detected packages, but obviously this fails to detect your internal repo package, which is the main issue here.
How is "internal package" installed, in o...
If I edit directly the OmegaConf in the UI than the port changes correctly
This will only work if you change the Hydra/allow_omegaconf_edit to True in the UI. Did you?
LudicrousParrot69
I "think" I have a better handle on what you wish to do.
Is it kind of generic "serving" solution?
FYI:
Model artifact is, usually, a weights/model file. The idea that later you will be able to access it and serve it. Now the problem is (and I think this is what you are referring to) there is usually a specific piece of code tied to that model that can use it (a.k.a pyfunc)
A few ideas:
These days everyone is trying to build their models with generic interface, so that scik...
: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them
This is the current state.
Downloading the artifacts is done only when actually calling get()/get_local_copy()
You might only see it when the upload is done